CN114255285B

CN114255285B - Video and urban information model three-dimensional scene fusion method, system and storage medium

Info

Publication number: CN114255285B
Application number: CN202111591333.2A
Authority: CN
Inventors: 陈彪; 陈顺清; 刘慧敏
Original assignee: Ogilvy Technology Co ltd
Current assignee: Ogilvy Technology Co ltd
Priority date: 2021-12-23
Filing date: 2021-12-23
Publication date: 2023-07-18
Anticipated expiration: 2041-12-23
Also published as: WO2023116430A1; CN114255285A

Abstract

The invention relates to the field of mapping and graphics, in particular to a method, a system and a storage medium for fusing a video and urban information model three-dimensional scene based on a heuristic algorithm, wherein the method comprises the following steps: generating a characteristic point file according to the video shooting image and the three-dimensional scene view; initializing the cone and camera parameters; setting the updating speed, the updating direction and the algorithm iteration times of the camera parameters, and updating the camera parameters; calculating a camera projection matrix, a camera space coordinate and a fitness function; when the fitness function does not exceed a set threshold, the currently matched camera parameters are global optimal solutions, if the fitness function does not exceed the set threshold, generating solution spaces of all parameters according to the iteration result, screening n groups of solutions with the minimum fitness value as candidate optimal solutions, and taking the candidate optimal solutions as search base points of the next iteration; and when the algorithm iteration times are met, screening the optimal solution with the minimum fitness value from the candidate optimal solutions, and outputting the optimal solution. The invention improves the robustness of camera position matching and realizes intelligent matching of camera parameters.

Description

Video and urban information model three-dimensional scene fusion method, system and storage medium

Technical Field

The invention relates to the field of map graphics, in particular to a method, a system and a storage medium for fusing video and urban information model three-dimensional scenes based on heuristic algorithm.

Background

In the field of real-scene three-dimensional GIS, the scene based on the three-dimensional model can truly restore the objects of physical worlds such as landforms, buildings, bridges and the like, and has the effects of high precision, equal proportion and high simulation. However, the three-dimensional model is a result of a certain period of time, belongs to static data, cannot show the behavior, and cannot show the current latest situation. In order to solve the problem, the live-action three-dimensional GIS is increasingly accessed to the Internet of things data such as monitoring videos and the like to meet the business requirements of different fields such as security protection, traffic and the like. The mode of combining the video and the three-dimensional scene is generally divided into two methods of popup window display video and three-dimensional scene fusion. The method is also called video fusion, and a user can know surrounding scenes while watching the video, so that the method has the characteristics of high restoration fidelity, intuitiveness, close and appropriate video position and real position, easiness in understanding and the like.

At present, the fusion of the video and the three-dimensional scene mostly adopts two modes of manual operation and automatic mapping. The manual operation needs to manually calibrate the video picture and the three-dimensional scene, and the camera information is restored by adjusting a plurality of parameter values such as the position, the orientation, the depression angle and the like of the camera. The automatic mapping method is to calculate a projection matrix through the camera internal parameters and the camera external parameters, so that the accurate mapping of the video and the scene is realized. The key step of the fusion of the video and the three-dimensional scene is camera calibration, namely the estimation of the internal and external parameters of the camera. There are two general ways, firstly, the projection matrix of the camera is calculated by using the least square method to minimize the reprojection error, the method needs a large number of characteristic points (at least 6 pairs), and the method has the limitation that the characteristic points are difficult to select because the field of view of the camera is limited and the model sometimes lacks 3D details; secondly, estimating the camera internal parameters by using a calibration instrument in advance, and then selecting at least three characteristic points from the current scene to estimate the camera external parameters. In addition, when feature points are few, the estimation of camera parameters by both methods is greatly affected by noise, and even if the error of re-projection is small, the obtained camera position may be deviated.

Disclosure of Invention

In order to solve the problems existing in the prior art, the invention provides a video and urban information model three-dimensional scene fusion method, a system and a storage medium based on a heuristic algorithm, wherein the method does not take the internal and external parameters of a calculated camera as important steps, but directly obtains a projection matrix based on the parameters such as the position and the observation point of the camera, and then utilizes the heuristic algorithm to dynamically search the parameters to obtain the minimum error of the projection of the camera; therefore, the difficulty of camera calibration and calculation is reduced, the robustness of camera position matching is improved, the efficiency of camera parameter estimation is improved, and the intelligent matching of camera parameters is realized.

The method for fusing the video and the urban information model three-dimensional scene in the embodiment of the invention comprises the following steps:

s1, calibrating spatial characteristic points according to a video shooting image and a coordinate file thereof, a three-dimensional scene view and a coordinate file thereof, and generating a characteristic point file;

s2, initializing the cone and camera parameters;

s3, setting the updating speed, the updating direction and the algorithm iteration times of the camera parameters;

s4, updating the camera parameters P' =P+V according to the updating speed and direction of each parameter of the camera _n * s, wherein P' is the updated camera parameter, P is the pre-updated camera parameter, V _n S is the moving direction of the parameter, which is the speed value after parameter updating;

s5, calculating a camera projection matrix;

s6, calculating a camera space coordinate according to the camera projection matrix and the three-dimensional coordinate of the feature point;

position'＝Position _{three-dimensional coordinates of feature points} *ProjectionMatrix

Wherein the Position _{Three-dimensional coordinates of feature points} Is the three-dimensional coordinates of the feature points, and the camera space coordinate position ' = (m ' is calculated by the three-dimensional coordinates ' _i ，n' _i )，i＝1,2,3...k；

S7, calculating an fitness function; the fitness function is defined as the average error between the real camera space coordinates of the feature points and the solved camera space coordinates:

wherein (m) ₁ ，n ₁ ) Is the real camera space coordinates of the feature points, (m' ₁ ，n' ₁ ) The calculated camera space coordinates, k is the number of feature points;

s8, judging whether the fitness function exceeds a set threshold value; if yes, executing step S9, if no, outputting the currently matched camera parameters as a global optimal solution, and taking the global optimal solution output as a matching result to obtain a camera position, an actual imaging point and a camera view angle, so as to realize fusion of a video scene and a three-dimensional scene;

s9, judging whether all parameters finish the iteration, if so, executing the step S10, otherwise, executing the steps S4 to S8;

s10, generating solution spaces of all parameters according to the iteration result, screening n groups of solutions with the minimum fitness value according to the fitness value as candidate optimal solutions, and taking the n groups of solutions as search base points of the next iteration;

s11, judging whether algorithm iteration times iters are met, if yes, screening out the optimal solution with the smallest fitness value from the current camera parameter candidate optimal solution, outputting the optimal solution to obtain a camera position, an actual imaging point and a camera view angle, and fusing a video scene and a three-dimensional scene; otherwise update the velocity value V _n Steps S4 to S10 are performed again.

The embodiment of the invention discloses a three-dimensional scene fusion system for a video and urban information model, which comprises the following modules:

the characteristic point file generation module is used for calibrating space characteristic points according to the video shooting image and the coordinate file thereof, the three-dimensional scene view and the coordinate file thereof and generating a characteristic point file;

the initialization module is used for initializing the parameters of the visual vertebral body and the camera;

the parameter setting module is used for setting the updating speed, the updating direction and the algorithm iteration times of the camera parameters;

a parameter updating module for updating the camera parameters P' =p+v according to the speed and direction of updating each parameter of the camera _n * s, wherein P' is the updated camera parameter, P is the pre-updated camera parameter, V _n S is the moving direction of the parameter, which is the speed value after parameter updating;

the computing module is used for computing a camera projection matrix and a fitness function; calculating the space coordinates of the camera according to the camera projection matrix and the three-dimensional coordinates of the feature points;

The fitness function is defined as the average error between the real camera space coordinates of the feature points and the solved camera space coordinates:

the fitness judging module is used for judging whether the fitness function exceeds a set threshold value; if the current matching parameters are the global optimal solutions, outputting the global optimal solutions as matching results to obtain the camera positions, actual imaging points and camera visual angles, and realizing the fusion of the video scene and the three-dimensional scene;

the iteration completion judging module is used for judging whether all parameters complete the iteration or not, if so, generating solution spaces of all parameters according to the iteration result, and screening n groups of solutions with the minimum fitness value according to the fitness value to serve as candidate optimal solutions to serve as search base points of the next iteration; otherwise, returning to the parameter updating module;

the iteration number judging module is used for judging whether the algorithm iteration number set by the parameter setting module is met, if so, screening out the optimal solution with the smallest fitness value from the current camera parameter candidate optimal solution to be output as the optimal solution, obtaining the camera position, the actual imaging point and the camera view angle, and realizing the fusion of the video scene and the three-dimensional scene; otherwise update the velocity value V _n And returning to the parameter updating module.

The storage medium of the invention has computer instructions stored thereon which, when executed by a processor, implement the steps of the video and urban information model three-dimensional scene fusion method of the invention.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention provides a video and three-dimensional scene fusion method, a system and a storage medium based on a heuristic algorithm, wherein the method does not take the internal and external parameters of a calculated camera as important steps, but directly obtains a projection matrix based on the parameters such as the position and the observation point of the camera, and then dynamically searches the parameters by using the heuristic algorithm to obtain the minimum error of the projection of the camera, thereby reducing the difficulty and accuracy requirements of the calibration and calculation of the camera and improving the robustness of the position matching of the camera.

2. The method supports the self-adaptive search of the camera parameters of the multi-feature points, and utilizes fewer camera parameters to realize the accurate matching of the video and the three-dimensional scene. Meanwhile, the invention also solves the problem that the accurate coordinate of the camera cannot be obtained, and the coordinate error of the algorithm can be converged to the optimal solution more quickly, so that the automatic matching of the camera parameters is realized, and the matching efficiency of the camera parameters is improved.

3. On the other hand, in general, the approximate positions of the camera and the observation center point are clear to the user, and can be set as the initial position of the camera, so that the efficiency of estimating the camera parameters can be improved to some extent.

Drawings

FIG. 1 is a schematic diagram of a feature point automatic matching flow in a dimension scene fusion method according to an embodiment of the present invention;

FIG. 2 is a schematic view of an imaging view of a vertebral body in accordance with an embodiment of the present invention;

fig. 3 is a schematic flow chart of intelligent matching of camera parameters in the method for merging dimension scenes according to the embodiment of the invention.

Detailed Description

The invention discloses a heuristic algorithm-based method for fusing a video and a three-dimensional scene of a city information model, which mainly solves the problem of the serious difficulty of intelligent matching of camera parameters. In a specific embodiment, the invention provides an improved heuristic algorithm, supports self-adaptive search of camera parameters of multiple feature points, and utilizes fewer camera parameters to realize accurate matching of video and three-dimensional scenes.

The technical scheme of the present invention will be described in further detail with reference to examples and drawings, but the embodiments of the present invention are not limited thereto.

Example 1

Referring to fig. 1-3, the method for fusing video and urban information model three-dimensional scene based on heuristic algorithm in this embodiment adopts the following technical means: 1) generating a characteristic point file according to the consistency of two three-dimensional scene objects, 2) selecting initial parameters of a cone of vision, calibrating a camera by using the minimum parameters to the maximum extent, 3) automatically adapting the parameters of the camera according to the error values of coordinate points, and 4) screening out optimal parameters by using the error values of the coordinate points and the iterative effect. Wherein, each camera parameter can be estimated according to the camera condition, and the algorithm is not influenced by the estimated accuracy. Specifically, the method mainly comprises the following steps:

s1, calibrating space characteristic points according to a video shooting image and a coordinate file thereof, a three-dimensional scene view and a coordinate file thereof, and generating a characteristic point file.

The method comprises the steps of marking homonymous points in a video shooting image and a three-dimensional scene view, wherein each pair of homonymous points is a group of characteristic points, and the homonymous points comprise coordinate position= (X) in the three-dimensional scene _i ,Y _i ,Z _i ) And two-dimensional image coordinate position= (m) in video _i ,n _i ) Where i=1.2.3..k, all pairs of feature points make up one feature point file. Specifically, according to the video shooting image and the coordinate file thereof, the three-dimensional scene view and the coordinate file thereof, a pair of homonymous points in the video shooting image and the three-dimensional scene view are marked and extracted as a corresponding group of characteristic points, the characteristic point description and the characteristic point matching are carried out, and all the extracted characteristic point pairs form a characteristic point file.

The specific flow of feature point matching is as follows:

(1) And acquiring pixel coordinates of the marked homonymy points in the video. In order to acquire pixel coordinates in the video, a picture can be cut out from the video, the picture is cut out by utilizing PhotoShop, and the pixel coordinates of the same-name points, namely the pixel coordinates of the feature points, are acquired by utilizing PhotoShop.

(2) Three-dimensional space coordinates corresponding to the feature points are acquired (X, Y, Z). The coordinates may be picked up directly from the three-dimensional information platform.

(3) And generating a characteristic point file. The feature point file contains pixel coordinates of feature points and three-dimensional space coordinate information corresponding to the feature points.

(4) And (5) matching the characteristic points. Each group of homonymous points is a group of characteristic points, and the characteristic point matching is to match the homonymous points together. If the pixel coordinates of the feature point a are (m, n), and the three-dimensional space coordinates are (x, y, z), then (m, n) and (x, y, z) are a set of matching results. How many sets of feature points there are, and how many sets of matching results.

The calibration of the same name points or the characteristic points can be realized by adopting manual and automatic calibration modes. The matching degree of the three-dimensional scene based on the oblique photographing data and the video scene is high, the characteristic points can be automatically marked in a machine learning mode, and the scene based on manual modeling needs to be calibrated manually. Table one illustrates the noted feature points:

table one: feature points

In the step, an image feature point matching algorithm, i.e. a SIFT algorithm, is adopted when the feature points of the oblique photographing scene and the video picture are automatically marked, and the main process is as follows: acquiring an image to be matched and a corresponding coordinate file, wherein the image to be matched comprises a video shooting picture and a three-dimensional scene view corresponding to the video shooting picture, the two-dimensional coordinates of the video image can be directly acquired through the picture, and the coordinates of the three-dimensional scene can be directly acquired through a three-dimensional system; extracting characteristic points; describing the feature points to obtain feature point descriptors; matching the characteristic points; and outputting the characteristic point file.

The manual labeling mode is adopted when labeling the three-dimensional scene and the video picture through manual modeling, and a plurality of rules are required to be followed when the manual labeling is carried out: adjusting the three-dimensional scene to the same observation angle as the monitoring video as much as possible, and scaling the three-dimensional scene to be consistent with the monitoring video; the selected points in the video are in one-to-one correspondence with the positions in the three-dimensional scene, so that the selected characteristic points have stability and identifiability as far as possible; the number of selected points in the video is not less than 4, and the selected points in the video are as far as possible compatible with the up, down, left and right directions and the center position of the video.

S2, initializing the cone and camera parameters.

The cone is a cone three-dimensional space formed by the direction of view OB (the center line of the cone), the view fov (i.e. the internal reference of the camera), the FAR PLANE (FAR PLANE) and the near PLANE (NEAR PLANE), wherein the object located between the FAR PLANE and the near PLANE is visible and imaged on the near PLANE, as shown in fig. 2.

The key parameters determining the visual cone are the camera position O (O _x ,O _y ,O _z ) A line of sight direction OB and a vertical viewing angle fov value, wherein the line of sight direction OB has an intersection point C (C _x ,C _y ,C _z ) OC may be used instead of OB. Thus, the initial camera parameters may be represented by initial cone parameters:

P＝(C _x ,C _y ,C _z ,O _x ,O _y ,O _z ,fov)

wherein C is _x 、C _y 、C _z Coordinates of an intersection point C of the line-of-sight direction OB and the three-dimensional scene; o (O) _x 、O _y 、O _z Coordinates of the camera position O; fov is the vertical viewing angle of the camera.

In this embodiment, the initialized cone has only 7 parameters, and the remaining parameters can be set directly. To make the camera observe a larger space to a greater extent, the distance value of OA in fig. 2 may be set as small as possible, while the distance of OB is as large as possible. The initialized key camera parameters can determine the visual cone, but are not limited to the parameters, and can be expanded according to specific practical situations.

S3, setting the updating speed, the updating direction and the algorithm iteration times of the camera parameters.

In this embodiment, the moving speed of each parameter is set to be V, the moving direction is s, the speed represents the step length of each movement of each parameter, and the moving direction represents the section where the parameter is movable. The movement speed is expressed asUpdated velocity V _n ＝V ₁ ++ (n-1) Deltav, n represents the number of algorithm iterations and Deltav represents the magnitude of the velocity update; movement direction s= [ -a, a]A is an integer, s= -a, -a+1,-a +2 of the total number of the components, 0.a.a-2, a-1, a, there are a total of 2a values; the search neighborhood of each parameter is [ - (va), va]Setting the iteration number of the algorithm as iters, and generating each iteration round of each candidate parameter set (2 a) ⁷ And (5) group solution.

In this embodiment, the initial speed of parameter update is V ₁ ＝(20 _cx ,20 _cy ,20 _cz ,20 _ox ,20 _oy ,20 _oz ,10 _fov )，The moving direction of each parameter is s= [ -5,5]That is, each parameter can select 5 values to the left and right of the candidate solution as neighborhood values each time, and total 10 neighborhood values; algorithm iteration number iters=50; error threshold δ=0.0001.

In this embodiment, each parameter may generate a plurality of new candidate parameters through movement, and the movement direction s and the movement speed V of the parameter determine the search neighborhood range of the parameter, and the movement range of each parameter needs to be set according to the actual situation, and the movement speed and the movement direction also directly affect the convergence efficiency of the algorithm.

S4, updating camera parameters. According to the speed and direction of updating each parameter of the camera, updating the parameters P' =P+V of the camera _n * s, wherein P' is the updated camera parameter and P is the pre-update camera parameter.

S5, calculating a camera projection matrix Projectmatrix.

From the three-dimensional imaging principle, the coordinates of an object in camera space are equal to world coordinates multiplied by the projection matrix of the camera. The change of the projection matrix is closely related to the cone, and each update of the camera parameters generates a new cone, so that the projection matrix also changes along with the change of the camera parameters. The formula adopted in calculating the camera projection matrix in this embodiment is:

wherein Aspect is the Aspect ratio of the camera, and the value is:

or alternatively

Wherein, nearly plane height is:

the far plane height is:

far represents the distance of the far plane from the O-point, near represents the distance of the near plane from the O-point, near plane width represents the near plane width, far plane width represents the far plane width.

S6, calculating an objective function according to the camera projection matrix and the three-dimensional coordinates of the feature points, wherein the objective function is the camera space coordinates:

Wherein the Position _{Three-dimensional coordinates of feature points} Is the three-dimensional coordinates of the feature points, and the camera space coordinate position ' = (m ' is calculated by the three-dimensional coordinates ' _i ，n' _i )，i＝1,2,3...k。

S7, calculating a fitness function.

In this embodiment, the fitness function is defined as the average error between the real camera space coordinates of the feature points and the solved camera space coordinates:

wherein (m) ₁ ，n ₁ ) Real camera space sitting being a feature pointTarget, (m' ₁ ，n' ₁ ) Is the calculated camera space coordinates, k is the number of feature points.

S8, judging whether the fitness function exceeds a set threshold value. If yes, executing step S9, if no, indicating that the currently matched camera parameters are the global optimal solution, outputting the global optimal solution as a matching result, ending the algorithm, obtaining the camera position, the actual imaging point and the camera view angle, and realizing the fusion of the video scene and the three-dimensional scene.

The value of the fitness function (abbreviated as fitness value) in steps S7 and S8 is an indicator for measuring the error between the current camera parameter and the actual camera parameter. The smaller the error, the better the parameter matching effect, and vice versa.

And S9, judging whether all parameters finish the iteration, if so, executing the step S10, otherwise, executing the steps S4 to S8.

And S10, generating solution spaces of all parameters according to the iteration result, screening n groups of solutions with the minimum fitness value according to the fitness value as candidate optimal solutions, and taking the n groups of solutions as search base points of the next iteration.

The purpose of step S10 is to generate a solution space of all parameters through the iteration of the present round, and select several groups of parameters which are most likely to be the optimal solution from the solution space, as the search base point of the parameter iteration of the next round. When selecting the candidate optimal solution, screening can be performed according to the principle that the fitness value is minimum and the parameter matching effect is best. This is because the image of the fitness function is a "U" style with only a minimum of 0, so the smaller the fitness value, the smaller the parameter error that is matched. In order to reduce the iterative times of the algorithm and expand the search area, when the optimal parameter candidate values are screened, a plurality of groups of parameters with the minimum fitness value can be found to be used as candidate solutions by sequencing the fitness value, the speed and the direction of parameter movement are updated, and the solution space of each group of candidate solutions is searched.

S11, judging whether the algorithm iteration number iters is met, if so, screening out the optimal solution with the smallest fitness value from the current optimal solution of the camera parameter candidate, outputting the optimal solution, ending the algorithm, obtaining the camera position, the actual imaging point and the camera view angle,the fusion of the video scene and the three-dimensional scene is realized; otherwise update the velocity value V _n Steps S4 to S10 are performed again.

Example 2

Based on the same inventive concept, the embodiment and embodiment 1 provide a video and urban information model three-dimensional scene fusion system based on heuristic algorithm, which comprises the following modules:

the feature point file generation module is used for realizing the step S1, calibrating the space feature points according to the video shooting image and the coordinate file thereof, the three-dimensional scene view and the coordinate file thereof, and generating a feature point file;

the initialization module is used for realizing the step S2 and initializing the parameters of the visual vertebral body and the camera;

the parameter setting module is used for realizing the step S3 and setting the updating speed, the updating direction and the algorithm iteration times of the camera parameters;

a parameter updating module for implementing step S4, in which the camera parameters P' =p+v are updated according to the speed and direction of updating each parameter of the camera _n * s, wherein P' is the updated camera parameter, P is the pre-updated camera parameter, V _n S is the moving direction of the parameter, which is the speed value after parameter updating;

the computing module is used for realizing the steps S5-S7 and computing a camera projection matrix and a fitness function; calculating the space coordinates of the camera according to the camera projection matrix and the three-dimensional coordinates of the feature points;

the fitness judging module is used for realizing the step S8 and judging whether the fitness function exceeds a set threshold value or not; if the current matching parameters are the global optimal solutions, outputting the global optimal solutions as matching results to obtain the camera positions, actual imaging points and camera visual angles, and realizing the fusion of the video scene and the three-dimensional scene;

the iteration completion judging module is used for realizing the steps S9-S10, judging whether all parameters complete the iteration, if so, generating solution spaces of all parameters according to the iteration result, and screening n groups of solutions with the minimum fitness value according to the fitness value to serve as candidate optimal solutions to serve as search base points of the next iteration; otherwise, returning to the parameter updating module;

the iteration number judging module is used for realizing the step S11, judging whether the algorithm iteration number set by the parameter setting module is met, if so, screening out the optimal solution with the smallest fitness value from the current camera parameter candidate optimal solution and outputting the optimal solution to obtain the camera position, the actual imaging point and the camera view angle, and realizing the fusion of the video scene and the three-dimensional scene; otherwise update the velocity value V _n And returning to the parameter updating module.

For describing the system method in this embodiment, since the implementation of the technical solution corresponds to the method in embodiment 1, the description of this embodiment is relatively simple, and the corresponding parts of the technical features are just referred to the description of each step in embodiment 1, and are not repeated here.

Example 3

The present embodiment and embodiment 1 are based on the same inventive concept, and propose a corresponding storage medium, on which computer instructions are stored, which when executed by a processor, implement the steps of the three-dimensional scene fusion method in embodiment 1.

From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, an optical disk, etc., and include several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in the embodiments or some parts of the embodiments of the present application.

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. The three-dimensional scene fusion method for the video and urban information model is characterized by comprising the following steps of:

s2, initializing the cone and camera parameters;

s5, calculating a camera projection matrix Projectionmatrix;

Wherein, the liquid crystal display device comprises a liquid crystal display device,Position _{three-dimensional coordinates of feature points} Is the three-dimensional coordinates of the feature points, and the camera space coordinate position ' = (m ' is calculated by the three-dimensional coordinates ' _i ，n' _i )，i＝1,2,3...k；

wherein (m) _i ,n _i ) Is the real camera space coordinates of the feature points, (m _i ',n _i ' is the calculated camera spatial coordinates, k is the number of feature points;

s11, judging whether algorithm iteration times iters are met, if yes, screening out the optimal solution with the smallest fitness value from the current camera parameter candidate optimal solution, outputting the optimal solution to obtain a camera position, an actual imaging point and a camera view angle, and fusing a video scene and a three-dimensional scene; otherwise update the velocity value V _n Steps S4 to S10 are performed again;

in step S3, the moving speed of each camera parameter is set to V, the moving direction is S, and the moving speed is expressed asThe updated velocity value is denoted as V _n ＝V ₁ ++ (n-1) Deltav, n represents the number of algorithm iterations and Deltav represents the magnitude of the velocity update; movement direction s= [ -a, a]A is an integer, s= -a, -a+1, -a +2 of the total number of the components, 0..a.a-2, a-1, a; the search neighborhood of each camera parameter is [ - (va), va]；

In the step S2, the cone is viewed from the camera position as an origin, and a cone three-dimensional space is formed by the view direction OB, the vertical view angle fov, the far plane and the near plane; key parameters of the optic cone include camera position O (O _x ,O _y ,O _z ) A line of sight direction OB and a vertical direction view angle fov value, wherein an intersection point of the line of sight direction OB and the three-dimensional scene is C (C _x ,C _y ,C _z ) The initial camera parameters are expressed as initial cone parameters:

P＝(C _x ,C _y ,C _z ,O _x ,O _y ,O _z ,fov)

wherein C is _x 、C _y 、C _z Is the coordinates of the intersection point C; o (O) _x 、O _y 、O _z Coordinates of the camera position O; fov is the vertical viewing angle of the camera.

2. The method for three-dimensional scene fusion between video and urban information model according to claim 1, wherein the calculation formula of the camera projection matrix in step S5 is:

wherein Aspect is the Aspect ratio of the camera, and the value is:

wherein, nearly plane height is:

the far plane height is:

3. The method for three-dimensional scene fusion between video and urban information model according to claim 1, wherein step S10 is performed according to the principle that the fitness function has the smallest value and the parameter matching effect is the best when selecting the candidate optimal solution.

4. The method for three-dimensional scene fusion of video and urban information model according to claim 1, wherein step S1 marks homonymous points in the video shot image and the three-dimensional scene view, each pair of homonymous points is a set of feature points, and the feature points comprise coordinate position= (X) in the three-dimensional scene _i ,Y _i ,Z _i ) And two-dimensional image coordinate position= (m) in video _i ,n _i ) Where i=1.2.3..k, all pairs of feature points make up one feature point file.

5. The three-dimensional scene fusion system of the video and the city information model is characterized by comprising:

the computing module is used for computing a camera projection matrix Projectionmatrix and an fitness function; calculating the space coordinates of the camera according to the camera projection matrix and the three-dimensional coordinates of the feature points;

the iteration number judging module is used for judging whether the algorithm iteration number set by the parameter setting module is met, if so, screening out the optimal solution with the smallest fitness value from the current camera parameter candidate optimal solution to be output as the optimal solution, obtaining the camera position, the actual imaging point and the camera view angle, and realizing the fusion of the video scene and the three-dimensional scene; otherwise update the velocity value V _n Returning to the parameter updating module;

the parameter setting module sets the moving speed of each camera parameter as V, the moving direction as s, and the moving speed is expressed asThe updated velocity value is denoted as V _n ＝V ₁ ++ (n-1) Deltav, n represents the number of algorithm iterations and Deltav represents the magnitude of the velocity update; movement direction s= [ -a, a]A is an integer, s= -a, -a+1, -a +2 of the total number of the components, 0..a.a-2, a-1, a; the search neighborhood of each camera parameter is [ - (va), va]；

In the initialization module, the cone is a cone three-dimensional space which is formed by the sight line direction OB, the vertical direction visual angle fov, the far plane and the near plane by taking the camera position as an origin; key parameters of the optic cone include camera position O (O _x ,O _y ,O _z ) A line of sight direction OB and a vertical direction view angle fov value, wherein an intersection point of the line of sight direction OB and the three-dimensional scene is C (C _x ,C _y ,C _z ) The initial camera parameters are expressed as initial cone parameters:

P＝(C _x ,C _y ,C _z ,O _x ,O _y ,O _z ,fov)

6. The system for three-dimensional scene fusion of video and urban information model according to claim 5, wherein the calculation formula of the camera projection matrix in the calculation module is:

wherein Aspect is the Aspect ratio of the camera, and the value is:

or alternatively

Wherein, nearly plane height is:

the far plane height is:

7. The system of claim 5, wherein the current iteration completion determination module performs the filtering according to a principle that the fitness function has a minimum value and the parameter matching effect is the best when selecting the candidate optimal solution.

8. The video and urban information model three-dimensional scene fusion system of claim 5The system is characterized in that the feature point file generation module marks homonymous points in the video shooting image and the three-dimensional scene view, each pair of homonymous points is a group of feature points, and the feature point file generation module comprises coordinates position= (X) in the three-dimensional scene _i ,Y _i ,Z _i ) And two-dimensional image coordinate position= (m) in video _i ,n _i ) Where i=1.2.3..k, all pairs of feature points make up one feature point file.

9. A storage medium having stored thereon computer instructions which, when executed by a processor, perform the steps of the video and urban information model three-dimensional scene fusion method of any of claims 1-4.