CN114255285B - Video and urban information model three-dimensional scene fusion method, system and storage medium - Google Patents

Video and urban information model three-dimensional scene fusion method, system and storage medium Download PDF

Info

Publication number
CN114255285B
CN114255285B CN202111591333.2A CN202111591333A CN114255285B CN 114255285 B CN114255285 B CN 114255285B CN 202111591333 A CN202111591333 A CN 202111591333A CN 114255285 B CN114255285 B CN 114255285B
Authority
CN
China
Prior art keywords
camera
parameter
parameters
video
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111591333.2A
Other languages
Chinese (zh)
Other versions
CN114255285A (en
Inventor
陈彪
陈顺清
刘慧敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ogilvy Technology Co ltd
Original Assignee
Ogilvy Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ogilvy Technology Co ltd filed Critical Ogilvy Technology Co ltd
Priority to CN202111591333.2A priority Critical patent/CN114255285B/en
Publication of CN114255285A publication Critical patent/CN114255285A/en
Priority to PCT/CN2022/137042 priority patent/WO2023116430A1/en
Application granted granted Critical
Publication of CN114255285B publication Critical patent/CN114255285B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2004Aligning objects, relative positioning of parts
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention relates to the field of mapping and graphics, in particular to a method, a system and a storage medium for fusing a video and urban information model three-dimensional scene based on a heuristic algorithm, wherein the method comprises the following steps: generating a characteristic point file according to the video shooting image and the three-dimensional scene view; initializing the cone and camera parameters; setting the updating speed, the updating direction and the algorithm iteration times of the camera parameters, and updating the camera parameters; calculating a camera projection matrix, a camera space coordinate and a fitness function; when the fitness function does not exceed a set threshold, the currently matched camera parameters are global optimal solutions, if the fitness function does not exceed the set threshold, generating solution spaces of all parameters according to the iteration result, screening n groups of solutions with the minimum fitness value as candidate optimal solutions, and taking the candidate optimal solutions as search base points of the next iteration; and when the algorithm iteration times are met, screening the optimal solution with the minimum fitness value from the candidate optimal solutions, and outputting the optimal solution. The invention improves the robustness of camera position matching and realizes intelligent matching of camera parameters.

Description

Video and urban information model three-dimensional scene fusion method, system and storage medium
Technical Field
The invention relates to the field of map graphics, in particular to a method, a system and a storage medium for fusing video and urban information model three-dimensional scenes based on heuristic algorithm.
Background
In the field of real-scene three-dimensional GIS, the scene based on the three-dimensional model can truly restore the objects of physical worlds such as landforms, buildings, bridges and the like, and has the effects of high precision, equal proportion and high simulation. However, the three-dimensional model is a result of a certain period of time, belongs to static data, cannot show the behavior, and cannot show the current latest situation. In order to solve the problem, the live-action three-dimensional GIS is increasingly accessed to the Internet of things data such as monitoring videos and the like to meet the business requirements of different fields such as security protection, traffic and the like. The mode of combining the video and the three-dimensional scene is generally divided into two methods of popup window display video and three-dimensional scene fusion. The method is also called video fusion, and a user can know surrounding scenes while watching the video, so that the method has the characteristics of high restoration fidelity, intuitiveness, close and appropriate video position and real position, easiness in understanding and the like.
At present, the fusion of the video and the three-dimensional scene mostly adopts two modes of manual operation and automatic mapping. The manual operation needs to manually calibrate the video picture and the three-dimensional scene, and the camera information is restored by adjusting a plurality of parameter values such as the position, the orientation, the depression angle and the like of the camera. The automatic mapping method is to calculate a projection matrix through the camera internal parameters and the camera external parameters, so that the accurate mapping of the video and the scene is realized. The key step of the fusion of the video and the three-dimensional scene is camera calibration, namely the estimation of the internal and external parameters of the camera. There are two general ways, firstly, the projection matrix of the camera is calculated by using the least square method to minimize the reprojection error, the method needs a large number of characteristic points (at least 6 pairs), and the method has the limitation that the characteristic points are difficult to select because the field of view of the camera is limited and the model sometimes lacks 3D details; secondly, estimating the camera internal parameters by using a calibration instrument in advance, and then selecting at least three characteristic points from the current scene to estimate the camera external parameters. In addition, when feature points are few, the estimation of camera parameters by both methods is greatly affected by noise, and even if the error of re-projection is small, the obtained camera position may be deviated.
Disclosure of Invention
In order to solve the problems existing in the prior art, the invention provides a video and urban information model three-dimensional scene fusion method, a system and a storage medium based on a heuristic algorithm, wherein the method does not take the internal and external parameters of a calculated camera as important steps, but directly obtains a projection matrix based on the parameters such as the position and the observation point of the camera, and then utilizes the heuristic algorithm to dynamically search the parameters to obtain the minimum error of the projection of the camera; therefore, the difficulty of camera calibration and calculation is reduced, the robustness of camera position matching is improved, the efficiency of camera parameter estimation is improved, and the intelligent matching of camera parameters is realized.
The method for fusing the video and the urban information model three-dimensional scene in the embodiment of the invention comprises the following steps:
s1, calibrating spatial characteristic points according to a video shooting image and a coordinate file thereof, a three-dimensional scene view and a coordinate file thereof, and generating a characteristic point file;
s2, initializing the cone and camera parameters;
s3, setting the updating speed, the updating direction and the algorithm iteration times of the camera parameters;
s4, updating the camera parameters P' =P+V according to the updating speed and direction of each parameter of the camera n * s, wherein P' is the updated camera parameter, P is the pre-updated camera parameter, V n S is the moving direction of the parameter, which is the speed value after parameter updating;
s5, calculating a camera projection matrix;
s6, calculating a camera space coordinate according to the camera projection matrix and the three-dimensional coordinate of the feature point;
position'=Position three-dimensional coordinates of feature points *ProjectionMatrix
Wherein the Position Three-dimensional coordinates of feature points Is the three-dimensional coordinates of the feature points, and the camera space coordinate position ' = (m ' is calculated by the three-dimensional coordinates ' i ,n' i ),i=1,2,3...k;
S7, calculating an fitness function; the fitness function is defined as the average error between the real camera space coordinates of the feature points and the solved camera space coordinates:
wherein (m) 1 ,n 1 ) Is the real camera space coordinates of the feature points, (m' 1 ,n' 1 ) The calculated camera space coordinates, k is the number of feature points;
s8, judging whether the fitness function exceeds a set threshold value; if yes, executing step S9, if no, outputting the currently matched camera parameters as a global optimal solution, and taking the global optimal solution output as a matching result to obtain a camera position, an actual imaging point and a camera view angle, so as to realize fusion of a video scene and a three-dimensional scene;
s9, judging whether all parameters finish the iteration, if so, executing the step S10, otherwise, executing the steps S4 to S8;
s10, generating solution spaces of all parameters according to the iteration result, screening n groups of solutions with the minimum fitness value according to the fitness value as candidate optimal solutions, and taking the n groups of solutions as search base points of the next iteration;
s11, judging whether algorithm iteration times iters are met, if yes, screening out the optimal solution with the smallest fitness value from the current camera parameter candidate optimal solution, outputting the optimal solution to obtain a camera position, an actual imaging point and a camera view angle, and fusing a video scene and a three-dimensional scene; otherwise update the velocity value V n Steps S4 to S10 are performed again.
The embodiment of the invention discloses a three-dimensional scene fusion system for a video and urban information model, which comprises the following modules:
the characteristic point file generation module is used for calibrating space characteristic points according to the video shooting image and the coordinate file thereof, the three-dimensional scene view and the coordinate file thereof and generating a characteristic point file;
the initialization module is used for initializing the parameters of the visual vertebral body and the camera;
the parameter setting module is used for setting the updating speed, the updating direction and the algorithm iteration times of the camera parameters;
a parameter updating module for updating the camera parameters P' =p+v according to the speed and direction of updating each parameter of the camera n * s, wherein P' is the updated camera parameter, P is the pre-updated camera parameter, V n S is the moving direction of the parameter, which is the speed value after parameter updating;
the computing module is used for computing a camera projection matrix and a fitness function; calculating the space coordinates of the camera according to the camera projection matrix and the three-dimensional coordinates of the feature points;
position'=Position three-dimensional coordinates of feature points *ProjectionMatrix
Wherein the Position Three-dimensional coordinates of feature points Is the three-dimensional coordinates of the feature points, and the camera space coordinate position ' = (m ' is calculated by the three-dimensional coordinates ' i ,n' i ),i=1,2,3...k;
The fitness function is defined as the average error between the real camera space coordinates of the feature points and the solved camera space coordinates:
wherein (m) 1 ,n 1 ) Is the real camera space coordinates of the feature points, (m' 1 ,n' 1 ) The calculated camera space coordinates, k is the number of feature points;
the fitness judging module is used for judging whether the fitness function exceeds a set threshold value; if the current matching parameters are the global optimal solutions, outputting the global optimal solutions as matching results to obtain the camera positions, actual imaging points and camera visual angles, and realizing the fusion of the video scene and the three-dimensional scene;
the iteration completion judging module is used for judging whether all parameters complete the iteration or not, if so, generating solution spaces of all parameters according to the iteration result, and screening n groups of solutions with the minimum fitness value according to the fitness value to serve as candidate optimal solutions to serve as search base points of the next iteration; otherwise, returning to the parameter updating module;
the iteration number judging module is used for judging whether the algorithm iteration number set by the parameter setting module is met, if so, screening out the optimal solution with the smallest fitness value from the current camera parameter candidate optimal solution to be output as the optimal solution, obtaining the camera position, the actual imaging point and the camera view angle, and realizing the fusion of the video scene and the three-dimensional scene; otherwise update the velocity value V n And returning to the parameter updating module.
The storage medium of the invention has computer instructions stored thereon which, when executed by a processor, implement the steps of the video and urban information model three-dimensional scene fusion method of the invention.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention provides a video and three-dimensional scene fusion method, a system and a storage medium based on a heuristic algorithm, wherein the method does not take the internal and external parameters of a calculated camera as important steps, but directly obtains a projection matrix based on the parameters such as the position and the observation point of the camera, and then dynamically searches the parameters by using the heuristic algorithm to obtain the minimum error of the projection of the camera, thereby reducing the difficulty and accuracy requirements of the calibration and calculation of the camera and improving the robustness of the position matching of the camera.
2. The method supports the self-adaptive search of the camera parameters of the multi-feature points, and utilizes fewer camera parameters to realize the accurate matching of the video and the three-dimensional scene. Meanwhile, the invention also solves the problem that the accurate coordinate of the camera cannot be obtained, and the coordinate error of the algorithm can be converged to the optimal solution more quickly, so that the automatic matching of the camera parameters is realized, and the matching efficiency of the camera parameters is improved.
3. On the other hand, in general, the approximate positions of the camera and the observation center point are clear to the user, and can be set as the initial position of the camera, so that the efficiency of estimating the camera parameters can be improved to some extent.
Drawings
FIG. 1 is a schematic diagram of a feature point automatic matching flow in a dimension scene fusion method according to an embodiment of the present invention;
FIG. 2 is a schematic view of an imaging view of a vertebral body in accordance with an embodiment of the present invention;
fig. 3 is a schematic flow chart of intelligent matching of camera parameters in the method for merging dimension scenes according to the embodiment of the invention.
Detailed Description
The invention discloses a heuristic algorithm-based method for fusing a video and a three-dimensional scene of a city information model, which mainly solves the problem of the serious difficulty of intelligent matching of camera parameters. In a specific embodiment, the invention provides an improved heuristic algorithm, supports self-adaptive search of camera parameters of multiple feature points, and utilizes fewer camera parameters to realize accurate matching of video and three-dimensional scenes.
The technical scheme of the present invention will be described in further detail with reference to examples and drawings, but the embodiments of the present invention are not limited thereto.
Example 1
Referring to fig. 1-3, the method for fusing video and urban information model three-dimensional scene based on heuristic algorithm in this embodiment adopts the following technical means: 1) generating a characteristic point file according to the consistency of two three-dimensional scene objects, 2) selecting initial parameters of a cone of vision, calibrating a camera by using the minimum parameters to the maximum extent, 3) automatically adapting the parameters of the camera according to the error values of coordinate points, and 4) screening out optimal parameters by using the error values of the coordinate points and the iterative effect. Wherein, each camera parameter can be estimated according to the camera condition, and the algorithm is not influenced by the estimated accuracy. Specifically, the method mainly comprises the following steps:
s1, calibrating space characteristic points according to a video shooting image and a coordinate file thereof, a three-dimensional scene view and a coordinate file thereof, and generating a characteristic point file.
The method comprises the steps of marking homonymous points in a video shooting image and a three-dimensional scene view, wherein each pair of homonymous points is a group of characteristic points, and the homonymous points comprise coordinate position= (X) in the three-dimensional scene i ,Y i ,Z i ) And two-dimensional image coordinate position= (m) in video i ,n i ) Where i=1.2.3..k, all pairs of feature points make up one feature point file. Specifically, according to the video shooting image and the coordinate file thereof, the three-dimensional scene view and the coordinate file thereof, a pair of homonymous points in the video shooting image and the three-dimensional scene view are marked and extracted as a corresponding group of characteristic points, the characteristic point description and the characteristic point matching are carried out, and all the extracted characteristic point pairs form a characteristic point file.
The specific flow of feature point matching is as follows:
(1) And acquiring pixel coordinates of the marked homonymy points in the video. In order to acquire pixel coordinates in the video, a picture can be cut out from the video, the picture is cut out by utilizing PhotoShop, and the pixel coordinates of the same-name points, namely the pixel coordinates of the feature points, are acquired by utilizing PhotoShop.
(2) Three-dimensional space coordinates corresponding to the feature points are acquired (X, Y, Z). The coordinates may be picked up directly from the three-dimensional information platform.
(3) And generating a characteristic point file. The feature point file contains pixel coordinates of feature points and three-dimensional space coordinate information corresponding to the feature points.
(4) And (5) matching the characteristic points. Each group of homonymous points is a group of characteristic points, and the characteristic point matching is to match the homonymous points together. If the pixel coordinates of the feature point a are (m, n), and the three-dimensional space coordinates are (x, y, z), then (m, n) and (x, y, z) are a set of matching results. How many sets of feature points there are, and how many sets of matching results.
The calibration of the same name points or the characteristic points can be realized by adopting manual and automatic calibration modes. The matching degree of the three-dimensional scene based on the oblique photographing data and the video scene is high, the characteristic points can be automatically marked in a machine learning mode, and the scene based on manual modeling needs to be calibrated manually. Table one illustrates the noted feature points:
table one: feature points
In the step, an image feature point matching algorithm, i.e. a SIFT algorithm, is adopted when the feature points of the oblique photographing scene and the video picture are automatically marked, and the main process is as follows: acquiring an image to be matched and a corresponding coordinate file, wherein the image to be matched comprises a video shooting picture and a three-dimensional scene view corresponding to the video shooting picture, the two-dimensional coordinates of the video image can be directly acquired through the picture, and the coordinates of the three-dimensional scene can be directly acquired through a three-dimensional system; extracting characteristic points; describing the feature points to obtain feature point descriptors; matching the characteristic points; and outputting the characteristic point file.
The manual labeling mode is adopted when labeling the three-dimensional scene and the video picture through manual modeling, and a plurality of rules are required to be followed when the manual labeling is carried out: adjusting the three-dimensional scene to the same observation angle as the monitoring video as much as possible, and scaling the three-dimensional scene to be consistent with the monitoring video; the selected points in the video are in one-to-one correspondence with the positions in the three-dimensional scene, so that the selected characteristic points have stability and identifiability as far as possible; the number of selected points in the video is not less than 4, and the selected points in the video are as far as possible compatible with the up, down, left and right directions and the center position of the video.
S2, initializing the cone and camera parameters.
The cone is a cone three-dimensional space formed by the direction of view OB (the center line of the cone), the view fov (i.e. the internal reference of the camera), the FAR PLANE (FAR PLANE) and the near PLANE (NEAR PLANE), wherein the object located between the FAR PLANE and the near PLANE is visible and imaged on the near PLANE, as shown in fig. 2.
The key parameters determining the visual cone are the camera position O (O x ,O y ,O z ) A line of sight direction OB and a vertical viewing angle fov value, wherein the line of sight direction OB has an intersection point C (C x ,C y ,C z ) OC may be used instead of OB. Thus, the initial camera parameters may be represented by initial cone parameters:
P=(C x ,C y ,C z ,O x ,O y ,O z ,fov)
wherein C is x 、C y 、C z Coordinates of an intersection point C of the line-of-sight direction OB and the three-dimensional scene; o (O) x 、O y 、O z Coordinates of the camera position O; fov is the vertical viewing angle of the camera.
In this embodiment, the initialized cone has only 7 parameters, and the remaining parameters can be set directly. To make the camera observe a larger space to a greater extent, the distance value of OA in fig. 2 may be set as small as possible, while the distance of OB is as large as possible. The initialized key camera parameters can determine the visual cone, but are not limited to the parameters, and can be expanded according to specific practical situations.
S3, setting the updating speed, the updating direction and the algorithm iteration times of the camera parameters.
In this embodiment, the moving speed of each parameter is set to be V, the moving direction is s, the speed represents the step length of each movement of each parameter, and the moving direction represents the section where the parameter is movable. The movement speed is expressed asUpdated velocity V n =V 1 ++ (n-1) Deltav, n represents the number of algorithm iterations and Deltav represents the magnitude of the velocity update; movement direction s= [ -a, a]A is an integer, s= -a, -a+1,-a +2 of the total number of the components, 0.a.a-2, a-1, a, there are a total of 2a values; the search neighborhood of each parameter is [ - (va), va]Setting the iteration number of the algorithm as iters, and generating each iteration round of each candidate parameter set (2 a) 7 And (5) group solution.
In this embodiment, the initial speed of parameter update is V 1 =(20 cx ,20 cy ,20 cz ,20 ox ,20 oy ,20 oz ,10 fov ),The moving direction of each parameter is s= [ -5,5]That is, each parameter can select 5 values to the left and right of the candidate solution as neighborhood values each time, and total 10 neighborhood values; algorithm iteration number iters=50; error threshold δ=0.0001.
In this embodiment, each parameter may generate a plurality of new candidate parameters through movement, and the movement direction s and the movement speed V of the parameter determine the search neighborhood range of the parameter, and the movement range of each parameter needs to be set according to the actual situation, and the movement speed and the movement direction also directly affect the convergence efficiency of the algorithm.
S4, updating camera parameters. According to the speed and direction of updating each parameter of the camera, updating the parameters P' =P+V of the camera n * s, wherein P' is the updated camera parameter and P is the pre-update camera parameter.
S5, calculating a camera projection matrix Projectmatrix.
From the three-dimensional imaging principle, the coordinates of an object in camera space are equal to world coordinates multiplied by the projection matrix of the camera. The change of the projection matrix is closely related to the cone, and each update of the camera parameters generates a new cone, so that the projection matrix also changes along with the change of the camera parameters. The formula adopted in calculating the camera projection matrix in this embodiment is:
wherein Aspect is the Aspect ratio of the camera, and the value is:
or alternatively
Wherein, nearly plane height is:
the far plane height is:
far represents the distance of the far plane from the O-point, near represents the distance of the near plane from the O-point, near plane width represents the near plane width, far plane width represents the far plane width.
S6, calculating an objective function according to the camera projection matrix and the three-dimensional coordinates of the feature points, wherein the objective function is the camera space coordinates:
position'=Position three-dimensional coordinates of feature points *ProjectionMatrix
Wherein the Position Three-dimensional coordinates of feature points Is the three-dimensional coordinates of the feature points, and the camera space coordinate position ' = (m ' is calculated by the three-dimensional coordinates ' i ,n' i ),i=1,2,3...k。
S7, calculating a fitness function.
In this embodiment, the fitness function is defined as the average error between the real camera space coordinates of the feature points and the solved camera space coordinates:
wherein (m) 1 ,n 1 ) Real camera space sitting being a feature pointTarget, (m' 1 ,n' 1 ) Is the calculated camera space coordinates, k is the number of feature points.
S8, judging whether the fitness function exceeds a set threshold value. If yes, executing step S9, if no, indicating that the currently matched camera parameters are the global optimal solution, outputting the global optimal solution as a matching result, ending the algorithm, obtaining the camera position, the actual imaging point and the camera view angle, and realizing the fusion of the video scene and the three-dimensional scene.
The value of the fitness function (abbreviated as fitness value) in steps S7 and S8 is an indicator for measuring the error between the current camera parameter and the actual camera parameter. The smaller the error, the better the parameter matching effect, and vice versa.
And S9, judging whether all parameters finish the iteration, if so, executing the step S10, otherwise, executing the steps S4 to S8.
And S10, generating solution spaces of all parameters according to the iteration result, screening n groups of solutions with the minimum fitness value according to the fitness value as candidate optimal solutions, and taking the n groups of solutions as search base points of the next iteration.
The purpose of step S10 is to generate a solution space of all parameters through the iteration of the present round, and select several groups of parameters which are most likely to be the optimal solution from the solution space, as the search base point of the parameter iteration of the next round. When selecting the candidate optimal solution, screening can be performed according to the principle that the fitness value is minimum and the parameter matching effect is best. This is because the image of the fitness function is a "U" style with only a minimum of 0, so the smaller the fitness value, the smaller the parameter error that is matched. In order to reduce the iterative times of the algorithm and expand the search area, when the optimal parameter candidate values are screened, a plurality of groups of parameters with the minimum fitness value can be found to be used as candidate solutions by sequencing the fitness value, the speed and the direction of parameter movement are updated, and the solution space of each group of candidate solutions is searched.
S11, judging whether the algorithm iteration number iters is met, if so, screening out the optimal solution with the smallest fitness value from the current optimal solution of the camera parameter candidate, outputting the optimal solution, ending the algorithm, obtaining the camera position, the actual imaging point and the camera view angle,the fusion of the video scene and the three-dimensional scene is realized; otherwise update the velocity value V n Steps S4 to S10 are performed again.
Example 2
Based on the same inventive concept, the embodiment and embodiment 1 provide a video and urban information model three-dimensional scene fusion system based on heuristic algorithm, which comprises the following modules:
the feature point file generation module is used for realizing the step S1, calibrating the space feature points according to the video shooting image and the coordinate file thereof, the three-dimensional scene view and the coordinate file thereof, and generating a feature point file;
the initialization module is used for realizing the step S2 and initializing the parameters of the visual vertebral body and the camera;
the parameter setting module is used for realizing the step S3 and setting the updating speed, the updating direction and the algorithm iteration times of the camera parameters;
a parameter updating module for implementing step S4, in which the camera parameters P' =p+v are updated according to the speed and direction of updating each parameter of the camera n * s, wherein P' is the updated camera parameter, P is the pre-updated camera parameter, V n S is the moving direction of the parameter, which is the speed value after parameter updating;
the computing module is used for realizing the steps S5-S7 and computing a camera projection matrix and a fitness function; calculating the space coordinates of the camera according to the camera projection matrix and the three-dimensional coordinates of the feature points;
position'=Position three-dimensional coordinates of feature points *ProjectionMatrix
Wherein the Position Three-dimensional coordinates of feature points Is the three-dimensional coordinates of the feature points, and the camera space coordinate position ' = (m ' is calculated by the three-dimensional coordinates ' i ,n' i ),i=1,2,3...k;
The fitness function is defined as the average error between the real camera space coordinates of the feature points and the solved camera space coordinates:
wherein (m) 1 ,n 1 ) Is the real camera space coordinates of the feature points, (m' 1 ,n' 1 ) The calculated camera space coordinates, k is the number of feature points;
the fitness judging module is used for realizing the step S8 and judging whether the fitness function exceeds a set threshold value or not; if the current matching parameters are the global optimal solutions, outputting the global optimal solutions as matching results to obtain the camera positions, actual imaging points and camera visual angles, and realizing the fusion of the video scene and the three-dimensional scene;
the iteration completion judging module is used for realizing the steps S9-S10, judging whether all parameters complete the iteration, if so, generating solution spaces of all parameters according to the iteration result, and screening n groups of solutions with the minimum fitness value according to the fitness value to serve as candidate optimal solutions to serve as search base points of the next iteration; otherwise, returning to the parameter updating module;
the iteration number judging module is used for realizing the step S11, judging whether the algorithm iteration number set by the parameter setting module is met, if so, screening out the optimal solution with the smallest fitness value from the current camera parameter candidate optimal solution and outputting the optimal solution to obtain the camera position, the actual imaging point and the camera view angle, and realizing the fusion of the video scene and the three-dimensional scene; otherwise update the velocity value V n And returning to the parameter updating module.
For describing the system method in this embodiment, since the implementation of the technical solution corresponds to the method in embodiment 1, the description of this embodiment is relatively simple, and the corresponding parts of the technical features are just referred to the description of each step in embodiment 1, and are not repeated here.
Example 3
The present embodiment and embodiment 1 are based on the same inventive concept, and propose a corresponding storage medium, on which computer instructions are stored, which when executed by a processor, implement the steps of the three-dimensional scene fusion method in embodiment 1.
From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, an optical disk, etc., and include several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in the embodiments or some parts of the embodiments of the present application.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims (9)

1. The three-dimensional scene fusion method for the video and urban information model is characterized by comprising the following steps of:
s1, calibrating spatial characteristic points according to a video shooting image and a coordinate file thereof, a three-dimensional scene view and a coordinate file thereof, and generating a characteristic point file;
s2, initializing the cone and camera parameters;
s3, setting the updating speed, the updating direction and the algorithm iteration times of the camera parameters;
s4, updating the camera parameters P' =P+V according to the updating speed and direction of each parameter of the camera n * s, wherein P' is the updated camera parameter, P is the pre-updated camera parameter, V n S is the moving direction of the parameter, which is the speed value after parameter updating;
s5, calculating a camera projection matrix Projectionmatrix;
s6, calculating a camera space coordinate according to the camera projection matrix and the three-dimensional coordinate of the feature point;
position'=Position three-dimensional coordinates of feature points *ProjectionMatrix
Wherein, the liquid crystal display device comprises a liquid crystal display device,Position three-dimensional coordinates of feature points Is the three-dimensional coordinates of the feature points, and the camera space coordinate position ' = (m ' is calculated by the three-dimensional coordinates ' i ,n' i ),i=1,2,3...k;
S7, calculating an fitness function; the fitness function is defined as the average error between the real camera space coordinates of the feature points and the solved camera space coordinates:
wherein (m) i ,n i ) Is the real camera space coordinates of the feature points, (m i ',n i ' is the calculated camera spatial coordinates, k is the number of feature points;
s8, judging whether the fitness function exceeds a set threshold value; if yes, executing step S9, if no, outputting the currently matched camera parameters as a global optimal solution, and taking the global optimal solution output as a matching result to obtain a camera position, an actual imaging point and a camera view angle, so as to realize fusion of a video scene and a three-dimensional scene;
s9, judging whether all parameters finish the iteration, if so, executing the step S10, otherwise, executing the steps S4 to S8;
s10, generating solution spaces of all parameters according to the iteration result, screening n groups of solutions with the minimum fitness value according to the fitness value as candidate optimal solutions, and taking the n groups of solutions as search base points of the next iteration;
s11, judging whether algorithm iteration times iters are met, if yes, screening out the optimal solution with the smallest fitness value from the current camera parameter candidate optimal solution, outputting the optimal solution to obtain a camera position, an actual imaging point and a camera view angle, and fusing a video scene and a three-dimensional scene; otherwise update the velocity value V n Steps S4 to S10 are performed again;
in step S3, the moving speed of each camera parameter is set to V, the moving direction is S, and the moving speed is expressed asThe updated velocity value is denoted as V n =V 1 ++ (n-1) Deltav, n represents the number of algorithm iterations and Deltav represents the magnitude of the velocity update; movement direction s= [ -a, a]A is an integer, s= -a, -a+1, -a +2 of the total number of the components, 0..a.a-2, a-1, a; the search neighborhood of each camera parameter is [ - (va), va];
In the step S2, the cone is viewed from the camera position as an origin, and a cone three-dimensional space is formed by the view direction OB, the vertical view angle fov, the far plane and the near plane; key parameters of the optic cone include camera position O (O x ,O y ,O z ) A line of sight direction OB and a vertical direction view angle fov value, wherein an intersection point of the line of sight direction OB and the three-dimensional scene is C (C x ,C y ,C z ) The initial camera parameters are expressed as initial cone parameters:
P=(C x ,C y ,C z ,O x ,O y ,O z ,fov)
wherein C is x 、C y 、C z Is the coordinates of the intersection point C; o (O) x 、O y 、O z Coordinates of the camera position O; fov is the vertical viewing angle of the camera.
2. The method for three-dimensional scene fusion between video and urban information model according to claim 1, wherein the calculation formula of the camera projection matrix in step S5 is:
wherein Aspect is the Aspect ratio of the camera, and the value is:
wherein, nearly plane height is:
the far plane height is:
far represents the distance of the far plane from the O-point, near represents the distance of the near plane from the O-point, near plane width represents the near plane width, far plane width represents the far plane width.
3. The method for three-dimensional scene fusion between video and urban information model according to claim 1, wherein step S10 is performed according to the principle that the fitness function has the smallest value and the parameter matching effect is the best when selecting the candidate optimal solution.
4. The method for three-dimensional scene fusion of video and urban information model according to claim 1, wherein step S1 marks homonymous points in the video shot image and the three-dimensional scene view, each pair of homonymous points is a set of feature points, and the feature points comprise coordinate position= (X) in the three-dimensional scene i ,Y i ,Z i ) And two-dimensional image coordinate position= (m) in video i ,n i ) Where i=1.2.3..k, all pairs of feature points make up one feature point file.
5. The three-dimensional scene fusion system of the video and the city information model is characterized by comprising:
the characteristic point file generation module is used for calibrating space characteristic points according to the video shooting image and the coordinate file thereof, the three-dimensional scene view and the coordinate file thereof and generating a characteristic point file;
the initialization module is used for initializing the parameters of the visual vertebral body and the camera;
the parameter setting module is used for setting the updating speed, the updating direction and the algorithm iteration times of the camera parameters;
a parameter updating module for updating the camera parameters P' =p+v according to the speed and direction of updating each parameter of the camera n * s, wherein P' is the updated camera parameter, P is the pre-updated camera parameter, V n S is the moving direction of the parameter, which is the speed value after parameter updating;
the computing module is used for computing a camera projection matrix Projectionmatrix and an fitness function; calculating the space coordinates of the camera according to the camera projection matrix and the three-dimensional coordinates of the feature points;
position'=Position three-dimensional coordinates of feature points *ProjectionMatrix
Wherein the Position Three-dimensional coordinates of feature points Is the three-dimensional coordinates of the feature points, and the camera space coordinate position ' = (m ' is calculated by the three-dimensional coordinates ' i ,n' i ),i=1,2,3...k;
The fitness function is defined as the average error between the real camera space coordinates of the feature points and the solved camera space coordinates:
wherein (m) i ,n i ) Is the real camera space coordinates of the feature points, (m i ',n i ' is the calculated camera spatial coordinates, k is the number of feature points;
the fitness judging module is used for judging whether the fitness function exceeds a set threshold value; if the current matching parameters are the global optimal solutions, outputting the global optimal solutions as matching results to obtain the camera positions, actual imaging points and camera visual angles, and realizing the fusion of the video scene and the three-dimensional scene;
the iteration completion judging module is used for judging whether all parameters complete the iteration or not, if so, generating solution spaces of all parameters according to the iteration result, and screening n groups of solutions with the minimum fitness value according to the fitness value to serve as candidate optimal solutions to serve as search base points of the next iteration; otherwise, returning to the parameter updating module;
the iteration number judging module is used for judging whether the algorithm iteration number set by the parameter setting module is met, if so, screening out the optimal solution with the smallest fitness value from the current camera parameter candidate optimal solution to be output as the optimal solution, obtaining the camera position, the actual imaging point and the camera view angle, and realizing the fusion of the video scene and the three-dimensional scene; otherwise update the velocity value V n Returning to the parameter updating module;
the parameter setting module sets the moving speed of each camera parameter as V, the moving direction as s, and the moving speed is expressed asThe updated velocity value is denoted as V n =V 1 ++ (n-1) Deltav, n represents the number of algorithm iterations and Deltav represents the magnitude of the velocity update; movement direction s= [ -a, a]A is an integer, s= -a, -a+1, -a +2 of the total number of the components, 0..a.a-2, a-1, a; the search neighborhood of each camera parameter is [ - (va), va];
In the initialization module, the cone is a cone three-dimensional space which is formed by the sight line direction OB, the vertical direction visual angle fov, the far plane and the near plane by taking the camera position as an origin; key parameters of the optic cone include camera position O (O x ,O y ,O z ) A line of sight direction OB and a vertical direction view angle fov value, wherein an intersection point of the line of sight direction OB and the three-dimensional scene is C (C x ,C y ,C z ) The initial camera parameters are expressed as initial cone parameters:
P=(C x ,C y ,C z ,O x ,O y ,O z ,fov)
wherein C is x 、C y 、C z Is the coordinates of the intersection point C; o (O) x 、O y 、O z Coordinates of the camera position O; fov is the vertical viewing angle of the camera.
6. The system for three-dimensional scene fusion of video and urban information model according to claim 5, wherein the calculation formula of the camera projection matrix in the calculation module is:
wherein Aspect is the Aspect ratio of the camera, and the value is:
or alternatively
Wherein, nearly plane height is:
the far plane height is:
far represents the distance of the far plane from the O-point, near represents the distance of the near plane from the O-point, near plane width represents the near plane width, far plane width represents the far plane width.
7. The system of claim 5, wherein the current iteration completion determination module performs the filtering according to a principle that the fitness function has a minimum value and the parameter matching effect is the best when selecting the candidate optimal solution.
8. The video and urban information model three-dimensional scene fusion system of claim 5The system is characterized in that the feature point file generation module marks homonymous points in the video shooting image and the three-dimensional scene view, each pair of homonymous points is a group of feature points, and the feature point file generation module comprises coordinates position= (X) in the three-dimensional scene i ,Y i ,Z i ) And two-dimensional image coordinate position= (m) in video i ,n i ) Where i=1.2.3..k, all pairs of feature points make up one feature point file.
9. A storage medium having stored thereon computer instructions which, when executed by a processor, perform the steps of the video and urban information model three-dimensional scene fusion method of any of claims 1-4.
CN202111591333.2A 2021-12-23 2021-12-23 Video and urban information model three-dimensional scene fusion method, system and storage medium Active CN114255285B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111591333.2A CN114255285B (en) 2021-12-23 2021-12-23 Video and urban information model three-dimensional scene fusion method, system and storage medium
PCT/CN2022/137042 WO2023116430A1 (en) 2021-12-23 2022-12-06 Video and city information model three-dimensional scene fusion method and system, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111591333.2A CN114255285B (en) 2021-12-23 2021-12-23 Video and urban information model three-dimensional scene fusion method, system and storage medium

Publications (2)

Publication Number Publication Date
CN114255285A CN114255285A (en) 2022-03-29
CN114255285B true CN114255285B (en) 2023-07-18

Family

ID=80797196

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111591333.2A Active CN114255285B (en) 2021-12-23 2021-12-23 Video and urban information model three-dimensional scene fusion method, system and storage medium

Country Status (2)

Country Link
CN (1) CN114255285B (en)
WO (1) WO2023116430A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114255285B (en) * 2021-12-23 2023-07-18 奥格科技股份有限公司 Video and urban information model three-dimensional scene fusion method, system and storage medium

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105447869B (en) * 2015-11-30 2019-02-12 四川华雁信息产业股份有限公司 Camera self-calibration method and device based on particle swarm optimization algorithm
CN108537876B (en) * 2018-03-05 2020-10-16 清华-伯克利深圳学院筹备办公室 Three-dimensional reconstruction method, device, equipment and storage medium
CN109035394B (en) * 2018-08-22 2023-04-07 广东工业大学 Face three-dimensional model reconstruction method, device, equipment and system and mobile terminal
CN110648363A (en) * 2019-09-16 2020-01-03 腾讯科技(深圳)有限公司 Camera posture determining method and device, storage medium and electronic equipment
CN111582022B (en) * 2020-03-26 2023-08-29 深圳大学 Fusion method and system of mobile video and geographic scene and electronic equipment
CN111640181A (en) * 2020-05-14 2020-09-08 佳都新太科技股份有限公司 Interactive video projection method, device, equipment and storage medium
CN111836012B (en) * 2020-06-28 2022-05-13 航天图景(北京)科技有限公司 Video fusion and video linkage method based on three-dimensional scene and electronic equipment
CN112053446B (en) * 2020-07-11 2024-02-02 南京国图信息产业有限公司 Real-time monitoring video and three-dimensional scene fusion method based on three-dimensional GIS
CN112258587B (en) * 2020-10-27 2023-07-07 上海电力大学 Camera calibration method based on gray wolf particle swarm mixing algorithm
CN112927353B (en) * 2021-02-25 2023-05-19 电子科技大学 Three-dimensional scene reconstruction method, storage medium and terminal based on two-dimensional target detection and model alignment
CN113658263B (en) * 2021-06-17 2023-10-31 石家庄铁道大学 Visual scene-based electromagnetic interference source visual labeling method
CN114255285B (en) * 2021-12-23 2023-07-18 奥格科技股份有限公司 Video and urban information model three-dimensional scene fusion method, system and storage medium

Also Published As

Publication number Publication date
WO2023116430A1 (en) 2023-06-29
CN114255285A (en) 2022-03-29

Similar Documents

Publication Publication Date Title
CN108335353B (en) Three-dimensional reconstruction method, device and system of dynamic scene, server and medium
US10540576B1 (en) Panoramic camera systems
JP4679033B2 (en) System and method for median fusion of depth maps
CN108765328B (en) High-precision multi-feature plane template and distortion optimization and calibration method thereof
US10643347B2 (en) Device for measuring position and orientation of imaging apparatus and method therefor
CN112367514B (en) Three-dimensional scene construction method, device and system and storage medium
CN106940704B (en) Positioning method and device based on grid map
CN108364319B (en) Dimension determination method and device, storage medium and equipment
CN110568447A (en) Visual positioning method, device and computer readable medium
WO2019164498A1 (en) Methods, devices and computer program products for global bundle adjustment of 3d images
CN112561978B (en) Training method of depth estimation network, depth estimation method of image and equipment
CN114782691A (en) Robot target identification and motion detection method based on deep learning, storage medium and equipment
JP6894707B2 (en) Information processing device and its control method, program
CN114022639A (en) Three-dimensional reconstruction model generation method and system, electronic device and storage medium
JP2022509329A (en) Point cloud fusion methods and devices, electronic devices, computer storage media and programs
CN115035235A (en) Three-dimensional reconstruction method and device
CN111260765B (en) Dynamic three-dimensional reconstruction method for microsurgery field
CN114255285B (en) Video and urban information model three-dimensional scene fusion method, system and storage medium
CN110428461B (en) Monocular SLAM method and device combined with deep learning
CN109215128B (en) Object motion attitude image synthesis method and system
CN112465977B (en) Method for repairing three-dimensional model water surface loophole based on dense point cloud
CN112288813B (en) Pose estimation method based on multi-view vision measurement and laser point cloud map matching
CN116576850B (en) Pose determining method and device, computer equipment and storage medium
CN111260712B (en) Depth estimation method and device based on refocusing polar line graph neighborhood distribution
CN112233149A (en) Scene flow determination method and device, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant