WO2022078240A1

WO2022078240A1 - Camera precise positioning method applied to electronic map, and processing terminal

Info

Publication number: WO2022078240A1
Application number: PCT/CN2021/122607
Authority: WO
Inventors: 高星; 徐建明; 石立阳
Original assignee: 佳都科技集团股份有限公司
Priority date: 2020-10-14
Filing date: 2021-10-08
Publication date: 2022-04-21
Also published as: CN112184890B; CN112184890A

Abstract

A method for camera precise positioning method applied to an electronic map, and a processing terminal. The method comprises: step 1: obtaining image data from a target region; step 2: generating the target region into a target model; step 3: establishing a bag of words of feature points to generate first image feature points into a sparse three-dimensional point cloud; step 4: obtaining the photographing material and position information of a camera to be positioned, obtaining a target video frame, and extracting second image feature points; step 5: matching feature points with positions to find the photographed picture; step 6: selecting an intrinsic parameter initial value to obtain the initial extrinsic parameter of the target video frame; step 7: according to the initial extrinsic parameter, calculating a reprojection error; step 8: adding a preset value b to the intrinsic parameter initial value to obtain a new intrinsic parameter initial value; and step 9: repeating step 6 to step 8 to obtain an intrinsic parameter, a pose and a three-dimensional position corresponding to the minimum reprojection error. The invention allows to determine a specific height of the camera, whether used indoors or outdoors, and a specific direction the camera is facing.

Description

A kind of camera precise positioning method and processing terminal applied in electronic map

technical field

The invention relates to the technical field of electronic maps, in particular to a camera precise positioning method and a processing terminal applied in electronic maps.

Background technique

In a city, there are hundreds of thousands of surveillance cameras, even those used by administrative agencies for public surveillance in the tens of thousands. For surveillance cameras for traffic, surveillance cameras for police security, etc., it is often necessary to determine which cameras can cover the target area when accessing these surveillance cameras on the electronic map. , distortion parameters, etc. are difficult to determine, and it is often difficult to do this. Therefore, the need to be able to accurately locate the camera has become an urgent need in the security industry.

At present, in actual use, the GPS coordinates of the camera are usually determined by means of geographic surveying and mapping. For example, the one-machine-one-file module on the security surveillance video networking platform stores the camera's location information (latitude and longitude information). When this camera positioning method is adopted, the workload is huge, and professional surveying and mapping equipment such as differential GPS instrument and total station needs to be used, and each camera needs to be manually surveyed and mapped, and the workload is huge. This also leads to the fact that in the current electronic map, especially the three-dimensional electronic map, the longitude and latitude of the camera are often marked manually on the three-dimensional electronic map. What's more, the longitude and latitude of the camera is replaced by the longitude and latitude of the target object (usually a building) where the camera is located, which is essentially the approximate longitude and latitude of the camera, and accurate positioning cannot be achieved.

The current camera positioning method makes it impossible to know the specific height of the camera, whether it is indoors or outdoors, the specific direction the camera is facing, etc. in the electronic map, and it is impossible to automatically determine which cameras for a specific target area through algorithms. The target area can be covered, and it is impossible to automatically calculate whether there are other available surveillance camera resources in a specific surveillance video screen based on algorithms, so as to realize high-low point linkage video jumping.

SUMMARY OF THE INVENTION

In view of the deficiencies of the prior art, one of the objectives of the present invention is to provide a method for precise positioning of a camera applied in an electronic map, which can solve the problem of the accurate position and attitude of the camera applied to the target area based on visual coverage in the electronic map. and the problem of internal parameters;

The second object of the present invention is to provide a processing terminal, which can solve the problem of accurate position, attitude and internal parameters of a camera applied to a target area based on visual coverage in an electronic map.

A technical solution for realizing one of the objectives of the present invention is: a method for precise positioning of a camera applied in an electronic map, comprising the following steps:

Step 1: Generate a 3D model of the target area according to the image data of the target area;

Step 2: Extract the first image feature point of each shot from the image data,

generating a sparse three-dimensional point cloud in the three-dimensional model from the first image feature points;

Step 3: obtaining the shooting data of the camera to be positioned, and extracting any video frame from the shooting data, denoting it as the target video frame, and extracting the second image feature point of the target video frame;

Step 4: Compare the target video frame with each shooting picture of the image data, and use the shooting picture corresponding to the first image feature point with the highest matching degree as the shooting picture ref,

The pixel coordinates of the ref feature points of the shooting picture are respectively established with the pixel coordinates of each feature point of the target video frame and the three-dimensional coordinates of the sparse three-dimensional point cloud;

Step 5: Select an initial value of an internal parameter of the camera to be positioned, and determine the initial external parameter of the target video frame according to the initial value of the internal parameter;

Step 6: According to the initial external parameters, perform beam method adjustment processing with all the shooting pictures of the image data, and calculate the reprojection error;

Step 7: Update the initial value of the internal parameter to obtain a new initial value of the internal parameter;

Step 8: Repeat steps 5-7 until the current new initial value of the internal parameter exceeds the preset value, then stop, and calculate the reprojection error corresponding to the initial value of each internal parameter, so as to obtain the internal parameter and attitude corresponding to the minimum reprojection error. and 3D position.

Further, before performing the step 1, it also includes step 0: photographing the target area, and obtaining image data including the target area and the photographing position, posture and internal/external parameters of the photographing camera.

Further, in the step 0, the target area is captured and photographed by a parade, and the parade capture and photography include aerial drone tilt photography and road surface capture vehicle photography, or the drone shoots at high altitude and low altitude respectively.

Further, in the step 2, the first image feature point is a corner point of a road sign marking or a general computer vision feature point.

Further, a mapping relationship is established between the pixel coordinates of the ref feature points of the shooting screen and the pixel coordinates of each feature point of the target video frame and the three-dimensional coordinates of the sparse three-dimensional point cloud, specifically,

The pixel coordinates of each feature point of the target video frame correspond to the pixel coordinates of the ref feature points of the shooting picture, and the pixel coordinates of the ref feature points of the shooting picture also correspond to the three-dimensional coordinates of the sparse three-dimensional point cloud.

Further, the target video frame is compared with each shooting picture of the image data, and the first image feature point with the highest matching degree with the second image feature point is found, specifically,

According to the feature points of the first image, a feature point word bag is established for each shooting picture, and through the feature point word bag, the feature points and positions of the target video frame and each shooting picture in the image data are matched, and the feature points corresponding to the second image are found. The first image feature point with the highest matching degree.

Further, the preset value in step 8 is the range of the field of view of the camera to be positioned.

Further, the initial value of the initial value of the internal reference is the lower limit of the range of the field of view of the camera to be positioned, and the initial value of the updated internal reference is increased by b=5° each time.

Further, after the step 8, it also includes,

Step 9: According to the final calculated internal parameters, attitude and three-dimensional position of the camera to be positioned, map the internal parameters, attitude and three-dimensional position to the three-dimensional model in step 1, and process through the spatial analysis of the three-dimensional map and the field of view analysis , to determine the specific height of the camera to be positioned in the three-dimensional electronic map and whether it is indoors or outdoors, and whether a certain target area in the three-dimensional electronic map can be covered by the camera to be positioned.

The technical solution for realizing the second object of the present invention is: a processing terminal, which includes:

memory for storing program instructions;

The processor is used for running the program instructions to execute the steps of the camera precise positioning method applied to the electronic map.

The beneficial effects of the present invention are: the present invention can determine information such as the specific height of the camera, whether it is indoors or outdoors, the specific direction the camera is facing, etc. in the electronic three-dimensional map, and can automatically determine which cameras can be used for a specific target area. Cover the target area, and automatically calculate whether there are other available surveillance camera resources in a specific surveillance video screen, so as to realize the subsequent application of high-low point linkage video jumping.

Description of drawings

1 is a schematic flowchart of a preferred embodiment;

FIG. 2 is a schematic diagram of a processing terminal.

Detailed ways

In order to make the objectives, technical solutions and advantages of the present application clearer, the specific embodiments of the present application will be further described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application. In addition, it should be noted that, for the convenience of description, the drawings only show some but not all of the contents related to the present application. Before discussing the exemplary embodiments in greater detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart depicts various operations (or steps) as a sequential process, many of the operations may be performed in parallel, concurrently, or concurrently. Additionally, the order of operations can be rearranged. The process may be terminated when its operation is complete, but may also have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, subroutines, and the like.

As shown in Figure 1, a method for precise positioning of a camera applied to an electronic map includes the following steps:

Step 1: Take a tour of the target area to collect and shoot, and obtain the image data including the target area and the internal/external parameters of the shooting camera. The target area is usually selected as an urban area of a city or a specific designated area.

In this step, the parade collection and photography include aerial drone tilt photography and road surface collection vehicle photography, such as street view collection vehicles, so that the three-dimensional position information and road surface information of each target object in the target area can be obtained, and the road surface information includes road signs and signs. Wire. Using this combined shooting method, the three-dimensional position information and road surface information of each target object (including the road surface) in the target area can be obtained, thereby laying a foundation for subsequent conversion into a three-dimensional model. Of course, only a drone or a collection vehicle can be used for shooting, and the collection vehicle can also capture the three-dimensional position information and road surface information of the target object. When only the drone is used for shooting, the drone can be shot at high altitude and low altitude, so as to better obtain the three-dimensional position information of the target object and clearly shoot the road surface information. The high-altitude and low-altitude here only refer to the relative height of the drone's shooting position, and do not limit the specific height of the drone's shooting position.

Step 2: Generate a three-dimensional model of the target area according to the obtained image data. In the 3D model, the shooting position, attitude and internal/external parameters of the shooting camera at any shooting point in the target area can be obtained.

The shooting camera also corresponds to the camera during the actual cruise acquisition and shooting. For the drone, it is the camera carried on the drone itself, and for the street view acquisition vehicle, it is the camera carried on the acquisition vehicle. According to the image data, a three-dimensional model of the target area is generated, which can be processed by existing photogrammetric modeling software.

Step 3: Extract the first image feature point of each shooting picture from the image data, establish a feature point word bag for each shooting picture according to the first image feature point, and classify the first image according to the internal parameters and external parameters of the shooting camera. The image feature points in the 3D model generate sparse 3D point cloud by triangulation method, and the 3D model here refers to the 3D electronic map.

In this step, the first image feature point may be a corner point of a road sign and marking line extracted based on image recognition, or a general computer vision feature point of any one of ORB, SIFT, and SURF.

Step 4: Obtain the shooting data and position information of the camera to be positioned. The shooting data is usually the video captured by the camera, and the second image feature point of any video frame is extracted from the shooting data, and extracted from the shooting data of the camera to be positioned. The video frame is recorded as the target video frame, and the position information of the camera to be positioned can be obtained through a one-machine-one-file module, and its position information is rough latitude and longitude information. Usually the camera to be positioned is fixed on a building in a certain target area, and its position is fixed rather than moving. Therefore, the shooting angle of the camera is usually unchanged. Therefore, any one can be found from the shooting data. It is not necessary to extract each video frame, and then extract the second image feature point from the video frame.

The second image feature point and the first image feature point are the same type of image feature point, that is, if the first image feature point is the corner point of the road sign marking, the second image feature point is also the corner point of the road marking marking , the first image feature point is ORB, the second image feature point is also ORB, and the others are the same.

Step 5: Through the feature point word bag, the target video frame is matched with the feature points and positions of each shooting picture in the image data, and the first image feature point with the highest matching degree with the second image feature point is found, so as to obtain the matching degree The shooting picture corresponding to the highest first image feature point and the shooting picture corresponding to the first image feature point with the highest matching degree are recorded as the shooting picture ref. That is, the second image feature point is matched and compared with the first image feature point corresponding to each captured picture in the feature point word bag, and the first image feature point with the highest matching degree is found, and the first image feature point corresponds to The shooting screen is the shooting screen ref that needs to be found in this step.

Through the matching results of this step, the pixel coordinates of each feature point of the target video frame (ie, the two-dimensional image coordinates) correspond to the pixel coordinates of the ref feature points of the shooting screen, and the pixel coordinates of the ref feature points of the shooting screen also correspond to a sparse three-dimensional point cloud. The three-dimensional coordinates are the geographic coordinates including the height.

Step 6: Select an initial value of the internal parameter within the range of the angle of view of the camera to be positioned, and the initial value of the initial value of the internal parameter is preferably the lower limit of the range of the angle of view of the camera to be positioned. The field of view of the camera has been set at the beginning of the factory. At present, the field of view of the camera is usually in the range of 30-150°. Therefore, the initial value of the internal parameter can be selected as 30°. According to the initial value of the internal parameters, the pixel coordinates and three-dimensional coordinates of the feature points matched by the target video frame, the initial external parameters of the target video frame are obtained through the PnP algorithm.

Step 7: According to the initial extrinsic parameters, perform beam adjustment processing with all the captured images of the image data, that is, optimize the intrinsic and extrinsic parameters through the beam adjustment, and calculate the reprojection error.

Step 8: On the basis of the previous initial value of the internal reference, add a preset value b to obtain a new initial value of the internal reference.

In this step, b=5°. If the field of view of the camera is in the range of 30-150°, the initial value of the internal parameter is selected as 30°, which is equivalent to selecting an initial value of the internal parameter every 5°, and then repeating steps 6 and 7 to obtain the corresponding initial value of each internal parameter. The initial external parameters of , and the reprojection error corresponding to each initial external parameter, stop until the initial value of the new internal parameter exceeds 150°, that is, the initial value of the internal parameter is limited to the range of the camera’s field of view, so that in all these reprojections Find the smallest reprojection error among the errors.

Step 9: Repeat steps 6-8 until the current new initial value of the internal parameter exceeds the field of view of the camera to be positioned, then stop, and calculate the reprojection error corresponding to the initial value of each internal parameter, thereby obtaining the minimum reprojection error corresponding to The intrinsic parameters, pose and 3D position of .

Among them, if the camera to be positioned is a still shooting camera, that is, a camera that cannot rotate to shoot, but can only shoot in a specified direction, the position and attitude of the camera can be directly obtained through the minimum reprojection error. If the camera to be positioned is a rotating camera, that is, a camera that can be rotated for shooting in multiple directions, then the posture at PT0 is calculated according to the PTZ value corresponding to the video frame extracted in step 4, and the position remains unchanged. change, so as to obtain the position and attitude information of the camera to be positioned.

Step 10: According to the final calculated internal parameters, attitude and three-dimensional position of the camera to be positioned, map the internal parameters, attitude and three-dimensional position to the three-dimensional model in step 2, and process through the spatial analysis and visual field analysis of the three-dimensional map. , so that the specific height of the camera to be positioned in the three-dimensional electronic map and whether it is indoors or outdoors can be effectively determined, and whether a certain target area in the three-dimensional electronic map can be covered by the camera.

The invention can realize the automatic positioning of the monitoring camera on the high-precision three-dimensional visual map (that is, the electronic three-dimensional map), can maximize the value of the oblique photography of the drone and the data collected by the street view of the collecting vehicle, and solve the problem of the conventional three-dimensional map model in the defects in actual use. Based on the present invention, surveillance video users can determine the exact positions, attitudes and internal parameters (such as FOV, distortion parameters) of all surveillance video cameras within the coverage area of the visual map, and then based on the application of the accurate visual field of the video, such as global perception, panorama Applications such as backtracking, alarm correlation, pointing where to play, humanoid track, high-low linkage, and gun-ball linkage can be implemented on a large scale.

As shown in FIG. 2, the present invention further provides a processing terminal 100, which includes:

a memory 101 for storing program instructions;

The processor 102 is configured to run the program instructions to execute the steps of the camera precise positioning method applied to the electronic map.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

Although preferred embodiments of the present invention have been described, additional changes and modifications to these embodiments may occur to those skilled in the art once the basic inventive concepts are known. Therefore, the appended claims are intended to be construed to include the preferred embodiment and all changes and modifications that fall within the scope of the present invention.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit and scope of the invention. Thus, provided that these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

A method for precise positioning of a camera applied to an electronic map, characterized in that it comprises the following steps:

Step 1: Generate a 3D model of the target area according to the image data of the target area;

Step 2: extracting the first image feature point of each shooting picture from the image data, and generating a sparse three-dimensional point cloud from the first image feature point in the three-dimensional model;

Step 3: obtaining the shooting data of the camera to be positioned, and extracting any video frame from the shooting data, denoting it as the target video frame, and extracting the second image feature point of the target video frame;

Step 4: Compare the target video frame with each shooting picture of the image data, and use the shooting picture corresponding to the first image feature point with the highest matching degree as the shooting picture ref,

A mapping relationship is established between the pixel coordinates of the ref feature points of the captured image and the pixel coordinates of each feature point of the target video frame and the 3D coordinates of the sparse 3D point cloud;

Step 5: Select an initial value of an internal parameter of the camera to be positioned, and determine the initial external parameter of the target video frame according to the initial value of the internal parameter;

Step 6: According to the initial external parameters, perform beam method adjustment processing with all the shooting pictures of the image data, and calculate the reprojection error;

Step 7: Update the initial value of the internal parameter to obtain a new initial value of the internal parameter;

Step 8: Repeat steps 5-7 until the current new initial value of the internal parameter exceeds the preset value, then stop, and calculate the reprojection error corresponding to the initial value of each internal parameter, so as to obtain the internal parameter and attitude corresponding to the minimum reprojection error. and 3D position.
The method for accurate positioning of a camera applied to an electronic map according to claim 1, wherein before performing the step 1, the method further comprises step 0: photographing the target area, obtaining image data including the target area and the photographing camera The shooting position, attitude and internal/external parameters.
The method for precise positioning of a camera applied to an electronic map according to claim 2, wherein, in the step 0, the target area is captured and photographed by a parade, and the parade capture and photography include aerial drone tilt photography and road surface photography. Capture vehicle shooting, or drone shooting at high and low altitudes, respectively.
The method for precise positioning of a camera applied to an electronic map according to claim 1, wherein, in the step 2, the first image feature point is a corner point of a road sign marking or a general computer vision feature point.
The method for accurate positioning of a camera applied to an electronic map according to claim 1, wherein the pixel coordinates of the ref feature points of the shooting screen are respectively the pixel coordinates of each feature point of the target video frame and the sparse three-dimensional point cloud. The three-dimensional coordinates of , establish a mapping relationship, specifically,

The pixel coordinates of each feature point of the target video frame correspond to the pixel coordinates of the ref feature points of the shooting picture, and the pixel coordinates of the ref feature points of the shooting picture also correspond to the three-dimensional coordinates of the sparse three-dimensional point cloud.
The method for precise positioning of a camera applied to an electronic map according to claim 1, wherein the target video frame is compared with each shot of the image data to find the one with the highest matching degree with the second image feature point. The first image feature point, specifically,

According to the feature points of the first image, a feature point word bag is established for each shooting picture, and through the feature point word bag, the feature points and positions of the target video frame and each shooting picture in the image data are matched, and the feature points corresponding to the second image are found. The first image feature point with the highest matching degree.
The method for precise positioning of a camera applied to an electronic map according to claim 1, wherein the preset value in the step 8 is the range of the field of view of the camera to be positioned.
The method for precise positioning of a camera applied to an electronic map according to claim 1, wherein the initial value of the initial value of the internal reference is the lower limit of the field of view range of the camera to be positioned, and the initial value of the updated internal reference is every Second increase b = 5°.
The method for precise positioning of a camera applied in an electronic map according to claim 1, wherein after the step 8, the method further comprises:

Step 9: According to the final calculated internal parameters, attitude and three-dimensional position of the camera to be positioned, map the internal parameters, attitude and three-dimensional position to the three-dimensional model in step 1, and process through the spatial analysis of the three-dimensional map and the field of view analysis , to determine the specific height of the camera to be positioned in the three-dimensional electronic map and whether it is indoors or outdoors, and whether a certain target area in the three-dimensional electronic map can be covered by the camera to be positioned.
A processing terminal, characterized in that it includes:

memory for storing program instructions;

The processor is configured to run the program instructions to execute the steps of the method for precise positioning of a camera applied in an electronic map according to any one of claims 1, 4-9.