CN111241988B

CN111241988B - Method for detecting and identifying moving target in large scene by combining positioning information

Info

Publication number: CN111241988B
Application number: CN202010017124.6A
Authority: CN
Inventors: 郑文涛; 林姝含; 李申达
Original assignee: Beijing Tianrui Kongjian Technology Co ltd
Current assignee: Beijing Tianrui Kongjian Technology Co ltd
Priority date: 2020-01-08
Filing date: 2020-01-08
Publication date: 2021-07-13
Anticipated expiration: 2040-01-08
Also published as: CN111241988A

Abstract

The invention relates to a method for detecting a moving target in a large scene by combining positioning information, which comprises the steps of firstly calibrating a point in a scene space and a pixel coordinate of the point in a video picture, establishing a target database, predicting the area of each moving target with latitude and longitude positioning information after the detection of the moving target is started, mapping the predicted area into the picture of the large scene video, zooming the image of the predicted area into a uniform scale, sending the uniform scale to a target detection processing module, matching all detected targets with the targets of corresponding types in the target database when a plurality of targets are detected, obtaining the optimal matching result, and restoring the coordinate to the large scene image. The invention greatly improves the processing speed, is beneficial to improving the detection accuracy and can be used for real-time video detection and tracking of occasions such as airport aircrafts and the like.

Description

Method for detecting and identifying moving target in large scene by combining positioning information

Technical Field

The invention relates to a method for detecting and identifying a moving target in a large scene by combining positioning information, belonging to the technical field of computer vision.

Background

In recent years, Augmented Reality (AR) technology has been increasingly used in large-scale scene management and control, such as large-scale square security, airport scene activity guidance and control, port production work area operation state monitoring, industrial park management and control, and the like. For example, in the management and control of aircrafts, vehicles and personnel in an airport flight area, in order to facilitate the understanding and the commanding of managers, it is often necessary to perform enhanced display on these moving objects, that is, related information of the objects, such as the flight number of the aircraft, the type of the vehicle, and personal information of the operator, is displayed at corresponding positions of the moving objects in a video picture. In order to ensure the accuracy of information enhancement display, the moving target needs to be positioned with high precision.

At present, a common method for realizing information enhancement display is to obtain a space coordinate of a moving target through a satellite positioning system such as a GPS and a beidou, and map the space coordinate to a pixel point coordinate on a video picture. However, data acquired by systems such as the GPS and the beidou system generally have large errors, and in addition, the data acquisition frequency is low (for example, the data is transmitted once per second), and the data acquisition frequency is asynchronous with the video, so that a large difference exists between a pixel point coordinate obtained by coordinate mapping and an actual coordinate of a moving target, the use experience is influenced, and more error display can occur.

In order to solve the above problem, a possible technical architecture is to adopt a method of detecting a moving object in a video image and then fusing the moving object with positioning data such as a GPS. In large scene video, in order to ensure that the target detection reaches a practical accuracy, a certain pixel resolution of the moving target is required, that is, the whole large scene picture is required to have ultrahigh resolution, for example, in order to detect a distant vehicle in the airport flight area picture, the pixel number of the whole flight area picture reaches one or more 4K ultrahigh resolutions (the resolution is 3840 × 2160). The best target detection performance is the detection method based on deep learning at present. The target detection and the deep Convolutional Network are successfully connected by a Convolutional Neural Network (RCNN) Based on the Region, and the accuracy of the target detection is improved to a new level. The RCNN consists of 3 independent steps, generating candidate windows, feature extraction, SVM classification, and window regression. Because the RCNN is divided into 3 independent processes, the detection efficiency is low, and the RCNN cannot be used in large scene videos.

In order to improve the real-time performance of target detection, a possible technical framework is a single-stage target detection algorithm, and the method is characterized in that end-to-end (endend) is adopted, the detection result is obtained in one step from the input image, and all the intermediate processes are obtained through neural network learning. Typical methods are YOLO (you Only Look one) and SSD (Single Shot Multibox Detector). The method realizes the direct connection from the input end to the output end by using a lightweight network, thereby greatly improving the speed of image detection. When the size of an input image is 416 × 416, the processing speed can reach 50fps (frame Per second), real-time detection can be realized, but for a large-scene video, the resolution can usually reach one or more than 4K, the number of pixel points is tens of times or even hundreds of times of the size of the image, and the method is far from achieving real time.

Another difficulty in moving object detection in large scenes is that the moving objects vary greatly in size on the screen depending on their location from the camera. E.g. at near may occupy half a picture, while at far may have only a few pixels. This can lead to increased difficulty in model training and reduced accuracy of final detection in deep learning algorithms.

Based on the reasons, the applicant also provides a new technical architecture, and the moving target detection in the large scene is carried out by combining longitude and latitude positioning information. Firstly, establishing a corresponding relation between a coordinate point in a large scene image and a corresponding longitude and latitude coordinate. During detection, according to the type of the target and longitude and latitude data (obtained from a GPS, a Beidou system and the like) with errors, the area where the moving target is located is predicted, and the area is mapped into a picture of a large-scene video to obtain an image block containing the moving target in the picture. And zooming the image block into a preset size, sending the image block into a target detection processing module for target detection to obtain the coordinate position of the target in the image block, restoring the coordinate position of the target into the coordinate position on the large scene image, and finishing the detection of the target. By adopting the technical route, the method can be carried out for each moving target with longitude and latitude positioning information, and the accurate positions of all the moving targets in the large scene image can be obtained, so that the processing speed is greatly increased, and the detection accuracy is improved.

However, to implement the above method, obstacles that may be encountered in practice need to be considered. When the area where the moving target is located is predicted according to the type of the target and satellite positioning information (longitude and latitude data with errors) of the target, two or more similar targets (such as vehicles, aircrafts and the like) may be contained in the obtained predicted area, particularly when the error and the time delay of the satellite positioning information are large, a large predicted area needs to be set to ensure that the target is contained, the probability of the two or more similar targets appearing in the predicted area is greatly improved, and the different targets need to be effectively identified and distinguished.

Disclosure of Invention

The present invention is directed to overcome the above-mentioned defects in the prior art, and provides a method for detecting and identifying a moving object in a large scene by combining positioning information, so as to effectively identify the object by combining the positioning information, reduce the data processing amount, increase the processing speed, and facilitate the identification of the object to be detected.

The technical scheme of the invention is as follows: a method for detecting a moving object in a large scene by combining positioning information comprises the following steps:

1) image scaling

Selecting a plurality of mark points in the image, and establishing the corresponding relation between the image coordinates and the accurate longitude and latitude coordinates of the mark points in the image, wherein the image is a large scene image of a corresponding scene.

The image coordinates of the mark points are from the image, and the longitude and latitude coordinates of the mark points are from the electronic map of the corresponding scene with the longitude and latitude information.

The large scene image can be generated by shooting of an integrated panoramic camera or spliced by local scene images with overlapped areas generated by shooting of a group of cameras.

2) Building a target database

According to the actually related target, recording the real type information and the target image of the target, and establishing a target database corresponding to the real type information and the target image of the target.

3) Computing a prediction region

Starting corresponding target detection according to the received target longitude and latitude positioning information, determining a prediction area of the target under the longitude and latitude coordinates according to an error range of the longitude and latitude positioning information and a size range of the target by taking the longitude and latitude coordinates in the longitude and latitude positioning information as a reference, mapping the prediction area of the target under the longitude and latitude coordinates into an image, and determining the prediction area under the image coordinates;

the prediction region in the image coordinates may be a smallest rectangular region in the image that contains a mapping region that maps the prediction region in the longitude and latitude coordinates to a region in the image. In general, four corners of the prediction area under the longitude and latitude coordinates can be mapped into the image, and an area formed by connecting lines of four points of the image mapped to the four corners is used as a mapping area, so that the calculation amount is reduced;

4) scale transformation

Converting the prediction area under the image coordinate into a detection image block with a uniform size according to the target detection requirement;

5) target detection and matching

Carrying out target detection on the image blocks for detection, and if a target is detected, taking the target as a target to be detected (a target corresponding to corresponding longitude and latitude positioning information), and obtaining the position and the size of the target in the image blocks for detection; if a plurality of targets are detected, target images of corresponding target types in a target database are called according to target type information contained in the longitude and latitude positioning information, the detected targets are respectively compared with the corresponding target images, the target closest to the comparison result is taken as the target to be detected (the target corresponding to the corresponding longitude and latitude positioning information), and the position and the size of the target are obtained in an image block for detection;

6) target reduction

And restoring the size and the position of the target in the image block for detection to the position in the large scene image according to the coordinate corresponding relation between the image block for detection and the image, thereby realizing the detection and the positioning in the moving target image.

The large scene image can be acquired by large scene video acquisition equipment which is an integrated panoramic camera or a group of cameras capable of image splicing, and the large scene image is shot by the integrated panoramic camera or is formed by splicing local scene images with overlapped areas shot and generated by the cameras.

The specific method for calibrating the longitude and latitude of the large scene image is as follows: selecting a plurality of mark points which are easy to distinguish from the large scene image, obtaining the image coordinates of each mark point from the large scene image, finding the mark points from the corresponding areas on the map displaying the longitude and latitude data, and obtaining the longitude and latitude coordinates of the mark points, thereby realizing the correspondence between the image coordinates and the longitude and latitude coordinates of each mark point.

The specific way of mapping the prediction area under the longitude and latitude coordinates into the large scene image is as follows: the longitude and latitude coordinates of four vertexes of the rectangular prediction area under the longitude and latitude coordinates are converted into image coordinates, a minimum rectangle containing the four image coordinate points is selected in the large scene image, and an image area defined by the minimum rectangle is used as an image detection area.

And calculating the longitude and latitude distance or the longitude and latitude distance square between any point needing the conversion from the longitude and latitude coordinates to the image coordinates and each marking point based on the point and the longitude and latitude coordinates of each marking point, and accordingly determining 8 marking points with the minimum distance to the point.

The invention has the beneficial effects that: the method comprises the steps that the latitude and longitude positioning information with errors (from a satellite positioning system or other positioning systems except for video analysis) is utilized to predict the area where a moving target is located, the prediction area (image block) of the moving target is determined, the image block of the prediction area is scaled to a specified size (fixed size with pixels as units), and then the scaled image block is subjected to target detection, so that the detection range is greatly reduced, the image size of the prediction area is unified, the target detection is facilitated, the data processing amount of the target detection is greatly reduced, the detection precision and accuracy are improved, and the real-time detection of the moving target (such as an airplane and an automobile) under a large scene (such as an airport scene) is possible; because a plurality of targets in the same prediction area are identified through prior or known information such as target attributes, the concerned targets are distinguished from other targets, and the interference of other targets in the same prediction area is eliminated.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

1. General procedure

The whole process of the invention is shown in figure 1.

Firstly, after a large-scene video acquisition device (an integrated panoramic camera or a group of spliced cameras) is fixedly installed, calibrating points in a scene space and pixel coordinates of the points in a video picture, namely establishing a corresponding relation between coordinates (x, y) in a large-scene image and longitude and latitude positioning information (l, t) of a positioning system, and expressing the corresponding relation as a plurality of quadruples (x, y, l, t).

After the moving target detection is started, for each moving target with longitude and latitude positioning information (including errors), the area where the target is located is predicted according to the longitude and latitude data with the errors and the type of the target, and the area is mapped into a picture of a large-scene video, so that a predicted area in the picture is obtained. The prediction area in the screen is scaled to a predetermined size by scaling. And sending the image blocks with uniform sizes into a target detection processing module. Due to the longitude and latitude errors, the target may not be detected, multiple targets may be detected, and the best target needs to be selected according to respective processing of the detected targets to obtain the coordinate position of the image block. And restoring the coordinate position to the coordinate position on the large scene image to finish the detection of the target.

The target detection processing is carried out aiming at each moving target with longitude and latitude positioning information and target type information, and the accurate positions of all the moving targets in the large scene image can be obtained.

2. Image scaling

After a large scene image (which may be referred to as an image for short) is acquired, image coordinates (which may generally be an image pixel coordinate system) and longitude and latitude are first associated or calibrated. The method comprises the steps of searching for obvious mark points (such as corner points or installation points of road marks, corner points of traffic lane lines, corner points of ground ornaments and the like) on an image, recording coordinates (x, y) of the mark points, recording longitude and latitude coordinates (l, t) of the mark points on a map capable of displaying longitude and latitude data, recording the longitude and latitude coordinates (x, y, l, t), and realizing mapping of a target in the image or mutual conversion between the longitude and latitude coordinates and the image coordinates based on the corresponding relation between the image coordinates and the longitude and latitude coordinates of the mark points.

The marker points are distributed as evenly as possible across the scene. The more the number of the marked points is, the more accurate the prediction of the subsequent target position area is, and at least about 100 points need to be marked on one large scene image.

3. Building a target database

According to the practical application scene, the objects which can appear in a specific area are limited, and the images of the objects can be obtained in advance, and various object attributes or object images (or image characteristics) are stored in an object database.

4. Computing a prediction region

According to the requirement of target detection, the area where the corresponding target is located can be predicted by considering the following factors for each piece of longitude and latitude positioning information (l, t).

(1) The latitude and longitude positioning information error. The error of the data acquired by the GPS, the Beidou and other systems, or the error caused by low acquisition frequency and transmission is recorded as (delta e)_l，Δe_t) Wherein Δ e_lPositive and negative error ranges in the longitudinal direction, Δ e_tThe latitude direction positive and negative error ranges. For example, the positioning error of the current GPS system is about 10 meters, which is equivalent to a longitude and latitude error of 3 × 10^-7°。

(2) And (5) measuring the size of the target to be detected. In the application scenario assumed by the present invention, the type of the target to be detected (e.g. airplane, vehicle, person, etc.) is known, and the size of the target can be reasonably limited. The dimensions are also expressed in terms of surface latitude and longitude, and are denoted as (Δ s)_l，Δs_t) Wherein Δ s_lAs positive and negative size ranges in the longitudinal direction, Δ s_tThe latitude direction is the positive and negative size range. For example, the sizes of an airplane, a vehicle and a pedestrian are approximately 100 meters, 10 meters and 1 meter respectively. The corresponding latitude and longitude ranges are respectively 3 multiplied by 10^-6°，3×10^-7°，3×10^-8°。

From the above analysis, the target region can be defined as follows: .

(l±(Δe_l+Δs_l)，t±(Δe_t+Δs_t) Formula (1)

This is a rectangular area on the latitude and longitude coordinate plane, and the image pixel coordinates corresponding to its 4 vertices are now calculated as follows:

suppose n tuples (x) that have been calibrated_i，y_i，l_i，t_i) (i ═ 1,2 … n), and the latitude and longitude coordinates of the point to be converted are (l, t).

Firstly, calculating the longitude and latitude of the marked point and the distance di of the target point

dist_i＝(l_i-l)²+(t_i-t)²Formula (2)

And finding out 8 calibration points closest to the selected point, and solving the nonlinear model parameters based on the least square method through the coordinates of the 8 calibration points to further obtain the image coordinates corresponding to the points.

There are many types of non-linear models, where a quadratic polynomial model is used,

x is a + b + c + t + d type (3)

Wherein x represents the abscissa value of the image corresponding to the selected point (l, t), and a, b, c, d are corresponding model parameters.

This model is solved below. Without loss of generality, assume that the quadruple of 8 index points closest to (l, t) is (x)_i,y_i,l_i,t_i) (i-1, 2 … 8) formula (4)

Assuming that coefficients a, b, c, d are present, the following equation is satisfied:

the above formula can be written as:

au ═ v formula (6)

Wherein

Transformation of equation (6) yields:

u＝A^-1v type (10)

Since the matrix A is not a square matrix, here A^-1A pseudo-inverse matrix (also called a generalized inverse matrix) which is a matrix a, namely:

A^-1＝(A^TA)^-1A^Tformula (11)

Thus, the conversion coefficient from the longitude and latitude coordinates to the image coordinates is obtained. So that for a selected point (l, t), the image abscissa value x is

x ═ l t lt 1] u formula (12)

By substituting the formulae (10) and (11) for the formula (12)

The same can obtain the vertical coordinate y of the image

After all four vertexes of the latitude and longitude range are converted into pixel coordinates, the four pixel coordinates cannot necessarily form a rectangle, and therefore a minimum rectangle containing all the four points is constructed in the image. This rectangle is the predicted area of the object to be detected. The upper left corner position of this rectangle is noted as (x)_p,y_p) Of size w_p×h_p。

5. Scale transformation

Because the current mainstream target detection algorithm requires that the input image has a fixed size, the prediction region of the target to be detected is subjected to scale transformation, and the fixed size W × H, such as 512 × 512 pixels, is scaled to form image blocks for detection with uniform size.

6. Target detection

Using existing target detection algorithms (e.g. reference [ 1]]、[2]) And carrying out target detection on the image block for detection to obtain a target. The target is represented by its circumscribed rectangle (circumscribed box) whose coordinate at the top left corner in the image block for detection is (x)_r,y_r) Width and height of (w)_r,h_r)。

Without loss of generality, the results of target detection are as follows:

(1) no target was detected: if the return result is null, the target detection is finished, and the next moving target (longitude and latitude positioning information) can be processed or the detection is stopped according to actual needs;

(2) an object is detected: considering the target as the target to be detected, returning the coordinate (x) of the top left corner of the circumscribed frame of the target in the image block for detection_r,y_r) And width and height (w)_r,h_r)；

(3) A plurality of targets are detected: at the moment, which target in the area is the target to be detected cannot be judged only by target detection, and the next step of image matching is carried out.

7. Image matching

Matching all detected targets with target information in the target database.

For example, when the target information stored in the target database is a target image, the image P of the corresponding type may be selected from the target database according to the actual type information of the target, and the similarity between each target and the image P is calculated:

S_i＝F(I_ip) formula (15)

Where F denotes an image similarity measure algorithm, any suitable prior art technique may be employed, such as the method set forth in reference [3 ].

Will S_iThe maximum target is used as the target to be detected at this time, and the maximum target is returnedCoordinates (x) of the top left corner of the bounding box of the target in the image block for detection_r,y_r) And width and height (w)_r,h_r)；

8. Coordinate restoration

And mapping the target detected in the image block for detection in the step 6 or 7 to the original image.

The upper left-hand coordinate (x) of the bounding box in the image block for detection, which can achieve this object according to the following formula_r,y_r) And width and height (w)_r,h_r) Conversion to the upper left coordinate (x) in the image_q,y_q) And width and height (w)_q,h_q)：

After the calculation, the image coordinates and the tracking frame of the corresponding moving target are obtained, so that the position information or other related information is enhanced on the large scene image, and the longitude and latitude coordinates or other world coordinate system coordinates of the moving target can also be obtained according to the prior art.

The invention greatly improves the processing speed because the target detection is only carried out in a limited area around the possible position. Meanwhile, the large target and the small target are scaled into a uniform image size, which is beneficial to improving the accuracy of detection. The method of the present invention can be effectively applied to aircraft (airplane) video detection and tracking at airports, according to the applicant's field use.

The technical means disclosed by the invention can be combined arbitrarily to form a plurality of different technical schemes except for special description and the further limitation that one technical means is another technical means.

Reference to the literature

[1]Redmon J,Divvala S,Girshick R,et al.You Only Look Once:Unified,Real-Time Object Detection[J].2015.

[2]Liu W,Anguelov D,Erhan D,et al.SSD:Single Shot MultiBox Detector[J].2015.

[3] Xiexixiao, Wangzheng, Xiaoyan, Huangyuzhu, Mawenqu image similarity calculation method based on geometric invariant moment [ J ] electronic technology and software engineering, 2017(16):84-85.

Claims

1. A method for detecting a moving object in a large scene by combining positioning information comprises the following steps:

1) image scaling

Selecting a plurality of mark points in an image, establishing a corresponding relation between image coordinates and accurate longitude and latitude coordinates of the mark points in the image, wherein the image is a large scene image of a corresponding scene and is shot by an integrated panoramic camera or formed by splicing local scene images with overlapped areas shot by various cameras;

2) building a target database

Recording the real type information and the target image or the target image characteristic of the target according to the actually related target, and establishing a target database corresponding to the real type information and the target image or the target image characteristic of the target;

3) computing a prediction region

Starting corresponding target detection according to the received longitude and latitude positioning information of the target, predicting the area where the target is located according to longitude and latitude data with errors and the type of the target per se aiming at each moving target with the longitude and latitude positioning information, determining a prediction area of the target under the longitude and latitude coordinates according to the error range of the longitude and latitude positioning information and the size range of the target by taking the longitude and latitude coordinates in the longitude and latitude positioning information as a reference, mapping the prediction area of the target under the longitude and latitude coordinates into an image, and determining the prediction area under the image coordinates, wherein the longitude and latitude positioning information is derived from a positioning system out of video analysis;

4) scale transformation

5) target detection and matching

Carrying out target detection on the image blocks for detection, and if one target is detected, taking the target as the target to be detected; if a plurality of targets are detected, target images of corresponding target types in a target database are called according to target type information contained in the longitude and latitude positioning information, the detected targets are respectively compared with the corresponding target images, and the target with the closest comparison result is taken as the target to be detected;

6) target reduction

And restoring the size and the position of the target in the image block for detection to the large scene image according to the coordinate corresponding relation between the image block for detection and the image, thereby realizing the detection and the positioning in the moving target image.

2. The method of claim 1, wherein the image coordinates of the landmark points are derived from an image, and the longitude and latitude coordinates of the landmark points are derived from an electronic map of the corresponding scene with longitude and latitude information.

3. The method of claim 1 or 2, wherein the longitude and latitude calibration of the image is performed by: selecting a plurality of mark points which are easy to distinguish from the image, obtaining the image coordinates of each mark point from the image, finding the mark points from the corresponding areas on the corresponding electronic map displaying the longitude and latitude data, and obtaining the longitude and latitude coordinates of the mark points, thereby realizing the correspondence between the image coordinates and the longitude and latitude coordinates of each mark point.

4. The method of claim 3, wherein the specific way to map the prediction area in latitude and longitude coordinates into the image is: the longitude and latitude coordinates of four vertexes of the rectangular prediction area under the longitude and latitude coordinates are converted into image coordinates, a minimum rectangle containing the four image coordinate points is selected from the image, and the image area defined by the minimum rectangle is used as an image detection area.

5. The method as claimed in claim 4, wherein the longitude and latitude coordinates of any one point are converted into image coordinates by selecting 8 index points which are nearest to the point and in which any three points are not on the same straight line, solving the nonlinear model parameters based on the least square method by the coordinates of the 8 index points, and then obtaining the image coordinates of the point by the model.