CN113012215A

CN113012215A - Method, system and equipment for space positioning

Info

Publication number: CN113012215A
Application number: CN201911333346.2A
Authority: CN
Inventors: 王杰; 杨少鹏; 程林松
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-12-20
Filing date: 2019-12-20
Publication date: 2021-06-22

Abstract

The application provides a method, a system and equipment for space positioning, and relates to the field of Artificial Intelligence (AI). The method comprises the following steps: the method comprises the steps that a space positioning system obtains an image, the image is obtained by shooting through a camera arranged at a fixed position of a geographical area, and a target to be detected is recorded in the image; inputting the image into a target positioning model to obtain a detection result, wherein the target positioning model carries out positioning point detection on the target to be detected in the image, the detection result comprises pixel coordinates of a positioning point of the target to be detected in the image, and the positioning point represents a point corresponding to the geographical position of the target to be detected in a geographical area in the image; and determining the geographic coordinates of the target to be detected according to the pixel coordinates of the positioning points and the calibration relation between the image shot by the camera and the geographic area. The method can improve the positioning precision.

Description

Method, system and equipment for space positioning

Technical Field

The invention relates to the field of Artificial Intelligence (AI), in particular to a method, a system and equipment for spatial positioning.

Background

With the continuous progress of computer vision technology, the application of solving the practical problems of the physical world is more and more extensive by processing and analyzing the images shot by the camera, which greatly promotes the development of industries such as intelligent traffic, intelligent security and smart city, and the like.

Spatial localization, which is one of core technologies of computer vision, is a technology for converting a position of a target in an image into a position of the target in the physical world according to a mapping relationship (also referred to as a calibration relationship) between the image captured by a camera and a captured geographic region of the physical world.

The current spatial positioning method usually performs spatial positioning on a target according to the position of a target detection frame obtained by target detection in an image by means of a target detection technology, and the method has low precision and large error and is easy to cause various misjudgments. For example: when the vehicle is close to the solid line to run, the vehicle is judged to run as a pressing line by the system; misjudgment of collision caused by small distance between motor vehicles, non-motor vehicles or pedestrians; the track of the vehicle is jittered in the running process, and the follow-up tracking, track analysis and the like cannot be realized.

Therefore, how to accurately position the target object in space is a problem to be solved urgently.

Disclosure of Invention

The embodiment of the invention discloses a method, a system and equipment for space positioning, which can reduce the cost and improve the positioning precision.

In a first aspect, the present application provides a method of spatial localization, the method comprising: the method comprises the steps that a space positioning system obtains an image, the image is obtained by shooting through a camera arranged at a fixed position of a geographical area, and at least one target to be detected is recorded in the image; inputting the image into a target positioning model by a spatial positioning system to obtain a detection result, wherein the target positioning model is used for carrying out positioning point detection on a target to be detected in the image, the detection result comprises pixel coordinates of a positioning point of the target to be detected in the image, and the positioning point represents a point corresponding to the geographical position of the target to be detected in the geographical area in the image; and the space positioning system determines the geographic coordinates of the target to be detected according to the pixel coordinates of the positioning points and the calibration relation between the image shot by the camera and the geographic area.

In the scheme provided by the application, the space positioning system performs positioning point detection on the target to be detected in the image shot by the camera by using the target positioning model, so that the pixel coordinate of the positioning point can be obtained, and further, the geographic coordinate of the target to be detected can be obtained according to the calibration relation between the image shot by the camera and the geographic area. Therefore, the target to be detected can be more accurately positioned according to the image shot by the camera, the applicable scene is expanded, and the positioning precision is improved.

With reference to the first aspect, in a possible implementation manner of the first aspect, the spatial localization system determines an initial target localization model, and the initial target localization model adopts a deep learning model; the space positioning system acquires a plurality of sample images carrying annotation information, the plurality of sample images are obtained by shooting the geographic area by the camera, and the annotation information comprises pixel coordinates of positioning points of targets recorded in the sample images; and the space positioning system utilizes the plurality of sample images carrying the labeling information to train the initial target positioning model.

In the scheme provided by the application, the spatial positioning system obtains the pixel coordinates of the positioning points of the target recorded in the sample image in advance, and then trains the initial target positioning model by utilizing a plurality of sample images carrying the positioning point pixel coordinates, so that the trained target positioning model has the capability of predicting the positioning points of the target recorded in the image, thus the positioning point detection can be performed on the image to be detected input into the target positioning model, and the pixel coordinates of the positioning points of the target recorded in the image to be detected can be output.

With reference to the first aspect, in a possible implementation manner of the first aspect, the spatial positioning system obtains pixel coordinates of a positioning point of the target recorded in the sample image.

In the scheme provided by the application, the accuracy of the pixel coordinates of the positioning point of the target in the sample image directly affects the detection performance of the target positioning model, so that, in order to improve the detection accuracy of the target positioning model, the spatial positioning system needs to accurately acquire the pixel coordinates of the positioning point of the target in the sample image, which is recorded in the sample image, before training the initial model.

With reference to the first aspect, in a possible implementation manner of the first aspect, the spatial positioning system obtains an overhead view image of the geographic area, where the overhead view image and the sample image are data acquired at the same time; then acquiring pixel coordinates of a target recorded in the sample image in the overhead view image; the spatial positioning system obtains the geographic position of the target recorded in the sample image in the geographic area according to the calibration relation between the overhead view image and the geographic area; and finally, the spatial positioning system obtains the pixel coordinates of the positioning point of the target in the sample image, which is recorded in the sample image, according to the calibration relation between the image shot by the camera and the geographic area.

In the scheme provided by the application, the spatial calibration system can obtain the geographic position of the target recorded in the sample image in the geographic area by using the top view image and the calibration relation between the top view image and the geographic area, and the top view image and the sample image are acquired at the same time to obtain the geographic position of the target in the geographic area.

With reference to the first aspect, in a possible implementation manner of the first aspect, the target location model is further configured to perform position detection and category detection on a target to be detected in the image, the detection result further includes detection frame information and category information of the target to be detected in the image, and the detection frame information corresponds to the location points one to one.

In the scheme provided by the application, when the target positioning model is used for detecting the positioning points of the input image, for each target to be detected in the image, not only the pixel coordinates of the positioning point of the target to be detected need to be output, but also the target to be detected needs to be marked by using the detection frames, and the type (such as motor vehicles, pedestrians) of the target to be detected is noted, and each detection frame corresponds to a unique positioning point, so that confusion can be avoided when the number of the targets to be detected is too large, and the distinction is convenient.

With reference to the first aspect, in a possible implementation manner of the first aspect, the geographic position of the target to be detected in the geographic area includes a geographic coordinate of a central point of a vertical projection of the target to be detected in the geographic area.

In the scheme provided by the application, since the target to be detected is not a point but occupies a certain space, in order to accurately represent the geographical position of the target to be detected in the geographical area, the geographical position of the target to be detected can be represented by using the geographical coordinates of the central point of the vertical projection of the target to be detected. Optionally, for a target to be detected, such as a vehicle, which is approximately symmetrical, the geographic coordinate of the vertical projection point of the centroid in the geographic area may also be used to represent the geographic position of the target to be detected, and the vertical projection point of the centroid and the vertical projection center point of the target to be detected are the same point.

In a second aspect, there is provided a spatial location system comprising: the device comprises an acquisition unit, a detection unit and a processing unit, wherein the acquisition unit is used for acquiring an image, the image is obtained by shooting by a camera arranged at a fixed position of a geographical area, and at least one target to be detected is recorded in the image; the positioning point detection unit is used for inputting the image to a target positioning model to obtain a detection result, wherein the target positioning model is used for performing positioning point detection on a target to be detected in the image, the detection result comprises pixel coordinates of a positioning point of the target to be detected in the image, and the positioning point represents a point corresponding to the geographic position of the target to be detected in the geographic area in the image; and the processing unit is used for determining the geographic coordinates of the target to be detected according to the pixel coordinates of the positioning point and the calibration relation between the image shot by the camera and the geographic area.

With reference to the second aspect, in a possible implementation manner of the second aspect, the obtaining unit is further configured to obtain a plurality of sample images carrying annotation information, where the plurality of sample images are obtained by shooting the geographic area with the camera, and the annotation information includes pixel coordinates of a positioning point of a target recorded in the sample image; the positioning point detection unit is also used for determining an initial target positioning model, and the initial target positioning model adopts a deep learning model; and training the initial target positioning model by utilizing the plurality of sample images carrying the marking information.

With reference to the second aspect, in a possible implementation manner of the second aspect, the obtaining unit is further configured to: and acquiring the pixel coordinates of the positioning point of the target recorded in the sample image.

With reference to the second aspect, in a possible implementation manner of the second aspect, the obtaining unit is specifically configured to: acquiring an overhead view image of the geographic area, wherein the overhead view image and the sample image are data acquired at the same time; acquiring pixel coordinates of a target recorded in the sample image in the overhead view image; obtaining the geographic position of the target recorded in the sample image in the geographic area according to the calibration relation between the overhead view image and the geographic area; and obtaining the pixel coordinates of the positioning point of the target in the sample image recorded in the sample image according to the calibration relation between the image shot by the camera and the geographic area.

With reference to the second aspect, in a possible implementation manner of the second aspect, the target location model is further configured to perform position detection and category detection on the target to be detected in the image, the detection result further includes detection frame information and category information of the target to be detected in the image, and the detection frame information corresponds to the location points one to one.

With reference to the second aspect, in a possible implementation manner of the second aspect, the geographic position of the target to be detected in the geographic area includes geographic coordinates of a central point of a vertical projection of the target to be detected in the geographic area.

In a third aspect, a computing device is provided, the computing device comprising a processor and a memory, the memory being configured to store program code, and the processor being configured to execute the program code in the memory to perform the first aspect and the method in combination with any one of the implementations of the first aspect.

In a fourth aspect, a computer-readable storage medium is provided, which stores a computer program, and when the computer program is executed by a processor, the processor executes the first aspect and the spatial calibration method provided in connection with any one of the implementations of the first aspect.

In a fifth aspect, a computer program product is provided, which includes instructions that, when executed by a computer, enable the computer to perform the first aspect and the flow of the spatial calibration method provided in connection with any one of the implementations of the first aspect.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram illustrating an implementation of object positioning using object detection according to an embodiment of the present application;

FIG. 2 is a diagram of a system architecture provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of another system architecture provided by an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a spatial location system according to an embodiment of the present disclosure;

fig. 5 is a schematic flowchart of a method for acquiring logo information according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of an object location model according to an embodiment of the present disclosure;

FIG. 7 is a schematic flow chart illustrating a method for spatial localization according to an embodiment of the present disclosure;

fig. 8 is a schematic flowchart of a method for obtaining a calibration relationship according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a computing device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application are described below clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

First, a part of words and related technologies referred to in the present application will be explained with reference to the accompanying drawings so as to be easily understood by those skilled in the art.

A geographic region refers to a specific region in the physical world, such as a traffic intersection, a traffic road, a cell doorway, etc.

Spatial calibration (calibration) refers to determining a corresponding relationship between spatial positions of different planes or spaces, and may specifically be calculating a corresponding relationship between a geographic coordinate of a point in a geographic area and a pixel coordinate of the point in an image corresponding to the geographic area, where the corresponding relationship may also be referred to as a calibration relationship.

Homography transformation (homography transform), also called projective transformation, refers to the spatial position transformation relationship between two central projections, which can map points on one projective plane to another projective plane and map straight lines into straight lines, having a linear-preserving property, and the mapping relationship between two projective planes can be represented by a homography transformation matrix.

Spatial localization (location) refers to a process of determining the position of an object in an image or video in a geographic region of the physical world, i.e., determining the geographic coordinates of the object in the geographic region of the physical world according to the pixel coordinates of the object in the image or video.

The positioning point represents a point corresponding to the geographic position of the target in the image, the geographic position of the target generally refers to a central point of a vertical projection of the target on the ground, the point can accurately represent the geographic position of the target, and the geographic coordinate of the target is the coordinate of the central point of the vertical projection of the target in a geographic area. The position of the target in the image can be represented by a pixel point corresponding to a central point of the geographical vertical projection of the target in the image recording the geographical area, and the pixel point is a positioning point of the target in the image.

An Artificial Intelligence (AI) model is a machine learning model, which is essentially a mathematical model that includes a large number of parameters and mathematical formulas (or mathematical rules). The aim is to learn mathematical expressions which can provide the correlation between the input value x and the output value y, and the mathematical expression which can provide the correlation between x and y is the trained AI model. Generally, an AI model obtained by training an initial AI model using some historical data (i.e., x and y) can be used to obtain a new y from the new x, thereby implementing predictive analysis, a process of which is also referred to as reasoning.

Supervised learning is a process of training an initial AI model using a plurality of training data carrying labeled information. Each training data serves as input data for the initial AI model, and the label information for each training data is the expected output of the initial AI model. The training process is to input training data to the initial AI model, the initial AI model performs a series of mathematical calculations on the input data to obtain an output result, the output result is compared with the labeled information of the training data, parameters in the initial AI model are adjusted, and the initial AI model is iteratively trained in sequence, so that the result output by the initial AI model is closer to the labeled information corresponding to the input training data. The trained AI model may be used to predict the results of unknown data. For example, the training data may be sample images captured by a camera, the labeling information carried by each sample image is the pixel coordinates of the detection frame of the target recorded in the sample image, then the sample images carrying the labeling information are sequentially input into the initial AI model for training, after each training, the pixel coordinates of the detection frame of the target output by the initial AI model are compared with the pixel coordinates of the detection frame of the target in the corresponding labeling information, parameters in the initial AI model are adjusted, and iterative training is continuously performed, so that the pixel coordinates of the detection frame of the target output by the initial AI model are close to the pixel coordinates of the detection frame in the labeling information corresponding to the input sample image, and thus, the initial AI model is trained, that is, the trained AI model has the capability of predicting the detection frame of the target to be detected in the input image.

In the process of spatially positioning the target, the related art obtains a detection frame corresponding to the target by means of a target detection technology, uses a lower edge midpoint or a central point of the detection frame in the image as the position of the target in the image, and then maps the point into a geographic area of the physical world through single strain transformation, thereby obtaining the geographic coordinates of the target. As shown in fig. 1, in a traffic intersection, an image captured by a camera is subject to target detection, the target object of the detection is a vehicle in the traffic intersection, each vehicle in the image is marked by a rectangular detection frame as a detection result, and then the position of the vehicle in the image is represented by the middle point or the central point of the lower edge of the rectangular detection frame, and the coordinates of the points are determined as the pixel coordinates of the vehicle. However, the geographic coordinate of the point in the geographic area, which is obtained from the pixel coordinate of the middle point (or the central point) of the lower edge of the rectangular detection frame and the calibration relationship between the image and the geographic area, often differs greatly from the geographic coordinate of the target in the geographic area. Therefore, the geographic coordinates of the target in the geographic area obtained in this way are used for subsequent vehicle violation detection, traffic analysis and the like, and large errors are brought. For example, in the traffic intersection scenario shown in fig. 1, the error caused by the spatial positioning of the vehicle during the turn at the intersection will exceed two meters.

Based on the above problems, the present application provides a spatial positioning method, which may acquire an image recording at least one target to be detected by a camera disposed at a fixed position in a geographic area, input the image into a target positioning model for detection, obtain pixel coordinates of a positioning point of the target to be detected in the image, where the positioning point represents a point corresponding to a geographic position of the target to be detected in the geographic area in the image, and then obtain geographic coordinates of the target to be detected in the geographic area by using the pixel coordinates of the positioning point and a calibration relationship between the image captured by the camera and the geographic area. By the method, the cost is reduced, the positioning accuracy is improved, and the applicable scene is expanded.

The target positioning model can be an AI model, an initial AI model needs to be trained before the AI model is used for detection, and the initial AI model is trained by using a sample image containing a target shot by a camera and pixel coordinates of a positioning point in the sample image, so that the trained AI model has the positioning point detection capability, and the positioning point detection can be performed on an image to be detected shot by the camera.

The pixel coordinates of the positioning points in the sample image can be obtained by calculation through high-point images or videos shot by an unmanned aerial vehicle and the like. For example, after a high-point image or video captured by an unmanned aerial vehicle or the like is obtained, the geographic position of the target in the geographic area is obtained through calculation, after the geographic position of the target is obtained, a positioning point of the geographic position of the target in a sample image captured by a camera can be further obtained, and finally, the pixel coordinates of the positioning point in the sample image are obtained.

The technical scheme of the embodiment of the application can be applied to various scenes needing space positioning, including but not limited to traffic intersections, traffic roads, school doorways, community doorways and the like.

The target in the present application includes a vehicle, a pedestrian, an animal, a static object, and the like recorded in an image, and the target in the image that needs to be detected and needs to be spatially located is also referred to as a target to be detected.

The pixel coordinates in the application are coordinates of pixel points in an image, and the pixel coordinates are two-dimensional coordinates.

The geographic coordinates in the present application are three-dimensional coordinate values representing points in a geographic area, and it should be understood that the coordinate values of the same point are different in different coordinate systems. The geographic coordinate of the point in the present application may be a coordinate value in any coordinate system, for example, the geographic coordinate of the target in the present application may be a three-dimensional coordinate composed of longitude, latitude, and altitude corresponding to the target, may also be a three-dimensional coordinate composed of X coordinate, Y coordinate, and Z coordinate in a natural coordinate system corresponding to the target, and may also be a coordinate in another form.

The spatial localization method provided by the present application is executed by a spatial localization system, and in a specific embodiment, the spatial localization system may be deployed in any computing device related to spatial localization. For example, as shown in fig. 2, may be deployed on one or more computing devices (e.g., a central server) on a cloud environment, or on one or more computing devices (edge computing devices) in an edge environment, which may be servers. The cloud environment refers to a central computing device cluster owned by a cloud service provider and used for providing computing, storage and communication resources, and the cloud environment has more storage resources and computing resources. An edge environment refers to a cluster of edge computing devices geographically close to the original data collection device for providing computing, storage, and communication resources. The spatial location system in the present application may also be deployed on one or more terminal devices, as shown in fig. 3, where the spatial location system is deployed on one terminal computing device, and the terminal device has certain computing, storage, and communication resources, and may be a computer, a server, and the like. Raw data acquisition equipment refers to equipment for acquiring raw data required by a space positioning system, and includes, but is not limited to, a video camera, an infrared camera, a laser radar and the like, and the raw data acquisition equipment includes equipment which is placed at a fixed position of a traffic road, and is used for acquiring raw data (such as video data, infrared data and the like) of the traffic road in a self-view angle, and the like.

The space positioning system is used for carrying out target detection and positioning on an image obtained by shooting through the camera, the space positioning system carries out positioning point detection on a target to be detected in the image by utilizing a trained target positioning model to obtain pixel coordinates of a positioning point of the target to be detected in the image, and then obtains geographic coordinates of the target to be detected according to a calibration relation between the image shot through the camera and a geographic area, the target positioning model can be the AI model formed by training the initial AI model, and the target positioning model has a positioning point detection function, namely the positioning point detection can be carried out on the image to obtain the pixel coordinates of the positioning point of the target in the image. The units in the space positioning system may be divided in various ways, which is not limited in this application. Fig. 4 is an exemplary division manner, and as shown in fig. 4, the function of each functional unit will be briefly described below.

The spatial localization system 400 comprises a plurality of functional units, wherein an acquisition unit 410 is configured to acquire an image captured by a camera disposed at a fixed location in a geographic area, the image having recorded therein at least one object to be detected; a positioning point detection unit 420, configured to input the image obtained by the obtaining unit 410 into a target positioning model, and perform positioning point detection on the target to be detected in the image to obtain a pixel coordinate of the positioning point; the processing unit 430 is configured to determine the geographic coordinate of the target to be detected according to the pixel coordinate of the positioning point of the target to be detected in the image detected by the positioning point detecting unit 420 and the calibration relationship between the image shot by the camera and the geographic area.

Optionally, the processing unit 430 is further configured to determine an initial target positioning model, and train the initial target positioning model using a plurality of sample images carrying tagging information acquired by the acquisition unit 410, so that the trained target positioning model has the capability of detecting a target position, a type, and a pixel coordinate of a locating point in an image, the plurality of sample images are obtained by shooting the geographic area by a camera, the tagging information includes type information of a target recorded in the sample image, detection frame information, and a pixel coordinate of a locating point in the sample image, and after the training is completed, the processing unit 430 deploys the trained target positioning model to the locating point detection unit 420. The positioning point detecting unit 420 is further configured to perform position and category detection on the target to be detected in the image, so as to obtain a category of the target to be detected and detection frame information in the image, where the detection frame information corresponds to the positioning points one to one.

The space positioning method provided by the application detects the positioning point of the target through the target positioning model, and further determines the geographic coordinates of the target in the geographic area according to the pixel coordinates of the detected positioning point, so that the space calibration of the target is realized, and the precision of the space calibration can be effectively improved.

It should be understood that the target location model in the present application is a trained AI model. The target location model needs to be trained before being used in the spatial location method of the present application, so that it has the capability of predicting the pixel coordinates of the location point of the target in the image. The object localization model in the present application may also have the capability to determine the class and location of the object (detection box information). In the training process, special training data is required to be used for training, analysis is carried out based on model capability requirements, and training is carried out by using a sample image shot by a camera carrying labeling information, wherein targets (such as vehicles, pedestrians and the like) are recorded in the sample image, and the labeling information comprises category information, positions (detection frame information) and pixel coordinates of positioning points of the targets in the sample image. The category information of the object is used to indicate the category of the object, for example: the detection frames correspond to the positioning points one by one, and the detection frames are used for marking out targets in the sample images, for example, detection frame information corresponding to the rectangular detection frame can be specifically composed of four pixel coordinates, namely, an upper left-corner abscissa, an upper left-corner ordinate, a lower right-corner abscissa and a lower right-corner ordinate of the detection frame. Note that the markup information may be saved in a file such as extensible markup language (XML) or JavaScript object notation (JSON).

The following first describes the process of acquiring training data and annotation information required for training the object localization model in the present application.

The type information and the detection frame information of the target can be obtained by detecting the sample image by using a target detection algorithm to obtain the detection frame information and the type information of the target recorded in the sample image, and can also be obtained by manually marking.

The following describes the process of acquiring the pixel coordinates of the positioning point of the target in the sample image, as shown in fig. 5:

s501: n top-view images of a geographic area are acquired.

Specifically, the unmanned aerial vehicle may be used to stay right above the geographic area to vertically shoot the geographic area, so as to obtain N top-view images of the geographic area, where the N top-view images correspond to traffic conditions in the geographic area at different times, where N is an integer greater than 1, and for example, N may be 50.

In the process of capturing the overhead view image of the geographic area, N sample images of the same geographic area should be captured by the camera at the same time. Each of the top view images is taken at the same time as the sample image taken by the camera, for example, the camera takes two sample images of the geographic area at the time t1 and the time t2, respectively, so that the two top view images of the geographic area are also taken by the drone at the corresponding times t1 and t2, that is, for the sample image taken by the camera at each time, there is a top view image taken at the same time of the same geographic area, and it is ensured that the geographic position of the object recorded in the sample image at the same time is the same as the geographic position of the object recorded in the top view image at the same time.

S502: pixel coordinates of an object recorded in the overhead view image are acquired.

Specifically, the target detection algorithm is used to detect the target of each of the top-view images by using a target detection algorithm, such as a single-point-only-look (Yolo) detector, a single-point-multiple-box detector (SSD), a Faster convolutional neural network (fast-RCNN) based on a region-generating network, and the like, and detect a detection frame (e.g., a rectangular detection frame) of the target recorded in the top-view images, where a pixel coordinate of a central point of each detection frame is a pixel coordinate of the target.

It is worth mentioning that the geographical position of the target in the geographical area may be represented by the geographical coordinates of the centre point of the vertical projection of the target in the geographical area. In the overhead view image, the point corresponding to the geographic position of the target in the image is the same as the center point of the detection frame of the target, and therefore, the pixel coordinate of the center point of the detection frame in the overhead view image is the pixel coordinate of the geographic position of the target in the overhead view image.

S503: and obtaining the geographic position of the target in the geographic area according to the calibration relation between the overhead view image and the geographic area.

Specifically, the pixel coordinates of the geographic position of the target in the overhead view image are obtained in step S502, and then the geographic position of the target in the geographic area, that is, the geographic coordinates of the center point of the vertical projection of the target in the geographic area, can be obtained through simple calculation by using the calibration relationship between the overhead view image and the geographic area.

It should be understood that the calibration relationship between the overhead view image and the geographic region can be obtained in advance and stored in the spatial positioning system, and the specific obtaining method thereof will be described in detail in the following steps.

S504: and obtaining the pixel coordinates of the positioning point of the target in the sample image according to the calibration relation between the image shot by the camera and the geographical area.

Specifically, since the camera and the unmanned aerial vehicle both photograph the same geographic area, the target recorded in the sample image photographed by the camera and the target recorded in the overhead view image photographed by the unmanned aerial vehicle at the same time are the same target at the same time, and the geographic position corresponding to the pixel coordinate of the positioning point of the target in the sample image and the geographic position corresponding to the pixel coordinate of the central point of the detection frame in the overhead view image are the same geographic position.

In step S503, the geographic position of the target in the geographic area is obtained, and then the pixel coordinates of the positioning point of the target in the sample image captured by the camera can be obtained through a simple coordinate transformation by using the calibration relationship between the sample image and the geographic area. Similarly, the calibration relationship between the sample image and the geographic region can be obtained in advance and stored in the spatial positioning system, and the specific obtaining method will be described in detail in the subsequent steps.

For example, assuming that the calibration matrix between the top-view image and the geographic region is H1, i.e., the transformation matrix for transforming the pixel coordinates of the point in the top-view image to the geographic coordinates in the geographic region is H1, the pixel coordinates of the target in the top-view image is (m, n), and the geographic coordinates of the target in the geographic region is (a, b, c) ═ m, n) × H1. The calibration matrix between the images taken by the cameras and the geographical area is H2, then H2^-1If the geographic coordinates of the point in the geographic area are converted to the conversion matrix corresponding to the pixel coordinates in the image captured by the camera, then (a, b, c) ═ x, y) × H2 exists according to the calibration relationship between the image captured by the camera and the geographic area, and the pixel coordinates (x, y) ═ x, b, c) < H2 of the locating point of the object in the image captured by the camera can be obtained^-1＝(m，n)*H1*H2^-1. It can be seen that, through two times of coordinate transformation, the pixel coordinates of the corresponding point of the geographic position of the target in the overhead view image can be transformed into the pixel coordinates of the positioning point of the target in the sample image shot by the camera, and the pixel coordinates of the positioning point of the target recorded in the sample image shot by the camera can be accurately obtained.

After the pixel coordinates of the positioning point of the target in the sample image are obtained, the pixel coordinates of the positioning point are used as the labeling information corresponding to the sample image, and the labeling information of a plurality of sample images can be obtained by the method described in fig. 5, and the plurality of sample images are used as training samples and used for the training of the subsequent target positioning model together with the labeling information corresponding to the plurality of sample images.

It should be noted that, in the process of obtaining the overhead view image by using a device such as an unmanned aerial vehicle, the unmanned aerial vehicle may not be absolutely stationary right above the geographic area due to the shake of the unmanned aerial vehicle or the influence of external factors (for example, wind, rain, etc.), which may cause the same point in the geographic area to correspond to different pixel coordinates in the overhead view image captured by the unmanned aerial vehicle at different times, and may also cause the geographic position of the obtained target (for example, a vehicle) in the geographic area to be inaccurate.

Optionally, in order to eliminate the influence, a possible implementation method is to map targets in the top view images shot at other times into the same top view image, so that the target only has one pixel coordinate at any time and corresponds to a unique geographic position, and thus, the influence caused by unmanned aerial vehicle shaking and the like can be eliminated.

In a possible application scenario, the unmanned aerial vehicle stays at a fixed height position right above a geographic area to vertically shoot the geographic area, an overlook image within a period of time is obtained, and when the unmanned aerial vehicle shoots, the camera also shoots the same geographic area. And selecting top view images shot at N different clear moments from the obtained top view images, and determining N sample images shot by the camera at the corresponding moment according to the shooting moments of the selected N top view images.

After selecting the N top-view images, any one top-view image may be selected as a reference image, and the remaining N-1 top-view images are non-reference images, which aims to map targets in all the non-reference images into the reference image so as to improve the accuracy of the obtained geographic positions of the targets.

Then, static background information of the N top-view images is acquired. In the present application, the static background information refers to image information left after a target object (generally, a moving object) is removed, for example, a static object at a traffic intersection recorded in an image, and includes: road marking lines, warning boards, traffic signal poles, station posts, surrounding buildings, street trees, flower beds and the like. Specifically, a target detection algorithm may be used to perform target detection on each of the top-view images to obtain a detection frame of a target in the top-view images, and then information in the detection frame is covered or deleted, and the rest is static background information.

It should be noted that the purpose of obtaining the static background information of the overhead view image is to obtain the mapping relationship between all the non-reference images and the reference image for calculation convenience, and since the static object does not move with time, the mapping relationship between the reference image and the non-reference images can be calculated only by obtaining the pixel coordinates of a plurality of static objects in the reference image and the non-reference images.

And then, respectively carrying out feature matching on the static background information of the reference image and the static background information of each non-reference image, and calculating to obtain a coordinate conversion matrix between the reference image and each non-reference image. The static background information (including the reference image and the non-reference image) of all the top-view images may be subjected to feature extraction by using an image feature extraction algorithm, for example, algorithms such as scale-invariant feature transform (SIFT), Speeded Up Robust Features (SURF), and oriented fast and robust feature point extraction and description (ORB), so as to extract feature points of the static background information of each top-view image and feature vectors describing the feature points.

Further, a feature matching algorithm, for example, a Bad Frame Masking (BFM) algorithm or a K-Nearest neighbor (KNN) algorithm, is used to match a feature vector corresponding to a feature point of the static background information of the reference image with a feature vector corresponding to a feature point of the static background information of each non-reference image, so as to obtain a plurality of feature point matching pairs, each feature point matching pair corresponds to a pixel coordinate pair, that is, a pixel coordinate of the feature point in the reference image and a pixel coordinate of the feature point in the non-reference image, and according to the obtained plurality of feature point matching pairs, a mapping relationship between the reference image and each non-reference image can be calculated by using a homography method, so as to obtain a coordinate transformation matrix therebetween. It should be noted that, for each of the reference image and the non-reference image, at least three pairs of feature point matching pairs need to be obtained to calculate the coordinate transformation matrix, and in order to improve the accuracy of the calculation result, tens of pairs of feature point matching pairs are generally obtained to perform calculation.

And finally, mapping the targets recorded in all the non-reference images into the reference image according to the coordinate conversion matrix between the obtained reference image and each non-reference image.

The training process of the object localization model will be described below.

By the method shown in fig. 5, the label information of a plurality of sample images shot at different times can be obtained, a plurality of sample images carrying label information form a training set, model training is performed by using training samples in the training set, and an initial target positioning model is determined at first. Therefore, the initial target positioning model of the application is also improved correspondingly in structure.

As shown in fig. 6, the structure of the initial object localization model 600 of the present application mainly includes three parts, namely, a backbone network 610, a detection network 620 and a loss function calculation unit 630. The backbone network 610 is used for extracting features of the input sample image, and includes a plurality of convolution layers inside, which may be a visual geometry group network (VGG), a residual network (residual network), a dense convolution network (dense convolution network), or the like. The detection network 620 is configured to detect and identify features extracted by the backbone network 610, output object type information, object position information (i.e., detection frame information), and pixel coordinates of a positioning point of the object, and essentially include a plurality of convolution layers therein, so as to perform further convolution calculation on an output result of the backbone network 610.

It should be noted that, compared to a general target detection model (e.g., yolo, false RCNN, etc.), the backbone network 610 of the present application may use the same network, but since the target positioning model may perform the anchor point detection, in the detection network 620, a plurality of channels are added to each convolution layer responsible for the regression detection frame, and two channels are preferentially added to represent the abscissa and the ordinate of the anchor point in the sample image, or of course, more channels may be added, and each channel is assigned with a corresponding physical meaning, for example, a channel representing the confidence of the anchor point is added, which is not limited in the present application.

First, the parameters of the initial positioning model 600 are initialized, and then the sample image is input to the initial object positioning model 600. The backbone network 610 performs feature extraction on the target recorded in the sample image to obtain an abstract feature, then inputs the abstract feature into the detection network 620, the detection network performs further detection and identification, predicts the category and position of the target and the pixel coordinate of the positioning point, and outputs the pixel coordinate to the loss function calculation unit 630 through a corresponding channel; then, the annotation information corresponding to the sample image is also input into the loss function calculation unit 630, the loss function calculation unit 630 compares the prediction result obtained by prediction by the detection network 620 with the annotation information corresponding to the sample image, calculates a loss function, and updates the parameters in the adjustment model by using a back propagation algorithm with the loss function as a target function. And sequentially inputting sample images carrying the labeling information, and continuously and iteratively executing the training process until the loss function value is converged, namely the loss function value obtained by calculation fluctuates around a certain value every time, the training is stopped, and at the moment, the target positioning model is trained, namely the target positioning model has the functions of detecting the type, the position and the positioning point of the target in the images and can be used for space positioning.

It is worth noting that since the present application adds two channels to each convolutional layer responsible for the regression detection block, redesign is required in the construction of the loss function. Assuming that the object localization model of the present application is an improvement on a classical object detection model (e.g. yolo, fast RCNN, etc.), whose Loss function is Loss1, the Loss function Loss of the constructed object localization model of the present application can be expressed as: loss1+ Loss2, and Loss2 is the Loss function corresponding to the two newly added channels.

Since the purpose of adding two channels in the present application is to predict the pixel coordinates of the locating point, when constructing Loss2, Loss2 may be represented by the distance between the pixel coordinates of the predicted locating point and the pixel coordinates of the locating point in the annotation information, and optionally, the distance may be represented by a one-dimensional (L1) norm or a two-dimensional (L2) norm, where the L1 norm represents the absolute value of the difference between the pixel coordinates of the predicted locating point and the true pixel coordinates of the locating point, and the L2 norm represents the square of the difference between the pixel coordinates of the predicted locating point and the true pixel coordinates of the locating point before being squared; alternatively, Loss2 may be constructed in other ways, which is not limited in this application.

And (3) utilizing the constructed Loss function Loss as a target function, updating parameters in the target positioning model by using a back propagation algorithm, and continuously performing iteration until the value of the Loss is converged, namely the value of the Loss obtained by each calculation fluctuates around a certain stable value, which indicates that the target positioning model is trained and can be used for space positioning.

After the training of the target location model is completed, the target location model may be utilized to perform spatial location, and how to perform spatial location will be described in detail below with reference to fig. 7. As shown in fig. 7, the method includes, but is not limited to, the following steps:

s701: the spatial positioning system acquires an image, the image is obtained by shooting by a camera arranged at a fixed position in a geographical area, and at least one target to be detected is recorded in the image.

Specifically, the spatial positioning system may acquire, by using a camera disposed in a geographic area, a piece of video data captured by the camera, where the video data is composed of video frames at different times, where the video frames in the video data are arranged in a time sequence, each video frame is an image and is used to reflect a condition of the geographic area captured at the current time, and each image records at least one object to be detected.

It should be noted that the camera and the camera used for capturing the sample image are the same camera, that is, the camera arranged at the fixed position in the geographic area needs to capture not only the sample image for training the target positioning model, but also the image to be detected for spatial positioning.

It should be understood that in the present application, a target may refer to a moving object on a traffic road or a moving object that is stationary for a period of time, such as: automotive, pedestrian, non-automotive, animal, and the like.

S702: and inputting the image to a target positioning model by the space positioning system to obtain a detection result.

Specifically, the trained target positioning model is used for performing positioning point detection on the target to be detected in the image, and the obtained detection result comprises pixel coordinates of the positioning point of the target to be detected in the image, the category of the target to be detected and a detection frame, wherein the positioning point represents a point corresponding to the geographic position of the target to be detected in the geographic area in the image. For example, taking an object to be detected as a vehicle as an example, the geographic position of the vehicle in the geographic area may be obtained in advance, and the geographic position of the vehicle may refer to a geographic coordinate of a center point of a vertical projection of the vehicle in the geographic area, and the geographic coordinate may be measured.

S703: and the space positioning system determines the geographic coordinates of the target to be detected according to the pixel coordinates of the positioning points and the calibration relation between the image shot by the camera and the geographic area.

Specifically, after the spatial positioning system obtains the pixel coordinates of the positioning point, the geographic coordinates of the positioning point can be obtained by using the calibration relation between the image shot by the camera and the geographic area through simple coordinate conversion, namely the geographic coordinates of the target to be detected, so that the spatial positioning of the target to be detected is realized.

The objects to be detected may have different meanings depending on the application scenario. For example, in an application scenario of determining vehicle violation at a traffic intersection, a target to be detected is a vehicle at a target traffic intersection; in an application scene of determining the driving position of a suspicious vehicle, a target to be detected is the suspicious vehicle on a traffic road; when suspicious people in a certain area (such as the interior of a residential area) are determined, the target to be detected is the suspicious people in the area; when determining a dangerous situation in a certain area (e.g., a factory), the target to be detected is a dangerous target (e.g., a malfunctioning machine, a burning object, etc.) in the area.

The spatial positioning system can be deployed on different processing devices according to different application scenarios, for example: the processing device may be a device for determining the geographical coordinates of the vehicle in a traffic management system, a device for determining the geographical coordinates of a suspicious vehicle in a police system, a device for determining the geographical coordinates of a suspicious person in a security management system, or a device for determining the geographical coordinates of a dangerous object in a danger-screening management system.

It should be understood that the calibration relationship between the image captured by the camera and the geographic area, and the calibration relationship between the overhead view image and the geographic area referred to in the above step S503 can be obtained by a variety of methods, which are similar in principle, and a method for obtaining the calibration relationship between the image captured by the camera and the geographic area is described below by way of example with reference to fig. 8:

s801: geographic coordinates of control points in a geographic area are acquired in advance.

In order to obtain the mapping relationship between the pixel coordinates of the target and the geographic coordinates of the target in the physical world, control points in some geographic areas need to be selected in advance, and the geographic coordinates of the control points are obtained and recorded. For example, a control point is selected at a traffic intersection, and the selection of the control point is usually some points with significant features, so that the positions of pixel points of the control point in an image can be found intuitively. For example: the method is characterized in that right-angle points of traffic sign lines, sharp points of arrows, green belt corner points and the like in traffic roads are used as control points, geographic coordinates (such as longitude, latitude and altitude) of the control points can be collected manually or by unmanned automobiles, the selected control points of the traffic roads need to be uniformly distributed on the traffic roads, all the control points are prevented from being distributed around a straight line, at least three control points which are not distributed on the straight line can be observed under the visual angle of a camera, and the selected number of the control points can be selected according to actual conditions.

S802: and acquiring the pixel coordinates of the acquired control point in the image shot by the camera.

Reading an image of a geographic area shot by a camera fixedly arranged in the geographic area, acquiring pixel coordinate values corresponding to observable control points in the shot image, and acquiring the pixel coordinate values in the image through manual acquisition or program acquisition, for example, acquiring the corresponding pixel coordinates of the control points of the geographic area in the image by using corner detection and a short-time Fourier transform edge extraction algorithm.

S803: and establishing a calibration relation from the image under the camera view angle to the physical world according to the geographic coordinates and the pixel coordinates of the control points. For example, a homography matrix H that converts pixel coordinates into geographic coordinates may be calculated according to the homography principle, the homography transformation formula is (m, n, H) ═ H (x, y), and the H matrix corresponding to the image captured by the camera may be calculated from the pixel coordinates (x, y) and the geographic coordinates (m, n, H) of at least three control points in the image obtained in the aforementioned steps S801 and S802 under the capturing view angle of the camera. It should be understood that if there are a plurality of cameras, the calculated H matrix is different because the shooting angle of view of each camera is different.

It can be seen that the geographic coordinates of the target to be detected can be obtained through simple calculation as long as the pixel coordinates of the positioning point of the target to be detected and the calibration relation between the image shot by the camera and the geographic area (i.e., the H matrix corresponding to the image shot by the camera) are obtained.

It should be noted that the execution time of the foregoing steps S801 to S803 should not be later than that of step S703, and the specific execution time is not limited, and for example, the execution may be completed before the training of the target location model. In addition, the method for obtaining the calibration relationship between the overhead view image and the geographic area is similar to that described above, and for brevity, the description is omitted here.

The method of the embodiments of the present application is described in detail above, and in order to better implement the above-mentioned aspects of the embodiments of the present application, correspondingly, the following also provides related equipment for implementing the above-mentioned aspects in a matching manner.

As shown in fig. 4, the present application also provides a spatial positioning system for performing the aforementioned spatial positioning method. The functional units in the spatial positioning system are not limited by the present application, and each unit in the spatial positioning system can be increased, decreased or combined as needed. Fig. 4 exemplarily provides a division of functional units:

the spatial localization system 400 comprises an acquisition unit 410, a localization point detection unit 420 and a processing unit 430.

Specifically, the obtaining unit 410 is configured to perform the foregoing steps S501, S701, and S801-S802, and optionally perform an optional method in the foregoing steps.

The anchor point detection unit 420 is configured to perform the foregoing steps S502 and S702, and optionally perform an optional method of the foregoing steps.

The processing unit 430 is configured to perform the foregoing steps S503-S504, S703, and S803, and optionally perform a method optional in the foregoing steps.

The three units may perform data transmission through a communication path, and it should be understood that each unit included in the spatial location system 400 may be a software unit, a hardware unit, or a part of the software unit and a part of the hardware unit.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a computing device according to an embodiment of the present application. As shown in fig. 9, the computing device 900 includes: a processor 910, a communication interface 920, and a memory 930, the processor 910, the communication interface 920, and the memory 930 being connected to each other by an internal bus 940. It should be understood that the computing device 900 may be a computing device in cloud computing, or a computing device in an edge environment.

The processor 910 may be formed by one or more general-purpose processors, such as a Central Processing Unit (CPU), or a combination of a CPU and a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.

The bus 940 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus 940 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 9, but not only one bus or type of bus.

Memory 930 may include volatile memory (volatile memory), such as Random Access Memory (RAM); the memory 930 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory (flash memory), a Hard Disk Drive (HDD), or a solid-state drive (SSD); the storage 930 may also include combinations of the above categories.

It should be noted that the memory 930 of the computing device 900 stores codes corresponding to the units of the spatial location system 400, and the processor 910 executes the codes to implement the functions of the units of the spatial location system 400, that is, to execute the methods of S701-S703.

The descriptions of the flows corresponding to the above-mentioned figures have respective emphasis, and for parts not described in detail in a certain flow, reference may be made to the related descriptions of other flows.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in, or transmitted from one computer-readable storage medium to another computer-readable storage medium, the computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more available media, such as a magnetic medium (e.g., floppy disks, hard disks, magnetic tapes), an optical medium (e.g., DVDs), or a semiconductor medium (e.g., SSDs), etc.

Claims

1. A method of spatial localization, the method comprising:

acquiring an image, wherein the image is obtained by shooting by a camera arranged at a fixed position in a geographical area, and at least one target to be detected is recorded in the image;

inputting the image to a target positioning model to obtain a detection result, wherein the target positioning model is used for performing positioning point detection on a target to be detected in the image, the detection result comprises pixel coordinates of a positioning point of the target to be detected in the image, and the positioning point represents a point corresponding to the geographical position of the target to be detected in the geographical area in the image;

and determining the geographic coordinates of the target to be detected according to the pixel coordinates of the positioning point and the calibration relation between the image shot by the camera and the geographic area.

2. The method of claim 1, wherein the object localization model is used before performing localization point detection on the object to be detected in the image, the method further comprising:

determining an initial target positioning model, wherein the initial target positioning model adopts a deep learning model;

acquiring a plurality of sample images carrying annotation information, wherein the plurality of sample images are obtained by shooting the geographic area by the camera, and the annotation information comprises pixel coordinates of positioning points of targets recorded in the sample images;

and training the initial target positioning model by utilizing the plurality of sample images carrying the marking information.

3. The method of claim 2, wherein prior to obtaining the plurality of sample images carrying annotation information, the method further comprises: and acquiring the pixel coordinates of the positioning point of the target recorded in the sample image.

4. The method of claim 3, wherein the obtaining pixel coordinates of a location point of a target recorded in the sample image comprises:

acquiring an overhead view image of the geographic area, wherein the overhead view image and the sample image are data acquired at the same time;

acquiring pixel coordinates of a target recorded in the sample image in the overhead view image;

obtaining the geographic position of the target recorded in the sample image in the geographic area according to the calibration relation between the overhead view image and the geographic area;

and obtaining the pixel coordinates of the positioning point of the target in the sample image recorded in the sample image according to the calibration relation between the image shot by the camera and the geographic area.

5. The method according to any one of claims 1 to 4, wherein the target positioning model is further configured to perform position detection and category detection on the target to be detected in the image, and the detection result further includes detection frame information and category information of the target to be detected in the image, where the detection frame information corresponds to the positioning points one to one.

6. The method according to any one of claims 1 to 5, wherein the geographical position of the object to be detected in the geographical area comprises geographical coordinates of a center point of a vertical projection of the object to be detected in the geographical area.

7. A spatial positioning system, comprising:

the device comprises an acquisition unit, a detection unit and a processing unit, wherein the acquisition unit is used for acquiring an image, the image is obtained by shooting by a camera arranged at a fixed position of a geographical area, and at least one target to be detected is recorded in the image;

the positioning point detection unit is used for inputting the image to a target positioning model to obtain a detection result, wherein the target positioning model is used for performing positioning point detection on a target to be detected in the image, the detection result comprises pixel coordinates of a positioning point of the target to be detected in the image, and the positioning point represents a point corresponding to the geographic position of the target to be detected in the geographic area in the image;

and the processing unit is used for determining the geographic coordinates of the target to be detected according to the pixel coordinates of the positioning point and the calibration relation between the image shot by the camera and the geographic area.

8. The spatial location system of claim 7,

the acquiring unit is further configured to acquire a plurality of sample images carrying annotation information, the plurality of sample images are obtained by shooting the geographic area with the camera, and the annotation information includes pixel coordinates of a positioning point of a target recorded in the sample image;

the positioning point detection unit is also used for determining an initial target positioning model, and the initial target positioning model adopts a deep learning model; and training the initial target positioning model by utilizing the plurality of sample images carrying the marking information.

9. The spatial positioning system of claim 8, wherein said acquisition unit is further configured to:

and acquiring the pixel coordinates of the positioning point of the target recorded in the sample image.

10. The spatial positioning system of claim 9, wherein the acquisition unit is specifically configured to:

11. The spatial positioning system of any one of claims 7-10, wherein the target positioning model is further configured to perform position detection and category detection on the target to be detected in the image, and the detection result further includes detection frame information and category information of the target to be detected in the image, and the detection frame information corresponds to the positioning points one to one.

12. The spatial location system of any of claims 7-11, wherein the geographic location of the object to be detected in the geographic area comprises geographic coordinates of a center point of a vertical projection of the object to be detected in the geographic area.

13. A computing device, comprising a memory and a processor, the processor executing computer instructions stored by the memory to cause the computing device to perform the method of any of claims 1-6.

14. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, performs the method of any of claims 1-6.