CN117237615A

CN117237615A - Supervision target positioning method and device

Info

Publication number: CN117237615A
Application number: CN202311507462.8A
Authority: CN
Inventors: 郑海波; 蔡鹏飞; 王晓梅; 李俊男; 潘思榕; 何子中; 齐洪武
Original assignee: CETC 15 Research Institute
Current assignee: CETC 15 Research Institute
Priority date: 2023-11-14
Filing date: 2023-11-14
Publication date: 2023-12-15
Anticipated expiration: 2043-11-14
Also published as: CN117237615B

Abstract

The application belongs to the technical field of computer vision, and provides a method and a device for positioning a supervision target, wherein the method comprises the following steps: collecting surrounding images of a fixed camera, marking the collected sample images, and finding out position information of a corresponding marked target object on a GIS (geographic information system) at the same time to establish a training data set; training the target recognition positioning model of the fixed camera by using the training data set to obtain a trained target recognition positioning model; performing target recognition on the current video stream image by adopting a trained target recognition positioning model, and obtaining the position information of a target object; and calculating the azimuth angle and the distance of the target object relative to the fixed camera so as to adjust the direction of the fixed camera. The application solves the problems that the target object can be accurately positioned, and the orientation of the fixed camera relative to the target object to be shot can be accurately adjusted, so that the positioning information is lacking after the single camera target is identified.

Description

Supervision target positioning method and device

Technical Field

The application relates to the technical field of computer vision, in particular to a method and a device for positioning a supervision target.

Background

Recently, as convolutional neural network technologies develop, image recognition technologies have been widely used, CNN technologies have been developed to determine camera direction in a manner that estimates horizon from a single image, but most formulate problems as regression or classification, imposing a strong prior on horizon-related features and responsive camera parameters. However, in practical applications, the above method is not well affected by many pseudo-parallel line segments.

In terms of computer vision, it has been proposed in the related method to mark in a large geographic image database by using appearance features that have low and moderate levels of closest match. In SFM (structure-from-motion) based frameworks, it has also been proposed to use a large number of images and computing resources to improve accuracy, essentially placing the recognition database on the cloud. In addition, a method of estimating camera position and orientation by matching a ground view image and a satellite image is also implemented.

In the prior art, when a target object is positioned, the internal parameters of a camera including focal length, optical center and the like need to be acquired, and related driving programs are written. For some old cameras, which lack a cradle head to acquire related parameters, the camera models of different manufacturers are different, the parameters and the operation modes of different models of cameras are different, and even some old cameras do not have a turntable interface, so that the rotation, focusing, positioning and the like of the cameras are not uniform, and the operation is complex. Generally, a unified interface is developed to control the camera by changing into a unified type camera, but the deployment cost is increased, the interfaces required by resource waste are different, and the workload is increased.

In the related art, by extracting vertical and horizontal building edges from an omni-directional image composed of four directional images, structural fragments describing a piecewise linear contour of a building are generated from the edges, searching is performed in a map area, a structure is found from a single image and identification positioning is performed with reference to the map, but the inclination of a camera is not considered in the positioning process, only large-scale searching and rough positioning can be supported, and cannot be applied to the requirement of higher precision, and the method requires more user operations. In addition, there are the following problems: the orientation and distance of the outer target object in space cannot be precisely determined in the absence of camera mounting parameters and optical parameters; the manufacturers and the models of the turntables corresponding to the cameras are different, a general method cannot be used for acquiring the actual parameters of the current turntables, and even the turntables of some old cameras have no corresponding interfaces at all; and (3) the problem of lack of positioning information after target identification of the single camera.

Therefore, there is a need to provide a new method of supervised object localization to solve the above problems.

Disclosure of Invention

The application aims to provide a supervision target positioning method and device, which are used for solving the problem that the orientation and the distance of an outer target object in space cannot be precisely determined under the condition of lacking camera installation parameters and optical parameters in the prior art; the manufacturers and the models of the turntables corresponding to the cameras are different, a general method cannot be used for acquiring the actual parameters of the current turntables, and even the turntables of some old cameras have no corresponding interfaces at all; the technical problems of lack of positioning information and the like after target identification of a single camera are solved by the following technical scheme.

The first aspect of the present application proposes a supervision target positioning method, comprising: collecting surrounding images of a fixed camera, marking the collected sample images, and finding out position information of a corresponding marked target object on a GIS (geographic information system) at the same time to establish a training data set; training the target recognition positioning model of the fixed camera by using the training data set to obtain a trained target recognition positioning model; performing target recognition on the current video stream image by adopting a trained target recognition positioning model, and obtaining the position information of a target object; and calculating the azimuth angle and the distance of the target object relative to the fixed camera so as to adjust the direction of the fixed camera.

According to an alternative embodiment, the azimuth θ of the target object B point relative to the fixed camera a point in the current video stream image is calculated using the following expression:

θ=atan2(y,x)；

x= cos(A _w ) sin(B _w )-sin(A _w )*cos(B _w )*cos(B _j -A _j )；

y= sin(B _j -A _j )*cos(B _w ) ；

wherein θ represents an azimuth angle θ of a target object B point in a current video stream image relative to a fixed camera a point; a (A) _j , A _w ) Representing the coordinate position of a fixed camera, A _j For fixing the longitude of the point A of the camera, A _w The latitude of the point A of the fixed camera; b (B) _j , B _w ) Representing the coordinate position of the B point of the target object in the current video stream image, B _j Longitude of point B as target object, B _w Is the latitude of the point B of the target object.

According to an alternative embodiment, the distance S of the target object B point relative to the fixed camera a point is calculated using the following expression:

S=R*arccos(sin(A _j )*sin(B _j )+cos(A _j )*cos(B _j )*cos(B _w -A _w ))

s represents the distance between the point B of the target object and the point A of the fixed camera; r is the average radius of the earth; a is that _j For fixing the A point of the cameraLongitude, A _w The latitude of the point A of the fixed camera; b (B) _j Longitude of point B as target object, B _w Is the latitude of the point B of the target object.

According to an alternative embodiment, the calculating the azimuth and the distance of the target object relative to the fixed camera to adjust the direction of the fixed camera includes: according to the camera and a target object in the current video stream image, calculating an azimuth angle of the fixed camera, and adjusting the direction of the fixed camera; when the calculated azimuth angle is larger than the rotatable angle of the camera, the fixed camera is adjusted to be the maximum azimuth angle of the rotatable angle, the GIS direction is synchronously adjusted, so that the camera direction on the GIS is always consistent with the actual camera direction, wherein each camera is provided with a default focal length and a default direction when being installed, the opposite position of the camera is the default direction, the azimuth angle of the camera in the default direction is recorded as theta, and the angle of the fixed camera direction on the GIS is adjusted as theta.

According to an alternative embodiment, one or more target objects are identified from the current video stream image, the position coordinates of the identified one or more target objects are obtained, and the position coordinates of the current video stream image are characterized using the longitude of the target object and the latitude of the target object.

According to an alternative embodiment, when a plurality of target objects are identified, taking the average of the image longitudes and the image latitudes of the plurality of target objects as the position coordinates of the target objects in the current video stream image.

According to an alternative embodiment, panoramic images formed by continuous shooting of a fixed camera in different rotation ranges under different focal lengths are collected, and the fixed camera is used for representing a camera installed at a fixed position;

sampling panoramic images at different positions in a panoramic video stream to obtain sample images, calibrating target objects in the sample images, and acquiring longitude and latitude of the calibrated target objects by combining GIS geographic information to establish a training data set, wherein the target objects comprise fixed objects fixed at a designated position or mounted at the designated position.

According to an alternative embodiment, when the target object is not identified or the target object is identified by mistake and the calculated azimuth angle exceeds the maximum rotation range of the fixed camera, the fixed camera on the GIS is adjusted to be in a fixed direction.

According to an alternative embodiment, a yolov5 algorithm is adopted to establish a target identification positioning model, the established training data set is used for training the target identification positioning model, so that the target identification positioning model learns characteristic information of target scenes around a fixed camera, such as buildings, fixed facilities, road traffic, vegetation layout and the like, and fine adjustment is carried out on network parameters inside the target identification positioning model.

The second aspect of the present application provides a supervision target positioning apparatus, which adopts the supervision target positioning method of the present application, and the supervision target positioning apparatus includes: the acquisition processing module acquires images around the fixed camera, marks the acquired sample images, and finds out the position information of the corresponding marked target object on the GIS at the same time so as to establish a training data set; the model training module is used for training the target recognition positioning model of the fixed camera by using the training data set to obtain a trained target recognition positioning model; the target recognition module is used for carrying out target recognition on the current video stream image by adopting a trained target recognition positioning model and obtaining the position information of a target object; and the calculation and adjustment module calculates the azimuth angle and the distance of the target object relative to the fixed camera so as to adjust the direction of the fixed camera.

A third aspect of the present application provides an electronic apparatus, comprising: one or more processors; a storage means for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of the first aspect of the present application.

A fourth aspect of the application provides a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the method according to the first aspect of the application.

The embodiment of the application has the following advantages:

compared with the prior art, the method and the device have the advantages that the fixed camera is combined with the GIS, and the self-established target recognition positioning model is utilized to carry out target recognition on the current video stream image, so that the position information of the target object is obtained;

the method has the advantages that the fixed cameras are used for calculating the azimuth angle and the distance of a target object relative to the fixed cameras at video image contents with different directions, the direction of the camera is calculated reversely, the position of the camera is calculated reversely, namely, the focal length of the current camera is calculated reversely, meanwhile, the actual direction and the actual focal length of the fixed cameras on the corresponding coordinates of the linkage GIS are adjusted, the target object can be positioned accurately, meanwhile, the direction of the fixed cameras relative to the target object to be shot can be adjusted accurately, and the problem that positioning information is lacking after single camera target identification is solved.

In addition, the application can be compatible with various cameras, does not need to change an identification positioning model in the camera, does not need to additionally install positioning and ranging equipment, and is also applicable to some old cameras.

Drawings

FIG. 1 is a flow chart of steps of an example of a supervised target localization method of the present application;

FIG. 2 is a flow chart of a specific example of a supervised target localization method of the present application;

FIG. 3 is a flow chart of an example of an image to be processed to which the supervised object localization method of the present application is applied;

FIG. 4 is a schematic diagram of position coordinate information of a target object B point relative to a fixed camera A point in the supervised target positioning method of the present application;

FIG. 5 is a schematic diagram of the method for supervising the target positioning according to the present application, in which the azimuth and distance of the target object relative to the fixed camera are calculated;

FIG. 6 is a schematic diagram of an example of calculating the focal length of a target object relative to a fixed camera in a supervised target localization method of the present application;

fig. 7 is a block diagram of an example of a supervisory object positioning apparatus of the application.

Detailed Description

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

In view of the above problems, the application provides a supervision target positioning method, which trains a target identification positioning model by using video annotation data acquired at a fixed position, and uses video image contents of the fixed camera in different orientations to reversely calculate the orientation of the camera and the position (namely the current camera focal length) of the target object by using the video image contents of the fixed camera in different orientations, and simultaneously adjusts the actual orientation and the actual focal length of the fixed camera on corresponding coordinates of a linkage GIS (geographic information system), thereby solving the problem of lack of positioning information after target identification of a single camera.

Example 1

FIG. 1 is a flow chart of steps of an example of a supervised object localization method of the present application.

The following describes the present application in detail with reference to fig. 1, 2, 3, 4, 5, 6 and 7.

Referring to fig. 1, 2 and 3, in step S101, images around the fixed camera are collected, the collected sample images are labeled, and meanwhile, position information of a corresponding labeling target object is found on the GIS, so as to establish a training data set.

Specifically, panoramic images formed by continuous shooting of fixed cameras in different rotation ranges under different focal lengths are collected, and the fixed cameras are used for representing cameras installed at fixed positions.

The image processing part shown in fig. 2 samples panoramic images (for example, panoramic images in a sector scanning range obtained from videos, in particular, records panoramic video streams under different focal lengths) at different positions in a panoramic video stream to obtain a sample image, calibrates a target object in the sample image, and combines geographic information of a GIS to obtain longitude and latitude of the calibrated target object, so as to obtain labeling data labeled with a target object category and longitude and latitude coordinates, so as to establish a training data set (for example, "generating the training data set with position information" in fig. 2), wherein the target object comprises a fixed object fixed at a designated position or mounted at the designated position, and the fixed object is a certain building identifier, for example.

Specifically, the sample image is an image as shown in fig. 3, and a first logo building and a second logo building are framed using gray square frames, gray rectangular frames in the right side portion of fig. 3. The locations of the first and second identified buildings in the GIS geographic information map are shown in the left-hand portion of fig. 3.

In a specific embodiment, for example, a labeling person labels an image acquired by a camera (also referred to as a fixed camera) at a fixed position, and at the same time, finds a corresponding labeling target (i.e. a calibrated target object) on a GIS, and obtains longitude and latitude coordinates of the target object, i.e. coordinates B (B) of the target object _j , B _w ）。

For example, the acquired panoramic video is split into an image set (including a plurality of sample images) with the size of the camera view range, a labeling tool is used for calibrating a fixed object in the sample images acquired by the fixed camera in the view range under different focal lengths, and then the longitude and latitude of the calibrated target object are acquired by combining with GIS geographic information, so as to obtain labeling data marked with the category (such as multi-classification, first-stage label, second-stage label and third-stage label) and longitude and latitude coordinates of the target object, and the sample image with the position information is generated, so that a training data set is established.

It should be noted that the foregoing is merely illustrative of the present application and is not to be construed as limiting thereof.

Next, in step S102, the training data set is used to train the target recognition positioning model of the fixed camera, so as to obtain a trained target recognition positioning model.

And (3) establishing a target identification positioning model by using a yolov5 algorithm, training the target identification positioning model by using the established training data set, and simultaneously enabling the target identification positioning model to learn characteristic information of target scenes around a fixed camera, such as buildings, fixed facilities, road traffic, vegetation layout and the like, so as to finely adjust internal network parameters of the target identification positioning model.

As shown in fig. 2, the model training part configures yolov5 target positioning identification model parameters: setting yolov5m. Pt as an initial pre-training model, taking marked data as training data, training n epochs (corresponding to the process of loading the target recognition positioning pre-training model for training in fig. 2), and storing the last training model result as a target recognition positioning prediction model (corresponding to the process of outputting and maintaining the model training result in fig. 2).

And obtaining a trained target recognition positioning model through model training and model parameter adjustment.

In other embodiments, a series model such as R-CNN, SSD, YOLO may be used to create the target recognition model. The foregoing is illustrative only and is not to be construed as limiting the application.

Next, in step S103, a trained object recognition positioning model is used to perform object recognition on the current video stream image, and obtain the position information of the target object.

Specifically, a current video stream image (i.e., an image to be processed, such as the image shown in fig. 3) is acquired, and a trained target recognition model is utilized to perform target recognition on the current video stream image, so that the recognized target object is converted into position coordinates.

One or more target objects are identified from the current video stream image, the position coordinates of the identified one or more target objects are obtained, and the position coordinates of the target objects are characterized using image longitude and image latitude.

When a plurality of target objects are identified, taking the average value of the image longitudes and the image latitudes of the plurality of target objects as the position coordinates of the target objects in the current video stream image.

Specifically, if a plurality of target objects are identified, the average thereof is taken as the position coordinates B (B _j , B _w ) Wherein ,/>。

Wherein B is _j Image longitude for the target object (i.e., fixed object) in the current video stream image; b (B) _w The image latitude of the target object in the current video stream image is determined; b (B) _ij The longitude of the ith fixed object in the current video stream image, specifically the longitude of the calibrated ith target object, i representing the ith target object; i. n is a positive integer, 1, 2, 3. B (B) _iw The method comprises the steps that the latitude of an ith fixed object in a current video stream image, specifically, the latitude of a calibrated ith target object is obtained, and i represents the ith target object; i. n is a positive integer, 1, 2, 3.

As shown in fig. 4, when n=2, the target object includes target object B ₁ （J ₁ ，W ₁ ）B ₂ （J ₂ ，W ₂ ）， B _j Is J ₁ And J ₂ Average value of B _w Is W ₁ And W is ₂ Average value of (2).

In another embodiment, when the target object is not identified or the target object is identified by mistake and the calculated azimuth angle is significantly beyond the maximum rotation range of the fixed camera, the fixed camera on the GIS is adjusted to a designated direction, that is, the camera is installed, and the opposite position of the camera is set to a default direction (that is, the designated direction, for example, the direction in which the camera is opposite to the front).

In the example of fig. 2, for example, the camera loads a trained image recognition model, performs object recognition on the current video stream image, and reads longitude and latitude coordinates of a recognition object for positioning by adjusting the direction of the camera.

The foregoing is illustrative only, and is not to be construed as limiting the present application.

Next, in step S104, the azimuth and distance of the target object with respect to the fixed camera are calculated to adjust the direction of the fixed camera.

And for orientation positioning of the camera, converting the known longitude and latitude coordinate values into azimuth angles, and regulating the orientation of the camera by the linkage GIS.

Assuming positive north latitude and negative south latitude; the east meridian is positive and the west meridian is negative. Coordinate position A (A) of camera (i.e. fixed camera) _j , A _w ) If there is a target object in the current video stream image, identifying the target object, and deducing longitude and latitude coordinates B (B _j , B _w ) If there are a plurality of target objects, the coordinates of the image are the average value B (B) _j , B _w ) I.e.，/>. See in particular fig. 5.

Specifically, the target object specifically refers to a target object in the current video stream image, such as a building, a tree, and the like.

Specifically, the following expression is used to calculate the azimuth angle θ of the target object B point in the current video stream image relative to the fixed camera a point, so as to adjust the direction of the fixed camera:

θ=atan2(y,x)；

x= cos(A _w ) sin(B _w )-sin(A _w )*cos(B _w )*cos(B _j -A _j )；

y= sin(B _j -A _j )*cos(B _w ) ；

Next, the distance S of the target object B point with respect to the fixed camera a point is calculated using the following expression:

S=R*arccos(sin(A _j )*sin(B _j )+cos(A _j )*cos(B _j )*cos(B _w -A _w ));

s represents the distance between the point B of the target object and the point A of the fixed camera; r is the average radius of the earth; a is that _j For fixing the longitude of the point A of the camera, A _w The latitude of the point A of the fixed camera; b (B) _j Longitude of point B as target object, B _w The latitude of the point B of the target object; r represents the average radius of the earth.

According to the calculated azimuth angle and distance, the orientation of the fixed camera is adjusted in real time, and meanwhile, the orientation of the fixed camera is adjusted by the linkage GIS.

When the calculated azimuth angle is larger than the rotatable angle of the camera, the fixed camera is adjusted to the maximum azimuth angle of the rotatable angle (namely, exceeds the collectable range of the camera), the GIS direction is synchronously adjusted, so that the camera direction on the GIS is always consistent with the actual camera direction, and the default focal length and the default direction of the camera are set when each camera is installed.

Optionally, the facing position of the camera is defined as a default direction, the azimuth angle of the camera in the default direction is recorded as θ, and the angle of the fixed camera direction on the GIS is adjusted as θ.

In another embodiment, when the calculated distance or azimuth is beyond the collectable range of the camera, the camera is adjusted to a default focal length and direction.

In yet another embodiment, when the target object is not identified or is erroneously identified and the calculated azimuth is outside the maximum rotation range of the fixed camera, the fixed camera on the GIS is adjusted to a fixed orientation (e.g., default orientation, specific orientation relative to another object is specified, etc.).

In still another embodiment, the size of the target object is known as D (the left-right direction in fig. 6 is horizontal) and H (i.e., the vertical direction in fig. 6 is vertical) as the horizontal width of the target object. And the identified target object on the current video stream image has a horizontal width d and a vertical height h, see fig. 6. Meanwhile, the distance from the identified target object to the camera is S, and the focal length f of the camera is calculated by using the following expression:

f=h×s/H or f=d×s/D;

wherein f represents the focal length of the camera; h is the height of the target object in the vertical direction; h represents the vertical height of the target object on the current video stream image; s is the distance from the identified target object to the camera; d represents the width of the target object in the horizontal direction (i.e., horizontal width); d represents the width (i.e., horizontal width) of the target object in the horizontal direction on the current video stream image.

The azimuth angle and the distance of the target object relative to the fixed camera are calculated, the direction of the camera and the position (namely the current camera focal length) of the target object are calculated reversely, meanwhile, the actual direction and the actual focal length of the fixed camera on the corresponding coordinates of the linkage GIS are adjusted, the target object can be positioned accurately, and meanwhile, the direction of the fixed camera relative to the target object to be shot can be adjusted accurately.

Furthermore, the drawings are only schematic illustrations of processes involved in a method according to an exemplary embodiment of the present application, and are not intended to be limiting. It will be readily understood that the processes shown in the figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

Example 2

The following are examples of the apparatus of the present application that may be used to perform the method embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method of the present application.

Fig. 7 is a schematic structural view of an example of a supervision target positioning apparatus according to the application.

Referring to fig. 7, a second aspect of the present disclosure provides a supervision target positioning apparatus 600 employing the supervision target positioning method according to the first aspect of the present disclosure. The supervising target positioning device 600 includes an acquisition processing module 610, a model training module 620, a target identification module 630, and a calculation adjustment module 640.

In a specific embodiment, the acquisition processing module 610 acquires the image around the fixed camera, marks the acquired sample image, and finds the position information of the corresponding marked target object on the GIS to create the training data set. The model training module 620 uses the training data set to train the target recognition positioning model of the fixed camera, and obtains a trained target recognition positioning model. The target recognition module 630 performs target recognition on the current video stream image by using the trained target recognition positioning model, and obtains the position information of the target object. The calculation adjustment module 640 calculates the azimuth and distance of the target object with respect to the fixed camera to adjust the direction of the fixed camera.

In a specific embodiment, panoramic images formed by continuous shooting of a fixed camera in different rotation ranges under different focal lengths are collected, and the fixed camera is used for representing a camera installed at a fixed position;

In an alternative embodiment, a yolov5 algorithm is adopted to establish a target recognition positioning model, the established training data set is used for training the target recognition positioning model, and meanwhile, the target recognition positioning model is enabled to learn information of target scenes around the fixed camera so as to finely adjust parameters of the target recognition positioning model.

Specifically, one or more target objects are identified from the current video stream image, the position coordinates of the identified one or more target objects are obtained, and the position coordinates of the target objects are characterized using image longitude and image latitude.

And when the target object is not identified or the target object is identified by mistake and the calculated azimuth angle exceeds the maximum rotation range of the fixed camera, adjusting the fixed camera on the GIS to be in a fixed direction.

Specifically, the azimuth angle theta of a target object B point relative to a fixed camera A point in a current video stream image is calculated by using the following expression:

θ=atan2(y,x)；

x= cos(A _w ) sin(B _w )-sin(A _w )*cos(B _w )*cos(B _j -A _j )；

y= sin(B _j -A _j )*cos(B _w ) ；

wherein θ represents an azimuth angle θ of a target object B point in a current video stream image relative to a fixed camera a point; a (A) _j , A _w ) Representing the coordinate position of a fixed camera, A _j For fixing the longitude of the point A of the camera, A _w Is fixed toDetermining the latitude of the point A of the camera; b (B) _j , B _w ) Representing the coordinate position of the B point of the target object in the current video stream image, B _j Longitude of point B as target object, B _w Is the latitude of the point B of the target object.

S=R*arccos(sin(A _j )*sin(B _j )+cos(A _j )*cos(B _j )*cos(B _w -A _w ));

s represents the distance between the point B of the target object and the point A of the fixed camera; r is the average radius of the earth; a is that _j For fixing the longitude of the point A of the camera, A _w The latitude of the point A of the fixed camera; b (B) _j Longitude of point B as target object, B _w Is the latitude of the point B of the target object.

According to the calculated azimuth angle and distance, the orientation of the fixed camera is adjusted in real time, and meanwhile, the orientation of the fixed camera is adjusted by the linkage GIS. And when the calculated distance is larger than the specified distance and the azimuth angle is larger than the set angle, adjusting the orientation of the fixed camera.

The exemplary embodiments of the present application have been particularly shown and described above. It is to be understood that this application is not limited to the precise arrangements, instrumentalities and instrumentalities described herein; on the contrary, the application is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A method of supervising target positioning, comprising:

collecting surrounding images of a fixed camera, marking the collected sample images, and finding out position information of a corresponding marked target object on a GIS (geographic information system) at the same time to establish a training data set;

training the target recognition positioning model of the fixed camera by using the training data set to obtain a trained target recognition positioning model;

performing target recognition on the current video stream image by adopting a trained target recognition positioning model, and obtaining the position information of a target object;

and calculating the azimuth angle and the distance of the target object relative to the fixed camera so as to adjust the direction of the fixed camera.

2. The method for supervising target positioning according to claim 1, wherein,

the azimuth angle θ of the target object B point relative to the fixed camera a point in the current video stream image is calculated using the following expression:

θ=atan2(y,x)；

x= cos(A _w ) sin(B _w )-sin(A _w )*cos(B _w )*cos(B _j -A _j )；

y= sin(B _j -A _j )*cos(B _w ) ；

3. The method for supervising target positioning according to claim 2, wherein,

the distance S of the target object B point with respect to the fixed camera a point is calculated using the following expression:

S=R*arccos(sin(A _j )*sin(B _j )+cos(A _j )*cos(B _j )*cos(B _w -A _w ))；

4. A surveillance target positioning method according to claim 2 or 3, wherein calculating the azimuth and distance of the target object with respect to the fixed camera to adjust the direction of the fixed camera comprises:

according to the camera and a target object in the current video stream image, calculating an azimuth angle of the fixed camera, and adjusting the direction of the fixed camera;

when the calculated azimuth angle is larger than the rotatable angle of the camera, the fixed camera is adjusted to be the maximum azimuth angle of the rotatable angle, the GIS direction is synchronously adjusted, so that the camera direction on the GIS is always consistent with the actual camera direction, wherein each camera is provided with a default focal length and a default direction when being installed, the opposite position of the camera is the default direction, the azimuth angle of the camera in the default direction is recorded as theta, and the angle of the fixed camera direction on the GIS is adjusted as theta.

5. The method for supervising target positioning according to claim 1, wherein,

one or more target objects are identified from the current video stream image, the position coordinates of the identified one or more target objects are obtained, and the position coordinates of the current video stream image are characterized using the longitude of the target object and the latitude of the target object.

6. The method for supervising target positioning according to claim 5, wherein,

7. The method for supervising target positioning according to claim 1, wherein,

collecting panoramic images formed by continuous shooting of a fixed camera in different rotation ranges under different focal lengths, wherein the fixed camera is used for representing a camera arranged at a fixed position;

8. A supervision target positioning method according to claim 2 or 3,

9. The method for supervising target positioning according to claim 7, wherein,

and (3) establishing a target identification positioning model by using a yolov5 algorithm, and training the target identification positioning model by using the established training data set, so that the target identification positioning model learns characteristic information of target scenes around a fixed camera, such as buildings, fixed facilities, road traffic, vegetation layout and the like, so as to finely adjust network parameters in the target identification positioning model.

10. A supervision target positioning apparatus employing the supervision target positioning method according to any one of claims 1 to 9, characterized in that the supervision target positioning apparatus comprises:

the acquisition processing module acquires images around the fixed camera, marks the acquired sample images, and finds out the position information of the corresponding marked target object on the GIS at the same time so as to establish a training data set;

the model training module is used for training the target recognition positioning model of the fixed camera by using the training data set to obtain a trained target recognition positioning model;

the target recognition module is used for carrying out target recognition on the current video stream image by adopting a trained target recognition positioning model and obtaining the position information of a target object;

and the calculation and adjustment module calculates the azimuth angle and the distance of the target object relative to the fixed camera so as to adjust the direction of the fixed camera.