CN115330838A

CN115330838A - Cross-scale target tracking method and device

Info

Publication number: CN115330838A
Application number: CN202210999023.2A
Authority: CN
Inventors: 高坤; 邵航; 夏晗; 刘威
Original assignee: Zhejiang Future Technology Institute (jiaxing)
Current assignee: Yangtze Delta Region Institute of Tsinghua University Zhejiang
Priority date: 2022-08-19
Filing date: 2022-08-19
Publication date: 2022-11-11

Abstract

The invention discloses a cross-scale target tracking method and a device, wherein the method comprises the following steps: building a training data acquisition device and training to obtain a cross-scale target heelA tracking model; constructing a cross-scale panoramic sensing system, collecting images, and calculating to obtain a transformation matrix of adjacent images of the overall sensing image; splicing and fusing a plurality of global perception images by using transformation matrixes of adjacent images to obtain a 360-degree panoramic spliced image I _G (ii) a Calculating a coordinate positioning result of the target in the local perception image; obtaining a tracking track of a target in a corresponding global perception image according to the tracking track of the target in the local perception image; obtaining a 360-degree panoramic mosaic image I through mosaic _G The specific trajectory of the target. The panoramic sensing system used by the invention solves the problems of low target sensing resolution and poor edge imaging quality, and provides data support for the training of a high-precision target tracking method, so that the target tracking method has high precision and good applicability.

Description

Cross-scale target tracking method and device

Technical Field

The application relates to the technical field of deep learning, in particular to a cross-scale target tracking method and device.

Background

The existing large-scene panoramic imaging mode generally utilizes a plurality of cameras to splice panoramic imaging or utilizes a plurality of fish glasses to perform panoramic imaging. The two modes have the defects of low small-scale target perception resolution and poor edge imaging quality, so that when the cross-scale target tracking is carried out under the conditions of motion blur, background similar interference, and occlusion or illumination change, the precision is low and the applicability is poor; in addition, the deployment modes have the problems of high price, high deployment cost and the like.

Therefore, the inventor recognizes that, for the requirement of tracking a panoramic target of a large scene, a large-scene panoramic imaging algorithm which has a simple structure, is convenient to deploy, and has a cross-scale high-precision target tracking performance is urgently needed.

Disclosure of Invention

Based on the above, a cross-scale target tracking method and device are provided to solve the above technical problems.

In a first aspect, a method for tracking a cross-scale target includes:

firstly, building a training data acquisition device, calibrating the position of a sensing imaging camera of the built training data acquisition device, acquiring data by using the built training data acquisition device, and then training to obtain a cross-scale target tracking model;

step two, a trans-scale panoramic sensing system is built, and the trans-scale panoramic sensing system is utilized to collect images to obtain a plurality of full panoramic sensing systemsThe local perception image and the local perception images are calculated to obtain a transformation matrix of adjacent images of the global perception image; splicing and fusing a plurality of global perception images by utilizing the transformation matrix of the adjacent images of the global perception image to obtain a 360-degree panoramic spliced image I _G ；

Step three, processing the multiple local perception images acquired in the step two by using the cross-scale target tracking model obtained by training in the step one respectively to obtain a coordinate position positioning result of the target in the local perception images; calculating a tracking track of the target in the local perception image based on a Kalman filtering algorithm, and obtaining the tracking track of the target in the corresponding global perception image according to the tracking track of the target in the local perception image; transforming target tracks in a plurality of global perception images by using a transformation matrix of adjacent images of the global perception images to obtain a 360-degree panoramic mosaic image I _G The specific trajectory of the target.

Optionally, the cross-scale panoramic sensing system comprises twelve groups of fixing devices arranged in a regular dodecagon shape, each group of fixing devices is fixed with one camera array fixing support, and each camera array fixing support is fixed with one group of camera arrays; the twelve groups of camera arrays respectively and correspondingly cover twelve peripheral areas in the circumferential direction of the regular dodecagon;

each group of camera array comprises three perception imaging cameras, each three perception imaging camera comprises two local perception imaging cameras and a global perception imaging camera, and the global perception imaging camera is fixed between the two local perception imaging cameras;

the imaging areas of the global perception imaging cameras of the two adjacent groups of camera arrays have overlapping parts; for each group of camera array, the field angle of the global perception imaging camera is larger than the sum of the field angles of the two local perception imaging cameras, the vertical field angle of the global perception imaging camera is larger than 2 times of the vertical field angle of the local perception imaging camera, and the angle of the field-of-view optical axis of the local perception imaging camera can be adjusted at will.

Further optionally, the training data collecting device includes one camera array fixing bracket and a group of camera arrays fixed on the camera array fixing bracket.

Further optionally, in the trans-scale panoramic sensing system, the optical axes of the fields of view of all the sensing imaging cameras are coplanar with the regular dodecagon, and the reverse extension lines of the optical axes of the fields of view of all the global sensing imaging cameras pass through the center of the regular dodecagon; the horizontal field angles of all the global perception imaging cameras are larger than or equal to 45 degrees, and the imaging resolutions of all the global perception imaging cameras and all the local perception imaging cameras are larger than or equal to 900 ten thousand pixels.

Further optionally, the calibration of the position of the perception imaging camera is performed on the built training data acquisition device in the step one, the built training data acquisition device is used for data acquisition, and then the training to obtain the cross-scale target tracking model specifically includes:

position calibration is carried out on two local perception imaging cameras and a global perception imaging camera in a training data acquisition device, and a mapping matrix of the two local perception imaging cameras relative to the global perception imaging camera is obtained by utilizing a characteristic point matching method

And

acquiring image data at a specific place by using a built training data acquisition device to obtain a local perception image data set and a global perception image data set, and marking a positioning frame on a target acquired in the local perception image data set and the global perception image data set; dividing a local perception image data set with labels into a training set, a testing set and a verification set according to a preset proportion;

training the training set by using the existing target tracking algorithm and the loss function corresponding to the existing target tracking algorithm, and iteratively training the model according to a preset training strategy until the loss function is converged to obtain a cross-scale target tracking model.

Further optionally, the feature point matching method is a surf feature point matching algorithm; the specific sites include pedestrian streets and intersections.

Further optionally, the preset ratio is 8; the target tracking algorithm is a ByteTrack algorithm.

Further optionally, the second step specifically includes:

constructing a cross-scale panoramic sensing system, synchronously acquiring image data of 12 groups of camera arrays by using a square wave synchronous triggering method, and recording a global sensing image sensed by the 12 groups of camera arrays at the time t as

And noting the perceived local perception image as

i represents the index of a group of camera arrays, i belongs to {1,2, …,12};

for a global perceptual image sequence at time t

Separately aligning adjacent images in a sequence using an image feature based matching algorithm

And

extracting and matching features to obtain a transformation matrix of adjacent images of the global perception image

i∈{1,2,…,11}，j＝i+1；

Splicing and fusing 12 global perception images by using a transformation matrix of adjacent images of the global perception images to obtain a 360-degree panoramic spliced image I _G 。

Further optionally, the step three specifically includes:

respectively processing 24 local perception images perceived by 12 groups of camera arrays at t moment by using the cross-scale target tracking model obtained by training in the step one to obtain a coordinate position positioning result of the target in the local perception images;

predicting the position of the target track stored at the t-1 moment at the t moment based on a Kalman filtering algorithm, performing similarity matrix calculation based on a IoU strategy or an appearance characterization vector, and then performing matching by using a Hungarian algorithm;

for the matched track, combining the matching result and the target positioning result at the time t, updating the track at the time t by using a Kalman filtering algorithm, and then taking the updated track as the target tracking track at the time t; putting the unmatched tracks into the unmatched track set, and matching at the next moment; adding the unmatched positioning result into the current tracking track set as a new track, and updating the track according to a Kalman filtering algorithm;

according to a mapping matrix

And

mapping the tracking track of the target in the local perception image back to the corresponding global perception image to obtain the tracking track of the target in the global perception image;

utilizing a transformation matrix of adjacent images of a global perceptual image for an acquired target track of the global perceptual image

Transforming the target tracks of the 12 overall perception images to obtain a 360-degree panoramic mosaic image I _G The specific track of the medium target realizes the cross-scale target tracking of the hundred million pixel level panoramic image.

In a second aspect, a cross-scale target tracking device includes:

the trans-scale target tracking model training module is used for building a training data acquisition device, calibrating the position of a perception imaging camera of the built training data acquisition device, acquiring data by using the built training data acquisition device, and then training to obtain a trans-scale target tracking model;

the panoramic image fusion module is used for building a trans-scale panoramic sensing system, acquiring images by using the trans-scale panoramic sensing system to obtain a plurality of global sensing images and a plurality of local sensing images, and calculating to obtain a transformation matrix of adjacent images of the global sensing images; splicing and fusing a plurality of global perception images by using the transformation matrix of the adjacent images of the global perception images to obtain a 360-degree panoramic spliced image I _G ；

The cross-scale target tracking module is used for processing a plurality of local perception images collected in the panoramic image fusion module by respectively utilizing a cross-scale target tracking model obtained by training of the cross-scale target tracking model training module to obtain a coordinate position positioning result of the target in the local perception images; calculating a tracking track of the target in the local perception image based on a Kalman filtering algorithm, and obtaining the tracking track of the target in the corresponding global perception image according to the tracking track of the target in the local perception image; transforming target tracks in a plurality of global perception images by using transformation matrixes of adjacent images of the global perception images to obtain a 360-degree panoramic mosaic image I _G The specific trajectory of the target.

The invention has at least the following beneficial effects:

the embodiment of the invention provides a large-scene-oriented trans-scale target tracking method, which comprises the steps of utilizing a multilateral heterogeneous panoramic sensing system to carry out multi-scale high-resolution panoramic imaging on a target, then carrying out large-scale tracking on the target through a local sensing image, and finally mapping the target to a global sensing image to realize trans-scale high-precision target track tracking; the panoramic sensing system used by the invention has simple structure and convenient deployment, solves the problems of low target sensing resolution and poor edge imaging quality, meets the data acquisition and training conditions of large-scene panoramic high-precision target tracking, provides data support for the training of a high-precision target tracking method, and then realizes ultrahigh-resolution panoramic target tracking, so that the target tracking method provided by the embodiment of the invention has high precision and good applicability.

Drawings

Fig. 1 is a schematic flowchart of a cross-scale target tracking method according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a cross-scale panoramic sensing system according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a training data acquisition device according to an embodiment of the present invention;

fig. 4 is a block diagram of a module architecture of a cross-scale target tracking apparatus according to an embodiment of the present invention.

Description of reference numerals:

1. a fixing device;

2. a camera array fixing bracket;

3. an array of cameras; 31. a first partially-aware imaging camera; 32. a second local perception imaging camera; 33. a global perception imaging camera.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In one embodiment, as shown in fig. 1, a cross-scale target Tracking method is provided, the main process includes large-scene cross-scale target Tracking model training, ultra-high resolution panoramic image fusion calculation, and ultra-high resolution panoramic cross-scale target Tracking result fusion, and the method adopts a Tracking-by-Detection strategy to realize target Tracking, including the following steps:

s1, building a training data acquisition device, calibrating the position of a sensing imaging camera of the built training data acquisition device, acquiring data by using the built training data acquisition device, and then training to obtain a cross-scale target tracking model;

s2, building a cross-scale panoramic sensing system and sensing by utilizing the cross-scale panoramicThe system collects images to obtain a plurality of global perception images and a plurality of local perception images, and calculates a transformation matrix of adjacent images of the global perception images; splicing and fusing a plurality of global perception images by using the transformation matrix of the adjacent images of the global perception images to obtain a 360-degree panoramic spliced image I _G ；

S3, processing the multiple local perception images acquired in the S2 by using the cross-scale target tracking model obtained by the training of the S1 respectively to obtain a coordinate position positioning result of the target in the local perception images; calculating a tracking track of the target in the local perception image based on a Kalman filtering algorithm, and obtaining the tracking track of the target in the corresponding global perception image according to the tracking track of the target in the local perception image; transforming target tracks in a plurality of global perception images by using transformation matrixes of adjacent images of the global perception images to obtain a 360-degree panoramic mosaic image I _G The specific trajectory of the target.

As shown in fig. 2, the trans-scale panoramic sensing system adopts a polygonal heterogeneous design to realize 360-degree panoramic sensing imaging, and includes twelve groups of fixing devices 1 arranged in a regular dodecagon, each group of fixing devices 1 is fixed with one camera array fixing support 2, and each camera array fixing support 2 is fixed with one group of camera arrays 3. That is, 12 sets of camera arrays 3 are arranged around the polygonal array around 12 sets of fixtures 1, and 12 peripheral regions of the regular dodecagon in the circumferential direction are respectively covered by the 12 sets of camera arrays 3.

Fig. 2 (a) at 3 is a group of camera arrays, and specifically, as shown in fig. 2 (b), each group of camera arrays 3 includes three perceptual imaging cameras; the three perception imaging cameras specifically include two local perception imaging cameras (a first local perception imaging camera 31 and a second local perception imaging camera 32) and one global perception imaging camera 33, and the global perception imaging camera 33 is fixed in the middle of the first local perception imaging camera 31 and the second local perception imaging camera 32.

The imaging areas of the globally perceived imaging cameras 33 of two adjacent sets of camera arrays 3 have overlapping portions; for each group of camera arrays 3, the field angle of the global perception imaging camera 33 is greater than the sum of the field angles of the two local perception imaging cameras, the vertical field angle of the global perception imaging camera 33 is greater than 2 times of the vertical field angle of the local perception imaging camera, and the field optical axes of the two local perception cameras can be adjusted at any angle to achieve heterogeneous deployment.

The trans-scale panoramic sensing system built by the embodiment comprises 12 global sensing imaging cameras 33 and 24 local sensing imaging cameras, the visual field optical axes of all the sensing imaging cameras are coplanar with a regular dodecagon, and the reverse extension lines of the visual field optical axes of all the global sensing imaging cameras 33 pass through the center of the regular dodecagon; the imaging resolution of all perceptual imaging cameras (including global perceptual imaging cameras and local perceptual imaging cameras) is no lower than 900 ten thousand pixels.

The system is characterized in that: in order to avoid dead angles in panoramic imaging and also consider the design of subsequent algorithms, it is necessary to ensure that imaging areas between each group of camera arrays 3 have a certain overlap. Taking the above system as an example, the horizontal field angle of the single global perception imaging camera 33 theoretically needs to reach more than 30 degrees, and actually should reach more than 45 degrees to ensure a certain degree of imaging overlapping area; for a set of camera arrays 3, the vertical field of view of the global perception cameras should be greater than 2 times the vertical field of view of the local perception cameras; the local perception device of the system can realize any angle adjustment to realize heterogeneous deployment, as long as the vertical field angle of the global perception imaging camera 33 is ensured to be 2 times larger than that of the corresponding local perception imaging camera.

In addition, the structure of the training data acquisition device is as shown in fig. 3, and includes a camera array fixing support 2 in the cross-scale panoramic sensing system and a group of camera arrays 3 fixed on the camera array fixing support 2, where the camera arrays 3 specifically include two local perception imaging cameras (a first local perception imaging camera 31 and a second local perception imaging camera 32) and a global perception imaging camera 33, and the configuration requirements of the two local perception imaging cameras are the same as the configuration requirements of the group of camera arrays 3 in the cross-scale panoramic sensing system.

In fig. 3, a is an assumed scene tracking target, B is an imaging schematic diagram of the global perception imaging camera 33 capturing a perception target, C, D respectively represent imaging schematic diagrams of the first local perception imaging camera 31 and the second local perception imaging camera 32 capturing a perception target, and E is a schematic diagram of a region corresponding to C, D camera local imaging in B. As can be seen from fig. 3, the target scales of the sensing target imaged on the global sensing imaging camera and the local sensing imaging camera have a large difference, and the target has a small size under global sensing, which results in poor target tracking accuracy of the global sensing image.

Further, the calibration of the position of the perception imaging camera is carried out on the built training data acquisition device in the S1, the built training data acquisition device is used for data acquisition, and then the training to obtain the cross-scale target tracking model specifically comprises the following steps:

(1) The two local perception imaging cameras (the first local perception imaging camera 31 and the second local perception imaging camera 32) and the global perception imaging camera 33 in the training data acquisition device shown in fig. 3 are subjected to position calibration, and a mapping matrix of the two local perception imaging cameras relative to the global perception imaging camera 33 is obtained by using a feature point matching method

And

(2) Carrying out diversity large-scene data acquisition by utilizing a built training data acquisition device, acquiring image data in a specific place to obtain a local perception image data set and a global perception image data set, and carrying out positioning frame marking on targets acquired in the local perception image data set and the global perception image data set; dividing a local perception image data set with labels into a training set, a test set and a verification set according to a preset ratio (8;

(3) Training a training set by using the existing target tracking algorithm and a loss function corresponding to the existing target tracking algorithm, and iteratively training the model according to a preset training strategy (learning rate, training Batch, training Epoch and optimizer) until the loss function is converged to obtain a large-scene cross-scale target tracking model, thereby completing the training of the global cross-scale target tracking model.

The feature point matching method includes, but is not limited to, surf feature point matching algorithm, the specific location includes, but is not limited to, a wide and dense target place such as a pedestrian street square and a crossroad, and the target tracking algorithm may be, but is not limited to, byteTrack algorithm.

Further, S2 specifically includes:

s21, constructing the cross-scale panoramic sensing system shown in FIG 2, and synchronously acquiring image data of 12 groups of camera arrays by using a square wave synchronous triggering method, wherein the situation that the 12 groups of camera arrays sense global sensing images at a certain time t is assumed to be

And the perceived local perceived image is

i represents the index of a group of camera arrays, i belongs to {1,2, …,12};

s22, for the global perception image sequence at the time t

And

i ∈ {1,2, …,11}, j = i +1, with which neighboring images can be stitched;

s23, splicing and fusing 12 global perception images by using the transformation matrix of the adjacent images of the global perception image to obtain a 360-degree panoramic spliced image I _G Thus, the fusion calculation of the ultrahigh resolution panoramic image is completed.

Further, S3 specifically includes:

s31, respectively processing 24 local perception images perceived by 12 groups of camera arrays of the cross-scale panoramic perception system at t moments by utilizing a cross-scale target tracking model obtained by training in the S1 to obtain a coordinate position positioning result of a target in the local perception images;

s32, predicting the position of the target track stored at the t-1 moment at the t moment based on a Kalman filtering algorithm, performing similarity matrix calculation based on a IoU strategy or an appearance characterization vector, and then performing matching by using a Hungary algorithm; namely, the target track stored at the time t-1 is matched with each target position in the prediction result, and the target track stored at the time t-1 is matched with the corresponding prediction position at the time t;

s33, updating the track at the time t by using a Kalman filtering algorithm according to the matched track and the target positioning result at the time t, and then taking the updated track as a target tracking track at the time t; putting the unmatched tracks into the unmatched track set, and matching at the next moment; adding the unmatched positioning result serving as a new track into the current tracking track set, and updating the track according to a Kalman filtering algorithm;

s34, according to the mapping matrix

s35, aiming at the acquired target track of the global perception image, utilizing the obtained transformation matrix of the adjacent image of the global perception image

The traditional target tracking method is used for carrying out target positioning and track tracking on a local scene perceived by a monocular camera, and is difficult to cope with large scale change of a target in a panoramic imaging process under a panoramic imaging condition. Based on the panoramic target tracking requirement of the large scene, the global perception of the large scene and the high-precision tracking of the multi-scale target are realized, and higher requirements are provided for panoramic imaging hardware and a target tracking method.

Therefore, the embodiment of the invention provides a large-scene-oriented cross-scale target tracking method, which comprises the steps of utilizing a multilateral heterogeneous panoramic sensing system to carry out multi-scale high-resolution panoramic imaging on a target, then carrying out large-scale tracking on the target through a local sensing image, and finally mapping the target to a global sensing image to realize cross-scale high-precision target track tracking; the panoramic sensing system utilized by the invention has a simple structure and is convenient to deploy, the problems of low target sensing resolution and poor edge imaging quality are solved, the data acquisition and training conditions of large-scene panoramic high-precision target tracking are met, data support is provided for the training of a high-precision target tracking method, and then the ultrahigh-resolution panoramic target tracking is realized based on the designed cross-scale target tracking method, so that the target tracking method provided by the embodiment of the invention has high precision and good applicability.

It should be understood that, although the steps in the flowchart of fig. 1 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 1 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

In one embodiment, as shown in fig. 4, there is provided a cross-scale target tracking apparatus, comprising:

the cross-scale target tracking model training module 401 is used for building a training data acquisition device, calibrating the position of a perception imaging camera of the built training data acquisition device, acquiring data by using the built training data acquisition device, and then training to obtain a cross-scale target tracking model;

the panoramic image fusion module 402 is used for building a trans-scale panoramic sensing system, collecting images by using the trans-scale panoramic sensing system to obtain a plurality of global sensing images and a plurality of local sensing images, and calculating a transformation matrix of adjacent images of the global sensing images; splicing and fusing a plurality of global perception images by utilizing the transformation matrix of the adjacent images of the global perception image to obtain a 360-degree panoramic spliced image I _G ；

The cross-scale target tracking module 403 is configured to process, by using a cross-scale target tracking model obtained through training by the cross-scale target tracking model training module 401, multiple local perception images acquired by the panoramic image fusion module 402, respectively, to obtain a coordinate position positioning result of a target in the local perception images; calculating a tracking track of the target in the local perception image based on a Kalman filtering algorithm, and obtaining the tracking track of the target in the corresponding global perception image according to the tracking track of the target in the local perception image; transforming target tracks in a plurality of global perception images by using transformation matrixes of adjacent images of the global perception images to obtain a 360-degree panoramic mosaic image I _G The specific trajectory of the target.

For specific definition of the cross-scale target tracking device, refer to the above definition of a cross-scale target tracking method, which is not described herein again. The modules in the cross-scale target tracking device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which includes a memory and a processor, the memory storing a computer program, and relates to all or part of the flow of the method of the above embodiment.

In one embodiment, a computer-readable storage medium having a computer program stored thereon is provided, which relates to all or part of the processes of the above-described embodiment methods.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile memory may include Read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical storage, or the like. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A cross-scale target tracking method is characterized by comprising the following steps:

building a training data acquisition device, calibrating the position of a perception imaging camera of the built training data acquisition device, acquiring data by using the built training data acquisition device, and then training to obtain a cross-scale target tracking model;

step two, a cross-scale panoramic sensing system is built, the cross-scale panoramic sensing system is used for collecting images to obtain a plurality of global sensing images and a plurality of local sensing images, and a transformation matrix of adjacent images of the global sensing images is obtained through calculation; splicing and fusing a plurality of global perception images by using the transformation matrix of the adjacent images of the global perception images to obtain a 360-degree panoramic spliced image I _G ；

Step three, processing the multiple local perception images acquired in the step two by using the cross-scale target tracking model obtained by training in the step one respectively to obtain a coordinate position positioning result of the target in the local perception images; calculating a tracking track of the target in the local perception image based on a Kalman filtering algorithm, and obtaining the tracking track of the target in the corresponding global perception image according to the tracking track of the target in the local perception image; transforming target tracks in a plurality of global perception images by using transformation matrixes of adjacent images of the global perception images to obtain a 360-degree panoramic mosaic image I _G The specific trajectory of the target.

2. The cross-scale target tracking method according to claim 1, wherein the cross-scale panoramic sensing system comprises twelve groups of fixing devices arranged in a regular dodecagon, each group of fixing devices is fixed with one camera array fixing bracket, and each camera array fixing bracket is fixed with one group of camera arrays; the twelve groups of camera arrays respectively and correspondingly cover twelve peripheral areas in the circumferential direction of the regular dodecagon;

3. The cross-scale target tracking method according to claim 2, wherein the training data acquisition device comprises one of the camera array fixing brackets and a group of the camera arrays fixed on the camera array fixing bracket.

4. The cross-scale target tracking method according to claim 3, wherein in the cross-scale panoramic sensing system, the optical axes of the fields of view of all the sensing imaging cameras are coplanar with the regular dodecagon, and the reverse extension lines of the optical axes of the fields of view of all the global sensing imaging cameras pass through the center of the regular dodecagon; the horizontal field angles of all the global perception imaging cameras are larger than or equal to 45 degrees, and the imaging resolutions of all the global perception imaging cameras and all the local perception imaging cameras are larger than or equal to 900 ten thousand pixels.

5. The cross-scale target tracking method according to claim 4, wherein in the first step, the position of the perception imaging camera is calibrated for the built training data acquisition device, the built training data acquisition device is used for data acquisition, and then the training is carried out to obtain the cross-scale target tracking model specifically comprises the following steps:

And

acquiring image data at a specific place by using a built training data acquisition device to obtain a local perception image data set and a global perception image data set, and marking a positioning frame on a target acquired in the local perception image data set and the global perception image data set; dividing a local perception image data set with labels into a training set, a test set and a verification set according to a preset proportion;

training the training set by using the existing target tracking algorithm and the loss function corresponding to the existing target tracking algorithm, and iterating the training model according to a preset training strategy until the loss function is converged to obtain the cross-scale target tracking model.

6. The cross-scale target tracking method according to claim 5, wherein the feature point matching method is a surf feature point matching algorithm; the specific sites include pedestrian streets and intersections.

7. The cross-scale target tracking method according to claim 5, wherein the preset ratio is 8; the target tracking algorithm is a ByteTrack algorithm.

8. The cross-scale target tracking method according to claim 5, wherein the second step specifically comprises:

And noting the perceived local perception image as

i represents the index of a group of camera arrays, i belongs to {1,2, …,12};

for a global perceptual image sequence at time t

And

9. The cross-scale target tracking method according to claim 8, wherein the third step specifically comprises:

according to a mapping matrix

And

10. A cross-scale target tracking device, comprising:

The cross-scale target tracking module is used for processing a plurality of local perception images collected in the panoramic image fusion module by respectively utilizing a cross-scale target tracking model obtained by training of the cross-scale target tracking model training module to obtain a coordinate position positioning result of the target in the local perception images; calculating a tracking track of the target in the local perception image based on a Kalman filtering algorithm, and obtaining the tracking track of the target in the corresponding global perception image according to the tracking track of the target in the local perception image; transforming target tracks in a plurality of global perception images by using a transformation matrix of adjacent images of the global perception images to obtain a 360-degree panoramic mosaic image I _G The specific trajectory of the target.