CN115760923B

CN115760923B - Passive non-visual field target real-time positioning tracking method and system

Info

Publication number: CN115760923B
Application number: CN202211570111.7A
Authority: CN
Inventors: 李学龙; 赵斌; 王奕豪
Original assignee: Shanghai AI Innovation Center
Current assignee: Shanghai AI Innovation Center
Priority date: 2022-12-08
Filing date: 2022-12-08
Publication date: 2024-05-28
Anticipated expiration: 2042-12-08
Also published as: CN115760923A

Abstract

The invention provides a passive non-visual field target real-time positioning tracking method and a system, wherein the method comprises the following steps: acquiring real-time video streams which are reflected by a relay medium and contain non-visual field target action tracks in real time by using an image pickup unit; initializing a position coding vector, and setting the position coding vector as an all-zero vector; inputting frames in a real-time video stream into a tracking unit in real time frame by frame to perform tracking operation, obtaining image feature vectors contained in each frame one by one, updating position coding vectors by using the image feature vectors after each image feature vector contained in each frame is obtained, and inputting the position coding vectors into a decoder after each update; and the decoder decodes after receiving one position coding vector to obtain real-time coordinate information corresponding to each position coding vector. The invention adopts a pure passive scheme, reduces the layout cost, and solves the problems of high cost, difficult deployment and application caused by harsh experimental conditions of an active method in the non-visual field tracking problem.

Description

Passive non-visual field target real-time positioning tracking method and system

Technical Field

The invention relates to the technical field of electronics, in particular to a passive non-visual field target real-time positioning and tracking method and system.

Background

The field of non-field imaging focuses on imaging, sensing and detecting invisible areas. In a typical setting, the invisible area refers to an area separated from the detector by a wall, and the optical signal in this area cannot directly propagate to the position of the detector, but can be reached by means of reflection from the relay wall, and thus can also be referred to as an area outside the direct line of sight. In the past, mainstream non-field of view imaging techniques focused on three-dimensional scene reconstruction of invisible areas using actively emitted signals (e.g., ultrafast pulsed laser, acoustic waves, etc.) and based on time of flight information of the return signals, etc.

For non-visual tracking tasks, most of the existing technologies use active schemes, but the deployment and application of the technologies are limited by the high cost and severe experimental conditions; very few techniques use passive schemes and convert tracking tasks to regression tasks for locations with the help of deep neural networks, but the effect is often not satisfactory. In addition, most of the existing methods do not utilize information generated by the movement of the object and priori knowledge of the continuity of the movement of the object, so that the tracking precision is not ideal and the stability is poor.

Disclosure of Invention

The present invention aims to solve one of the above problems.

The invention mainly aims to provide a passive non-visual field target real-time positioning and tracking method.

It is another object of the present invention to provide a passive non-field of view target real-time location tracking system.

In order to achieve the above purpose, the technical scheme of the invention is specifically realized as follows:

The invention provides a passive non-visual field target real-time positioning and tracking method, which comprises the following steps: acquiring real-time video streams which are reflected by a relay medium and contain non-visual field target action tracks in real time by using an image pickup unit; initializing a position coding vector, and taking the position coding vector as an all-zero vector; inputting frames in the real-time video stream into a tracking unit in real time frame by frame to perform tracking operation, obtaining image feature vectors contained in each frame one by one, updating the position coding vectors by using the image feature vectors after each image feature vector contained in each frame is obtained, and inputting the position coding vectors into a decoder after each update; and after each time the decoder receives one position coding vector, decoding the received position coding vector to obtain real-time coordinate information corresponding to each position coding vector.

In another aspect, the present invention provides a passive non-visual field target real-time positioning and tracking system, including: the camera shooting unit is used for acquiring real-time video streams which are reflected by the relay medium and contain the action tracks of the non-visual targets in real time; the initialization unit is used for initializing the position coding vector and setting the position coding vector as an all-zero vector; the tracking unit is used for receiving frames in the real-time video stream input in real time frame by frame, executing tracking operation, obtaining image feature vectors contained in each frame one by one, updating the position coding vector by using the image feature vectors after each image feature vector contained in each frame is obtained, and inputting the position coding vector to a decoder after each update; and the decoder is used for decoding the received position coding vectors after receiving one position coding vector, so as to obtain real-time coordinate information corresponding to each position coding vector.

According to the technical scheme provided by the invention, the invention provides the passive non-visual field target real-time positioning and tracking method and the system, wherein the passive non-visual field target real-time positioning and tracking method only utilizes the camera unit to shoot real-time video in real time, updates the position coding vector contained in each frame of image in the video in real time through tracking operation, and then decodes the position coding vector into the position coordinate corresponding to the frame in real time by using the decoder, so that the aim of real-time tracking is achieved. The passive non-visual field target real-time positioning tracking method reduces layout cost by adopting a pure passive scheme, and solves the problems of difficult deployment and application caused by high cost and severe experimental conditions of an active method in the non-visual field tracking problem. In addition, by introducing a differential frame and a specially designed propagation and calibration network, the problems of non-ideal tracking precision and poor stationarity caused by neglecting motion information and motion continuity prior in the non-visual field real-time tracking problem are solved, and the tracking precision and the track stationarity are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a passive non-view target real-time positioning and tracking method provided in embodiment 1 of the present invention;

Fig. 2 is a schematic view of a scene setting provided in embodiment 1 of the present invention;

FIG. 3 is a flowchart of tracking and decoding according to embodiment 1 of the present invention;

FIG. 4 is a flow chart of the method for performing pre-heating and tracking using the propagation and calibration network according to embodiment 1 of the present invention;

FIG. 5 is a flowchart of the preheating stage, tracking stage and decoding stage according to embodiment 1 of the present invention;

Fig. 6 is a schematic structural diagram of a passive non-view target real-time positioning and tracking system according to embodiment 1 of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or quantity or position.

In the description of the present invention, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

Embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.

Example 1

The embodiment provides a passive non-view target real-time positioning and tracking method, as shown in fig. 1, which comprises the following steps:

step S101, real-time video stream which is reflected by a relay medium and contains a non-visual field target action track is acquired in real time by using an image pickup unit; specifically, the non-visual field target generally refers to a living body (e.g., a person, an animal, etc.) or a non-living body (e.g., a vehicle, etc.) that can move freely, and is a target for which the present embodiment needs to track the movement track. The relay medium may be a planar object or a non-planar object that can reflect light, such as a relay wall, a metal plate, a plastic plate, or the like, as long as the light can be reflected. The image capturing unit may be a general consumer-grade RGB camera, and should be capable of capturing video in real time. Fig. 2 is a schematic view showing a scene setting of the present embodiment, where the scene includes a walking person, a general camera, a relay wall, and an obstacle. When a person walks in a room, the light isolated from the person by the existence of the obstacle is directly acquired by the camera, so that the common camera can only capture the walking track of the person by shooting the light reflected to the relay wall by the person in the walking process.

Step S102, initializing a position coding vector, and setting the position coding vector as an all-zero vector; in particular, a position-coded vector refers to a high-dimensional vector of implicit position semantic information that can be decoded by a decoder into actual position coordinates. Before entering the tracking phase, the position-coding vector needs to be zeroed to prevent non-zeroing from affecting the following calculation. The initialization assignment of the position coding vector can provide preliminary information, so that the position coding vector becomes a true vector with position information. Step S102 may be completed before step S101 or after step S101, as long as it is completed before step S103.

Step S103, inputting frames in the real-time video stream into a tracking unit in real time frame by frame to perform tracking operation, obtaining image feature vectors contained in each frame one by one, updating position coding vectors by using the image feature vectors after each image feature vector contained in each frame is obtained, and inputting the position coding vectors into a decoder after each update; specifically, in the tracking stage, the received real-time frames need to be subjected to position conversion, and the real-time position coding vector contained in each frame is input to a decoder for decoding to obtain real position coordinates.

In a specific embodiment, inputting frames in the real-time video stream into the tracking unit in real time frame by frame to perform tracking operation, obtaining image feature vectors contained in each frame one by one, and updating the position-coding vector by using the image feature vectors after each obtained image feature vector contained in the frame comprises: acquiring a current frame, wherein the current frame refers to a frame of a current input real-time video stream; if the current frame is not the first frame, calculating a differential frame according to the current frame and the last frame; extracting differential frame image feature vectors from the differential frames, and extracting current frame image feature vectors from the current frames, wherein the differential frame image feature vectors contain dynamic information, and the current frame image feature vectors contain static information; calculating and updating a position coding vector by using a propagation unit according to the differential frame image feature vector; and calculating and updating the position coding vector again according to the characteristic vector of the current frame image by using a calibration unit.

In particular, the position-coding vector is updated during the tracking phase with the current frame, the difference frame, and the propagation and calibration network. The frames of the tracking phase may be acquired in real time, i.e. the camera unit performs a single step tracking once it acquires a frame. Since the frame images acquired in the tracking stage are frame images that need to be decoded later, each acquired frame needs to be processed. Of course, if the first frame is received, the differential operation cannot be performed, and thus the operation starts from the second frame. Of course, if other phases (e.g., a later warm-up phase) are included before the tracking phase, then the previously received frame may be treated as the previous frame when the first frame of the tracking phase is processed. The difference frames (DIFFERENCE FRAME) refer to "difference images" obtained by differencing each frame from the previous frame in the real-time video stream. The difference Frame has the same data size as the current Frame (Raw Frame), but the former reflects the motion information at this point in time, and the current Frame reflects the static information at this point in time. Of course, if the first frame is received, the differential operation cannot be performed, and thus the operation is generally started from the second frame. However, if other phases (e.g., a later warm-up phase) are included before the tracking phase, then the first frame of the tracking phase may be processed as the previous frame. As shown in fig. 3, in the tracking stage, a difference frame d _T-1 is obtained by performing a difference operation with a T-1 frame f _T (current frame) and a T-1 frame f _T-1 (previous frame), and the propagation and calibration are performed according to the image feature vectors of the difference frame and the current frame, respectively, and the position code is updated. The image feature vector refers to a high-dimensional vector implying semantic information of an image, can be extracted from a frame image by using a feature extractor, and is extracted from a current frame and a differential frame respectively. The propagation and calibration unit is a basic component of a propagation and calibration network (PAC-Net), which includes two sets of submodules of identical structure but not sharing weights, called a propagation unit (Propagate-Cell) and a calibration unit (calibre-Cell), for propagating and calibrating the position-coding vector, respectively. The term "sharing no weight" refers to that the submodules are independent of each other and have different internal parameters, so that different functions can be exerted. Using the propagation unit, the position-coding vector can be updated with the feature vector containing dynamic information extracted from the difference frame; the use of the calibration unit allows updating the position-coding vector by means of feature vectors extracted from the current frame, which feature vectors contain static information. The propagation and calibration units in the tracking stage do not share weights, are independent of each other and have different internal parameters, and can play different roles. A flow chart performed in the trace phase using the propagation and calibration network is shown in fig. 4.

In an alternative embodiment, in the tracking operation, extracting the differential frame image feature vector from the differential frame and extracting the current frame image feature vector from the current frame includes: extracting a differential frame image feature vector from a differential frame by using a first residual neutral network, and extracting a current frame image feature vector from a current frame by using a second residual neutral network, wherein the first residual neutral network and the second residual neutral network do not share weight; the operation and updating of the position coding vector according to the differential frame image feature vector by using the propagation unit comprises the following steps: the propagation unit utilizes the first recurrent neural network to operate according to dynamic information contained in the differential frame image operation and the current position coding vector, and updates the position coding vector by utilizing the operation result; the operation and updating of the position coding vector again according to the characteristic vector of the current frame image by using the calibration unit comprises the following steps: the calibration unit performs operation according to static information contained in the current frame image operation and the current position coding vector by using the second recurrent neural network, and updates the position coding vector by using the operation result, wherein the first recurrent neural network and the second recurrent neural network do not share weight. Specifically, in the flowchart executed by the propagation and calibration network shown in fig. 4, in the step of extracting features in the tracking phase, the present embodiment uses the backbone portions of two residual neural networks (ResNet-18) that do not share weights as feature extractors for extracting feature vectors of the differential frame and the current frame, respectively; two gating recursion units GRU that do not share weights are also used in the step of propagating and calibrating to propagate and calibrate the position-coded vectors. ResNet-18 are units of a convolutional neural network CNN and GRU is a recurrent neural network RNN, and their operation can be formally described as follows:

F＝CNN(I)

h_t+1＝RNN(h_t，F)

Wherein I denotes a frame image of the extracted feature, F denotes an image feature vector of the frame image I, h _t denotes a position-coding vector before each update, and h _t+1 denotes a position-coding vector after each update.

In the tracking operation of the present embodiment, the motion information critical to the tracking task is supplemented by introducing differential frames and specially designed propagation and calibration networks, using the differential frames as carriers of the motion information, and the motion information is explicitly supplemented. The propagation and calibration network alternately extracts information from the differential frames for propagation and information from the current frames for calibration, and the recurrent neural network is utilized to explicitly model continuous motion, so that the problems of non-ideal tracking precision and poor stationarity caused by ignoring motion information and motion continuity priori in the non-visual field real-time tracking problem are solved, and the tracking precision and track stationarity are improved.

Step S104, after each received position coding vector, the decoder decodes the received position coding vector to obtain real-time coordinate information corresponding to each position coding vector. Specifically, a multi-layer perceptron (Multilayer Perceptron, MLP) is used as a decoder in the decoding step to decode the position-coded vector into position coordinates. In an alternative embodiment, after the decoder finishes decoding, the action track of the non-view target is dynamically restored according to the real-time coordinate information corresponding to each position coding vector. And sequentially connecting the position coordinates corresponding to each frame to form a real-time tracking track, and reconstructing the track of the non-visual target in real time.

According to the passive non-visual field target real-time positioning tracking method, only the camera unit is used for shooting real-time video in real time, the position coding vector contained in each frame of image in the video is updated in real time through tracking operation, and then the decoder is used for decoding the position coding vector into the position coordinate corresponding to the frame in real time, so that the purpose of real-time tracking is achieved. The passive non-visual field target real-time positioning tracking method reduces layout cost by adopting a pure passive scheme, and solves the problems of difficult deployment and application caused by high cost and severe experimental conditions of an active method in the non-visual field tracking problem. In a specific embodiment, the problems of non-ideal tracking precision and poor stability caused by neglecting motion information and motion continuity prior in the non-visual field real-time tracking problem can be solved by introducing a differential frame and a specially designed propagation and calibration network, and the tracking precision and track stability are improved.

The invention aims at a technical framework which takes time sequence dense high-dimensional characteristics as input and takes real-time low-dimensional reconstruction as a task target, and can be applied to not only the problem of non-visual real-time tracking of the invention, but also other tasks. When the passive non-visual field real-time tracking problem is solved, a differential frame is used as a carrier of motion information, a backbone part of a residual neural network (ResNet-18) is used as a feature extractor, a gating recursive unit GRU is used as a basic unit of the recursive neural network, and a multi-layer perceptron MLP is used as a decoder; in addressing other specific tasks, different motion information carriers, different feature extractors, different recurrent neural network base units and different decoders may be selected for addressing. Therefore, all the solutions realized by the technical framework of the present invention are within the protection scope of the present invention.

In an alternative embodiment, a pre-heat operation may also be performed prior to tracking, by which a precise current position-coding vector is provided for the tracking operation. Specifically, before inputting frames in the real-time video stream to the tracking unit frame by frame in real-time, the method further comprises: performing a preheating operation; performing the warm-up operation includes: and inputting the first W frames in the real-time video stream into a preheating unit frame by frame, obtaining image feature vectors contained in each frame in the first W frames one by one, updating the position coding vectors by using the image feature vectors contained in each frame in the first W frames, and obtaining the position coding vectors after the preheating is completed, wherein the position coding vectors after the preheating is the position coding vectors updated last time in the preheating operation, and W is more than or equal to 1 and is a positive integer. Specifically, the warm-up operation operates in a similar manner as in the trace operation, but the purpose of the warm-up operation is to provide a precise position-coding vector before the trace is started. Fig. 5 shows a preheating and tracking two-stage process execution method when the present embodiment is adopted. As shown in fig. 5, the preheating stage and the tracking stage respectively perform a single-step tracking for each frame in the real-time video stream, and update the position code vector in the single-step tracking. The preheating stage and the tracking stage can adopt the same operation mode, but the two stages are independent from each other and do not share weight. The frames to be subjected to the warm-up operation may be acquired one by one at a time or may be acquired in real time. The 1-W frames in the real-time video stream are used for preheating, which does not participate in the decoding of the subsequent tracking operation, because the position information represented by the position-coding vector may not be initially accurate, the position-coding vector is further calibrated during the preheating stage, the position-coding vector is gradually accurate during the preheating process, and finally, the position-coding vector can be basically close to the real position information. The number of frames required in the preheating stage, i.e. the value of W, is affected by factors such as the complexity of the tracking scene, the complexity of the room environment, etc., so that the value of W is different according to different tracking environments. In actual operation, a proper W value can be found in advance through training, and the W value is preset in a formal application scene. In general, the W value may take 32 or 48, i.e., 32 frames or 48 frames as frames for the warm-up phase. Through the preheating operation, a more accurate position coding vector can be provided for the subsequent tracking stage, and the tracking accuracy is improved.

In a specific embodiment, the first W frames in the real-time video stream are input to the preheating unit frame by frame, the image feature vector contained in each frame in the first W frames is obtained one by one, the position coding vector is updated by using the image feature vector contained in each frame in the first W frames, and the obtaining the position coding vector after the preheating is completed includes: acquiring a current frame, wherein the current frame refers to frames of the first W frames in a current input real-time video stream; if the current frame is not the first frame of the previous W frames, calculating a differential frame according to the current frame and the previous frame; extracting differential frame image feature vectors from the differential frames, and extracting current frame image feature vectors from the current frames, wherein the differential frame image feature vectors contain dynamic information, and the current frame image feature vectors contain static information; calculating and updating a position coding vector by using a propagation unit according to the differential frame image feature vector; calculating and updating the position coding vector again according to the characteristic vector of the current frame image by using a calibration unit; if the current frame is the tail frame of the previous W frames, the position coding vector after the preheating is finished is output after the position coding vector is calculated and updated again by the calibration unit according to the image characteristic vector of the current frame. Specifically, the warm-up phase outputs the warm-up completed position-coded vector after the first frame is acquired, and the second frame is processed. The difference frame (DIFFERENCE FRAME) refers to a "difference image" obtained by making a difference between each frame and the previous frame in the real-time video stream. The difference Frame has the same data size as the current Frame (Raw Frame), but the former reflects the motion information at this point in time, and the current Frame reflects the static information at this point in time. As shown in fig. 5, a difference frame d ₁ is obtained by performing a difference operation with the 2 nd frame f ₂ (current frame) and the 1 st frame f ₁ (previous frame). The image feature vector refers to a high-dimensional vector implying semantic information of an image, can be extracted from a frame image by using a feature extractor, and is extracted from a current frame and a differential frame respectively. The propagation and calibration unit is a basic component of a propagation and calibration network (PAC-Net), which includes two sets of submodules of identical structure but not sharing weights, called a propagation unit (Propagate-Cell) and a calibration unit (calibre-Cell), for propagating and calibrating the position-coding vector, respectively. The term "sharing no weight" refers to that the submodules are independent of each other and have different internal parameters, so that different functions can be exerted. Using the propagation unit, the position-coding vector can be updated with the feature vector containing dynamic information extracted from the difference frame; the use of the calibration unit allows updating the position-coding vector by means of feature vectors extracted from the current frame, which feature vectors contain static information. A flow chart of the propagation and calibration network execution employed in the warm-up phase is also shown in fig. 4.

In an alternative embodiment, in the warm-up operation, extracting the differential frame image feature vector from the differential frame and extracting the current frame image feature vector from the current frame includes: extracting a differential frame image feature vector from a differential frame by using a first residual neutral network, and extracting a current frame image feature vector from a current frame by using a second residual neutral network, wherein the first residual neutral network and the second residual neutral network do not share weight; the operation and updating of the position coding vector according to the differential frame image feature vector by using the propagation unit comprises the following steps: the propagation unit utilizes the first recurrent neural network to operate according to dynamic information contained in the differential frame image operation and the current position coding vector, and updates the position coding vector by utilizing the operation result; the operation and updating of the position coding vector again according to the characteristic vector of the current frame image by using the calibration unit comprises the following steps: the calibration unit performs operation according to static information contained in the current frame image operation and the current position coding vector by using the second recurrent neural network, and updates the position coding vector by using the operation result, wherein the first recurrent neural network and the second recurrent neural network do not share weight. Specifically, in the flowchart executed by the propagation and calibration network shown in fig. 4, the present embodiment uses, as feature extractors, backbone portions of two residual neural networks (ResNet-18) that do not share weights in the step of extracting features, for extracting feature vectors of the difference frame and the current frame, respectively; two gating recursion units (Gated Recurrent Units, GRU) that do not share weights are used in the propagating and calibrating steps to propagate and calibrate the position-coded vectors. ResNet-18 are convolutional neural networks (Convolutional Neural Network, CNN), GRUs are units of recurrent neural networks (Recurrent Neural Network, RNN), the operation of which can be formally described as follows:

F＝CNN(I)

h_t+1＝RNN(h_t，F)

The embodiment also provides a passive non-visual field target real-time positioning and tracking system, as shown in fig. 6, which includes: an imaging unit 601, an initializing unit 602, a tracking unit 603, and a decoder 604.

An image capturing unit 601, configured to obtain, in real time, a real-time video stream reflected by a relay medium and including a non-visual target action track; specifically, the image capturing unit 601 may be a general consumer-grade RGB camera, and the image capturing unit 601 should be able to capture video in real time. The non-visual field object is generally a living body (e.g., a person, an animal, etc.) or a non-living body (e.g., a vehicle, etc.), which can move freely, and is an object whose movement track is required to be tracked in the present embodiment. The relay medium may be a planar object or a non-planar object that can reflect light, such as a relay wall, a metal plate, a plastic plate, or the like, as long as the light can be reflected. Fig. 2 is a schematic view showing a scene setting of the present embodiment, where the scene includes a walking person, a general camera, a relay wall, and an obstacle. When a person walks in a room, the light isolated from the person by the existence of the obstacle is directly acquired by the camera, so that the common camera can only capture the walking track of the person by shooting the light reflected to the relay wall by the person in the walking process.

An initializing unit 602, configured to initialize a position-coding vector, and set the position-coding vector as an all-zero vector; specifically, the initialization unit 602 performs initialization assignment on the position-coding vector, and may provide preliminary information to make the position-coding vector a true vector with position information. The position-coding vector refers to a high-dimensional vector of implicit position semantic information that can be decoded by a decoder into actual position coordinates. Before entering the tracking phase, the position-coding vector needs to be zeroed to prevent non-zeroing from affecting the following calculation. The initialization unit 602 may complete the initialization before the imaging unit 601 starts imaging, or may complete the initialization after the imaging unit 601 starts imaging, as long as it is completed before the operation of the tracking unit 603.

A tracking unit 603, configured to receive frames in a real-time video stream input in real-time frame-by-frame manner, perform a tracking operation, obtain image feature vectors contained in each frame one by one, update a position-coding vector with the image feature vectors after each image feature vector contained in each frame is obtained, and input the position-coding vector to a decoder after each update; specifically, the tracking unit 603 needs to perform position conversion on the received real-time frames, and inputs the real-time position-coding vector included in each frame to the decoder 604 for decoding to obtain the real position coordinates.

In a specific embodiment, the tracking unit receives frames in a real-time video stream input in real-time frame by frame and performs a tracking operation, and obtains an image feature vector contained in each frame one by one, and after each image feature vector contained in the frame is obtained, the operation of updating the position-coding vector by using the image feature vector specifically includes: acquiring a current frame, wherein the current frame refers to a frame in a current input real-time video stream; if the current frame is not the first frame, calculating a differential frame according to the current frame and the last frame; extracting differential frame image feature vectors from the differential frames, and extracting current frame image feature vectors from the current frames, wherein the differential frame image feature vectors contain dynamic information, and the current frame image feature vectors contain static information; calculating and updating a position coding vector by using a propagation unit according to the differential frame image feature vector; and calculating and updating the position coding vector again according to the characteristic vector of the current frame image by using a calibration unit.

In particular, the position-coding vector is updated during the tracking phase with the current frame, the difference frame, and the propagation and calibration network. The frames of the tracking phase may be acquired in real time, i.e. the camera unit performs a single step tracking once it acquires a frame. Since the frame images acquired in the tracking stage are frame images that need to be decoded later, each acquired frame needs to be processed. The difference frame (DIFFERENCE FRAME) refers to a "difference image" obtained by making a difference between each frame and the previous frame in the real-time video stream. The difference Frame has the same data size as the current Frame (Raw Frame), but the former reflects the motion information at this point in time, and the current Frame reflects the static information at this point in time. Of course, if the first frame is received, the differential operation cannot be performed, and thus the operation is generally started from the second frame. However, if other phases (e.g., a later warm-up phase) are included before the tracking phase, then the first frame of the tracking phase may be processed as the previous frame. As shown in fig. 3, in the tracking stage, a difference frame d _T-1 is obtained by performing a difference operation with a T-1 frame f _T (current frame) and a T-1 frame f _T-1 (previous frame), and the propagation and calibration are performed according to the image feature vectors of the difference frame and the current frame, respectively, and the position code is updated. The image feature vector refers to a high-dimensional vector implying semantic information of an image, can be extracted from a frame image by using a feature extractor, and is extracted from a current frame and a differential frame respectively. The propagation and calibration unit is a basic component of a propagation and calibration network (PAC-Net), which includes two sets of submodules of identical structure but not sharing weights, called a propagation unit (Propagate-Cell) and a calibration unit (calibre-Cell), for propagating and calibrating the position-coding vector, respectively. The term "sharing no weight" refers to that the submodules are independent of each other and have different internal parameters, so that different functions can be exerted. Using the propagation unit, the position-coding vector can be updated with the feature vector containing dynamic information extracted from the difference frame; the use of the calibration unit allows updating the position-coding vector by means of feature vectors extracted from the current frame, which feature vectors contain static information. The propagation and calibration units in the tracking stage do not share weights, are independent of each other and have different internal parameters, and can play different roles. A flow chart performed in the trace phase using the propagation and calibration network is shown in fig. 4.

In an alternative embodiment, in the tracking operation, extracting the differential frame image feature vector from the differential frame and extracting the current frame image feature vector from the current frame includes: extracting a differential frame image feature vector from a differential frame by using a first residual neutral network, and extracting a current frame image feature vector from a current frame by using a second residual neutral network, wherein the first residual neutral network and the second residual neutral network do not share weight; the operation and updating of the position coding vector according to the differential frame image feature vector by using the propagation unit comprises the following steps: the propagation unit utilizes the first recurrent neural network to operate according to dynamic information contained in the differential frame image operation and the current position coding vector, and updates the position coding vector by utilizing the operation result; the operation and updating of the position coding vector again according to the characteristic vector of the current frame image by using the calibration unit comprises the following steps: the calibration unit performs operation according to static information contained in the current frame image operation and the current position coding vector by using the second recurrent neural network, and updates the position coding vector by using the operation result, wherein the first recurrent neural network and the second recurrent neural network do not share weight. Specifically, in the flowchart executed by the propagation and calibration network shown in fig. 4, the present embodiment also uses the backbone portions of two residual neural networks (ResNet-18) that do not share weights as feature extractors in the step of extracting features in the tracking phase, for extracting feature vectors of the differential frame and the current frame, respectively; two gating recursion units GRU that do not share weights are also used in the step of propagating and calibrating to propagate and calibrate the position-coded vectors. ResNet-18 are units of a convolutional neural network CNN and GRU is a recurrent neural network RNN, and their operation can be formally described as follows:

F＝CNN(I)

h_t+1＝RNN(h_t，F)

In the tracking unit 603 of the present embodiment, the motion information critical to the tracking task is supplemented by introducing a differential frame and a specially designed propagation and calibration network, using the differential frame as a carrier of the motion information, and the motion information is explicitly supplemented. The propagation and calibration network alternately extracts information from the differential frames for propagation and information from the current frames for calibration, and the recurrent neural network is utilized to explicitly model continuous motion, so that the problems of non-ideal tracking precision and poor stationarity caused by ignoring motion information and motion continuity priori in the non-visual field real-time tracking problem are solved, and the tracking precision and track stationarity are improved.

And the decoder 604 is configured to decode the received position-coded vector after receiving one position-coded vector, so as to obtain real-time coordinate information corresponding to each position-coded vector. In particular, the decoder 604 may decode the position-coded vector into position coordinates using a multi-layer perceptron MLP. In an alternative embodiment, after the decoder 604 completes decoding, the action track of the non-view object is dynamically restored according to the real-time coordinate information corresponding to each position encoding vector. And sequentially connecting the position coordinates corresponding to each frame to form a real-time tracking track, and reconstructing the track of the non-visual target in real time.

According to the passive non-visual field target real-time positioning tracking system provided by the embodiment, only the camera unit 601 is used for shooting real-time video in real time, the tracking unit 603 is used for updating the position coding vector contained in each frame of image in the video in real time, and then the decoder 604 is used for decoding the position coding vector into the position coordinate corresponding to the frame in real time, so that the purpose of real-time tracking is achieved. The passive non-visual field target real-time positioning tracking system reduces layout cost by adopting a pure passive scheme, and solves the problems of difficult deployment and application caused by high cost and severe experimental conditions of an active method in the non-visual field tracking problem. In a specific embodiment, a differential frame and a specially designed propagation and calibration network can be introduced into the tracking unit 603, so that the problems of non-ideal tracking precision and poor stability caused by neglecting motion information and motion continuity priori in the non-visual field real-time tracking problem are solved, and the tracking precision and track stability are improved.

In an alternative implementation, the passive non-field-of-view target real-time positioning tracking system of the present embodiment may further include a preheating unit for performing a preheating operation, and providing the tracking unit 603 with a precise current position encoding vector through the preheating operation. Specifically, the preheating unit performs a preheating operation specifically including: and receiving the first W frames in the real-time video stream input frame by frame, obtaining image feature vectors contained in each frame in the first W frames one by one, updating the position coding vectors by using the image feature vectors contained in each frame in the first W frames, and obtaining the position coding vectors after the preheating is finished, wherein the position coding vectors after the preheating are the position coding vectors updated last time in the preheating operation, and W is more than or equal to 1 and is a positive integer.

Specifically, the warm-up operation operates in a similar manner as in the trace operation, but the purpose of the warm-up operation is to provide a precise position-coding vector before the trace is started. Fig. 5 shows a preheating and tracking two-stage process execution method when the present embodiment is adopted. As shown in fig. 5, the preheating stage and the tracking stage respectively perform a single-step tracking for each frame in the real-time video stream, and update the position code vector in the single-step tracking. The preheating stage and the tracking stage can adopt the same operation mode, but the two stages are independent from each other and do not share weight. The frames to be subjected to the warm-up operation may be acquired one by one at a time or may be acquired in real time. The 1-W frames in the real-time video stream are used for preheating, which does not participate in the decoding of the subsequent tracking operation, because the position information represented by the position-coding vector may not be initially accurate, the position-coding vector is further calibrated during the preheating stage, the position-coding vector is gradually accurate during the preheating process, and finally, the position-coding vector can be basically close to the real position information. The number of frames required in the preheating stage, i.e. the value of W, is affected by factors such as the complexity of the tracking scene, the complexity of the room environment, etc., so that the value of W is different according to different tracking environments. In actual operation, a proper W value can be found in advance through training, and the W value is preset in a formal application scene. In general, the W value may take 32 or 48, i.e., 32 frames or 48 frames as frames for the warm-up phase. Through the preheating operation, a more accurate position coding vector can be provided for the subsequent tracking stage, and the tracking accuracy is improved.

In a specific embodiment, receiving first W frames in a real-time video stream input frame by frame, obtaining image feature vectors contained in each frame of the first W frames one by one, updating a position-coding vector by using the image feature vectors contained in each frame of the first W frames, and obtaining the position-coding vector after preheating specifically includes: acquiring a current frame, wherein the current frame refers to a frame in the previous W frames in a current input real-time video stream; if the current frame is not the first frame of the previous W frames, calculating a differential frame according to the current frame and the previous frame; extracting differential frame image feature vectors from the differential frames, and extracting current frame image feature vectors from the current frames, wherein the differential frame image feature vectors contain dynamic information, and the current frame image feature vectors contain static information; calculating and updating a position coding vector by using a propagation unit according to the differential frame image feature vector; calculating and updating the position coding vector again according to the characteristic vector of the current frame image by using a calibration unit; if the current frame is the tail frame of the previous W frames, the position coding vector after the preheating is finished is output after the position coding vector is calculated and updated again by the calibration unit according to the image characteristic vector of the current frame.

Specifically, the warm-up phase outputs the warm-up completed position-coded vector after the first frame is acquired, and the second frame is processed. The difference frame (DIFFERENCE FRAME) refers to a "difference image" obtained by making a difference between each frame and the previous frame in the real-time video stream. The difference Frame has the same data size as the current Frame (Raw Frame), but the former reflects the motion information at this point in time, and the current Frame reflects the static information at this point in time. As shown in fig. 5, a difference frame d ₁ is obtained by performing a difference operation with the 2 nd frame f ₂ (current frame) and the 1st frame f ₁ (previous frame). The image feature vector refers to a high-dimensional vector implying semantic information of an image, can be extracted from a frame image by using a feature extractor, and is extracted from a current frame and a differential frame respectively. The propagation and calibration unit is a basic component of a propagation and calibration network (PAC-Net), which includes two sets of submodules of identical structure but not sharing weights, called a propagation unit (Propagate-Cell) and a calibration unit (calibre-Cell), for propagating and calibrating the position-coding vector, respectively. The term "sharing no weight" refers to that the submodules are independent of each other and have different internal parameters, so that different functions can be exerted. Using the propagation unit, the position-coding vector can be updated with the feature vector containing dynamic information extracted from the difference frame; the use of the calibration unit allows updating the position-coding vector by means of feature vectors extracted from the current frame, which feature vectors contain static information. A flow chart of the propagation and calibration network execution employed in the warm-up phase is also shown in fig. 4.

F＝CNN(I)

h_t+1＝RNN(h_t，F)

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives, and variations may be made in the above embodiments by those skilled in the art without departing from the spirit and principles of the invention. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A passive non-visual field target real-time positioning tracking method is characterized by comprising the following steps:

acquiring real-time video streams which are reflected by a relay medium and contain non-visual field target action tracks in real time by using an image pickup unit;

Initializing a position coding vector, and taking the position coding vector as an all-zero vector;

inputting frames in the real-time video stream into a tracking unit in real time frame by frame to perform tracking operation, obtaining image feature vectors contained in each frame one by one, updating the position coding vectors by using the image feature vectors after each image feature vector contained in each frame is obtained, and inputting the position coding vectors into a decoder after each update;

After each time the decoder receives one position coding vector, the decoder decodes the received position coding vector to obtain real-time coordinate information corresponding to each position coding vector; using the multi-layer perceptron as a decoder in the decoding step to decode the position-coded vector into position coordinates; after the decoder finishes decoding, dynamically restoring the action track of the non-visual field target according to the real-time coordinate information corresponding to each position coding vector; sequentially connecting the position coordinates corresponding to each frame to form a real-time tracking track, and reconstructing the track of the non-visual target in real time;

The non-visual field target is a target needing to track the action track of the target, and the target comprises a free-running organism or a free-running non-organism;

free-moving organisms include humans or animals;

Free-running inanimate objects include vehicles;

the relay medium is a planar or non-planar object that can reflect light.

2. The method according to claim 1, wherein inputting frames in the real-time video stream to a tracking unit in real time frame by frame to perform a tracking operation, obtaining image feature vectors contained in each of the frames one by one, and updating the position-coding vectors with the image feature vectors after each of the obtained image feature vectors contained in the frames comprises:

Acquiring a current frame, wherein the current frame refers to a frame of the real-time video stream which is currently input;

if the current frame is not the first frame, calculating a differential frame according to the current frame and the last frame;

Extracting a differential frame image feature vector from the differential frame, and extracting a current frame image feature vector from the current frame, wherein the differential frame image feature vector contains dynamic information, and the current frame image feature vector contains static information;

Calculating and updating the position coding vector according to the differential frame image characteristic vector by using a propagation unit;

and calculating and updating the position coding vector again according to the characteristic vector of the current frame image by using a calibration unit.

3. The method of claim 1, wherein prior to inputting frames in the real-time video stream to a tracking unit frame-by-frame in real-time, the method further comprises: performing a preheating operation;

Performing the warm-up operation includes:

Inputting the first W frames in the real-time video stream into a preheating unit frame by frame, obtaining image feature vectors contained in each frame in the first W frames one by one, updating the position coding vector by using the image feature vectors contained in each frame in the first W frames, and obtaining a position coding vector after preheating, wherein the position coding vector after preheating is the position coding vector updated last time in the preheating operation, and W is more than or equal to 1 and is a positive integer.

4. The method according to claim 3, wherein the inputting the first W frames in the real-time video stream to the preheating unit frame by frame, obtaining the image feature vector contained in each of the first W frames frame by frame, updating the position-coding vector with the image feature vector contained in each of the first W frames, and obtaining the position-coding vector after the preheating is completed includes:

Acquiring a current frame, wherein the current frame refers to a frame in the previous W frames in the current input real-time video stream;

If the current frame is not the first frame of the previous W frames, calculating a difference frame according to the current frame and the previous frame;

Calculating and updating the position coding vector again according to the characteristic vector of the current frame image by using a calibration unit;

And if the current frame is the tail frame of the first W frames, outputting the position coding vector after the preheating is finished after the position coding vector is calculated and updated again by using a calibration unit according to the image characteristic vector of the current frame.

5. The method according to claim 2 or 4, wherein,

The extracting the differential frame image feature vector from the differential frame and the extracting the current frame image feature vector from the current frame comprises the following steps:

extracting a differential frame image feature vector from the differential frame by using a first residual neutral network, and extracting a current frame image feature vector from the current frame by using a second residual neutral network, wherein the first residual neutral network and the second residual neutral network do not share weight;

The calculating and updating the position coding vector according to the differential frame image characteristic vector by using a propagation unit comprises the following steps:

The propagation unit utilizes a first recurrent neural network to operate according to the dynamic information contained in the differential frame image operation and the current position coding vector, and updates the position coding vector by utilizing the operation result;

The calculating and updating the position coding vector again according to the characteristic vector of the current frame image by using the calibration unit comprises the following steps:

the calibration unit performs operation according to static information contained in the current frame image operation and the current position coding vector by using a second recurrent neural network, and updates the position coding vector by using the operation result, wherein the first recurrent neural network and the second recurrent neural network do not share weight.

6. A passive non-field of view target real-time location tracking system, comprising:

The camera shooting unit is used for acquiring real-time video streams which are reflected by the relay medium and contain the action tracks of the non-visual targets in real time;

The initialization unit is used for initializing the position coding vector and setting the position coding vector as an all-zero vector;

The tracking unit is used for receiving frames in the real-time video stream input in real time frame by frame, executing tracking operation, obtaining image feature vectors contained in each frame one by one, updating the position coding vector by using the image feature vectors after each image feature vector contained in each frame is obtained, and inputting the position coding vector to a decoder after each update;

The decoder is used for decoding the received position coding vectors after receiving one position coding vector, so as to obtain real-time coordinate information corresponding to each position coding vector;

using the multi-layer perceptron as a decoder in the decoding step to decode the position-coded vector into position coordinates; after the decoder finishes decoding, dynamically restoring the action track of the non-visual field target according to the real-time coordinate information corresponding to each position coding vector; sequentially connecting the position coordinates corresponding to each frame to form a real-time tracking track, and reconstructing the track of the non-visual target in real time;

free-moving organisms include humans or animals;

Free-running inanimate objects include vehicles;

the relay medium is a planar or non-planar object that can reflect light.

7. The passive non-view object real-time locating and tracking system according to claim 6, wherein the tracking unit receives frames in the real-time video stream input frame by frame in real time and performs a tracking operation to obtain image feature vectors contained in each frame one by one, and the operation of updating the position-coding vector with the image feature vectors after each image feature vector contained in the frame is obtained specifically comprises:

8. The passive non-field of view target real time location tracking system of claim 6, further comprising: a preheating unit for performing a preheating operation;

the preheating unit performs the preheating operation specifically including:

And receiving the first W frames in the real-time video stream input frame by frame, obtaining image feature vectors contained in each frame in the first W frames one by one, updating the position coding vector by using the image feature vectors contained in each frame in the first W frames, and obtaining the position coding vector after preheating, wherein the position coding vector after preheating is the position coding vector updated last time in the preheating operation, and W is more than or equal to 1 and is a positive integer.

9. The passive non-view object real-time positioning and tracking system according to claim 8, wherein receiving first W frames in the real-time video stream input frame by frame, obtaining image feature vectors contained in each of the first W frames one by one, updating the position-coding vectors by using the image feature vectors contained in each of the first W frames, and obtaining the position-coding vectors after preheating is completed specifically includes:

10. A method for using the passive non-visual field target real-time positioning tracking system as set forth in any one of claim 7 or 9, characterized in that,