CN115002409B

CN115002409B - Dynamic task scheduling method for video detection and tracking

Info

Publication number: CN115002409B
Application number: CN202210551198.7A
Authority: CN
Inventors: 王晓飞; 王义兰; 刘志成; 赵云凤; 仇超; 张程
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2022-05-20
Filing date: 2022-05-20
Publication date: 2023-07-28
Anticipated expiration: 2042-05-20
Also published as: CN115002409A

Abstract

The invention discloses a dynamic task scheduling method for video detection and tracking, which comprises the following steps: constructing a real-time target detection system comprising a plurality of terminal devices and an edge server, wherein a target tracker is arranged in the terminal devices, and a target detector is arranged in the edge server; constructing a joint optimization problem of video frame unloading decision, channel decision and frame interval decision in a real-time target detection system as a Markov decision problem; each decision time slot, each terminal device sends tracking precision, queue head frame information and video content change rate to an edge server, and the edge server builds a joint decision model by using a deep reinforcement learning algorithm of DDQN; and solving a joint optimization problem by using a joint decision model with the aim of maximizing a benefit function, wherein the terminal equipment performs according to a video frame unloading decision, a channel decision and a frame interval decision which are output by the edge server. The invention realizes the maximization of the video frame detection accuracy under the delay limit.

Description

Dynamic task scheduling method for video detection and tracking

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a dynamic task scheduling method for video detection and tracking.

Background

The advanced machine vision is introduced into the terminal equipment of the Internet of things, so that wide autonomous depth vision application such as traffic monitoring, automatic driving, unmanned airport scene analysis, robot vision and the like can be realized. In these applications, it is of vital importance that the terminal detects objects from captured video frames. However, in order to achieve accurate target detection, the target detection model usually has a complex structure and numerous parameters, and the calculation and storage requirements for the terminal device itself are high. Running a full-scale object detection model on resource-constrained terminal devices is therefore a challenge, often difficult to meet real-time requirements, and even suffers from heat dissipation problems. Meanwhile, if the compression model is run locally, the workload of a Deep Learning (DL) model can be greatly reduced, however, these techniques often result in a reduction of model accuracy due to a basic tradeoff between model size and model accuracy.

With the advent of 5G networks, offloading the computationally intensive object detection tasks to edge servers for execution has become a promising solution. The edge server runs the large model, so that accurate detection is realized, and finally, the detection result is transmitted back to the terminal equipment. Recent work has employed a tracking-based detection (DBT, detection based Tracking) approach, specifically to run the object detector periodically on some video frames while processing these frames in between using a lightweight object tracker. Therefore, the framework based on DBT is receiving more and more attention to realize real-time video frame detection analysis. However, most of the existing DBT-based schemes consider a scenario in which one edge server serves a single terminal device and there are enough transmission resources, while ignoring a scenario in which one edge server serves a plurality of terminal devices and limited communication resources negatively affect the offloading performance of competing terminal devices, when designing an offloading policy; in addition, most of the existing schemes based on DBT adopt tracking each frame when designing a terminal equipment tracking strategy, and error accumulation of delay brought by tracking each frame on a detection result is ignored; moreover, the existing technical scheme based on DBT is based on experimental evaluation to realize collaborative detection, system optimization is realized by less theoretical modeling, and specific model encapsulation, modeling and expression cannot be performed on collaborative detection of terminal equipment and an edge server.

Disclosure of Invention

Aiming at the technical problems, the invention provides a dynamic task scheduling method for video detection and tracking. In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a dynamic task scheduling method for video detection and tracking comprises the following steps:

s1, constructing a real-time target detection system comprising a plurality of terminal devices and an edge server, wherein a target tracker is arranged in the terminal devices, and a target detector is arranged in the edge server;

s2, constructing a joint optimization problem of video frame unloading decision, channel decision and frame interval decision in a real-time target detection system as a Markov decision problem;

the video frame unloading decision is that the first queue frame of the terminal equipment is continuously waiting in a local queue of the terminal equipment when each decision time slot is carried out, and is immediately unloaded to an edge server for detection or directly outputting a tracking result, the channel decision is that whether the terminal equipment output by the edge server is distributed to a channel or not, and the frame interval decision is that the next decision time slot queue frame of the terminal equipment output by the edge server is the frame number of the interval between the next decision time slot queue frame of the current decision time slot and the next queue frame of the current decision time slot;

s3, each decision time slot, each terminal device sends tracking precision, queue head frame information and video content change rate to an edge server, and the edge server builds a joint decision model by using a depth reinforcement learning algorithm of DDQN;

and S4, aiming at maximizing the benefit function, solving the joint optimization problem by utilizing the joint decision model constructed in the step S3, and executing the terminal equipment according to the video frame unloading decision, the channel decision and the frame interval decision output by the edge server.

The step S2 includes the steps of:

s2.1, constructing a state space, wherein the expression of the state space is as follows:

S _n (t)＝(M _n (t)，h _n (t)，p _n (t)，v _n (t))；

wherein M is _n (t) first frame information of a local queue of the terminal device n when deciding the slot t, h _n (t) represents channel gain, v between terminal device n and edge server _n (t) represents the video content change rate of the terminal device n when deciding the slot t, S _n (t) represents the state space of terminal device n when deciding slot t, p _n (t) represents the tracking accuracy of the first frame of the terminal device n when the time slot is determined by t;

s2.2, constructing an action space, wherein the expression of the action space is as follows:

A _n (t)＝(a _n (t)，C _n (t)，I _n (t))；

wherein A is _n (t) represents the operation space of the terminal device n, a, when deciding the slot t _n (t) video frame offloading decision representing the first frame of the local queue of the terminal device n output by the edge server when deciding the slot t, i.e. whether to continue waiting in the local queue, offload immediately to the edge server or output the tracking result directly, C _n (t) represents channel decision of terminal equipment n output by edge server in decision time slot t, I _n (t) representing the frame number of the interval between the next decision time slot queue first frame and the current decision time slot queue first frame, namely the frame interval decision, of the terminal equipment n output by the edge server when the decision time slot t is carried out;

s2.3, constructing a reward function, wherein the expression of the reward function is as follows:

wherein R is _n (t) represents a bonus function, i.e., a gain function, of the terminal device n at the time of deciding the slot t, acc represents the detection accuracy or tracking accuracy of the first frame of the queue of the terminal device n at the time of deciding the slot t, β represents a weight coefficient, and β > 0,representing the processing time of the first frame of the queue in the terminal equipment n when deciding the slot t, alpha is the performance improvement factor, alpha is more than 0, T _max Representing the maximum value of the ideal range of video frame detection delays.

In step S2.1, the first frame information M of the local queue of the terminal device n at the time of the decision slot t _n The expression of (t) is:

wherein s is _n (t) represents the frame size of the head frame of the local queue of terminal device n when deciding on slot t,indicating the arrival time of the head frame of the local queue of terminal device n +.>Representing the time that the head frame of the local queue of terminal device n has been waiting before processing when deciding on slot t.

In step S2.1, the channel gain h between the terminal n and the edge server _n The calculation formula of (t) is:

wherein, gamma _n (t) represents a random channel fading factor conforming to the rayleigh distribution,representing the average channel gain of terminal device n;

average channel gain of the terminal equipment nThe calculation formula of (2) is as follows:

wherein A is _d Represents the antenna gain of the terminal equipment, delta represents the path loss coefficient, d _n Indicating the distance of the terminal device n to the edge server.

In step S2.1, the tracking accuracyp _n The calculation formula of (t):

wherein G represents the true position area of the target, Y _n And (t) represents the position area of the target detected by the terminal device n running the tracking algorithm when the time slot is decided by t.

In step S2.1, the video content change rate v of the terminal device n at the time slot t _n The calculation formula of (t) is:

in the method, in the process of the invention,pixel position of the kth feature of the ith frame in the local queue of terminal device n representing decision slot t,/for the time slot t>The pixel position of the kth characteristic of the j-th frame in the local queue of the terminal equipment n when the time slot t is decided is represented, m represents the characteristic number of the video frame in the local queue of the terminal equipment n when the time slot t is decided, and j-i is more than or equal to 1.

In step S2.3, if the queue head frame directly outputs the tracking result, the processing time of the queue head frameThe calculation formula of (2) is as follows:

in the method, in the process of the invention,indicating the first frame of the queue in the terminal equipment n when deciding the slot tTracking time of->The time that the head frame of the local queue of the terminal equipment n waits before processing when deciding the time slot t;

if the queue's first frame is immediately unloaded and the channel is available, then the processing time of the queue's first frameThe calculation formula of (2) is as follows:

wherein T is _e Representing the time for the edge server to perform the target detection,the time of transmitting the first frame of the team in the terminal equipment n through the channel when deciding the time slot t is shown;

if the first frame of the queue decides to wait, or decides to unload immediately but the wireless network between the terminal device and the edge server is not available at this time, the first frame of the queue needs to wait in the local queue until the channel is available and then unload to the edge server, the processing time of the first frame of the queueThe calculation formula of (2) is as follows:

in the method, in the process of the invention,decision slot indicating the start of transmission of the first frame, for example>Representing decision slot +.>Time of transmission of the first frame of the team in the terminal device n via the channel,/time of transmission of the first frame of the team in the terminal device n via the channel>Representing estimated decision slot t to decision slot +.>Is a time slot number of (a) in a frame.

The step S3 includes the steps of:

s3.1, setting a total training round M, initializing an experience playback memory D and a parameter theta of an evaluation network, and assigning the parameter theta of the evaluation network to a parameter theta' of a target network;

s3.2, setting a training cycle number epicode=1;

s3.3, for State space S _n (t) initializing, i.e. S _n (t)＝S _n (0) Wherein S is _n (t) represents the state space of the terminal device n when deciding the slot t;

s3.4, setting a decision time slot number T;

s3.5, t=t+1 is performed;

s3.6, selecting action A according to probability epsilon _n (t) the expression is:

wherein A represents such a way that Q (S _n (t)，A _n (t); θ) action at maximum value, A _n (t) represents the action space of the terminal device n when deciding the slot t;

s3.7, act A selected according to step S3.3 _n (t) obtaining a prize R _n (t) and the State space S of the next step _n (t+1)；

S3.8, experience (S _n (t)，A _n (t)，R _n (t)，S _n (t+1)) stored in empirical playbackD, storing;

s3.9, G experiences are randomly fetched from the experience playback memory D (S _n (t′)，A _n (t′)，R _n (t′)，S _n (t′+1))；

S3.10, predicting benefits according to experience extracted in the step S3.9, wherein the expression is as follows:

wherein R is _n (t ') represents the reward function of the terminal device n in deciding the slot t', gamma represents the discount factor, A 'represents the time slot t' such thatAction of taking maximum value, ++>Representing the maximum benefit in making a t' +1 decision on slot, S _n (t '+1) represents the state space of the terminal device n at the time of deciding the slot t' +1;

s3.11, updating the parameter theta of the evaluation network based on a gradient descent method;

s3.12, assigning the parameter theta of the evaluation network to the parameter theta' of the target network every C steps;

s3.13, judging that T is less than T, if yes, returning to the step S3.5, otherwise, executing the step S3.14;

and S3.14, executing the ep=ep+1, judging that the ep is less than M, if yes, returning to the step S3.3, otherwise, outputting the joint decision model containing the target network.

In step S4, the expression of the maximum benefit function is:

s.t.C ₁ (t)+C ₂ (t)+...+C _n (t)+...+C _N (t)≤1；

a _n (t)∈{0，1，2}；

I _n (t)∈{1，2，3}；

wherein a is _n (t) video frame offloading decision indicating whether the first frame of the local queue of the terminal device n outputted by the edge server is to continue waiting in the local queue, immediately offloading to the edge server or directly outputting the tracking result when deciding the slot t, when a _n When (t) =0, the first frame of the queue representing the terminal equipment n waits for the next decision slot, when a _n When (t) =1, the first frame of the terminal equipment n is immediately unloaded to the edge server, and when a _n When (t) =2, the terminal device n directly outputs the tracking result, C _n (t) represents the channel decision of the terminal device n output by the edge server when deciding the slot t, when C _n When (t) =0, it means that terminal device n is not allocated to a channel in decision slot t, when C _n When (t) =1, it means that terminal device n is allocated to channel in decision slot t, I _n (t) the frame number of the interval between the next decision time slot queue first frame and the current time slot next queue first frame, namely the frame interval decision, R of the terminal equipment n output by the edge server when deciding the time slot t _n (t) represents a bonus function, i.e. a gain function, of the terminal device N in deciding the slot t, N representing the total number of terminal devices.

The invention has the beneficial effects that:

the invention is based on the real-time target detection framework of DBT, mainly orients to the continuous video frame scene with delay constraint, has set up the target detection system based on terminal equipment and edge server collaborative detection of network condition and video content of dynamic change, through this system, can further analyze the characteristic based on real-time target detection under a plurality of terminal equipment scenes under DBT frame; the influence of the video content change rate is introduced, the terminal equipment selects different tracking frequencies based on the video content change rate instead of conventionally tracking each frame, and the optimization problem is formed by designing the benefit function, so that the video frame detection accuracy is maximized under the delay limit.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of the present invention.

Fig. 2 is a schematic diagram of tracking accuracy at different frame intervals.

Fig. 3 is a diagram showing a change in average tracking accuracy when a frame interval is changed.

Fig. 4 is a graph comparing the effects of the present application with other algorithms.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without any inventive effort, are intended to be within the scope of the invention.

A dynamic task scheduling method for video detection and tracking, as shown in figure 1, comprises the following steps:

s1, constructing a real-time target detection system comprising a plurality of terminal devices and an edge server, wherein a target tracker is arranged in the terminal devices, a target detector is arranged in the edge server, and each terminal device is in communication connection with the edge server through a wireless network;

the set of all terminal devices is denoted by N, n= {1,..once, n..once, N }, and the set of video frames captured by the N-th terminal device is denoted by F _n Representing that all video frame sets captured by the terminal equipment are represented by F, wherein F= { F ₁ ，...，F _n ，...，F _N }. The terminal device operates a lightweight object tracker, and the edge server operates a large object detector to realize real-time detection of objects in captured video frames. However, heelThe performance of the tracking will decrease with time and the change of the video content, so before the tracking performance decreases to be too low, i.e. the tracking threshold value, a new video frame should be sent to the edge server for detection to obtain a new detection result, so as to improve the accuracy of target tracking of the terminal device.

Each terminal device maintains a local queue for buffering video frames waiting to be processed, the video frames in the local queue wait to be processed according to a first come first serve principle, the system time is divided into continuous time slots, and each time slot can only be reached by one frame at most for each terminal device provided that the time slot is small enough. In each decision slot t, i.e. the slot in which a video frame waits in the local queue, consider first that the video frame at the head of the queue of each terminal device is also called the head-of-queue frame, since the target tracker needs to be initialized using a bounding box (bounding box) detected by the edge server. Before starting tracking, the terminal equipment needs to send a first frame to the edge server for detection, acquires a detection result of the first frame, namely, a bounding box, then runs a target tracker on a subsequent first frame of a team based on the bounding box for tracking, sends frame information and tracking precision to the edge server after tracking, and the edge server makes decisions of channel allocation, video frame unloading and tracking frequency, namely, tracking frame interval based on the overall situation and sends the decisions to the terminal equipment; and finally, the terminal equipment makes corresponding actions according to the decision of the edge server. Since the result output by the edge server is much smaller than the frame itself, the present application ignores the time for the result to return, considering only the frame transmission procedure of the uplink of the whole system. If the unloading decision is local tracking, directly outputting a tracking result. If the unloading decision is immediate unloading and the channel is available, unloading can be performed to the edge server for detection, and the edge server returns the detection result to the corresponding terminal equipment after detection. If the offloading decision is to wait or if the direct offloading is not available, then the next decision slot needs to be waited in the local queue.

Due to the limited wireless network resources, wireless network bandwidth may become a bottleneck for terminal devices to offload video frames to an edge server. The present application addresses this challenge in two ways: on one hand, video frames with reliable tracking performance directly output tracking results so as to save bandwidth; on the other hand, for video frames with lower tracking performance, due to bandwidth resource limitation and competition of the terminal device, in the decision slot t, a situation that no wireless channel is available may occur, and the video frame may wait in the local queue of the terminal device until the channel is available.

S2, constructing a joint optimization problem of video frame unloading decision, channel decision and frame interval decision in a real-time target detection system as an MDP problem (Markov Decision Problem ), comprising the following steps of:

S _n (t)＝(M _n (t)，h _n (t)，p _n (t)，v _n (t))；

wherein M is _n (t) first frame information of a local queue of the terminal device n when deciding the slot t, h _n (t) represents channel gain, v between terminal device n and edge server _n (t) represents the video content change rate of the terminal device n when deciding the slot t, S _n (t) represents the state space of terminal device n when deciding slot t, p _n And (t) represents the tracking precision of the first frame of the terminal equipment n when the time slot is decided by t.

The first frame information M of the local queue of the terminal equipment n when deciding the time slot t _n The expression of (t) is:

wherein s is _n (t) represents the frame size of the head frame of the local queue of terminal device n when deciding on slot t,first frame of local queue of terminal equipment nArrival time (I)>Representing the time that the head frame of the local queue of terminal device n has been waiting before processing when deciding on slot t.

Channel benefit h between the terminal device n and the edge server _n (t) accords with a Rayleigh fading channel model, and the calculation formula is as follows:

wherein, gamma _n (t) represents a random channel fading factor conforming to the rayleigh distribution,representing the average channel gain of terminal device n.

Average channel gain of the terminal equipment nThe method accords with a free space path loss model, and the calculation formula is as follows:

Before each decision time slot is finished, the local queue of the terminal equipment is updated, and the number of frames of the video frames cached in the local queue of the terminal equipment n during t decision time slot adopts X _n (t) represents X _n The evolution of (t+1) depends on the arrival of new video frames and the departure of old video frames, the updated expression of which is:

in the method, in the process of the invention,is a random binary variable indicating whether a new video frame arrives at the terminal device n, O in the decision slot t _n (t) ∈ {0, -1} is also a random binary variable representing whether the video frame at the head of the queue at decision slot t will leave the local queue of terminal device n, X _n (t+1) represents the number of frames of video frames buffered in the local queue of terminal device n at time slot t+1. O (O) _n (t) =0 means that the head frame of the local queue of terminal device n will continue to wait until the next decision slot, O _n (t) = -1 indicates that the decision slot t is that the first frame of the local queue of the terminal device n will leave the local queue in the next decision slot, for example, the tracking result of the video frame is directly output, or is offloaded to the edge server for detection.

Based on experiments, it is found that it takes about 10ms for the terminal device to track a single target in a frame, and the duration of tracking the whole frame increases proportionally with the increase of the number of targets in the frame. Therefore, in order to provide real-time video analysis processing, some frames must be skipped during tracking to catch up with the frame capture speed of the terminal device, such as a camera, and therefore, I is employed _n (t) represents a frame interval determined when deciding the slot t. Therefore, the number of frames X of the video frame buffered in the local queue of terminal device n at decision slot t+1 _n (t+1) can be updated as:

wherein O is _n The value of (t) becomes {0, -I _n (t) } 0 means that the head frame continues to wait in the local queue.

As shown in FIG. 2, the experiment measures consecutive 50 frames of video, I _n The minimum value of (t) is 1, and the maximum value is 10. It can be seen from the figure that no matter I _n (t) which value is taken, the tracking accuracy decreases with the increase of the tracking frame numberLow, and I _n The greater the value of (t), the faster the accuracy of tracking decreases, so I cannot be infinitely increased to provide real-time processing _n The value of (t). I of the present embodiment _n (t) ∈ {1,2,3}, as shown in FIG. 3, in the case of continuous tracking of 50 frames, the average tracking accuracy is maintained at I of 0.5 or more _n The value of (t) is 1,2,3.

At the same I _n In the case of (t), if the video content changes faster, the displacement between the two tracked video frames is larger, and the tracking accuracy is more unreliable. Therefore, in order to ensure more reliable tracking accuracy of the terminal equipment, I _n The determination of (t) should introduce the effect of the rate of change of video content, and the metric evaluating the rate of change of video content must be lightweight to ensure that its calculation does not affect the tracking operation of the real-time object detection system. The present application measures the rate of change of video content using the tracked intermediate results, so that little additional computation is added, and uses the average moving speed of all features extracted from two adjacent frames as the rate of change of video content, the rate of change v of video content of terminal device n at time slot t _n The calculation formula of (t) is:

in the method, in the process of the invention,pixel position of the kth feature of the ith frame in the local queue of terminal device n representing decision slot t,/for the time slot t>The pixel position of the kth feature of the j-th frame in the local queue of the terminal equipment n at the decision time slot t is represented, m represents the feature number of the video frame in the local queue of the terminal equipment n at the decision time slot t, and j-i is more than or equal to 1, because some video frames are skipped when the target tracking is performed. The video content change rate can be obtained by calculating the moving speed between the features of two adjacent frames, a high oneThe moving speed means that the video content changes rapidly, i.e., the existing object moves out rapidly, and new objects may occur frequently.

The method and the device for tracking the frame target based on the Lucas-Kanade method have the advantages that the tracking precision can be reduced along with the change of time and video content, and meanwhile, the first frame of the queue with reliable tracking performance of the terminal equipment is more prone to directly outputting the tracking result, so that the bandwidth is saved. The following calculates the intersection ratio of the tracked result and the real result to measure the tracked performance, and the corresponding expression is:

wherein Y is _n And (t) representing the position area of the target detected by the terminal device n running the tracking algorithm when the time slot is decided by t, and G representing the real position area of the target.

A _n (t)＝(a _n (t)，C _n (t)，I _n (t))；

wherein A is _n (t) represents the operation space of the terminal device n, a, when deciding the slot t _n (t) video frame offloading decision indicating whether the first frame of the local queue of the terminal device n outputted by the edge server is to continue waiting in the local queue, immediately offloading to the edge server or directly outputting the tracking result when deciding the slot t, when a _n When (t) =0, the first frame of the queue representing the terminal equipment n waits for the next decision slot, when a _n When (t) =1, the first frame of the terminal equipment n is immediately unloaded to the edge server, and when a _n When (t) =2, the terminal device n directly outputs the tracking result, C _n (t) represents the channel decision of the terminal device n output by the edge server when deciding the slot t, when C _n When (t) =0, it means that terminal device n is not allocated to a channel in decision slot t, when C _n When (t) =1, it means that terminal device n is allocated to channel in decision slot t, I _n (t) terminal equipment n output by edge server when deciding time slot tThe frame number of the interval between the next decision time slot queue first frame and the current decision time slot queue first frame, namely the frame interval decision.

wherein R is _n (t) represents a reward function, namely a gain function, of the terminal equipment n in deciding the slot t, acc represents the detection precision or tracking precision p of the first frame of the terminal equipment n in deciding the slot t _n (T) the detection accuracy is set to 1.0, beta represents a weight coefficient, beta > 0, adjusting beta balances the time weight between frame processing and frame transmission, alpha is a performance improvement factor, alpha > 0, through which the importance of adjusting inference performance in a reward function, T _max Represents the maximum value of the ideal range of the video frame detection delay, which is the maximum delay which can be tolerated to detect one frame under the condition of meeting the required detection delay,the processing time of the first frame of the queue in the terminal device n at the time of deciding the slot t is indicated.

When deciding the time slot t, if the first frame of the queue directly outputs the tracking result, the processing time of the first frame of the queueThe method comprises the following steps of tracking time and waiting time in a queue, wherein the calculation formula is as follows:

in the method, in the process of the invention,tracking of the first frame of a team in a terminal device n representing the decision slot tTime.

In deciding the slot t, if the first frame is immediately unloaded and the channel is available, the processing time of the first frameThe calculation formula of (2) is as follows:

wherein T is _e Representing the time for the edge server to perform the target detection,the time of the transmission of the first frame of the queue in the terminal device n through the channel when the time slot t is decided is represented.

Determining time of transmitting first queue frame through channel in terminal equipment n during time slot tThe calculation formula of (2) is as follows:

wherein s is _n (t) represents the frame size, i.e., the data amount, r of the first frame of the queue in the terminal device n when deciding the slot t _n And (t) represents the transmission rate between the edge server and the terminal equipment n channel when the t decides the time slot.

Considering the path loss and Rayleigh fading of the channels, based on the shannon theorem, the t decision time slot edge server allocates the n channels of the terminal equipment, and the transmission rate r between the two _n The calculation formula of (t) is:

where w represents the channel bandwidth, h _n (t) represents a terminalChannel gain, P, of end device n as a function of decision slot t _n Representing the transmission power of terminal device N, N ₀ Representing background noise power.

Since in order to efficiently use bandwidth resources, if the wireless network is not available or the wireless network is degraded, the first frame of the queue waits for the next decision slot t in the local queue, these frames are often transmitted to the edge server for detection instead of outputting the tracking result directly, otherwise the frame should not be decided to wait. Thus, in deciding slot t, if the first frame decides to wait, or decides to unload immediately but the wireless network is not available at this time, the first frame needs to wait in the local queue until the channel is available and then unload to the edge server, the processing time of the first frameThe calculation formula of (2) is as follows:

in the method, in the process of the invention,decision slot indicating the start of transmission of the first frame, for example>Representing decision slot +.>Time of transmission of the first frame of the team in the terminal device n via the channel,/time of transmission of the first frame of the team in the terminal device n via the channel>Representing estimated decision slot t to decision slot +.>Time slot number of (2), and->Is a positive integer.

S3, in each decision time slot, each terminal device tracks the precision p _n (t), first team frame information M _n (t), video content change Rate v _n (t) sending to an edge server, the edge server constructing a joint decision model using an algorithm of deep reinforcement learning (Deep Reinforcement Learning, DRL) of DDQN (Double Deep Q Network), comprising the steps of:

s3.2, setting a training cycle number epicode=1;

s3.3, for State space S _n (t) initializing, i.e. S _n (t)＝S _n (0)；

S3.4, setting a decision time slot number T;

s3.5, t=t+1 is performed;

where θ represents a parameter of the evaluation network, and a represents a parameter such that Q (S _n (t)，A _n (t); θ) takes the maximum value, the random in this equation refers to a random selection of actions from the action space.

S3.8, experience (S _n (t)，A _n (t)，R _n (t)，S _n (t+1)) is stored in the experience playback memory D;

s3.9, G experiences are randomly fetched from the experience playback memory D (S _n (t′)，A _n (t′)，R _n (t′)，S _n (t' +1)), wherein,S _n (t ') represents the state space of terminal device n when deciding slot t', A _n (t ') represents the action space of the terminal device n when deciding the slot t';

wherein R is _n (t ') represents the reward function of the terminal device n in deciding the slot t ', gamma represents the discount factor for balancing the current benefit and the long-term reward, A ' represents such thatThe action of taking the maximum value is that, representing the maximum benefit in making a t' +1 decision on slot, S _n (t '+1) represents the state space of the terminal device n at the time of deciding the slot t' +1.

s3.12, assigning the parameter theta of the evaluation network to the parameter theta' of the target network every C steps, wherein C is integer times of T, and C is smaller than T;

The DDQN algorithm comprises an evaluation network with a parameter of theta and a target network with a parameter of theta', wherein the evaluation network is used for updating the parameter by reducing the loss function, the target network is used for calculating a target Q value, and the target network parameter is updated by the evaluation network at regular step numbers. Meanwhile, the DDQN maintains a section of experience playback memory D, stores some past experiences, and updates the experiences stored in the experience playback memory D when the experience playback memory D is full.

S4, solving a joint optimization problem by using the joint decision model constructed in the step S3 with the aim of maximizing the benefit function, wherein the terminal equipment executes according to a video frame unloading decision, a channel decision and a frame interval decision output by the edge server;

the expression of the maximum benefit function is:

s.t.C ₁ (t)+C ₂ (t)+...+C _n (t)+...+C _N (t)≤1；

a _n (t)∈{0，1，2}；

I _n (t)∈{1，2，3}。

the Jetson Nano is used as terminal equipment, a Lucas-Kanade target tracker is operated, a Jetson AGX Xavier edge server is used, a YOLOX is operated as a target detector, the time of tracking one frame by the terminal equipment and the time of detecting one frame by the edge server are actually measured, and then a simulation environment is established based on the time. The system is divided into individual time slots, provided that the time slots are small enough, so that at most only one new frame arrives at the local queue at each time slot, and the arrival rate of the frames conforms to the Bernoulli process with parameter P. The network simulation adopts a wireless channel Rayleigh fading model, wherein the gain of each terminal equipment antenna is set to be 4.11, the distance between the terminal equipment and an edge server accords with the uniform distribution of U (2.5,5.2), the power transmitted by the terminal equipment is 0.03, the background noise is 10e-10, the path loss coefficient is 2.8, and the bandwidth of an uplink is 2MHZ. The python was used to implement the pytorch1.7 based DDQN algorithm and set D to a size of 1000, total training round to 400, batch size to 32, learning rate to 0.0001, γ to 0.9, and epsilon to 0.9.

In order to show the superiority of the method in continuous video frame scenes, the method is compared with Random algorithm Random and Greedy algorithm Greedy, and the evaluation index is the average rewarding of the system. The random algorithm randomly selects a decision without considering any environmental information, whose performance is always the worst. The greedy algorithm makes optimal decisions based on the current state, but does not consider interactions between adjacent tasks. As shown in fig. 4, P is the arrival rate of video frames of each slot, and the larger P indicates the larger the video frame rate, the denser the task; the smaller p, the smaller the video frame rate, the thinner the task. It can be found that the algorithm of the present application is superior to the random algorithm and the greedy algorithm regardless of the fluctuation of the P value.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. A dynamic task scheduling method for video detection and tracking is characterized by comprising the following steps:

the step S2 includes the steps of:

S _n (t)＝(M _n (t)，h _n (t)，p _n (t)，v _n (t))；

wherein s is _n (t) represents the frame size of the head frame of the local queue of terminal device n when deciding on slot t,indicating the arrival time of the head frame of the local queue of terminal device n +.>The time that the head frame of the local queue of the terminal equipment n waits before processing when deciding the time slot t;

video content change rate v of terminal device n at time slot t _n The calculation formula of (t) is:

in the method, in the process of the invention,pixel position of the kth feature of the ith frame in the local queue of terminal device n representing decision slot t,/for the time slot t>Representing the pixel position of the kth feature of the j-th frame in the local queue of the terminal equipment n when the time slot t is decided, m represents the feature number of the video frame in the local queue of the terminal equipment n when the time slot t is decided, and j-i is more than or equal to 1;

the tracking accuracy p _n The calculation formula of (t):

wherein G represents the true position area of the target, Y _n (t) represents the position area of the target detected by the terminal device n running tracking algorithm when the time slot is decided by t;

A _n (t)＝(a _n (t)，C _n (t)，I _n (t))；

s23, constructing a reward function, wherein the expression of the reward function is as follows:

wherein R is _n (t) represents a bonus function, i.e., a gain function, of the terminal device n in deciding the slot t, acc represents a detection accuracy or a tracking accuracy of the first frame of the terminal device n in deciding the slot t, β represents a weight coefficient, and β>0，Representing the processing time of the first frame of the queue in the terminal equipment n when deciding the slot t, alpha is the performance improvement factor, alpha is more than 0, T _max Representing the maximum value of the ideal range of the video frame detection delay;

the expression of the maximum benefit function is:

s.t.C ₁ (t)+C ₂ (t)+…+C _n (t)+…+C _N (t)≤1；

a _n (t)∈{0，1，2}；

I _n (t)∈{1，2，3}；

2. The method for dynamic task scheduling for video detection and tracking according to claim 1, wherein in step S2.1, the channel benefit h between the terminal device n and the edge server _n The calculation formula of (t) is:

3. The method for dynamic task scheduling for video detection and tracking according to claim 1, wherein in step S2.3, if the first frame of the queue directly outputs the tracking result, the processing time of the first frame of the queueThe calculation formula of (2) is as follows:

in the method, in the process of the invention,indicating the tracking time of the first frame of the team in the terminal device n when deciding the slot t,/for the time slot>The time that the head frame of the local queue of the terminal equipment n waits before processing when deciding the time slot t;

wherein T is _e Representing the time for the edge server to perform the target detection,representing decision time slotsTime of transmitting the first frame of the team in the terminal equipment n through the channel at the time of slot t;

4. The method for dynamic task scheduling for video detection and tracking according to claim 1, wherein the step S3 comprises the steps of:

s3.2, setting a training cycle number epicode=1;

s3.4, setting a decision time slot number T;

s3.5, t=t+1 is performed;

and S3.14, executing the ep=ep+1, judging whether the ep is < M, if yes, returning to the step S3.3, otherwise, outputting the joint decision model containing the target network.