CN115002409A

CN115002409A - Dynamic task scheduling method for video detection and tracking

Info

Publication number: CN115002409A
Application number: CN202210551198.7A
Authority: CN
Inventors: 王晓飞; 王义兰; 刘志成; 赵云凤; 仇超; 张程
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2022-05-20
Filing date: 2022-05-20
Publication date: 2022-09-02
Anticipated expiration: 2042-05-20
Also published as: CN115002409B

Abstract

The invention discloses a dynamic task scheduling method for video detection and tracking, which comprises the following steps: constructing a real-time target detection system comprising a plurality of terminal devices and an edge server, wherein a target tracker is arranged in the terminal devices, and a target detector is arranged in the edge server; constructing a combined optimization problem of a video frame unloading decision, a channel decision and a frame interval decision in a real-time target detection system as a Markov decision problem; in each decision time slot, each terminal device sends tracking precision, head frame information and video content change rate to an edge server, and the edge server constructs a joint decision model by using a DDQN deep reinforcement learning algorithm; and with the maximum gain function as a target, solving the joint optimization problem by using a joint decision model, and executing the terminal equipment according to the video frame unloading decision, the channel decision and the frame interval decision output by the edge server. The invention achieves the maximization of the accuracy of video frame detection under the delay limit.

Description

Dynamic task scheduling method for video detection and tracking

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a dynamic task scheduling method for video detection and tracking.

Background

The advanced machine vision is introduced into the terminal equipment of the Internet of things, so that wide autonomous depth vision application such as traffic monitoring, automatic driving, unmanned aerial vehicle scene analysis, robot vision and the like can be realized. In these applications, the ability of the terminal to detect objects from captured video frames is of paramount importance. However, in order to achieve accurate target detection, the target detection model usually has a complex structure and numerous parameters, and the calculation and storage requirements of the terminal device are high. Therefore, running a full-scale target detection model on a terminal device with limited resources is a challenge, and it is often difficult to meet the requirement of real-time performance, and even heat dissipation problems are encountered. Meanwhile, if the compression model is locally run, although the workload of a Deep Learning (DL) model can be greatly reduced, due to a basic tradeoff between the model size and the model precision, the techniques often result in the reduction of the model precision.

With the advent of 5G networks, offloading the computationally intensive object detection tasks to an edge server for execution has become a promising solution. And the edge server operates the large model, so that accurate detection is realized, and finally, the detection result is transmitted back to the terminal equipment. Some recent efforts have employed a Detection Based Tracking (DBT) approach, specifically running the target detector periodically on some video frames, while processing the frames in between using a lightweight target tracker. Therefore, the DBT-based framework has received more and more attention to real-time video frame detection and analysis. However, most existing solutions based on DBT consider a scenario where one edge server serves a single terminal device and has sufficient transmission resources when designing an offloading policy, and ignore a scenario where one edge server serves multiple terminal devices and limited communication resources negatively affect the offloading performance of competing terminal devices; in addition, most existing solutions based on the DBT adopt tracking of each frame when designing a tracking strategy of the terminal device, and neglect error accumulation of delay caused by tracking of each frame on a detection result; moreover, the conventional technical scheme based on the DBT is completely based on experimental evaluation to realize cooperative detection, system optimization is less realized through theoretical modeling, and specific model encapsulation, modeling and expression cannot be performed on the cooperative detection of the terminal equipment and the edge server.

Disclosure of Invention

Aiming at the technical problems, the invention provides a dynamic task scheduling method facing video detection and tracking. In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a dynamic task scheduling method for video detection and tracking comprises the following steps:

s1, constructing a real-time target detection system comprising a plurality of terminal devices and an edge server, wherein the terminal devices are provided with target trackers, and the edge server is provided with target detectors;

s2, constructing a combined optimization problem of video frame unloading decision, channel decision and frame interval decision in the real-time target detection system into a Markov decision problem;

the video frame unloading decision means that the team first frame of the terminal equipment is continuously waited in a local queue of the terminal equipment when a slot is formed in each decision, is immediately unloaded to an edge server for detection or directly outputs a tracking result, the channel decision means whether the terminal equipment output by the edge server is allocated to a channel, and the frame interval decision means the number of frames of an interval between the next decision slot team first frame when the slot is formed in the current decision of the terminal equipment output by the edge server and the slot next-in-line first frame when the slot is formed in the current decision;

s3, each terminal device sends the tracking precision, the head frame information and the video content change rate to an edge server in each decision time slot, and the edge server constructs a joint decision model by using a DDQN deep reinforcement learning algorithm;

and S4, with the maximum gain function as a target, solving the joint optimization problem by using the joint decision model constructed in the step S3, and executing the terminal equipment according to the video frame unloading decision, the channel decision and the frame interval decision output by the edge server.

The step S2 includes the steps of:

s2.1, constructing a state space, wherein the expression of the state space is as follows:

S _n (t)＝(M _n (t)，h _n (t)，p _n (t)，v _n (t))；

in the formula, M _n (t) queue head frame information of local queue of terminal device n at decision slot t, h _n (t) denotes the channel gain between terminal device n and edge server, v _n (t) represents the rate of change of video content of terminal device n at decision slot t, S _n (t) represents the state space of terminal device n at decision slot t, p _n (t) represents the tracking accuracy of the first frame of the team of the terminal equipment n when the slot is decided by t;

s2.2, constructing an action space, wherein the expression of the action space is as follows:

A _n (t)＝(a _n (t)，C _n (t)，I _n (t))；

in the formula, A _n (t) represents the space of motion of terminal device n at decision slot t, a _n (t) video frame unload decision representing the first frame of the local queue of the terminal device n output by the edge server in the decision slot t, i.e. whether to continue waiting in the local queue, unload to the edge server immediately or output the tracking result directly, C _n (t) represents the channel decision of the terminal device n output by the edge server in the decision slot t, I _n (t) the frame number of the interval between the first frame of the slot queue of the next decision time slot and the first frame of the slot queue of the current decision time, namely the frame interval decision, of the terminal device n output by the edge server at the decision time slot t is represented;

s2.3, constructing a reward function, wherein the expression of the reward function is as follows:

in the formula (I), the compound is shown in the specification,R _n (t) represents a reward function, namely a gain function, of the terminal device n in the decision slot t, Acc represents the detection precision or tracking precision of the first frame of the terminal device n in the decision slot t, beta represents a weight coefficient, and beta is greater than 0,

representing the processing time of the first frame of the queue in the terminal equipment n at the time of the decision slot T, alpha is a performance improvement factor and is more than 0, T _max Which represents the maximum value of the ideal range of video frame detection delay.

In step S2.1, the first frame information M of the local queue of terminal device n at time slot t is decided _n The expression of (t) is:

in the formula, s _n (t) represents the frame size of the head-of-queue frame of the local queue of terminal device n at decision slot t,

representing the arrival time of the head frame of the local queue of terminal device n,

indicating the time the head-of-line frame of the local queue of terminal device n at decision slot t has been waiting before processing.

In step S2.1, the channel gain h between the terminal device n and the edge server _n The formula for calculation of (t) is:

in the formula, gamma _n (t) represents a random channel fading factor that conforms to a rayleigh distribution,

indicating terminal devicePreparing the average channel gain of n;

average channel gain of the terminal device n

The calculation formula of (c) is:

in the formula, A _d Denotes the antenna gain of the terminal equipment, delta denotes the path loss factor, d _n Indicating the distance of the terminal device n from the edge server.

In step S2.1, the tracking accuracy p _n (t) the calculation formula:

wherein G represents the true location area of the target and Y _n And (t) represents the position area of the target detected by the tracking algorithm run by the terminal equipment n when the slot is decided by t.

In step S2.1, the video content change rate v of terminal device n at time slot t _n The formula for calculation of (t) is:

in the formula (I), the compound is shown in the specification,

the pixel position of the kth feature of the i-th frame in the local queue of terminal device n at decision slot t,

the pixel position of the kth characteristic of the jth frame in the local queue of the terminal equipment n when the slot t is decided, m represents the characteristic number of the video frame in the local queue of the terminal equipment n when the slot t is decided, and j-i is more than or equal to1。

In step S2.3, if the head frame directly outputs the tracking result, the processing time of the head frame is determined

The calculation formula of (2) is as follows:

in the formula (I), the compound is shown in the specification,

represents the tracking time of the head frame of the queue in terminal device n at decision slot t,

representing the waiting time of the head frame of the local queue of the terminal equipment n before processing when the slot t is decided;

if the head frame is unloaded immediately and the channel is available, the processing time of the head frame is determined

The calculation formula of (2) is as follows:

in the formula, T _e Indicating the time at which the edge server performs object detection,

the time of transmitting the first frame of the queue in the terminal equipment n through the channel when the slot t is decided is represented;

if the queue head frame decides to wait or decides to unload immediately but the wireless network between the terminal device and the edge server is unavailable at the moment, the queue head frame needs to continue waiting in the local queue until the channel is available and then unload to the edge server, the processing time of the queue head frame is shortened

The calculation formula of (2) is as follows:

in the formula (I), the compound is shown in the specification,

a decision slot indicating the beginning of the transmission of the head frame of the team,

presentation decision slot

The time when the head frame of the team in terminal device n is transmitted through the channel,

representing predicted time slot t to time slot t

The number of time slots.

The step S3 includes the following steps:

s3.1, setting a total training round M, initializing an experience playback memory D and a parameter theta of an evaluation network, and assigning the parameter theta of the evaluation network to a parameter theta' of a target network;

s3.2, setting the training cycle number epimode to be 1;

s3.3, to the State space S _n (t) initialization, i.e. S _n (t)＝S _n (0) Wherein S is _n (t) represents the state space of terminal device n at decision slot t;

s3.4, setting the number T of decision time slot slots;

s3.5, performing t ═ t + 1;

s3.6, selecting action A according to probability epsilon _n (t) the expression is:

wherein A represents such that Q (S) _n (t)，A _n (t); θ) action at maximum value, A _n (t) represents the action space of the terminal device n at the time of the decision slot t;

s3.7, according to the action A selected in the step S3.3 _n (t) obtaining a reward R _n (t) and the state space S of the next step _n (t+1)；

S3.8, experience (S) _n (t)，A _n (t)，R _n (t)，S _n (t +1)) is stored in the experience replay memory D;

s3.9, randomly taking out G experiences from the experience playback memory D (S) _n (t′)，A _n (t′)，R _n (t′)，S _n (t′+1))；

S3.10, predicting benefits according to the experience taken out in the step S3.9, wherein the expression is as follows:

in the formula, R _n (t ') represents the reward function of the terminal device n at the decision slot t ', γ represents the discount factor, A ' represents such that

The operation of obtaining the maximum value is performed,

represents the maximum gain in slot, S, at the time of t' +1 decision _n (t '+ 1) represents the state space of terminal device n at decision slot t' + 1;

s3.11, updating a parameter theta of the evaluation network based on a gradient descent method;

s3.12, assigning the parameter theta of the evaluation network to the parameter theta' of the target network in each step C;

s3.13, judging that T is less than T, if so, returning to the step S3.5, otherwise, executing the step S3.14;

and S3.14, executing the epsilon +1, judging that the epsilon is less than M, if so, returning to the step S3.3, otherwise, outputting a joint decision model containing the target network.

In step S4, the expression of the maximized benefit function is:

s.t.C ₁ (t)+C ₂ (t)+...+C _n (t)+...+C _N (t)≤1；

a _n (t)∈{0，1，2}；

I _n (t)∈{1，2，3}；

in the formula, a _n (t) indicating the video frame unloading decision of the head frame of the local queue of the terminal equipment n output by the edge server in the decision slot t, namely, continuously waiting in the local queue, immediately unloading to the edge server or directly outputting the tracking result when a _n When (t) is 0, the first frame of the queue representing the terminal device n waits for the next decision slot, when a is _n When (t) is 1, the method indicates that the head-of-line frame of the terminal device n is unloaded to the edge server immediately, and when a is _n When (t) is 2, it means that the terminal device n directly outputs the tracking result, C _n (t) represents the channel decision of the terminal device n output by the edge server in the decision slot t, when C _n When (t) is 0, it means that terminal device n is not allocated to a channel in slot t at decision time, and when C is equal to _n When (t) is 1, it means that terminal device n is allocated to a channel in decision slot t, I _n (t) the frame number of the interval between the first frame of the slot queue of the next decision time slot and the first frame of the slot queue of the current time slot, namely the frame interval decision, R, of the terminal device n output by the edge server in the decision time slot t _n (t) represents the reward function, i.e. gain function, of the terminal device N at the decision slot t, N representing the total number of terminal devices.

The invention has the beneficial effects that:

the DBT-based real-time target detection framework mainly aims at continuous video frame scenes with delay constraint, and establishes a target detection system based on the dynamic change network conditions and video content for the cooperative detection of terminal equipment and an edge server, and the characteristics of real-time target detection under a plurality of terminal equipment scenes based on the DBT framework can be further analyzed through the system; the influence of the video content change rate is introduced, the terminal equipment selects different tracking frequencies based on the video content change rate instead of tracking each frame traditionally, an optimization problem is formed by designing a gain function, and the video frame detection accuracy is maximized under the delay limit.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of the present invention.

Fig. 2 is a diagram illustrating tracking accuracy at different frame intervals.

Fig. 3 is a diagram illustrating the change of the average tracking accuracy when the frame interval changes.

Fig. 4 is a graph comparing the effect of the present application with other algorithms.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

A dynamic task scheduling method for video detection and tracking, as shown in fig. 1, includes the following steps:

s1, constructing a real-time target detection system comprising a plurality of terminal devices and an edge server, wherein the terminal devices are provided with target trackers, the edge server is provided with target detectors, and each terminal device is in communication connection with the edge server through a wireless network;

the set of all end devices is denoted by N, N ═ 1,. lamda, N }, and the set of video frames captured by the nth end device is denoted by F _n The representation is that all video frame sets captured by the terminal equipment are represented by F, and F is { F ═ F ₁ ，...，F _n ，...，F _N }. The terminal equipment operates a light-weight target tracker, and the edge server operates a large-scale target detector so as to realize real-time detection of targets in captured video frames. However, the tracking performance may decrease with time and the change of the video content, and therefore, before the tracking performance decreases to be too low, that is, the tracking threshold, a new video frame should be sent to the edge server for detection to obtain a new detection result, so as to improve the accuracy of target tracking of the terminal device.

Each terminal device maintains a local queue for buffering video frames waiting for processing, the video frames in the local queue wait for processing according to the principle of first-come first-served, the system time is divided into continuous time slot slots, and if the time slot slots are small enough, only one frame can reach the local queue in each time slot. In each decision time slot t, that is, a time slot in which a video frame waits in a local queue, a video frame at the head of the queue of each terminal device is considered to be also called a head frame, and since the target tracker needs to use a bounding box (bounding box) detected by the edge server for initialization. Therefore, before starting tracking, the terminal device needs to send a first frame to the edge server for detection, obtain a detection result of the first frame, namely the bounding box, then operate the target tracker on a subsequent team first frame based on the detection result for tracking, send frame information and tracking precision to the edge server by the terminal device after tracking, and make a decision of channel allocation, video frame unloading and tracking frequency, namely a tracking frame interval, based on the whole situation and send the decision to the terminal device by the edge server; and finally, the terminal equipment makes corresponding action according to the decision of the edge server. Since the data amount of the result output by the edge server is much smaller than that of the frame itself, the time for returning the result is ignored, and only the uplink frame transmission process of the whole system is considered. And if the unloading decision is local tracking, directly outputting a tracking result. If the unloading decision is immediate unloading and the channel is available, the unloading decision can be unloaded to the edge server for detection, and the detection result is returned to the corresponding terminal equipment after the edge server is detected. If the offload decision is either pending or directly offloaded but the channel is not available, then the slot needs to wait for the next decision in the local queue.

Due to the limited wireless network resources, the wireless network bandwidth may become a bottleneck for the terminal device to offload video frames to the edge server. The present application addresses this challenge in two ways: on one hand, the video frame with reliable tracking performance directly outputs the tracking result to save bandwidth; on the other hand, for a video frame with lower tracking performance, due to the limitation of bandwidth resources and competition of terminal devices, a situation that no available wireless channel exists in slot t at the time of decision, and the video frame waits in a local queue of the terminal device until the channel is available.

S2, constructing a joint optimization Problem of video frame unloading Decision, channel Decision and frame interval Decision in the real-time target detection system as an MDP Problem (Markov Decision Problem), including the following steps:

S _n (t)＝(M _n (t)，h _n (t)，p _n (t)，v _n (t))；

in the formula, M _n (t) queue head frame information of local queue of terminal device n at decision slot t, h _n (t) denotes the channel gain between terminal device n and edge server, v _n (t) represents the rate of change of video content of terminal device n at decision slot t, S _n (t) represents the state space of terminal device n at decision slot t, p _n (t) head of line of terminal device n at slot time of decision time tThe tracking accuracy of the frame.

The first frame information M of the local queue of the terminal device n at the time of the decision slot t _n The expression of (t) is:

in the formula s _n (t) represents the frame size of the head-of-queue frame of the local queue of terminal device n at decision slot t,

the arrival time of the head frame of the queue of the local queue representing terminal device n,

indicating the time the head of line frame of the local queue of terminal device n has waited before processing at decision slot t.

The channel gain h between the terminal equipment n and the edge server _n (t) conforms to the rayleigh fading channel model, and is calculated by the formula:

representing the average channel gain of terminal device n.

Average channel gain of the terminal device n

The method conforms to a free space path loss model, and the calculation formula is as follows:

Before the slot is finished in each decision, the local queue of the terminal equipment is updated, and the number of the video frame frames cached in the local queue of the terminal equipment n in the slot in t decision adopts X _n (t) represents, X _n The evolution of (t +1) depends on the arrival of a new video frame and the departure of an old video frame, and the updated expression is:

in the formula (I), the compound is shown in the specification,

is a random binary variable which indicates whether slot t will have a new video frame to reach terminal equipment n, O during decision making _n (t) E {0, -1} is also a random binary variable that indicates whether the video frame with slot t at the head of the queue will leave the local queue of terminal device n at the time of decision, X _n (t +1) represents the number of video frame frames buffered by the local queue of terminal device n at time slot t + 1. O is _n (t) — 0 indicates that at decision slot t the head-of-line frame of the local queue of terminal device n will continue to wait until the next decision slot, O _n And (t) — 1 indicates that the head-of-queue frame of the local queue of the decision slot t terminal device n will leave the local queue in the next decision slot, for example, directly output the tracking result of the video frame, or unload the video frame to an edge server for detection.

Based on experiments, it is found that it takes about 10ms for the terminal device to track a single target in a frame, and the duration of tracking the whole frame is increased in proportion to the increase of the number of targets in the frame. Therefore, in order to provide real-time video analysis processing, some frames must be skipped during tracking to keep up with the frame capture speed of the terminal device, such as a video camera, and therefore, I is used _n (t) represents a frame interval determined at the time of decision slot t. Thus, the decision slot t +1 time terminal devicen local queue buffer video frame number X _n (t +1) may be updated as:

in the formula, O _n (t) becomes {0, -I _n (t) }, 0 denotes that the head-of-line frame continues waiting in the local queue.

As shown in FIG. 2, the experiment measures a succession of 50 video frames, I _n (t) is a value of at least 1 and at most 10. It can be seen from the figure that either I _n (t) what value to take, the tracking accuracy decreases with the increase of the tracking frame number, and I _n The larger the value of (t), the faster the tracking accuracy decreases, so I cannot be increased indefinitely to provide real-time processing _n The value of (t). I of the present example _n (t) e {1, 2, 3}, and as shown in FIG. 3, in the case of continuously tracking 50 frames, the average tracking accuracy is kept at I of 0.5 or more _n The value of (t) is 1, 2, 3.

In the same I _n (t), if the video content changes faster, the displacement between two tracked video frames is larger, and the tracking accuracy is more unreliable. Therefore, to ensure more reliable tracking accuracy of the terminal equipment, I _n (t) the determination should introduce the effect of the rate of change of the video content, and the metric evaluating the rate of change of the video content must be lightweight to ensure that its calculations do not affect the tracking operation of the real-time target detection system. The method measures the change rate of the video content by using the intermediate result of tracking, so that additional calculation is hardly added, the average moving speed of all characteristics extracted from two adjacent frames is used as the change rate of the video content, and the change rate v of the video content of the terminal equipment n at the time slot t is used as the change rate of the video content _n The formula for calculation of (t) is:

in the formula (I), the compound is shown in the specification,

the pixel position of the kth characteristic of the jth frame in the local queue of the terminal equipment n when the slot t is decided is shown, m represents the characteristic number of the video frame in the local queue of the terminal equipment n when the slot t is decided, and j-i is more than or equal to 1, because some video frames are skipped when the target tracking is carried out. The video content change rate can be obtained by calculating the moving speed between two adjacent frame features, and a high moving speed means that the video content changes rapidly, i.e. the existing object moves out rapidly, and new objects may appear frequently.

According to the method, the target tracking of the frame is carried out based on the Lucas-Kanade method, the tracking precision is reduced along with the change of time and video content, and meanwhile, the first frame of the team with reliable tracking performance of the terminal equipment tends to directly output a tracking result, so that the bandwidth is saved. The following is to calculate the intersection ratio of the tracking result and the real result to measure the tracking performance, and the corresponding expression is as follows:

in the formula, Y _n And (t) represents a position area of the target detected by the tracking algorithm run by the terminal equipment n when the slot is determined by t, and G represents a real position area of the target.

A _n (t)＝(a _n (t)，C _n (t)，I _n (t))；

in the formula, A _n (t) represents the space of motion of terminal device n at decision slot t, a _n (t) video frame unloading decision of head frame of local queue of terminal equipment n output by edge server in decision slot t, namely, continuing waiting in local queue and immediately unloading to edgeThe server also directly outputs the tracking result when a _n When (t) is 0, the first frame of the queue representing the terminal device n waits for the next decision slot, when a is _n When (t) is 1, the method indicates that the head-of-line frame of the terminal device n is unloaded to the edge server immediately, and when a is _n When (t) is 2, it means that the terminal device n directly outputs the tracking result, C _n (t) represents the channel decision of the terminal device n at decision slot t output by the edge server, when C _n When (t) is 0, it means that terminal device n is not allocated to the channel in slot t at decision time, when C is equal to _n When (t) is 1, it means that terminal device n is allocated to a channel in decision slot t, I _n (t) the frame number of the interval between the first frame of the slot queue of the next decision time slot and the first frame of the slot queue of the current decision time slot, namely the frame interval decision, of the terminal device n output by the edge server at the decision time slot t.

in the formula, R _n (t) represents a reward function, namely a gain function, of the terminal device n in the decision slot t, and Acc represents the detection precision or tracking precision p of the first frame of the terminal device n in the decision slot t _n (T), the detection precision is set to 1.0, beta represents a weight coefficient, beta is more than 0, adjusting beta can balance the time weight between frame processing and frame transmission, alpha is a performance improvement factor, alpha is more than 0, the importance of adjusting inference performance in a reward function can be reflected through the factor, and T _max Represents the maximum value of the ideal range of the video frame detection delay, which refers to the maximum delay of a frame that can be detected under the condition of satisfying the required detection delay,

representing the processing time of the head frame in terminal device n at decision slot t.

When the slot t is decided, if the first frame of the queue directly outputs the tracking nodeIf so, the processing time of the first frame of the team

Including the tracking time and the waiting time in the queue, the calculation formula is as follows:

in the formula (I), the compound is shown in the specification,

the tracking time of the head frame of the team in the terminal device n at the decision slot t is shown.

At decision time slot t, if the head frame is unloaded immediately and a channel is available, then the processing time of the head frame

The calculation formula of (2) is as follows:

in the formula, T _e Indicating the time when the edge server performs the object detection,

and the time of transmission of the head frame of the queue in the terminal device n through the channel at the decision slot t is shown.

Time for transmitting first frame of queue in terminal equipment n through channel in decision time slot t

The calculation formula of (c) is:

in the formula, s _n (t) represents the frame size of the head frame of the queue in terminal device n at decision slot t, i.e.Amount of data, r _n (t) represents the transmission rate between the two in the case where the edge server allocates a terminal device n channel at the time of the slot decision t.

Considering the path loss and Rayleigh fading of the channel, based on Shannon's theorem, when the edge server allocates the n channel to the terminal device in the slot time of t decision, the transmission rate r between the two _n The formula for (t) is:

where w represents the channel bandwidth, h _n (t) denotes the channel gain of terminal device n as a function of decision slot t, P _n Representing the transmission power of terminal equipment N, N ₀ Representing the background noise power.

Since in order to efficiently utilize bandwidth resources, if a wireless network is unavailable or deteriorates, the first frame of the queue waits for the next decision slot t in the local queue, and these frames are often transmitted to the edge server for detection, rather than directly outputting the tracking result, otherwise, the frame should not be decided to wait. Therefore, in the decision slot t, if the head frame decides to wait, or decides to unload immediately but the wireless network is unavailable, the head frame needs to wait in the local queue until the channel is available and then unload to the edge server, then the processing time of the head frame is the same as the processing time of the head frame

The calculation formula of (c) is:

in the formula (I), the compound is shown in the specification,

presentation decision slot

representing predicted slot t to slot t

The number of time slots, and

is a positive integer.

S3, each terminal device will track the accuracy p in each decision slot _n (t) head of line frame information M _n (t) rate of change of video content v _n (t) sending to an edge server, wherein the edge server constructs a joint decision model by using an algorithm of Deep Reinforcement Learning (DRL) of DDQN (double Deep Q network), and the method comprises the following steps:

s3.2, setting the training cycle number epimode to be 1;

s3.3, to the State space S _n (t) initialization, i.e. S _n (t)＝S _n (0)；

S3.4, setting the number T of decision time slot slots;

s3.5, performing t ═ t + 1;

s3.6, selecting action A according to probability epsilon _n (t), the expression of which is:

in the formula, θ represents a parameter for evaluating the network, and A represents Q (S) _n (t)，A _n (t); θ) is the maximum value, and random in this equation refers to the random selection of motion from the motion space.

S3.7, according to action A selected in step S3.3 _n (t) earning a reward R _n (t) and the state space S of the next step _n (t+1)；

s3.9, randomly taking out G experiences from the experience playback memory D (S) _n (t′)，A _n (t′)，R _n (t′)，S _n (t' +1)), wherein S _n (t ') represents the state space of terminal device n at decision slot t', A _n (t ') represents the action space of the terminal device n at the time of the decision slot t';

s3.10, predicting the profit according to the experience taken out in the step S3.9, wherein the expression is as follows:

in the formula, R _n (t ') represents the reward function of the terminal device n at the decision slot t ', γ represents the discount factor for balancing the current benefit with the long-term reward, a ' represents such that

The operation of obtaining the maximum value is performed,

represents the maximum gain in slot, S, at the time of t' +1 decision _n (t '+ 1) represents the state space of terminal device n at decision slot t' + 1.

s3.12, assigning the parameter theta of the evaluation network to the parameter theta' of the target network in each step C, wherein C is integral multiple of T and is less than T;

The DDQN algorithm comprises an evaluation network with a parameter theta and a target network with a parameter theta', the evaluation network is used for updating the parameters by reducing a loss function, the target network is used for calculating a target Q value, and the target network parameters are updated by the evaluation network at regular intervals. Meanwhile, the DDQN maintains a section of experience replay memory D, stores some past experiences, and updates the stored experiences when the experience replay memory D is full.

S4, with the goal of maximizing a gain function, solving the joint optimization problem by using the joint decision model constructed in the step S3, and executing the terminal equipment according to the video frame unloading decision, the channel decision and the frame interval decision output by the edge server;

the expression of the maximized revenue function is:

s.t.C ₁ (t)+C ₂ (t)+...+C _n (t)+...+C _N (t)≤1；

a _n (t)∈{0，1，2}；

I _n (t)∈{1，2，3}。

in the following, Jetson Nano is taken as a terminal device, a Lucas-Kanade target tracker is operated, Jetson AGX Xavier is taken as an edge server, YOLOX is taken as a target detector, the time of tracking one frame by the terminal device and the time of detecting one frame by the edge server are measured really, and then a simulation environment is established based on the time. The system is divided into individual time slot slots, and the time slot slots are assumed to be small enough, so that at most one new frame arrives at the local queue at the terminal equipment in each time slot, and the arrival rate of the frames conforms to the Bernoulli process with the parameter P. The network simulation adopts a wireless channel Rayleigh fading model, wherein the gain of each terminal equipment antenna is set to be 4.11, the distance between the terminal equipment and the edge server is in accordance with the uniform distribution of U (2.5,5.2), the transmission power of the terminal equipment is 0.03, the background noise is 10e-10, the path loss coefficient is 2.8, and the bandwidth of an uplink is 2 MHZ. The DDQN algorithm based on the pytorech 1.7 was implemented using python, and the size of D was set to 1000, the total training round was set to 400, the batch size was 32, the learning rate was 0.0001, γ was set to 0.9, and ε was set to 0.9.

In order to show the superiority of the method in a continuous video frame scene, the method is compared with a Random algorithm and a Greedy algorithm, and the evaluation index is the system average reward. The random algorithm is to randomly select a decision without considering any environmental information, whose performance is always the worst. The greedy algorithm makes an optimal decision based on the current state, but does not consider the interaction between adjacent tasks. As shown in fig. 4, P is the arrival rate of the video frame in each timeslot slot, and the larger P indicates that the higher the video frame rate is, the more intensive the task is; the smaller p indicates that the video frame rate is smaller and the task is more sparse. It can be found that the algorithm of the present application is superior to the random algorithm and the greedy algorithm regardless of the fluctuation of the P value.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims

1. A dynamic task scheduling method for video detection and tracking is characterized by comprising the following steps:

and S4, solving the joint optimization problem by using the joint decision model constructed in the step S3 with the aim of maximizing a gain function, and executing the terminal equipment according to the video frame unloading decision, the channel decision and the frame interval decision output by the edge server.

2. The dynamic task scheduling method for video detection and tracking according to claim 1, wherein the step S2 comprises the following steps:

S _n (t)＝(M _n (t)，h _n (t)，p _n (t)，v _n (t))；

in the formula, M _n (t) queue head frame information of local queue of terminal device n at decision slot t, h _n (t) denotes the channel gain between terminal device n and edge server, v _n (t) represents the rate of change of video content of terminal device n at decision slot t, S _n (t) represents the state space of terminal device n at decision slot t, p _n (t) represents the tracking precision of the head frame of the team of the terminal device n when the slot is determined by t;

A _n (t)＝(a _n (t)，C _n (t)，I _n (t))；

in the formula, A _n (t) represents the space of motion of terminal device n at decision slot t, a _n (t) video frame unload decision representing the first frame of the local queue of the terminal device n output by the edge server in the decision slot t, i.e. whether to continue waiting in the local queue, unload to the edge server immediately or output the tracking result directly, C _n (t) denotes the channel decision of the terminal device n at decision slot t, I, output by the edge server _n (t) the frame number of the interval between the first frame of the slot queue of the next decision time slot and the first frame of the slot queue of the current decision time, namely the frame interval decision, of the terminal device n output by the edge server at the decision time slot t is represented;

in the formula, R _n (t) represents a reward function, namely a gain function, of the terminal device n at the time of the decision slot t, Acc represents the detection precision or tracking precision of the first frame of the terminal device n at the time of the decision slot t, β represents a weight coefficient, and β > 0,

3. The video-detection-and-tracking-oriented dynamic task scheduling method according to claim 2, wherein in step S2.1, the first frame information M of the local queue of terminal device n at decision slot t is determined _n The expression of (t) is:

4. The video detection and tracking oriented dynamic task scheduling method of claim 2, wherein in step S2.1, the channel gain h between the terminal device n and the edge server _n The formula for calculation of (t) is:

represents the average channel gain of terminal device n;

average channel gain of the terminal device n

The calculation formula of (2) is as follows:

5. The video detection and tracking oriented dynamic task scheduling method of claim 2, wherein in step S2.1, the calculation formula of the tracking accuracy pn (t):

6. The method for dynamic task scheduling for video detection and tracking according to claim 2, wherein in step S2.1, the video content change rate v of terminal device n at time slot t is _n The formula for (t) is:

in the formula (I), the compound is shown in the specification,

the pixel position of the kth feature of the i-th frame in the local queue of terminal device n at the time of decision slot t,

the pixel position of the kth characteristic of the jth frame in the local queue of the terminal equipment n at the time of the decision slot t is shown, m represents the characteristic number of the video frame in the local queue of the terminal equipment n at the time of the decision slot t, and j-i is larger than or equal to 1.

7. The video detection and tracking oriented dynamic task scheduling method of claim 2, wherein in step S2.3, if the head frame of the teamDirectly outputting the tracking result to obtain the processing time of the first frame

The calculation formula of (2) is as follows:

in the formula (I), the compound is shown in the specification,

if the head frame is unloaded immediately and the channel is available, the processing time of the head frame

The calculation formula of (2) is as follows:

The calculation formula of (2) is as follows:

in the formula (I), the compound is shown in the specification,

presentation decision slot

representing predicted time slot t to time slot t

The number of time slots.

8. The dynamic task scheduling method for video detection and tracking according to claim 1, wherein the step S3 comprises the following steps:

s3.2, setting the training cycle number epimode to be 1;

s3.4, setting the number T of decision time slot slots;

s3.5, performing t ═ t + 1;

wherein A represents such that Q (S) _n (t)，A _n (t); θ) action at maximum value, A _n (t) represents the action space of terminal device n at decision slot t;

An operation of obtaining the maximum value of the data,

represents the maximum gain in slot, S, at the time of t' +1 decision _n (t '+ 1) indicates that terminal device n is in decision slot t' +1A state space of (a);

9. The method for dynamic task scheduling for video detection and tracking as claimed in claim 1, wherein in step S4, the expression of the maximized benefit function is:

s.t.C ₁ (t)+C ₂ (t)+...+C _n (t)+...+C _N (t)≤1；

a _n (t)∈{0，1，2}；

I _n (t)∈{1，2，3}；

in the formula, a _n (t) indicating the video frame unloading decision of the head frame of the local queue of the terminal equipment n output by the edge server in the decision slot t, namely, continuously waiting in the local queue, immediately unloading to the edge server or directly outputting the tracking result when a _n When (t) is 0, the head of line frame of the terminal device n waits for the next decision slot, when a _n When (t) is 1, the method indicates that the head-of-line frame of the terminal device n is unloaded to the edge server immediately, and when a is _n When (t) is 2, it means that the terminal device n directly outputs the tracking result, C _n (t) represents the channel decision of the terminal device n output by the edge server in the decision slot t, when C _n When (t) is 0, it means that terminal device n is not allocated to a channel in slot t at decision time, and when C is equal to _n When (t) is 1, it means that terminal device n is allocated to a channel in decision slot t, I _n (t) a frame number, namely a frame interval decision, representing the interval between the first frame of the next decision time slot queue and the first frame of the current time slot queue of the terminal equipment n output by the edge server at the decision time slot t, R _n (t) represents the reward function, i.e. gain function, of the terminal device N at the decision slot t, N representing the total number of terminal devices.