CN114815755A

CN114815755A - Method for establishing distributed real-time intelligent monitoring system based on intelligent cooperative reasoning

Info

Publication number: CN114815755A
Application number: CN202210576950.3A
Authority: CN
Inventors: 胡清华; 王卓航; 王晓飞; 赵云凤; 刘志成; 仇超
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2022-05-25
Filing date: 2022-05-25
Publication date: 2022-07-29

Abstract

The invention discloses a method for establishing a distributed real-time intelligent monitoring system based on intelligent cooperative reasoning, which comprises the following steps: establishing a horizontal and vertical segmentation model based on a deep neural network by using a horizontal segmentation algorithm and a vertical segmentation algorithm; constructing a Markov decision process by using a transverse division point decision, a transverse execution node decision and a longitudinal execution node decision; the base station utilizes a DDQN algorithm to construct a division point execution equipment decision model by taking the minimized task processing time difference as a target; each monitoring terminal inputs a video stream into the decision-making model and uploads the video stream to a transverse execution node, the transverse execution node performs transverse division and execution by using a transverse division model, and the executed network parameters are sent to a longitudinal execution node; the vertical execution node vertically divides and executes the vertical execution network according to the vertical division algorithm; and the cloud receives the execution results from each longitudinal execution node, and completes the cross-camera track matching by using a track matching algorithm. The invention improves the effective utilization rate of system computing power.

Description

Method for establishing distributed real-time intelligent monitoring system based on intelligent cooperative reasoning

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a method for establishing a distributed real-time intelligent monitoring system based on intelligent cooperative reasoning.

Background

An Intelligent monitoring System (ISS) is an important application combining Deep Learning (DL) and Internet of Things (IoT), and Multiple Targets Multiple Camera Tracking (MTMCT) has been widely recognized as a promising solution for Intelligent monitoring systems. However, the current terminal devices have low memory, low power consumption and limited computing power, so that an Artificial Intelligence (AI) model deployed on these resource-limited devices needs to ensure inference delay and accuracy. In order to fully utilize the computing power of the system and minimize the system computing delay, a real-time video analysis system based on edge computing has become a new research hotspot in recent years. However, the existing system architecture lacks a fine-grained discussion of a deep learning model, and the cooperation problem among camera clusters is ignored in model splitting.

The key reason for the success of MTMCT is the explosive development of deep learning techniques and the internet of things, such as image recognition, target detection, and target tracking. Real-time video analysis based on the Internet of things can be realized by utilizing dynamic calculation unloading and resource allocation. The real-time video stream analysis architecture based on edge calculation verifies the possibility of edge calculation in the visual field. However, a wide range of application scenarios lead to different architectures in edge cloud clusters, and there is a lack of discussion on fine-grained Deep Neural Network (DNN) models. In the field of DNN model partitioning, considering cooperation between multi-layer clusters remains a problem to be solved. Thus, edge real-time video analytics systems still present three challenges: 1. offloading the complex DNN model to a limited edge device; 2. offload edge cloud heterogeneous clusters between tiers or nodes; 3. the method relieves the computing pressure of the cloud, fully utilizes the computing power of the system, and minimizes the computing delay in the scenes of multiple video streams. Specifically, the method comprises the following steps:

firstly, data in the MTMCT requires a huge computing power to completely release the potential, and cloud computing is one of the solutions. However, with the proliferation of the number of surveillance cameras, it is challenging to process such huge data only through cloud computing, which faces huge transmission pressures, high latency, expensive cost, and low security.

Secondly, a real-time video analysis system based on edge calculation becomes a research hotspot. Common system designs focus on solving the multi-target multi-camera tracking task by using a collaborative learning strategy, however, the deep neural network still runs in the cloud, resulting in system performance limitations. Existing systems achieve adaptively balancing workloads between smart cameras and partition workloads between cameras and edge clusters to achieve optimized system performance. However, the nature of DNN has not been considered, and artificial intelligence models deployed on these resource-constrained devices need to keep the number of parameters low while tolerating high inference delays and low accuracy. Therefore, it makes sense to combine DNN partitioning with cross-camera video processing.

Driven by the above trend, there are many types of research that pay attention to the splitting of DNN to reduce delay and save resources, and there are two basic classification strategies: horizontal segmentation and vertical segmentation. For the former, the horizontal division of the DNN utilizes the characteristics of the DNN to design a coarse-grained hierarchical calculation division strategy. By treating the DNN as a Directed Acyclic Graph (DAG), the problem of minimizing latency is shifted to an equivalent minimal partitioning problem. However, the horizontally split DNN does not enable parallel execution of the model network and increases the communication cost of intermediate parameters between devices. For vertical segmentation, the feature projections are convolutional layer partitioned based on the computational power of a single node, and then the outputs of all nodes in the host are combined. Vertical Partitioning applies a scalable convolutional layer Fused partition (FTP) policy to minimize memory footprint and reduce transmission between nodes. Vertical segmentation, while enabling both inter-convolutional layer and inter-convolutional layer implementation, requires more detailed discussion of models where residual structure and attention mechanisms exist.

The traditional cloud computing mode cannot handle huge data volume and suffers from heavy transmission pressure, high latency, expensive cost and low security. As shown in fig. 1, a process for deep learning based MTMCT computation is shown. The common MTMCT system flow is divided into the following three major parts: the multiple cameras generate video stream data in real time and upload the video stream to the cloud center for algorithm execution; the MTMCT algorithm deployed in the cloud center performs Target Detection (Target Detection), that is, a common deep neural network such as YOLOv4 is used to detect targets such as people, bikes, and bicycles in a video, and a network structure of YOLOv4 is shown in fig. 3; the MTMCT algorithm executes track association (TrackerAssociation) by using a Deepsort algorithm, namely, corresponding track characteristics are generated for targets in each video, the algorithm compares tracks of front and rear time of the same camera, the currently generated target track is matched with the target track which is detected in the past, track matching is performed between crossing cameras, and finally tracking results of the targets in a single camera and the video of the crossing cameras are obtained.

Disclosure of Invention

Aiming at the technical problem of lightweight model Inference of ISS resource-limited computing nodes, the invention provides a method for establishing a Distributed Real-Time Intelligent monitoring System based on Intelligent cooperative Inference, which realizes the Edge Intelligent cooperative Inference based Distributed Real-Time Intelligent monitoring System (Edge Intelligent Collaborative information for Distributed Real-Time Intelligent tracking System, EI-ISS), and is suitable for executing self-adaptive computing unloading and resource allocation of cooperative multi-camera tracking tasks on an Edge cloud System. In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a method for establishing an intelligent cooperative lightweight model of a distributed intelligent monitoring system comprises the following steps:

s1, constructing a video monitoring system comprising a monitoring terminal, a base station and a cloud;

s2, establishing a horizontal segmentation model based on a deep neural network by using a horizontal segmentation and vertical segmentation algorithm, wherein the output of the horizontal segmentation model comprises a horizontal execution network and a vertical execution network;

s3, constructing a transverse division point decision, a transverse execution node decision and a longitudinal execution node decision of the transverse division model into a Markov decision process;

s4, the cloud utilizes a DDQN multi-agent deep reinforcement learning algorithm and takes the minimized task processing time difference as a target function to construct a division point execution equipment decision model;

s5, each monitoring terminal respectively inputs the generated video stream into a division point execution equipment decision model, uploads the video stream to a corresponding transverse execution node according to a transverse execution node decision generated by the division point execution equipment decision model, and the transverse execution node completes transverse division and execution of the model by using the transverse division model and sends network parameters after transverse execution to a corresponding longitudinal execution node;

s6, the longitudinal execution node respectively utilizes the longitudinal division algorithm to longitudinally divide and execute the respective longitudinal execution network according to the longitudinal execution node decision generated by the division point execution equipment decision model;

and S7, the cloud receives the execution results from the longitudinal execution nodes and completes cross-camera track matching by using a track matching algorithm.

The step S2 includes the following steps:

s2.1, constructing the deep neural network model into a directed acyclic graph;

s2.2, obtaining the longest path from the input layer to the output layer in the deep neural network model by using a depth-first algorithm;

s2.3, dividing the longest path obtained in the step S2.2 into a transverse execution network by utilizing a transverse division algorithm

And vertical execution network

The horizontal execution network

And vertical execution network

The expression of the network parameters of (a) is:

wherein n is 0 or 1, and when n is 0,

represents the transverse division point p ^h Partitioned transverse execution network

When n is 1,

represents the transverse division point p ^h Partitioned vertical execution network

The network parameters of (a) are set,

representing the Resunit network structure at a transverse division point p ^h Is divided into

Number ratio, omega, in the network layer of a network _Resunit A parameter indicating the structure of the reset network,

represents CBL _{conv+bn+leakyRelu} Network structure is divided into points p in transverse direction ^h Is divided into

Number ratio in the network, ω _CBL Represents CBL _{conv+bn+leakyRelu} The parameters of the network structure are such that,

representing CBM _conv+bn+mish Network structure is divided into points p in transverse direction ^h Is divided into

Number ratio in the network, ω _CBM Representing CBM _conv+bn+mish Parameters of the network structure.

The step S3 includes the following steps:

s3.1, constructing a state space, wherein the expression of the state space is as follows:

in the formula (I), the compound is shown in the specification,

presentation execution camera d _m The execution time of the detection task of the video stream of (2),

indicating camera d _m In the executing node

And executing node

The delay in the transmission between the first and second,

indicating camera d _m Detecting task of video stream at execution node

The execution time of the first time slot is greater than the execution time of the second time slot,

indicating camera d _m Detecting task of video stream at execution node

The waiting time of the upper computer system is shorter,

representing base station and cloud in-phase dividing camera d _m States under the influence of other agents besides the agent, including the number of tasks waiting to be executed, the amount of tasks and the predicted completion time of the currently executed task in the base station and the cloud, p ^m Indicating camera d _m The network corresponding to the video stream of (a),

indicating camera d _m In the state at the time t,

epsilon represents the set of base stations and,

the representation of a cloud is shown,

is a collection of monitoring terminals;

s3.2, constructing an action space, wherein the expression is as follows:

in the formula (I), the compound is shown in the specification,

presentation cameraMachine d _m An action at time t;

s3.3, constructing a reward function, wherein the expression of the reward function is as follows:

in the formula, mu represents a penalty weight, rho represents the tolerance of the maximum time difference of all the cameras for completing the detection task at the moment t, Z represents the maximum time tolerance of completing the detection task of a single camera,

indicating camera d ₀ In the motion space at the time t,

indicating camera d _M-1 In the motion space at the time t,

indicating camera d ₀ In the state space at the time t,

indicating camera d _M-1 In the state space at the time t,

indicating camera d _m The reward function at the time of t,

presentation execution camera d _m The maximum execution time of the detection task of the video stream of (1),

presentation execution camera d _m The shortest execution time of the detection task of the video stream.

In step S3.1, the execution camera d _m Execution time of detection task of video stream

The calculation formula of (2) is as follows:

in the formula (I), the compound is shown in the specification,

indicating camera d _m Detection task of video stream is executed in vertical direction

indicating camera d _m To the horizontal execution node

The transmission delay of (1).

The camera d _m Detection task of video stream is executed in horizontal direction

Waiting time of

The calculation formula of (2) is as follows:

in the formula (I), the compound is shown in the specification,

indicating camera d _m′ Transmitting detection task of video stream to transverse execution node

Is delayed in transmission, and

is an index function.

The camera d _m To the horizontal execution node

Is delayed

The calculation formula of (2) is as follows:

in the formula (I), the compound is shown in the specification,

indicating camera d _m And a horizontal execution node

Bandwidth speed between, K _m Indicating camera d _m The actual amount of data of the video stream is transmitted per second.

Execution time on

The calculation formula of (2) is as follows:

in the formula (I), the compound is shown in the specification,

indicating camera d _m Corresponds to the regression parameter of the execution time,

indicating camera d _m Horizontal division point p of network corresponding to detection task of video stream ^m Partitioned transverse execution network

In the horizontal execution node

The output parameters of the last execution are output,

to represent

The regression parameters corresponding to the execution time are,

representing a horizontal execution network

Corresponds to the regression parameter of the execution time,

representing a horizontal execution network

In the horizontal execution node

The regression parameters of (a) above (b),

constant term, K, representing a fitting performed by time regression _m Indicating camera d _m The actual amount of data of the video stream is transmitted per second,

representing horizontally executing nodes

The computing power of (a) is determined,

to calculate

The required computational power of the fitting function is regressed,

The network parameter of (2).

The camera d _m The detection task of the video stream is executed on the horizontal execution node

And a vertical execution node

Transmission delay therebetween

The calculation formula of (2) is as follows:

in the formula (I), the compound is shown in the specification,

representing horizontally executing nodes

And a vertical execution node

The speed of the bandwidth in between is,

In the horizontal execution node

And outputting the executed output parameters.

In step S4, the expression of the objective function is:

in the formula, T ^Total Representing the maximum processing time difference of a video detection task periodically generated by a camera cluster, P representing a transverse division point set of a deep neural network model, Y ₀ A set of transverse execution nodes Y representing a transverse execution network divided by each transverse division point in the set of transverse division points P ₁ Represents a longitudinal execution node set of a longitudinal execution network divided by each transverse division point in the transverse division point set P, P represents the tolerance of the maximum time difference of finishing detection tasks of all cameras at the time t,

presentation execution camera d _m The shortest execution time of the detection task of the video stream of (2),

is a collection of monitoring terminals.

The step S6 includes the following steps:

s6.1, dividing the longitudinal execution network into l x l fusion areas according to a l x l grid method;

s6.2, establishing a relation among network parameters, input and output of the longitudinal execution network by using a regression function, wherein the corresponding expression is as follows:

in the formula (I), the compound is shown in the specification,

representing a horizontal execution network

In the horizontal execution node

The regression fitting parameters of the intermediate parameters output after the execution and the execution time of the longitudinal execution network are completed,

representing vertical execution networks

Executing nodes in the vertical direction

Regression fitting parameters of the output parameters after completion of executionThe number of the first and second groups is,

representing vertical execution networks

The regression-fit parameters of the parameters,

indicating camera d _m Horizontal division point p of network corresponding to detection task of video stream ^m Partitioned vertical execution network

Executing nodes in the vertical direction

The output parameter after the execution is finished,

representing vertical execution networks

Executing nodes in the vertical direction

The above regression fit constant term parameters,

representing vertical execution nodes

The computing power of (a) is determined,

representation calculation

The required computational power of the fitting function is regressed,

to represent

The regression of the parameters of (a) to (b),

The network parameters of (a) are set,

In the horizontal execution node

An executed output parameter;

s6.3, each vertical execution node

And carrying out parallel computation on the received longitudinal execution network, and merging and outputting the computed results.

The invention has the beneficial effects that:

the computing power requirement, the time cost and the network parameters of the complex DNN target identification algorithm are modeled and represented by using a regression equation, the transverse segmentation and vertical segmentation methods are combined, the computing power of the base station is fully utilized, and the computing time cost of the complex DNN model is reduced. In addition, the method adopts a solution based on deep reinforcement learning to obtain an approximate optimal solution of dynamic resource allocation, accelerates the cooperative reasoning speed of the edge cloud system, improves the effective utilization rate of system computing power, and has better effect than a baseline through experiments. The model is divided into two parts by the transverse division algorithm according to the DNNs level granularity, so that the computing capacity utilization rate of the system is improved, and the higher cloud service cost is reduced. The longitudinal segmentation algorithm is based on the FTP strategy, so that the memory occupation can be minimized, the transmission between nodes can be reduced, and the parallel computation of the model can be realized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flow of a deep learning based MTMCT system.

Fig. 2 is a workflow flow of a base station and cloud based MTMCT system.

Fig. 3 is a network structure diagram of YOLOv4 in the prior art.

Fig. 4 is a schematic diagram of segmentation based on the horizontal segmentation method and the vertical segmentation method.

FIG. 5 is a schematic diagram of a multi-agent reinforcement learning training system.

Fig. 6 is a flow chart of modeling according to the present application.

Fig. 7 is a graph showing the effect of the present application compared with other algorithms.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

A method for establishing a distributed real-time intelligent monitoring system based on intelligent collaborative reasoning, as shown in fig. 1 and 2, includes the following steps:

the video monitoring system adopts

It is shown that the process of the present invention,

representing a collection of monitoring terminals, i.e. cameras, each of which is an agent,

m represents a monitoring terminal set

The number of the monitoring terminals, epsilon represents the set of base stations, epsilon is { e } ₀ ，e ₁ ，...，e _N-1 N denotes the number of base stations in the set of base stations epsilon,

the representation of a cloud is shown,

each monitoring terminal is in wireless connection with all base stations, any two base stations are in wired connection, and all the base stations are in wired connection with the cloud, so that transmission of video streams is achieved. Each base station and the cloud are execution nodes which are responsible for shooting the video cameraThe target tracking is realized by detecting and calculating the video stream, and the computing power of the execution node i is represented as C _i And is and

represents the speed per bit of a CPU or GPU disk; by the use of b _i，m Indicating camera d _m Bandwidth speed between execution node i in kb/s, camera d _m Generating x per second _m The larger the batch size, the more memory and computational power are consumed in image detection, due to the limited memory of the execution nodes in the video stream of frames. Therefore, to alleviate the overall stress on the system, it is believed that the camera will drop some frames per second, with the frame loss rate set to F. Camera d _m The number of valid data frames per second is f _m ＝(1-F)·x _m Actual data amount of transmitted video stream is K _m ＝k·f _m Where k is the average unit amount per video frame. If the executing node i will execute the detecting camera d _m Detecting and tracking task of video stream, hereinafter referred to as detecting task of camera, then the camera d is detected _m The transmission delay of the video stream to the execution node i is expressed as

S2, designing a horizontal segmentation model based on a deep neural network by using a horizontal segmentation algorithm, wherein the output of the horizontal segmentation model comprises a horizontal execution network and a vertical execution network, as shown in FIG. 4, the method comprises the following steps:

s2.1, constructing a Deep Neural Network (DNN) model into a directed acyclic graph;

the directed acyclic graph adopts

It is shown that,

representing a set of vertices and also the DNN model network fabric layers,

a set of edges is represented that are,

representing a vertex v _j Is a vertex v in a network structure of the DNN model _j′ An output of, and

the present application uses YOLOv4 as a DNN model of a cross-camera tracking task, as shown in fig. 3, YOLOv4 has a large number of residual structures, and the YOLOv4 network mainly includes the following three network structures: CBL, CBM, ResUnit are as network element, the network structure of YOLOv4 network is prior art, and the detailed description is not repeated in this application.

S2.2, obtaining the longest path from the input layer to the output layer in the DNN model by using a depth-first algorithm;

And vertical execution network

The horizontal execution network

And vertical execution network

The expression of the network parameters of (a) is:

wherein n is 0 or 1, and when n is 0,

When n is 1,

The network parameters of (a) are set,

a set of lateral segmentation points is represented,

and is

P represents a set of transverse segmentation points

The number of the elements in the Chinese character,

represents the transverse division point p ^h Is divided into

The number ratio, omega, of the network structure of the Resunit network in the network in this network layer _Resunit Representation of a Resunit network structureIs determined by the parameters of (a) and (b),

represents the transverse division point p ^h Is divided into

CBL in a network _{conv+bn+leakyRelu} The number ratio, ω, of the network structure in the network layer _CBL Represents CBL _{conv+bn+leakyRelu} The parameters of the network structure are such that,

represents the transverse division point p ^h Is divided into

CBM in a network _conv+bn+mish The number ratio, ω, of the network structure in the network layer _CBM Representing CBM _conv+bn+mish Parameters of the network structure.

S3, constructing a transverse segmentation point decision, a transverse execution node decision and a longitudinal execution node decision of the transverse segmentation model into a Markov decision process, comprising the following steps:

s3.1, constructing a state space, wherein the expression is as follows:

in the formula (I), the compound is shown in the specification,

indicating camera d _m In the horizontal execution node

And a vertical execution node

The delay in the transmission between the first and second nodes,

indicating camera d _m Detection task of video stream is executed in horizontal direction

The waiting time of the upper computer system is shorter,

representing base station and cloud in-phase dividing camera d _m States under the influence of other agents besides the agents include the number of tasks waiting to be executed in the base station and the cloud, the size of the tasks, namely the task amount, and the predicted completion time of the currently executed task, p ^m Indicating camera d _m The network corresponding to the video stream of (a),

indicating camera d _m At time t, and

the execution camera d _m Execution time of detection task of video stream

The calculation formula of (2) is as follows:

in the formula (I), the compound is shown in the specification,

indicating camera d _m To the horizontal execution node

The transmission delay of (1).

Latency delays may occur when multiple cameras select the same execution node for computation. All these transmitted tasks are stored In a task queue of the execution node and executed In sequence on the execution node according to the First In First Out (FIFO) principle, so that the camera d _m Detection task of video stream is executed in horizontal direction

Waiting time of

The calculation formula of (2) is as follows:

in the formula (I), the compound is shown in the specification,

indicating camera d _m Transfer of detection tasks of the video stream to the horizontal execution node

Is delayedAnd is and

for an index function, the value of the index function is 1 when the parameter of the index function is true, and 0 otherwise.

The camera d _m To the horizontal execution node

Is delayed

The calculation formula of (2) is as follows:

in the formula (I), the compound is shown in the specification,

indicating camera d _m And a horizontal execution node

Bandwidth speed in between.

Construction of Camera d Using regression function _m Horizontal division point p of network corresponding to detection task of video stream ^m Partitioned transverse execution network

With respect to the network parameters and inputs and outputs, said camera d _m Detection task of video stream is executed in horizontal direction

Execution time on

The calculation formula of (2) is as follows:

in the formula (I), the compound is shown in the specification,

In the horizontal execution node

The output parameters of the last execution are output,

to represent

The regression parameters corresponding to the execution time are,

representing a horizontal execution network

Corresponds to the regression parameter of the execution time,

representing a horizontal execution network

In the horizontal execution node

The regression parameters of (a) above (b),

a constant term representing a time-regression fit is performed.

Representing horizontally executing nodes

The computing power of (a) is determined,

to calculate

The required computational power of the fitting function is regressed,

The network parameter of (2).

And a vertical execution node

Transmission delay therebetween

The calculation formula of (2) is as follows:

in the formula (I), the compound is shown in the specification,

representing horizontally executing nodes

And a vertical execution node

Bandwidth speed in between.

S3.2, constructing an action space, wherein the expression is as follows:

in the formula (I), the compound is shown in the specification,

indicating camera d _m The corresponding action of the video stream at time t is also called decision.

indicating camera d ₀ In the motion space at the time t,

indicating camera d _M-1 In the motion space at the time t,

indicating camera d ₀ In the state space at the time t,

indicating camera d _M-1 In the state space at the time t,

indicating camera d _m The reward function at the time of t,

Reflecting the generated task completion time difference of all the camera machines at the time t.

S4, the cloud utilizes an algorithm of Deep Reinforcement Learning (DRL) of a DDQN (double Deep Q network) and takes a minimized task processing time difference as a target function to construct a decision model of a split point execution device, and the solution of the Markov decision process is realized;

the expression of the objective function is:

in the formula, T ^Total Representing the maximum processing time difference of video analysis tasks periodically generated by a camera cluster, P representing a transverse division point set of a DNN model, and Y ₀ A set of transverse execution nodes Y representing a transverse execution network divided by each transverse division point in the set of transverse division points P ₁ Representing each horizontal in the set P of horizontal division pointsAnd a vertical execution node set of the vertical execution network divided to the division point.

The method for constructing the segmentation point execution equipment decision model comprises the following steps:

a, setting training round

Initializing a training period tau to 1;

b, continuously generating video streams by each camera at the time t, and obtaining the action of each camera according to the current DDQN network;

c, distributing the video stream to different horizontal execution nodes according to the action;

d, after each transverse execution node finishes the transverse division task, transmitting the intermediate parameters to the longitudinal execution node, and finally finishing the execution of the detection task;

for cross-camera tracking tasks, all cameras should work in concert, i.e. the simultaneously generated detection tasks should be "as synchronized as possible" within a certain threshold.

e, all the longitudinal execution nodes transmit respective target recognition results to the cloud, and the cloud finishes cross-camera multi-target tracking and obtains a final detection result;

f, the cloud simultaneously obtains the rewards of all the agents under the current action

And new state

And will be

Storing the data into an experience pool;

g, the cloud gets persistent from experience pool through experience replay

Training and updating a DDQN model of the multi-agent, and making a decision at the next moment t + 1;

as shown in fig. 5 and 6, the DDQN has an evaluation network and a target network, and eliminates the problem of overestimation of DQN by decoupling policy evaluation from action selection and Q of the next action, and the target network updates the weights from the evaluation network periodically. That is, DDQN is theta according to the parameter _t Selects the best action for the evaluation network and based on the parameters

The target network of (2) obtains a Q value. In distributed execution, the DDQN weights after centralized training are shared with all agents. Each agent has a homogenous state, action and reward space, which reduces the problem of the large number of weights that must be trained.

According to the method, each camera is modeled as an intelligent agent, and each camera

The corresponding agent is in accordance with the strategy pi and the current state

Selecting an action

Wherein S represents a set of all states, A represents a set of all actions, and actions

After interaction with the environment, the camera d _m Corresponding agent earning rewards

And transition to a new state

And through the duration reward R, learning strategy pi, making the accumulated reward G _t And (4) maximizing. The goal of the agent is to maximize

Where γ is the discount coefficient, R _t+1 Reward, Q (S), for obtaining at time t +1 _t+1 ) Denotes the Q value, argmax, obtained at time t +1 _a Q(S _t+1 ，a；θ _t ) Representing the action that maximizes the state action cost function,

representing parameters of the target network. Each agent observes the state at time t

The base station and the cloud follow FIFO (first in first out) principle, and need to cooperate to complete resource allocation decision of the multi-agent, namely transverse division point decision and calculation unloading decision, namely transverse execution node decision and longitudinal execution node decision. Finding an optimal strategy pi for real-time load balancing of multiple consecutive video analytics tasks in a distributed scenario ^* Is a challenge. The goal of the EI-ISS is to minimize the processing time difference of the video analysis tasks periodically generated by the camera cluster, and the EI-ISS not only considers minimizing the processing time of each camera video task, but also ensures that the time difference between the cameras completing the task under the same moment task is minimized.

S5, each monitoring terminal respectively inputs the generated video stream into a dividing point executing equipment decision model, uploads the video stream to a transverse executing node according to a transverse executing node decision generated by the dividing point executing equipment decision model, the transverse executing node completes transverse division and execution of the model by using the transverse dividing model, and sends network parameters after transverse execution to a longitudinal executing node, and the method comprises the following steps:

s5.1, each monitoring terminal inputs the generated video stream into a decision model of a segmentation point execution device, and the decision model of the segmentation point execution device generates a transverse segmentation point decision, a transverse execution node decision and a longitudinal execution node decision;

s5.2, the monitoring terminals respectively transmit the respective video streams to the corresponding transverse execution nodes according to the transverse execution node decision;

s5.3, the transverse execution node utilizes the transverse division model to divide the network model corresponding to the video stream according to the transverse division point decision, and transverse execution of the transverse execution network is completed;

and S5.4, the transverse execution node transmits the network parameters after transverse execution to the longitudinal execution node corresponding to the longitudinal execution node decision.

S6, the vertical execution node respectively uses the vertical division algorithm to perform vertical division on the vertical execution network, namely, the target identification of the MTMCT is completed, the method comprises the following steps:

s6.1, dividing the neural network of the longitudinal execution network into l x l fusion areas according to a l x l grid method;

s6.2, constructing a longitudinal execution network by utilizing a regression function

The relationship between the input and output of the network parameter, the corresponding expression, i.e. camera d _m Detection task of video stream is executed in vertical direction

Execution time on

The calculation formula of (2) is as follows:

in the formula (I), the compound is shown in the specification,

representing a horizontal execution network

In the horizontal execution node

The regression simulation parameters of the output intermediate parameters and the execution time of the longitudinal network after the execution is finished,

representing a longitudinally segmented network

Executing nodes in the vertical direction

Regression fitting parameters of the output parameters after completion of execution,

representing vertical execution networks

The regression-fit parameters of the parameters,

Executing nodes in the vertical direction

The output parameters of the last execution are output,

representing vertical execution networks

Executing nodes in the vertical direction

The above regression fit constant term parameters,

representing vertical execution nodes

The computing power of (a) is determined,

representing computations

The required computational power of the fitting function is regressed,

to represent

The regression of the parameters of (a) to (b),

The network parameter of (2).

S6.3, each vertical execution node

And respectively carrying out parallel computation on the received parameters output by the transverse execution nodes, and merging and outputting the computed results by utilizing a full connection layer.

And S7, the cloud receives the target identification results from each longitudinal execution node, cross-camera track matching is completed by adopting a Deepsort track matching algorithm, and finally a cross-camera multi-target tracking result is obtained.

According to the method and the device, regression and parameter fitting are respectively carried out on the computing resources of the camera, the edge device and the cloud, and corresponding dynamic environment representation is obtained. Then, in an online decision-making stage, a division point decision is carried out through the multi-agent according to a real dynamic network environment and user requirements, and finally a division result is obtained. The necessity of performing cross-camera, device decision model inference with segmentation points is set forth below:

as shown in fig. 1, a plurality of cameras are on and continuously generating a video stream. The camera requests to offload the video tracking task to the base station or the cloud node. And according to the decision result of the MADRL, the camera unloads the first part of the task to a transverse execution node, the node stores a pre-configuration file of the model, transversely divides the model and executes a transverse processing task. After that, the horizontal execution node needs to transmit the intermediate parameters to the node performing the vertical segmentation, and the vertical execution node accepts the network intermediate parameters and performs the vertical model segmentation and the parallel computation of the network according to the pre-configuration file. And uploading and storing the final target recognition result to a cloud center, and firstly performing camera internal tracking recognition on a single camera by the cloud center by using a DeepSort track matching algorithm. After the target recognition tasks are completed by the cameras, the cloud center completes track analysis and recognition among the cameras, and finally a tracking result is obtained. It can be seen that, in this process, the communication, calculation, and cache resources of the base station, that is, the edge and cloud systems, are all invoked, and therefore, in this process, the present application needs to be used to implement the joint allocation of the three-dimensional resources.

To demonstrate the superiority of the present application in a collaborative multitasking flow scenario, a greedy algorithm and a stochastic algorithm are compared with the present application, as shown in fig. 7. The Epsilon-Greedy algorithm makes optimal decisions based on current tasks, resource availability, and delay constraints. The discount factor and the minimum greedy rate are 0.0001 and 0.3, respectively. And the random algorithm carries out random decision, and does not consider the cooperative relationship between the edge devices. The MTMCT system requires coordinated cross-camera tracking, and the lengthy differences in task processing between cameras can result in loss and duplication of tracked objects. Therefore, based on a user-defined time threshold ρ, we have devised a metric to determine the success rate of collaboration between tasks. As can be seen from the figure, the cooperation rate of the DDQN gradually increases and converges, and finally reaches the cooperation time requirement provided by the user, and the cooperation rate of the greedy algorithm only fluctuates around 0.4, so that the cooperation superiority of the cross-camera tracking cooperation system is proved.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for establishing a distributed real-time intelligent monitoring system based on intelligent collaborative reasoning is characterized by comprising the following steps:

2. The method for establishing a distributed real-time intelligent monitoring system based on intelligent cooperative reasoning according to claim 1, wherein the step S2 comprises the following steps:

s2.1, constructing the deep neural network model into a directed acyclic graph;

And vertical execution network

The horizontal execution network

And vertical execution network

The expression of the network parameters of (a) is:

wherein n is 0 or 1, and when n is 0,

When n is 1,

The network parameters of (a) are set,

3. The method for establishing a distributed real-time intelligent monitoring system based on intelligent cooperative reasoning according to claim 1, wherein the step S3 comprises the following steps:

s3.1, constructing a state space, wherein the expression is as follows:

in the formula (I), the compound is shown in the specification,

indicating camera d _m In the execution node

And executing node

The delay in the transmission between the first and second,

indicating camera d _m Detecting task of video stream at execution node

indicating camera d _m Detecting task of video stream at execution node

The waiting time of the upper computer system is shorter,

representing base station and cloud in-phase dividing camera d _m Other than agent effectsThe state of the state comprises the number of tasks waiting to be executed, the task amount and the predicted completion time of the current task to be executed in the base station and the cloud, p ^m Presentation Camera d _m The network corresponding to the video stream of (a),

indicating camera d _m In the state at the time t,

epsilon represents the set of base stations and,

the representation of a cloud is shown,

is a collection of monitoring terminals;

s3.2, constructing an action space, wherein the expression is as follows:

in the formula (I), the compound is shown in the specification,

indicating camera d _m An action at time t;

in the formula, mu represents a penalty weight, rho represents the tolerance of the maximum time difference of all cameras completing the detection task at the time t, Z represents the maximum time tolerance of completing the detection task of a single camera,

indicating camera d ₀ In the motion space at the time t,

indicating camera d _M-1 In the motion space at the time t,

indicating camera d ₀ In the state space at the time t,

indicating camera d _M-1 In the state space at the time t,

indicating camera d _m The reward function at the time of t,

4. Method for establishing a distributed real-time intelligent monitoring system based on intelligent cooperative reasoning according to claim 3, wherein in step S3.1, the execution camera d _m Execution time of detection task of video stream

The calculation formula of (2) is as follows:

in the formula (I), the compound is shown in the specification,

indicating camera d _m To the horizontal execution node

The transmission delay of (1).

5. The method for establishing a distributed real-time intelligent monitoring system based on intelligent cooperative reasoning according to claim 4, wherein the camera d _m Detection task of video stream is executed in horizontal direction

Waiting time of

The calculation formula of (2) is as follows:

in the formula (I), the compound is shown in the specification,

Is delayed, and

is an index function.

6. The method for establishing the distributed real-time intelligent monitoring system based on intelligent cooperative reasoning according to claim 4, wherein the camera d is arranged _m To the horizontal execution node

Is delayed

The calculation formula of (2) is as follows:

in the formula (I), the compound is shown in the specification,

indicating camera d _m And a horizontal execution node

7. The intelligence-based system of claim 3The establishment method of the distributed real-time intelligent monitoring system capable of collaborative reasoning is characterized in that the camera d _m Detection task of video stream is executed in horizontal direction

Execution time on

The calculation formula of (2) is as follows:

in the formula (I), the compound is shown in the specification,

In the horizontal execution node

The output parameters of the last execution are output,

to represent

The regression parameters corresponding to the execution time are,

representing a horizontal execution network

Corresponds to the regression parameter of the execution time,

representing a horizontal execution network

In the horizontal execution node

The regression parameters of (a) above (b),

representing horizontally executing nodes

The computing power of (a) is determined,

to calculate

The required computational power of the fitting function is regressed,

The network parameter of (2).

8. The method for establishing a distributed real-time intelligent monitoring system based on intelligent cooperative reasoning according to claim 3, wherein the camera d _m The detection task of the video stream is executed on the horizontal execution node

And a vertical execution node

Transmission delay therebetween

The calculation formula of (2) is as follows:

in the formula (I), the compound is shown in the specification,

representing horizontally executing nodes

And a vertical execution node

The speed of the bandwidth in between is,

In the transverse directionNode point

And outputting the executed output parameters.

9. The method for establishing a distributed real-time intelligent monitoring system based on intelligent cooperative reasoning according to claim 1, wherein in step S4, the expression of the objective function is:

representing the shortest execution time for executing the detection task of the video stream of the camera dm,

is a collection of monitoring terminals.

10. The method for establishing a distributed real-time intelligent monitoring system based on intelligent cooperative reasoning according to claim 1, wherein the step S6 comprises the following steps: