CN116260990B

CN116260990B - AI asynchronous detection and real-time rendering method and system for multipath video streams

Info

Publication number: CN116260990B
Application number: CN202310549343.2A
Authority: CN
Inventors: 宋艳枝; 金晨曦
Original assignee: Hefei Gauss Intelligent Technology Co ltd
Current assignee: Hefei Gauss Intelligent Technology Co ltd
Priority date: 2023-05-16
Filing date: 2023-05-16
Publication date: 2023-07-28
Anticipated expiration: 2043-05-16
Also published as: CN116260990A

Abstract

The invention relates to the technical field of multipath video stream processing, which solves the technical problems of low rendering performance, remarkable resource occupation and delay in serial synchronous processing in the prior art, in particular to an AI asynchronous detection and real-time rendering method and system of multipath video streams, and the method comprises the following steps: s1, a plurality of cameras are subjected to frame splitting through a video decoding module to obtain each frame of image, and the latest frame of image is recorded as an Nth frame of image; s2, the AI detection engine module determines a corresponding detection algorithm model according to the self performance evaluation frame extraction interval, and the Nth frame image obtained from the S1 is sent into the detection algorithm model for relevant detection and identification at fixed time periods. The invention adopts the parallel asynchronous rendering scheme to greatly reduce the rendering delay of the real-time video stream, processes video data with the same order of magnitude, loads the same algorithm model, has lower resource occupancy rate and greatly improves the resource utilization efficiency.

Description

AI asynchronous detection and real-time rendering method and system for multipath video streams

Technical Field

The invention relates to the technical field of multipath video stream processing, in particular to an AI asynchronous detection and real-time rendering method and system for multipath video streams.

Background

The existing method for performing AI detection and rendering on most real-time video streams generally obtains video streams from a camera through OpenCV/FFmpeg for video decoding and frame extraction, converts the obtained video frames into images (such as RGB) in a corresponding format, sends a frame of frame images into an AI model for detection, classification and identification of related targets, draws detection results output by the AI model on each frame of images through OpenCV, and finally performs video encoding on each frame of images through FFmpeg to output video streams in an H264/H265 format.

However, the above AI detection and rendering have certain drawbacks, mainly:

1. each step of the scheme is serial synchronization, the output FPS depends on the reasoning speed of the AI model, and the rendering performance is low;

2. one AI model can only serve one path of video stream, the multi-path video rendering capability is lacked, if multi-path video is required to be rendered, a plurality of AI models are required to be loaded, and the resource occupation is obviously increased;

3. the real-time video frames acquired from the camera consume a lot of time in the steps of image transcoding, AI detection, video coding and the like, and generate great delay, so that the real-time images seen by the rendering terminal and the images at the current moment in the real scene have great deviation in time.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an AI asynchronous detection and real-time rendering method and system for multipath video streams, which solve the technical problems of low rendering performance, remarkable resource occupation and delay in serial synchronous processing in the prior art.

In order to solve the technical problems, the invention provides the following technical scheme: an AI asynchronous detection and real-time rendering method of multipath video stream comprises the following steps:

s1, a plurality of cameras are subjected to frame splitting through a video decoding module to obtain each frame of image, and the latest frame of image is recorded as an Nth frame of image;

s2, the AI detection engine module determines a corresponding detection algorithm model according to the self performance evaluation frame extraction interval, sends the Nth frame image obtained from the S1 into the detection algorithm model for relevant detection and identification every fixed time period, and pushes the detection identification result into a corresponding result queue to obtain the detection result of the Nth frame image;

s3, an inter-frame prediction module predicts the position and the state of an N-frame image detection frame according to the historical position and the displacement speed of the detection frame by adopting a Kalman filtering algorithm according to the N-frame image and the N-1-frame image which has output the detection result before, so as to obtain N-frame image data;

s4, the image drawing and rendering module draws a final video frame according to the information of the Nth frame image data and the detection result of the Nth frame image;

and S5, encoding and streaming the final video frame through a video encoding module and a real-time streaming module, wherein the user terminal can see the real-time picture after AI rendering.

Further, in step S2, the AI detection engine module determines a corresponding detection algorithm model according to the self performance evaluation frame interval, and the specific process includes the following steps:

s21, an AI detection engine module automatically reads the output frame rate of a plurality of paths of video streams and marks the output frame rate as r, and the plurality of paths of video streams as N;

s22, the AI detection engine module judges whether the processing frame rate FPS of the current detection algorithm model aiming at the current input stream is enough to cope with N paths of video streams or not according to the loaded detection algorithm model, namely judges whether the value of the processing frame rate FPS of the current detection algorithm model is larger than or equal to N r or not;

if yes, judging whether the value of the processing frame rate FPS of the current detection algorithm model is more than or equal to 2 x N x r;

if the value of the processing frame rate FPS is more than or equal to 2 x N x r, unloading half of the current detection algorithm model, stopping the related threads, and reducing the resource occupancy rate of the AI detection engine module;

if the value of the processing frame rate FPS is less than 2×n×r, step S23 is entered;

if not, adopting the current detection algorithm model and ending;

s23, judging whether the residual resources of the AI detection engine module can load a new detection algorithm model;

if yes, starting a new thread, and loading a new detection algorithm model to jointly process N paths of video streams;

if not, the output frame rate r of each path of video is reduced, and the frame extraction interval of the current detection algorithm model aiming at each path of video stream is adjusted until the input of N paths of video streams can be dealt with.

Further, in step S3, the position and state of the nth frame image detection frame are predicted, and the specific process includes the following steps:

s31, initializing a state vector X and a state covariance matrix P;

s32, defining a state transition matrix F, an observation matrix H, an observation noise covariance matrix R and a process noise covariance matrix Q;

s33, predicting the position of the target detection frame；

S34, prediction state covariance matrix；

S35, the state update obtains a corrected state vector X and a state covariance matrix P.

Further, in step S31,

the state vector X represents the state of the target, including position and speed information, and an initial value of the state vector needs to be set according to the detection result of the first frame at the beginning;

the expression of the state vector X is:

；

the state covariance matrix P represents uncertainty of state estimation, and may be initially defined as a unit matrix;

the expression of the state covariance matrix P is:

；

in the above-mentioned method, the step of,、/>representing the coordinates of the upper left corner of the detection frame,/-)>、/>Representing the coordinates of the lower right corner of the detection frame,/-)>、、/>、/>Indicating the velocities at the upper left and lower right corner coordinates, respectively.

Further, in step S35, the state update obtains the corrected state vector X and the state covariance matrix P, and the specific process includes the following steps:

s351, obtaining a new observed value Z from the target detection algorithm and the upper left corner coordinate of the target detection frame in the new frameAnd lower right corner coordinates->；

S352, according to the predicted state vectorAnd an observation matrix H, calculating an observation residual Y, wherein the observation residual Y represents the difference between a predicted value and an actual observed value, and a predicted state vector +.>I.e. position +.in step S33>；

S353, according to the predicted state covariance matrixThe Kalman gain K is calculated from the observation matrix H and the observation noise covariance matrix R, and is a weight matrix for balancingUncertainty of the predicted value and the observed value;

s354, correcting the predicted state vector using the Kalman gain K and the observation residual YObtaining an updated state vector X;

s355, correcting the predicted state covariance matrix by using the Kalman gain K and the observation matrix HObtaining an updated state covariance matrix P;

s36, repeating the steps S33-S35.

The technical scheme also provides a system for realizing the AI asynchronous detection and real-time rendering method, which comprises the following steps:

the video decoding module decodes the RTSP/RTMP stream by using a video processing tool FFmpeg to obtain frames of images;

the AI detection engine module is used for automatically dynamically evaluating and dynamically loading different detection algorithm models according to the load of the current AI detection engine module according to the number of accessed video streams, the frame rate of an original video stream and the category of a configuration detection algorithm model;

the inter-frame prediction module is used for predicting the position and the state of an N-th frame image according to the detection result of the N-1-th frame image based on a Kalman filtering algorithm to obtain a prediction result, and correcting the prediction result according to the detection result of the N-th frame image to generate N-th frame image data;

the image drawing and rendering module adopts a ffmpeg filter module to render coordinate information detected by the detection algorithm model onto a video frame in a rectangular frame or mask area mode;

the video coding module is used for coding the final video frame;

and the real-time streaming module is used for streaming the final video frame.

Further, the detection algorithm models are packaged into a dynamic library, and the AI detection engine module utilizes dlopen/dlcalose to dynamically load/unload related detection algorithm models when in operation, wherein the detection algorithm models comprise a target detection model and an instance segmentation model.

Further, the object detection model is used for detecting a certain object in real time, the object detection model comprises a YOLO series and a fast RCNN, the instance segmentation model is used for segmenting the boundary of a certain object, and the instance segmentation model comprises a Yolact and a Mask RCNN.

Further, the AI detection engine module includes:

image coding and decoding: for converting the video frames of H264/H265 into RGB images;

resource monitor: the system is used for monitoring the utilization rate of the system disk, the memory, the CPU and the GPU resources;

algorithm scheduler: the detection algorithm model is used for dynamically loading or unloading related detection algorithm models according to system resource occupation, system performance and corresponding scenes;

reasoning engine: the detection algorithm model is used for running and outputting;

the algorithm module: is used for preprocessing, AI detection and post-processing of images.

By means of the technical scheme, the invention provides the AI asynchronous detection and real-time rendering method and system for the multipath video streams, which at least have the following beneficial effects:

1. the invention adopts the parallel asynchronous rendering scheme to greatly reduce the rendering delay of the real-time video stream, processes video data with the same order of magnitude, loads the same algorithm model, has lower resource occupancy rate and greatly improves the resource utilization efficiency.

2. The invention disassembles the traditional serial processing flow into a plurality of modules such as image encoding and decoding, AI detection engine, inter-frame prediction, image drawing and rendering, and the like, fully utilizes the computing resources and the parallel processing capability of each module, greatly reduces the video delay and reduces the system resource occupation on the premise of ensuring the output FPS is unchanged.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 is a flow chart of the AI asynchronous detection and real-time rendering method of the present invention;

FIG. 2 is a network structure diagram of a conventional object detection model of the present invention;

FIG. 3 is a network block diagram of a common example segmentation model of the present invention;

FIG. 4 is a schematic diagram of a Kalman filtering algorithm of the present invention;

FIG. 5 is a diagram showing an example of predicting the position and state of an N-th frame image detection frame by using a Kalman filtering algorithm according to the present invention;

FIG. 6 is a block diagram of an AI detection engine module of the invention;

fig. 7 is a block diagram of the AI asynchronous detection and real-time rendering system of the present invention.

In the figure: 10. a video decoding module; 20. an AI detection engine module; 30. an inter prediction module; 40. an image drawing and rendering module; 50. a video encoding module; 60. and the real-time plug flow module.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description. Therefore, the implementation process of how to apply the technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented.

Those of ordinary skill in the art will appreciate that all or a portion of the steps in implementing an embodiment method may be performed by a program to instruct related hardware and, thus, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Referring to fig. 1-7, a specific implementation manner of the present embodiment is shown, in which a parallel asynchronous rendering scheme is adopted in the present embodiment to greatly reduce the rendering delay of a real-time video stream, and process video data with the same order of magnitude, load the same algorithm model, so that the resource occupancy rate of the present embodiment is lower, and the resource utilization efficiency is greatly improved.

Referring to fig. 1, the present embodiment provides an AI asynchronous detection and real-time rendering method for multi-path video streams, which includes the following steps:

s1, the multiple cameras are subjected to frame disassembly through a video decoding module to obtain each frame image, the latest frame image is recorded as an N frame image, and it is noted that videos shot by the multiple cameras are multiple video streams in the embodiment, and the video decoding module is used for carrying out frame disassembly on the multiple video streams to obtain each frame image.

S2, the AI detection engine module determines a corresponding detection algorithm model according to the self performance evaluation frame extraction interval, the N-th frame image obtained from the S1 is sent into the detection algorithm model for relevant detection and identification at fixed time intervals, and the detection identification result is pushed into a corresponding result queue to obtain the N-th frame image detection result.

In step S2, the number and types of the detection algorithm models are not limited, but are merely used as various related models for realizing target detection and identification in the present embodiment, and all the detection algorithm models are encapsulated into a dynamic library and integrated in the AI detection engine module, meanwhile, the AI detection engine module can load different detection algorithm models according to the number of accessed video paths, the frame rate and the load (CPU utilization, memory utilization, GPU utilization, etc.) of the original video flow, meanwhile, the detection algorithm models comprise a target detection model and an instance segmentation model, the target detection model is used for detecting a certain target in real time, the target detection model comprises YOLO series and Faster RCNN, the instance segmentation model is used for segmenting the boundary of a certain target object, and the instance segmentation model comprises Yolact and Mask RCNN.

In order to fully disclose the object detection model in the prior art, please refer to fig. 2 and fig. 3, which are respectively a network structure diagram of a common object detection model and a network structure diagram of an example segmentation model.

In this embodiment, in order to clearly and completely describe the implementation method of step S2, how to implement the AI detection engine module to determine the corresponding detection algorithm model according to the self performance evaluation frame extraction interval is described, and the specific process includes the following steps:

if not, adopting the current detection algorithm model and ending;

Specifically, if the processing frame rate FPS of the current detection algorithm model is insufficient to support the N-path video stream, it is determined whether the resources left by the AI detection engine module can load the new detection algorithm model, and the resources left by the AI detection engine module are the CPU utilization, the memory, and the video memory. If the new detection algorithm model can be loaded, a new thread is started, and the new detection algorithm model is loaded to jointly process the N paths of video streams. If the AI detection engine module is in the performance bottleneck and can not load a new detection algorithm model, the output frame rate r of each path of video stream is reduced, and the frame extraction interval of the current detection algorithm model for each path of video stream is adjusted until the input of N paths of video streams can be dealt with.

S3, an inter-frame prediction module predicts the position and the state of an N-th frame image detection frame according to the historical position and the displacement speed of the detection frame by adopting a Kalman filtering algorithm and combining historical detection data according to the N-th frame image and the N-1-th frame image which has output a detection result before, so as to obtain N-th frame image data;

referring to fig. 4, a schematic diagram of a kalman filter algorithm is shown, and for the purpose of describing step S3, the kalman filter algorithm is a mathematical algorithm for a dynamic system for estimating state variables. The kalman filtering algorithm can be used for predicting the position and the speed of an object in the next frame image, and the basic idea is to consider the position and the speed of the object as state variables and predict the position and the state of the object in the next frame image by using the position of a detection frame in the previous frame image as prior information.

Referring to fig. 5, in order to predict the position and the state of the nth frame image detection frame by using the kalman filtering algorithm, in step S3, the specific process includes the following steps:

s31, initializing a state vector X and a state covariance matrix P;

the expression of the state vector X is:

；

the expression of the state covariance matrix P is:

；

state transition matrix F: describe the state variable at time intervalsHow the inner changes, in this embodiment, the position coordinates are at speed +.>And->And (3) a change.

；

Observation matrix H: the state vector is mapped to the observation space, and our observations only include position coordinates, so that the observation matrix H retains only the position information in the state vector.

；

Observing a noise covariance matrix R: uncertainty in observed noise is described. In this example we assume that the observation noise is independent at each coordinate, so that R is a diagonal matrix.

；

Process noise covariance matrix Q: uncertainty of process noise is described.

；

In the above formula, σ is the standard deviation,the G matrix at time T is represented, in this embodiment 0.01, G being a matrix similar to the state transition matrix F but containing only the velocity term, and the process noise covariance matrix Q can be obtained by calculating the G matrix, which is defined in this embodiment as follows:

；

in the above-mentioned method, the step of,representing a time interval.

S33, predictive object detectionMeasuring the position of a frame；

；

In this embodiment, the state vector X includes position and velocity information expressed as:t denotes the moment, the state transition matrix F already takes into account the influence of speed on position, the position thus calculated +.>Including predicted position and velocity information. To obtain the predicted coordinates, only the position +.>The predicted upper left corner coordinate is +.>The lower right corner coordinate is->。

S34, prediction state covariance matrix；

；

In the above formula, F is a state transition matrix of the system, which describes the evolution rule of the system state from the last moment to the current moment; p represents a state covariance matrix; q is a covariance matrix representing noise, which describes the noise of the input control and the noise during state evolution;at the time of TState transition matrix of (a); by this formula we can get the predicted covariance matrix +.>For a subsequent state update step.

In step S35, the state update obtains the corrected state vector X and the state covariance matrix P, and the specific process includes the following steps:

The new observation value Z is calculated by the following formula:

；

in the above formula, T represents matrix transposition.

S352, according to the predicted state vectorAnd an observation matrix H, calculating an observation residual Y, wherein the observation residual Y represents the difference between a predicted value and an actual observed value, and a predicted state vector +.>I.e. the position of the target detection frame predicted in step S33 +.>；

；

In the above-mentioned method, the step of,representing the predicted position of the target detection frame.

S353, according to the predicted state covariance matrixCalculating a Kalman gain K by using the observation matrix H and the observation noise covariance matrix R, wherein the Kalman gain K is a weight matrix used for balancing the uncertainty of the predicted value and the observed value;

；

in the above-mentioned method, the step of,representing the observation matrix at time T.

the correction formula is:

；

the correction formula is:

；

in the above formula, I represents an identity matrix, that is, a square matrix in which all elements on a diagonal line are 1 and the remaining elements are 0. In this embodiment, I represents an identity matrix of 8×8. In the state covariance matrix update formula, (I-k×h) represents a linear combination of the kalman gain K and the observation matrix H, and is multiplied by the predicted state covariance matrix P', to obtain an updated state covariance matrix P. The meaning of the formula is to correct the predicted state covariance matrix P' with the error estimate (kalman gain K) calculated in the kalman filter algorithm to obtain a more accurate state estimate.

S36, repeating the steps S33-S35.

Because the AI detection engine module needs to consume a certain time for detection and identification, the output result of the AI detection engine module is lagged, and the inter-frame prediction module is required to predict the position of the detection frame of the N-th frame image according to the N-th frame image and the N-1-th frame image which has output the detection result before, and the historical detection data is combined, so as to obtain the N-th frame image data according to the historical position and displacement speed of the detection frame.

And S4, the image drawing and rendering module draws a final video frame by utilizing the information of the Nth frame image data and the detection result of the Nth frame image.

The image drawing and rendering module in the embodiment abandons the OpenCV image drawing scheme adopted in most schemes, and directly uses the ffmpeg filter module to draw the detection result on the video frame. The advantage of this is that the time consumption caused by the conversion of the ffmpeg frame and the OpenCV image format is avoided, the time of encoding and decoding is saved, and the efficiency is improved.

More specifically, AVFilterGraph using FFmpeg creates a filter that is used to render coordinate information detected by the detection algorithm model onto a video frame in the form of a rectangular box or mask. For example, when a rectangular frame needs to be drawn, an avfilter_init_subject function is used to set the upper left corner coordinates (x and y), width and height (width and height) and color (color) of the rectangular frame, and the AI detection engine module can dynamically adjust parameters of the rectangular frame according to the detection result of each frame, so that the purposes of dynamic drawing and rendering are achieved.

According to the embodiment, aiming at a scene of multi-path video merging AI detection rendering, an asynchronous parallel optimization scheme is innovatively provided, a traditional serial processing flow is disassembled into a plurality of modules such as image encoding and decoding, AI detection engines, inter-frame prediction, image drawing and rendering, and the like, calculation resources and parallel processing capacity of each module are fully utilized, video delay is greatly reduced on the premise that output FPS is unchanged, and system resource occupation is reduced.

The system for the AI asynchronous detection and real-time rendering method according to the present embodiment corresponds to the AI asynchronous detection and real-time rendering method according to the foregoing embodiment, and since the AI asynchronous detection and real-time rendering system according to the present embodiment corresponds to the AI asynchronous detection and real-time rendering method according to the foregoing embodiment, implementation of the foregoing AI asynchronous detection and real-time rendering method is also applicable to the AI asynchronous detection and real-time rendering system according to the present embodiment, and will not be described in detail in the present embodiment.

Referring to fig. 7, a block diagram of an AI asynchronous detection and real-time rendering system according to the present embodiment is shown, where the AI asynchronous detection and real-time rendering system is composed of a video decoding module 10, an AI detection engine module 20, an inter-frame prediction module 30, an image drawing and rendering module 40, a video encoding module 50 and a real-time push module 60, and specifically, the communication connection between the modules on the premise of realizing their respective functions is shown in fig. 7.

The video decoding module 10 decodes the RTSP/RTMP stream using the video processing tool FFmpeg to obtain frames of images.

The AI detection engine module 20 automatically and dynamically loads different detection algorithm models according to dynamic evaluation of the load CPU utilization rate, memory utilization rate, GPU utilization rate and the like of the current AI detection engine module 20 according to the number of accessed video streams, the frame rate of an original video stream and the category of the configuration detection algorithm model.

All detection algorithm models are packaged into a dynamic library, the AI detection engine module 20 can utilize dlopen/dlcalose to dynamically load/unload related detection algorithm models at running, the detection algorithm models comprise a target detection model and an instance segmentation model, the target detection model is used for detecting a certain target in real time, the target detection model comprises a YOLO series and a Faster RCNN, the instance segmentation model is used for segmenting the boundary of a certain target object, and the instance segmentation model comprises a Yolat and a Mask RCNN.

Referring to fig. 6, in order to illustrate a block diagram of the AI detection engine module, the AI detection engine module may be regarded as an algorithm scheduler, responsible for loading/unloading the algorithm, responsible for delivering video frames into the algorithm for relevant AI detection and output, and delivered into the inter-frame prediction module for video frame drawing.

The AI detection engine module 20 includes:

In the AI detection engine module 20 of the present embodiment, no specific detection algorithm model or structure is concerned, and the AI detection engine module 20 already supports a commonly used target detection model, such as fast RCNN/Yolo, and also supports a commonly used instance segmentation algorithm, such as MASK RCNN, yolo, etc., which input 2D/3D images and output coordinates based on pixel points.

The inter-frame prediction module 30 predicts the position and state of the nth frame image based on the detection result of the nth frame image based on the kalman filter algorithm to obtain a prediction result, and corrects the prediction result according to the detection result of the nth frame image to generate the nth frame image data.

The image drawing and rendering module 40 renders the coordinate information detected by the detection algorithm model to the video frame in a rectangular frame or mask area manner by using a ffmpeg filter module.

The image rendering and rendering module 40 creates a filter using AVFilterGraph of FFmpeg, which is used to render the coordinate information detected by the detection algorithm model onto the video frame in the form of a rectangular frame or mask area. For example, when a rectangular frame needs to be drawn, the avfilter_init_subject function is used to set the upper left corner coordinates x and y, width and height and color of the rectangular frame, and the AI detection engine module can dynamically adjust parameters of the rectangular frame according to the detection result of each frame, so as to achieve the purposes of dynamic drawing and rendering.

The video encoding module 50 is used for encoding the final video frame;

the real-time push module 60 is used for pushing the final video frame.

In this embodiment, the image drawing and rendering module 40 and the AI detection engine module 20 are separated into different threads to operate independently, so that the serial synchronous processing flow is changed into an asynchronous parallel processing mode, and the rendering performance is greatly improved.

And the AI detection engine module 20 can be connected into multiple paths of different video streams, and dynamically load/unload related detection algorithm models according to the load state, so that FPS (source driver interface) of each path of video push stream is met as much as possible, and the resource utilization efficiency is improved.

The AI detection engine module 20 directly draws the detection result into the video stream in a filter mode, so that the time consumed by steps of image transcoding, AI detection, video coding and the like is saved, and the time delay of the real-time video stream is greatly reduced.

And the result detected by the AI detection engine module 20 is used as a prediction target, a correlation filter is used for predicting a future detection result according to historical detection data, and the predicted target is corrected by using the result of AI detection of the next frame, so that the delay caused by the need of waiting for the output of the AI detection result in video rendering is avoided, the inter-frame prediction is realized, and the real-time performance of video rendering is improved.

It should be noted that, in the system provided in the foregoing embodiment, when implementing the functions thereof, only the division of the foregoing functional modules is used as an example, in practical application, the foregoing functional allocation may be implemented by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the system and method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the system and method embodiments are detailed in the method embodiments, which are not repeated herein.

The following is performance comparison data for the same 8-way RTSP video stream (FPS: 25) run on an RTX 3080ti graphics card using the yolvs 5s model:

；

according to the comparison data, the parallel asynchronous rendering scheme provided by the embodiment greatly reduces the rendering delay of the real-time video stream, processes video data with the same order of magnitude, loads the same algorithm model, has lower resource occupancy rate and greatly improves the resource utilization efficiency.

The foregoing embodiments have been presented in a detail description of the invention, and are presented herein with a particular application to the understanding of the principles and embodiments of the invention, the foregoing embodiments being merely intended to facilitate an understanding of the method of the invention and its core concepts; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. An AI asynchronous detection and real-time rendering method of multipath video stream is characterized by comprising the following steps:

in step S2, the AI detection engine module determines a corresponding detection algorithm model according to the self performance evaluation frame interval, and the specific process includes the following steps:

if not, adopting the current detection algorithm model and ending;

if not, reducing the output frame rate r of each path of video, and adjusting the frame extraction interval of the current detection algorithm model aiming at each path of video stream until the input of N paths of video streams can be dealt with;

2. The AI asynchronous detection and real-time rendering method of claim 1, wherein: in step S3, the position and state of the nth frame image detection frame are predicted, and the specific process includes the following steps:

s31, initializing a state vector X and a state covariance matrix P;

s33, predicting the position of the target detection frame；

S34, prediction state covariance matrix；

3. The AI asynchronous detection and real-time rendering method of claim 2, wherein: in the step S31 of the process of the present invention,

the expression of the state vector X is:

；

the expression of the state covariance matrix P is:

；

in the above-mentioned method, the step of,、/>representing the coordinates of the upper left corner of the detection frame,/-)>、/>Representing the coordinates of the lower right corner of the detection frame,/-)>、/>、、/>Indicating the velocities at the upper left and lower right corner coordinates, respectively.

4. The AI asynchronous detection and real-time rendering method of claim 2, wherein: in step S35, the state update obtains the corrected state vector X and the state covariance matrix P, and the specific process includes the following steps:

s36, repeating the steps S33-S35.

5. A system for implementing the AI asynchronous detection and real-time rendering method of any of the preceding claims 1-4, characterized in that the system comprises:

a video decoding module (10), wherein the video decoding module (10) decodes the RTSP/RTMP stream by using a video processing tool FFmpeg to obtain frames of images;

the AI detection engine module (20), the AI detection engine module (20) is used for dynamically evaluating and dynamically loading different detection algorithm models according to the load of the current AI detection engine module (20) according to the number of accessed video streams, the frame rate of an original video stream and the category of a configuration detection algorithm model;

the inter-frame prediction module (30), the inter-frame prediction module (30) predicts the position and the state of an N-th frame image according to the detection result of the N-1-th frame image based on a Kalman filtering algorithm to obtain a prediction result, and corrects the prediction result according to the detection result of the N-th frame image to generate N-th frame image data;

the image drawing and rendering module (40), the image drawing and rendering module (40) adopts a ffmpeg filter module to render the coordinate information detected by the detection algorithm model to a video frame in a rectangular frame or mask area mode;

a video encoding module (50), the video encoding module (50) being configured to encode a final video frame;

and the real-time streaming module (60) is used for streaming the final video frames.

6. The system according to claim 5, wherein: the AI detection engine module (20) utilizes dlopen/dlcalose to dynamically load/unload related detection algorithm models when in operation, and the detection algorithm models comprise a target detection model and an instance segmentation model.

7. The system according to claim 6, wherein: the target detection model is used for detecting a certain target in real time, the target detection model comprises a YOLO series and a fast RCNN, the instance segmentation model is used for segmenting the boundary of a certain target object, and the instance segmentation model comprises a yoact RCNN and a Mask RCNN.

8. The system according to claim 5, wherein: the AI detection engine module (20) includes: