CN113436165A

CN113436165A - Video image detection system based on artificial intelligence and detection method thereof

Info

Publication number: CN113436165A
Application number: CN202110701735.7A
Authority: CN
Inventors: 江蔚明; 朱冬伟; 苏宇; 于海伟; 周磊
Original assignee: Hefei Michael Thailand Mdt Infotech Ltd
Current assignee: Hefei Michael Thailand Mdt Infotech Ltd
Priority date: 2021-06-23
Filing date: 2021-06-23
Publication date: 2021-09-24

Abstract

The invention discloses a video image detection system based on artificial intelligence and a detection method thereof, wherein the system comprises a video intelligent monitoring and analyzing platform arranged at the rear end and a monitoring camera arranged at the front end, and the monitoring camera is connected with the video intelligent monitoring and analyzing platform; the monitoring camera is provided with a people stream data statistical template for reading people stream data of a warehousing target place, the monitoring camera is provided with a body temperature detection sensor, the people stream data statistical template is connected with the body temperature detection sensor, the people stream data statistical template is provided with a warehousing people stream non-temperature-measurement warning module, and the video intelligent monitoring analysis platform is accessed to on-line video stream data of each target place; the video intelligent monitoring and analyzing platform comprises an input end, a central processing unit and an output end; and an independent connection module is added between the central processing unit and the final output end, and the central processing unit is connected with a modeling unit.

Description

Video image detection system based on artificial intelligence and detection method thereof

Technical Field

The invention relates to the field of information technology detection, in particular to a video image detection system based on artificial intelligence and a detection method thereof.

Background

The existing power system analyzes video data of each target place in a centralized manner, and the main analysis data are 4 pieces of data including time standard data, wearing (color and wearing data) standard data, daily work (aging) standard data and business environment (work order) data. The applicant thus presents a solution: the intelligent routing inspection is carried out by utilizing an artificial intelligence scheme, so that problems are found and improved in time.

Disclosure of Invention

The invention aims to provide a video image detection system based on artificial intelligence and a detection method thereof so as to solve the problems in the background technology.

In order to achieve the above purpose, the invention adopts the technical scheme that: a video image detection system based on artificial intelligence comprises a video intelligent monitoring and analyzing platform arranged at the rear end and a monitoring camera arranged at the front end, wherein the monitoring camera is connected with the video intelligent monitoring and analyzing platform; the monitoring camera is provided with a people stream data statistical template for reading people stream data of a warehousing target place, the monitoring camera is provided with a body temperature detection sensor, the people stream data statistical template is connected with the body temperature detection sensor, the people stream data statistical template is provided with a warehousing people stream non-temperature-measurement warning module, and the video intelligent monitoring analysis platform is accessed to on-line video stream data of each target place; the video intelligent monitoring and analyzing platform comprises an input end, a central processing unit and an output end; and an independent connection module is added between the central processing unit and the final output end, and the central processing unit is connected with a modeling unit.

Furthermore, the system also comprises a timer and a target execution statistical unit which are arranged on the target area, wherein the monitoring camera at the front end of the timer and the target execution statistical unit is connected with the intelligent video monitoring and analyzing platform at the rear end.

Furthermore, an online report generation module and a display unit are arranged on the video intelligent monitoring and analyzing platform;

the monitoring camera is connected by a Multi-attribute Classification neural network (Multi-task Classification), 14 types of data extraction with subdivided attributes are carried out on the divided human body targets, and the 14 types of data comprise target age prejudgment, angles, sexes, bags, hats, whether articles are carried in front of the body, trousers, bags, shoes, jacket styles, glasses, masks, jacket colors and lower dress color data.

Still further, the executing of the target execution statistics unit includes: performing long data analysis on the service based on the statistics of the target execution statistical unit; and then, analyzing the passenger flow data of the new retail area based on the target execution statistical unit: and then performing a statistical unit on the specific customer data based on the target; and a specific client data non-execution alarm module is arranged on the target execution statistical unit.

Still further, the executing of the target execution statistics unit further includes: and then performing identification of the non-standard data by the statistical unit based on the target.

Further, the input end: the method comprises a Mosaic data enhancement, cmBN and SAT self-confrontation training connecting end; the central processing unit: based on the feature extraction network, CSPDarknet53 is adopted, a Mish activation function is used, and Dropblock is adopted in a regularization mode; the single connection module adopts an SPP module and an FPN + PAN structure, and in the SPP module, the maximum pooling mode of k ═ {1 × 1,5 × 5,9 × 9,13 × 13} is used, and then Concat operation is carried out on feature maps with different scales.

Still further, the video intelligent monitoring analysis platform further comprises an independent prediction box module, and the nms screened by the prediction box is changed into DIOU _ nms.

A video image detection method based on artificial intelligence comprises the following steps:

1) the monitoring analysis is based on a video intelligent monitoring analysis platform, a plurality of artificial intelligent analysis algorithms of target detection and identification and target tracking are adopted to analyze and process real-time video streaming or off-line video files, key information of massive videos is extracted, and classification and aggregation are carried out, so that the videos are searched and pushed in an intelligent mode;

2) the method comprises the steps of performing image description and analysis after an accessed video stream is sliced, and further performing analysis, identification, tracking, understanding and compression coding on the image;

3) performing 14 types of attribute-subdivided data extraction on the segmented object by using a Multi-attribute Classification neural network (Multi-task Classification), and subdividing the attributes of the person into at least 14 types of data;

the temperature of the entrance of the target is measured and identified by a body temperature detection sensor arranged on the monitoring camera.

Further, a video image detection method based on artificial intelligence comprises the following steps:

1) accessing data

Accessing online video stream data of each target through a video intelligent monitoring and analyzing platform;

2) target detection

The method comprises two processes of target frame prediction and target type classification, wherein the target detection comprises two parts of target identification and target position prediction by taking target object detection as an example;

selecting a Yolo algorithm in an One-stage algorithm for the target detection algorithm;

3) object segmentation

Carrying out image description and analysis after the video stream is cut into slices by utilizing a threshold-based segmentation method, a region-based segmentation method and an edge-based segmentation method, and further carrying out analysis, identification, tracking, understanding and compression coding on the images;

4) attribute extraction

A Multi-attribute Classification neural network (Multi-task Classification) for extracting 14 types of subdivision attribute data of the divided object; subdividing the attributes of the subject into at least 14 categories including age prediction, angle, gender, bag type, hat, presence of a wearer, pants, bags, shoes, jacket style, glasses, mask, jacket color, and under-garment color data;

5) target data tracking

And identifying implicit motion and morphological characteristic data to associate the detection object with the predicted target position by applying a Kalman-GRU method, and realizing direct end-to-end association through a GRU network to realize target tracking.

Furthermore, detecting and tracking the object entering the network point through the accessed monitoring camera picture, and recording the data statistics of the client entering the network point; through an accessed monitoring camera) picture, performing detection, feature extraction and area analysis methods on an object entering a website, identifying a working object or a client, and performing statistical data sequential analysis based on a target execution statistical unit;

model construction of a modeling unit on a video intelligent monitoring analysis platform: the method comprises the following steps:

1) and a target detection model:

the target detection comprises two processes of target frame prediction and target type classification, in each frame of a video, firstly, all targets which accord with the characteristics of the target are found out through a target detector, the predicted positions of the targets are generally marked by a frame (bounding box), a confidence coefficient is predicted for each possible target frame by using a classification model, and finally, a final detection result is generated according to the confidence coefficient and the frame position information;

the target detection algorithm includes three parts: detecting window selection, feature design and classifier design;

2) and associating the detected object with the predicted target location in multi-target tracking (MOT) data by utilizing implicit motion and morphological features simultaneously.

The invention has the technical effects that: the video intelligent monitoring and analyzing platform is accessed to the intelligent monitoring platform for monitoring video stream data of a target area, and completes the functions of feature labeling, real-time monitoring, intelligent analysis, early warning and online reporting. Firstly, various target site field data conditions are quantified in real time through video data monitoring, and target site data are displayed to develop a panorama; secondly, analyzing the service standard data condition of the target area through video data monitoring, and comprehensively controlling the service quality and effect situation data; thirdly, analyzing the data conditions of epidemic prevention and safety measures of the target area by monitoring video data, and comprehensively controlling and emergently disposing safety situation data of the target area; and fourthly, analyzing the pedestrian flow data of the new retail area of the target area through video data monitoring, and providing a decision for the efficient operation of the target area. The system adopts a plurality of artificial intelligence analysis methods of target detection and identification and target tracking to analyze and process the real-time video stream or off-line video file data, extracts key information of massive videos, and classifies and aggregates the key information so as to search and push the key information in an intelligent way. In a scene of a target site on the spot, the target is mounted in various ways and the illumination is varied. The problem of feature extraction and classification capability of the target detection algorithm needs to be solved. Through comparison and test, the target detection algorithm selects and adopts a Yolo algorithm in an One-stage algorithm, and the structural part of the convolutional neural network is adapted based on the Yolo algorithm, so that the robustness and the generalization capability of the whole network are greatly enhanced. The invention can effectively record the data generation process and the alarm generation process through the actual monitoring of the application scene. Effective data support is provided for the management of the target place, so that the service quality of the target place is improved.

Drawings

FIG. 1 is a schematic diagram of the system configuration of the present invention.

Detailed Description

Based on the data analysis of the existing power system on the target site shortage, people flow data statistics, warehouse-in non-temperature-measuring alarm, service overtime alarm and new retail execution data statistics functions are realized to help better analyze target site execution conditions, better guarantee the health and safety of the object particularly in the epidemic situation period and better service. The system has the advantages that the operation monitoring of the target places, the special data identification and detection and the functional requirements are subdivided, so that the target places can run more efficiently, and all the target places are comprehensively managed.

Specifically, target supervision and detection are performed, and the monitoring camera 2 based on the front end:

1) monitoring the execution time of a target place and carrying out time early warning;

2) detecting and identifying the target ground object dressing, wherein the identification data comprises target clothing and shoe data;

3) detecting whether the target ground work object is on duty or off duty;

4) counting aiming at the target ground people flow;

in addition, the system data connection interface of the invention provides an API data interface.

The video intelligent monitoring and analyzing platform 1 of the invention is accessed to the intelligent monitoring platform for monitoring video stream data of a target area, and completes the functions of feature labeling, real-time monitoring, intelligent analysis, early warning and alarming and online reporting. Firstly, various target site field data conditions are quantified in real time through video data monitoring, and target site data are displayed to develop a panorama; secondly, analyzing the service standard data condition of the target area through video data monitoring, and comprehensively controlling the service quality and effect situation data; thirdly, analyzing the data conditions of epidemic prevention and safety measures of the target area by monitoring video data, and comprehensively controlling and emergently disposing safety situation data of the target area; and fourthly, analyzing the pedestrian flow data of the new retail area of the target area through video data monitoring, and providing a decision for the efficient operation of the target area.

The monitoring analysis relies on a video intelligent monitoring analysis platform 1, and adopts various artificial intelligent analysis algorithms of target detection and identification and target tracking to analyze and process real-time video streams or off-line video files, extract key information of massive videos, and classify and aggregate the key information so as to search and push the key information in an intelligent manner. Performing image description and analysis after the video stream is cut by using a segmentation method based on a threshold value, a segmentation method based on a region and a segmentation method based on an edge, and further performing analysis, identification, tracking, understanding and compression coding on the image; the Multi-attribute Classification neural network (Multi-task Classification) is used for extracting 14 types of attribute-subdivided data of the segmented object target, and subdividing the attributes of the person into at least 14 data categories including data of age prejudgment, angle, gender, bag type, hat, whether the person is wearing objects, trousers, bag, shoes, jacket style, glasses, mask, jacket color and under-garment color.

The people flow data statistical template 2 a: and detecting the object entering the network point through the picture of the accessed monitoring camera 2, tracking (through an algorithm), and recording the data statistics of the client entering the network point. And (3) carrying out detection, feature extraction and region analysis methods on the object entering the website through the accessed picture of the monitoring camera 2, and identifying the working object or the client. The service execution duration data counted by the target execution counting unit 4 is analyzed, objects entering a network point are detected through the images of the accessed monitoring cameras 2, the handling duration is analyzed through the residence time of the clients in the area of the new retail counter, the threshold value is set according to the national network power supply service standard, and each piece of data is different based on the data type and does not exceed 5 minutes and 20 minutes respectively. The temperature measurement and identification of the entrance of the target are performed by the body temperature detection sensor 3 set on the monitoring camera 2: and detecting the body temperature of all the objects entering the target place, and judging whether the body temperature detection is carried out or not through the warehouse entry people flow non-temperature-detection alarm module 2 b. And then, the statistical unit 4 is executed based on the target to analyze the passenger flow data of the new retail area: and analyzing the new retail area of the target place, including the display counter area of the new retail, according to the time period, counter area, people flow and residence time data, and giving corresponding data or reports. And then the statistical unit 4 is executed based on the target to analyze the specific customer data: the method comprises the steps of extracting analysis data of old people and children at a target place, and sending out an alarm and recording when the old people or the children appear in a non-business area or stay at the target place for a long time (exceeding a set threshold value). And then the identification of the non-standard data by the statistical unit 4 is performed based on the target: and recording the data monitoring of the irregular behaviors of the working object in a data form of the abnormal execution behaviors for the secondary confirmation of the background object. And carrying out multi-dimensional statistics on the irregular behavior information in different time periods, different types and different places.

The specific implementation process of the system of the invention comprises the following steps:

1) accessing data

And accessing online video stream data of each target through a video intelligent monitoring and analyzing platform (1).

2) Target detection

The method comprises two processes of target frame prediction and target type classification, and taking target object detection as an example, the object detection comprises two parts of object identification and object position prediction.

3) Object segmentation

And describing and analyzing the image after the video stream is cut by utilizing a segmentation method based on a threshold value, a segmentation method based on a region and a segmentation method based on an edge, and further analyzing, identifying, tracking, understanding and compressing and coding the image.

4) Attribute extraction

The Multi-attribute Classification neural network (Multi-task Classification) extracts 14 types of segment attribute data from the segmented target. The attributes of the subject are subdivided into at least 14 categories including age prediction, angle, gender, bag type, hat, presence, pants, bags, shoes, jacket style, glasses, mask, jacket color, and under-garment color data.

5) Target data tracking

The system adopts a plurality of artificial intelligence analysis methods of target detection and identification and target tracking to analyze and process the real-time video stream or off-line video file data, extracts key information of massive videos, and classifies and aggregates the key information so as to search and push the key information in an intelligent way. In a scene of a target site on the spot, the target is mounted in various ways and the illumination is varied. The problem of feature extraction and classification capability of the target detection algorithm needs to be solved. Through comparison and test, the target detection algorithm selects and adopts a Yolo algorithm in an One-stage algorithm, and the structural part of the convolutional neural network is adapted based on the Yolo algorithm, so that the robustness and the generalization capability of the whole network are greatly enhanced.

In particular, the method comprises the following steps of,

(1) input terminal 11: the system comprises a Mosaic data enhancement end, a cmBN end and an SAT self-confrontation training connecting end. The Mosaic scheme used for data enhancement is a CutMix data enhancement scheme, and 4 pictures are adopted and spliced through random scaling, random cutting and random arrangement. The detection data set is greatly enriched, and especially random scaling increases many small targets, so that the network robustness is better.

(2) The central processing unit 12: the CSPDarknet53 is adopted based on the feature extraction network, and the network feature extraction capability is improved. And adapting a leak _ relu activation function, using a Mish activation function, and simultaneously adopting Dropblock in a regularization mode. Through the Mish activation function, the capability of extracting network features can be increased. Meanwhile, in order to relieve the phenomenon of overfitting in the training process, the previous Dropout mode is adapted, and a more robust Dropblock regularization mode is used instead.

(3) Individual connection module 14 (tack layer): and a Neck layer is added between the central processing unit 12 and the final output end 13, and the feature extraction capability of the network is improved by adopting an SPP module and an FPN + PAN structure. In the SPP module, the largest pooling mode of k ═ {1 × 1,5 × 5,9 × 9,13 × 13} is used, and then the feature maps of different scales are subjected to Concat operation.

(4) Individual prediction block module 18 (budget layer): the main improvement of the output 13 is the Loss function CIOU _ Loss of the target frame during training, and the nms of the prediction frame filtering is changed into DIOU _ nms. The nms is mainly used for screening a prediction box, and a common nms mode is generally adopted in a common target detection algorithm. The goal detection of the system adopts DIOU _ nms for nms, and the more important goal overlapped in the middle can be detected.

Through the target detection algorithm, the head and shoulders of the object in the image can be positioned and classified.

On the basis, the number of the clients and the staff objects on site in the video is monitored and counted in real time, and auxiliary basis is provided for judging whether the working objects are in the operating specification or not according to the position information.

The data mining process of the invention comprises the following steps:

the whole process of model construction is summarized as follows:

determining data requirements;

data preparation (including data acquisition, data quality check, data exploration, data cleaning, data preprocessing and data table generation);

thirdly, completing different alarm analysis based on various monitoring data of the target in the data table;

selecting a training sample target place service prediction model based on different alarm analysis results;

model tuning and verifying.

The data quality of the model is in the reliability aspect, the video data come from a target monitoring system (a monitoring camera 2), and although abnormal data are collected to a certain extent, the subsequent data mining is not influenced. In the aspect of integrity, the video data has an interruption problem, but the subsequent data processing can be recovered for perfection.

Preprocessing of data: data scrubbing is employed to find and correct errors in the data. In order to meet the requirements of a target algorithm, transformation processing is carried out on original data, and a characteristic variable construction and aggregation method is used for data transformation based on understanding of the data and data demand analysis.

Data exploration: and accessing a monitoring video stream of a test point target site into the video intelligent monitoring and analyzing platform 1, and completing the functions of feature labeling, real-time monitoring, intelligent analysis, early warning and alarming and online reporting. Firstly, various target site field data conditions are quantified in real time through video data monitoring, and target site data are displayed to develop a panorama; secondly, analyzing the service standard data condition of the target place through video data monitoring, and comprehensively controlling the client service quality and effect situation data; thirdly, analyzing the data conditions of epidemic prevention and safety measures of the target area by monitoring video data, and comprehensively controlling and emergently disposing safety situation data of the target area; and fourthly, analyzing the people flow heat data of the new retail area of the target area through video data monitoring, and efficiently operating and deciding the target area.

Data distribution analysis and data analysis conclusion:

and performing normality check on the data in the video data table of the deleted element-independent field by using a normality check method. The original hypothesis of the normality test is that the test case conforms to normal distribution, whether the object distribution conforms to a specified region is judged, and if the test result does not conform to a set rule, the original hypothesis is rejected, namely the data distribution does not conform to the normal distribution.

Model construction

1. A target detection model:

the target detection comprises two processes of target frame prediction and target type classification, and taking the object detection as an example, the object detection comprises two parts of object identification and object position prediction. In each frame of the video, firstly, all targets which accord with the object characteristics are found out through a target detector, the predicted object positions are generally marked by borders (bounding boxes), a confidence coefficient is predicted for each possible target border by using a classification model, and finally, a final detection result is generated according to the confidence coefficient and the border position information. Object detection of an object is essentially the localization of multiple object objects, i.e. the localization of multiple object objects in a picture.

The target detection algorithm includes three parts: detection window selection, feature design, and classifier design. [ Here, the applicant has extended: since the face detection method based on Adaboost was proposed by Viola Jones in 2001, the target detection algorithm goes through the traditional frame of artificially designed features and a shallow classifier, and goes through the frame of End-To-End based on big data and a deep neural network, and the target detection technology gradually matures. In fact, the traditional method is the mainstream in the industry before 2013, and the research object is based on the traditional feature optimization detection method. With the huge advantage of the deep learning-based AlexNet gaining the image classification champion in 2012, the whole academic and industrial fields gradually begin to use deep learning for detection, because the deep neural network can automatically extract features and the classification recall rate and accuracy rate are remarkably improved compared with the traditional method. The latest target detection algorithm is basically a deep learning-based method, starting from the earliest RCNN, and is improved by nearly one time from the initial recording of 49.6% of accuracy. The precision of the traditional method, such as SVM-HOG, reaches 31.5, which is much lower than that of deep learning. In order to overcome the development and evolution of deep learning, in the two detection concepts of Region pro-poral Based and Non-Region pro-poral Based, the appearance of the following models has attracted extensive attention in the industry and become a milestone:

Region Proposal Based，

R-CNN and SPPNet were introduced in-2014

Fast R-CNN and Faster R-CNN were introduced in 2015

R-FCN was introduced in-2016

Non-Region Proposal Based，

Yolo and SSD were introduced in 2015

Yolo9000 appeared in 2016

It should be noted that, a model that is not the latest model is not necessarily the first choice in practical application, and some models increase the accuracy by several percent but consume a large amount of computing resources compared with the previous models, so that in practical application, a proper model needs to be selected by comprehensively considering the cost performance. ]

The target detection is the first step and the key step of the whole video analysis application, and massive basic data are collected by a high-gravity target detection algorithm in multiple dimensions of different scenes, different angles, different light rays and different seasons. And the algorithm is continuously iterated and optimized, so that the high generalization capability is achieved. The solidified algorithm model can reach a level far beyond the average level of industry in a new scene without any adaptive optimization. If in the final scene, after the corresponding adaptive algorithm optimization is carried out, the accuracy of target detection can be improved to a certain degree.

Target tracking algorithm

The main objective of multi-target tracking (MOT) is to automatically track an object of interest in a video and to restore a motion trajectory of the tracked object by using spatial, temporal and visual feature information in video data. MOT technology itself is well suited to handle complex scenes with large numbers of targets and has great potential in camera surveillance, behavioral analysis, autopilot/navigation, and smart city construction. [ Here, the applicant has extended: early target tracking techniques focused primarily on single target tracking, which generally utilized conventional computer vision techniques to perform feature engineering and then utilized these artificial features to build classification models. Due to the rise of the end-to-end learning concept in this year, related researches start to be transferred to deep learning, for example, a Discriminant Correlation Filters (DCFs) is embedded into a deep neural network as a calculation module, and in addition, an Efficient Convolution Operator (ECO) method reaches the most advanced level in the visual tracking benchmark game of 2017, and the performance of an accelerated version ECO-HC based on manual features on a single GPU reaches 60 fps.

Currently, the most mainstream frames in the MOT field are basically tracked by using detection frames (tracking-by-detection), and the main idea is to link the detection frames in different frames by various data association methods, and the general flow is as follows: the real position of the object in each frame is estimated by a detector, and then the detection results of a plurality of frames are synthesized to dynamically generate or delete the motion trail of multiple targets. Under this genre, various tracking strategies emerge, and the tracking algorithm based on kalman prediction is the most popular way.

From the original single-target tracking to the current multi-target tracking, from offline to online, from computation-intensive to real-time, from the traditional computer vision method to the current deep learning method, the tracking technology has undergone great development, although the limitations still exist, for example, if the extremely complex scene containing a large amount of shelters and frequent crossing of moving targets is faced, the MOT technology at the present stage is far from satisfactory. This problem is not well solved, in part because these features are too complex and diverse, for example, the features of the detected object can vary greatly from location to location in the display, which makes it difficult to associate the detected object with the predicted target location, especially when there is a large variation in light, scale or occlusion. ]

The system of the present invention designs a new method of GRU-based data correlation within the framework of Kalman prediction, called the Kalman-GRU method, by associating the detection object with the predicted target position by using implicit motion and morphological features simultaneously, i.e., a direct end-to-end correlation is achieved through the GRU network, rather than explicitly weighting the motion parameters and morphological features as before. Experimental results show that the effect of the implicit data association method is superior to that of the previous display data association method, and extra calculation cost is not introduced.

The invention can effectively record the data generation process and the alarm generation process through the actual monitoring of the application scene. Effective data support is provided for the management of the target place, so that the service quality of the target place is improved.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, substitutions and improvements made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A video image detection system based on artificial intelligence comprises a video intelligent monitoring and analyzing platform (1) arranged at the rear end and a monitoring camera (2) arranged at the front end, wherein the monitoring camera (2) is connected with the video intelligent monitoring and analyzing platform (1); the monitoring camera (2) on set up and read people's flow data statistics template (2a) of the people's flow data of warehouse entry target place, set up body temperature detection sensor (3) on monitoring camera (2), people's flow data statistics template (2a) be connected with body temperature detection sensor (3), set up warehouse entry people flow on the people's flow data statistics template (2a) and do not measure the temperature alarm module (2b), its characterized in that: the video intelligent monitoring and analyzing platform (1) is accessed to the online video stream data of each target place; the video intelligent monitoring and analyzing platform (1) comprises an input end (11), a central processing unit (12) and an output end (13); and a separate connection module (14) is added between the central processing unit (12) and the final output end (13), and the central processing unit (12) is connected with a modeling unit (15).

2. The artificial intelligence based video image detection system according to claim 1, further comprising a timer (5) and a target execution statistical unit (4) which are disposed at a target site, wherein the timer (5), the monitoring camera (2) at the front end of the target execution statistical unit (4) and the video intelligent monitoring analysis platform (1) at the rear end are respectively connected.

3. The artificial intelligence based video image detection system according to claim 1 or 2, wherein an online report generation module (16) and a presentation unit (17) are arranged on the video intelligent monitoring and analysis platform (1);

the monitoring camera (2) is connected by adopting a Multi-attribute Classification neural network (Multi-task Classification), 14 kinds of data extraction with subdivided attributes are carried out on the divided human body targets, and the 14 kinds of data comprise target age prejudgment, angle, gender, bag type, hat, whether articles are carried in front of the body, trousers, bags, shoes, jacket style, glasses, mask, jacket color and under-garment color data.

4. An artificial intelligence based video image detection system according to claim 2, wherein the execution of the target execution statistics unit (4) comprises: service execution duration data analysis based on the target execution statistics unit (4) statistics; and then, a statistical unit (4) is executed based on the target to analyze the passenger flow data of the new retail area: and then performing a statistical unit (4) analysis of the specific customer data based on the target; a specific client data non-execution alarm module (4a) is arranged on the target execution statistical unit (4).

5. An artificial intelligence based video image detection system according to claim 4, wherein the execution of the target execution statistics unit (4) further comprises: and performing the identification of the non-specification data by the statistical unit (4) based on the target.

6. An artificial intelligence based video image detection system according to claim 1, characterized in that said input (11): the method comprises a Mosaic data enhancement, cmBN and SAT self-confrontation training connecting end; the central processing unit (12): based on the feature extraction network, CSPDarknet53 is adopted, a Mish activation function is used, and Dropblock is adopted in a regularization mode; the individual connection module (14) adopts an SPP module and an FPN + PAN structure, and in the SPP module, the maximum pooling mode of k ═ {1 × 1,5 × 5,9 × 9,13 × 13} is used, and then Concat operation is carried out on feature maps with different scales.

7. The system according to claim 1 or 6, wherein the video intelligent monitoring and analyzing platform (1) further comprises a single prediction box module (18) for changing the nms of the prediction box filtering to DIOU _ nms.

8. A video image detection method based on artificial intelligence is characterized by comprising the following steps:

1) the monitoring analysis relies on a video intelligent monitoring analysis platform (1), and various artificial intelligent analysis algorithms of target detection and identification and target tracking are adopted to analyze and process real-time video streams or off-line video files, extract key information of massive videos, and classify and aggregate the key information so as to search and push the key information in an intelligent manner;

the temperature of the entrance of the target is measured and identified by a body temperature detection sensor (3) arranged on the monitoring camera (2).

9. A video image detection method based on artificial intelligence is characterized by comprising the following steps:

1) accessing data

Accessing online video stream data of each target through a video intelligent monitoring and analyzing platform (1);

2) target detection

3) object segmentation

4) attribute extraction

5) target data tracking

10. The video image detection system based on artificial intelligence of claim 8 or 9, characterized in that, the object entering the network site is detected and tracked through the image of the monitor camera (2) and the data statistics of the client entering the network site is recorded; detecting, characteristic extracting and area analyzing methods are carried out on the objects entering the network points through the accessed pictures of the monitoring camera (2), the working objects or clients are identified, and the statistical data are sequentially analyzed based on the target execution statistical unit (4);

model construction of a modeling unit (15) on a video intelligent monitoring analysis platform (1): the method comprises the following steps:

1) and a target detection model: