CN113436165A - Video image detection system based on artificial intelligence and detection method thereof - Google Patents
Video image detection system based on artificial intelligence and detection method thereof Download PDFInfo
- Publication number
- CN113436165A CN113436165A CN202110701735.7A CN202110701735A CN113436165A CN 113436165 A CN113436165 A CN 113436165A CN 202110701735 A CN202110701735 A CN 202110701735A CN 113436165 A CN113436165 A CN 113436165A
- Authority
- CN
- China
- Prior art keywords
- target
- data
- video
- monitoring
- detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 96
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 20
- 238000012544 monitoring process Methods 0.000 claims abstract description 86
- 238000004458 analytical method Methods 0.000 claims abstract description 42
- 238000012545 processing Methods 0.000 claims abstract description 15
- 230000036760 body temperature Effects 0.000 claims abstract description 11
- 238000000034 method Methods 0.000 claims description 62
- 230000008569 process Effects 0.000 claims description 19
- 230000011218 segmentation Effects 0.000 claims description 15
- 238000000605 extraction Methods 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 11
- 238000013461 design Methods 0.000 claims description 7
- 230000000877 morphologic effect Effects 0.000 claims description 7
- 238000010276 construction Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 6
- 239000011521 glass Substances 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 5
- 230000006835 compression Effects 0.000 claims description 5
- 238000007906 compression Methods 0.000 claims description 5
- 238000013145 classification model Methods 0.000 claims description 4
- 238000007405 data analysis Methods 0.000 claims description 4
- 238000013075 data extraction Methods 0.000 claims description 4
- 238000011176 pooling Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 2
- 238000009529 body temperature measurement Methods 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 9
- 238000013135 deep learning Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 5
- 238000009826 distribution Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 230000006399 behavior Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000002372 labelling Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000002265 prevention Effects 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000007418 data mining Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000032683 aging Effects 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 238000013501 data transformation Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000001422 normality test Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000005201 scrubbing Methods 0.000 description 1
- 238000012882 sequential analysis Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- PICXIOQBANWBIZ-UHFFFAOYSA-N zinc;1-oxidopyridine-2-thione Chemical class [Zn+2].[O-]N1C=CC=CC1=S.[O-]N1C=CC=CC1=S PICXIOQBANWBIZ-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/12—Edge-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/136—Segmentation; Edge detection involving thresholding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20021—Dividing image into blocks, subimages or windows
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a video image detection system based on artificial intelligence and a detection method thereof, wherein the system comprises a video intelligent monitoring and analyzing platform arranged at the rear end and a monitoring camera arranged at the front end, and the monitoring camera is connected with the video intelligent monitoring and analyzing platform; the monitoring camera is provided with a people stream data statistical template for reading people stream data of a warehousing target place, the monitoring camera is provided with a body temperature detection sensor, the people stream data statistical template is connected with the body temperature detection sensor, the people stream data statistical template is provided with a warehousing people stream non-temperature-measurement warning module, and the video intelligent monitoring analysis platform is accessed to on-line video stream data of each target place; the video intelligent monitoring and analyzing platform comprises an input end, a central processing unit and an output end; and an independent connection module is added between the central processing unit and the final output end, and the central processing unit is connected with a modeling unit.
Description
Technical Field
The invention relates to the field of information technology detection, in particular to a video image detection system based on artificial intelligence and a detection method thereof.
Background
The existing power system analyzes video data of each target place in a centralized manner, and the main analysis data are 4 pieces of data including time standard data, wearing (color and wearing data) standard data, daily work (aging) standard data and business environment (work order) data. The applicant thus presents a solution: the intelligent routing inspection is carried out by utilizing an artificial intelligence scheme, so that problems are found and improved in time.
Disclosure of Invention
The invention aims to provide a video image detection system based on artificial intelligence and a detection method thereof so as to solve the problems in the background technology.
In order to achieve the above purpose, the invention adopts the technical scheme that: a video image detection system based on artificial intelligence comprises a video intelligent monitoring and analyzing platform arranged at the rear end and a monitoring camera arranged at the front end, wherein the monitoring camera is connected with the video intelligent monitoring and analyzing platform; the monitoring camera is provided with a people stream data statistical template for reading people stream data of a warehousing target place, the monitoring camera is provided with a body temperature detection sensor, the people stream data statistical template is connected with the body temperature detection sensor, the people stream data statistical template is provided with a warehousing people stream non-temperature-measurement warning module, and the video intelligent monitoring analysis platform is accessed to on-line video stream data of each target place; the video intelligent monitoring and analyzing platform comprises an input end, a central processing unit and an output end; and an independent connection module is added between the central processing unit and the final output end, and the central processing unit is connected with a modeling unit.
Furthermore, the system also comprises a timer and a target execution statistical unit which are arranged on the target area, wherein the monitoring camera at the front end of the timer and the target execution statistical unit is connected with the intelligent video monitoring and analyzing platform at the rear end.
Furthermore, an online report generation module and a display unit are arranged on the video intelligent monitoring and analyzing platform;
the monitoring camera is connected by a Multi-attribute Classification neural network (Multi-task Classification), 14 types of data extraction with subdivided attributes are carried out on the divided human body targets, and the 14 types of data comprise target age prejudgment, angles, sexes, bags, hats, whether articles are carried in front of the body, trousers, bags, shoes, jacket styles, glasses, masks, jacket colors and lower dress color data.
Still further, the executing of the target execution statistics unit includes: performing long data analysis on the service based on the statistics of the target execution statistical unit; and then, analyzing the passenger flow data of the new retail area based on the target execution statistical unit: and then performing a statistical unit on the specific customer data based on the target; and a specific client data non-execution alarm module is arranged on the target execution statistical unit.
Still further, the executing of the target execution statistics unit further includes: and then performing identification of the non-standard data by the statistical unit based on the target.
Further, the input end: the method comprises a Mosaic data enhancement, cmBN and SAT self-confrontation training connecting end; the central processing unit: based on the feature extraction network, CSPDarknet53 is adopted, a Mish activation function is used, and Dropblock is adopted in a regularization mode; the single connection module adopts an SPP module and an FPN + PAN structure, and in the SPP module, the maximum pooling mode of k ═ {1 × 1,5 × 5,9 × 9,13 × 13} is used, and then Concat operation is carried out on feature maps with different scales.
Still further, the video intelligent monitoring analysis platform further comprises an independent prediction box module, and the nms screened by the prediction box is changed into DIOU _ nms.
A video image detection method based on artificial intelligence comprises the following steps:
1) the monitoring analysis is based on a video intelligent monitoring analysis platform, a plurality of artificial intelligent analysis algorithms of target detection and identification and target tracking are adopted to analyze and process real-time video streaming or off-line video files, key information of massive videos is extracted, and classification and aggregation are carried out, so that the videos are searched and pushed in an intelligent mode;
2) the method comprises the steps of performing image description and analysis after an accessed video stream is sliced, and further performing analysis, identification, tracking, understanding and compression coding on the image;
3) performing 14 types of attribute-subdivided data extraction on the segmented object by using a Multi-attribute Classification neural network (Multi-task Classification), and subdividing the attributes of the person into at least 14 types of data;
the temperature of the entrance of the target is measured and identified by a body temperature detection sensor arranged on the monitoring camera.
Further, a video image detection method based on artificial intelligence comprises the following steps:
1) accessing data
Accessing online video stream data of each target through a video intelligent monitoring and analyzing platform;
2) target detection
The method comprises two processes of target frame prediction and target type classification, wherein the target detection comprises two parts of target identification and target position prediction by taking target object detection as an example;
selecting a Yolo algorithm in an One-stage algorithm for the target detection algorithm;
3) object segmentation
Carrying out image description and analysis after the video stream is cut into slices by utilizing a threshold-based segmentation method, a region-based segmentation method and an edge-based segmentation method, and further carrying out analysis, identification, tracking, understanding and compression coding on the images;
4) attribute extraction
A Multi-attribute Classification neural network (Multi-task Classification) for extracting 14 types of subdivision attribute data of the divided object; subdividing the attributes of the subject into at least 14 categories including age prediction, angle, gender, bag type, hat, presence of a wearer, pants, bags, shoes, jacket style, glasses, mask, jacket color, and under-garment color data;
5) target data tracking
And identifying implicit motion and morphological characteristic data to associate the detection object with the predicted target position by applying a Kalman-GRU method, and realizing direct end-to-end association through a GRU network to realize target tracking.
Furthermore, detecting and tracking the object entering the network point through the accessed monitoring camera picture, and recording the data statistics of the client entering the network point; through an accessed monitoring camera) picture, performing detection, feature extraction and area analysis methods on an object entering a website, identifying a working object or a client, and performing statistical data sequential analysis based on a target execution statistical unit;
model construction of a modeling unit on a video intelligent monitoring analysis platform: the method comprises the following steps:
1) and a target detection model:
the target detection comprises two processes of target frame prediction and target type classification, in each frame of a video, firstly, all targets which accord with the characteristics of the target are found out through a target detector, the predicted positions of the targets are generally marked by a frame (bounding box), a confidence coefficient is predicted for each possible target frame by using a classification model, and finally, a final detection result is generated according to the confidence coefficient and the frame position information;
the target detection algorithm includes three parts: detecting window selection, feature design and classifier design;
2) and associating the detected object with the predicted target location in multi-target tracking (MOT) data by utilizing implicit motion and morphological features simultaneously.
The invention has the technical effects that: the video intelligent monitoring and analyzing platform is accessed to the intelligent monitoring platform for monitoring video stream data of a target area, and completes the functions of feature labeling, real-time monitoring, intelligent analysis, early warning and online reporting. Firstly, various target site field data conditions are quantified in real time through video data monitoring, and target site data are displayed to develop a panorama; secondly, analyzing the service standard data condition of the target area through video data monitoring, and comprehensively controlling the service quality and effect situation data; thirdly, analyzing the data conditions of epidemic prevention and safety measures of the target area by monitoring video data, and comprehensively controlling and emergently disposing safety situation data of the target area; and fourthly, analyzing the pedestrian flow data of the new retail area of the target area through video data monitoring, and providing a decision for the efficient operation of the target area. The system adopts a plurality of artificial intelligence analysis methods of target detection and identification and target tracking to analyze and process the real-time video stream or off-line video file data, extracts key information of massive videos, and classifies and aggregates the key information so as to search and push the key information in an intelligent way. In a scene of a target site on the spot, the target is mounted in various ways and the illumination is varied. The problem of feature extraction and classification capability of the target detection algorithm needs to be solved. Through comparison and test, the target detection algorithm selects and adopts a Yolo algorithm in an One-stage algorithm, and the structural part of the convolutional neural network is adapted based on the Yolo algorithm, so that the robustness and the generalization capability of the whole network are greatly enhanced. The invention can effectively record the data generation process and the alarm generation process through the actual monitoring of the application scene. Effective data support is provided for the management of the target place, so that the service quality of the target place is improved.
Drawings
FIG. 1 is a schematic diagram of the system configuration of the present invention.
Detailed Description
Based on the data analysis of the existing power system on the target site shortage, people flow data statistics, warehouse-in non-temperature-measuring alarm, service overtime alarm and new retail execution data statistics functions are realized to help better analyze target site execution conditions, better guarantee the health and safety of the object particularly in the epidemic situation period and better service. The system has the advantages that the operation monitoring of the target places, the special data identification and detection and the functional requirements are subdivided, so that the target places can run more efficiently, and all the target places are comprehensively managed.
Specifically, target supervision and detection are performed, and the monitoring camera 2 based on the front end:
1) monitoring the execution time of a target place and carrying out time early warning;
2) detecting and identifying the target ground object dressing, wherein the identification data comprises target clothing and shoe data;
3) detecting whether the target ground work object is on duty or off duty;
4) counting aiming at the target ground people flow;
in addition, the system data connection interface of the invention provides an API data interface.
The video intelligent monitoring and analyzing platform 1 of the invention is accessed to the intelligent monitoring platform for monitoring video stream data of a target area, and completes the functions of feature labeling, real-time monitoring, intelligent analysis, early warning and alarming and online reporting. Firstly, various target site field data conditions are quantified in real time through video data monitoring, and target site data are displayed to develop a panorama; secondly, analyzing the service standard data condition of the target area through video data monitoring, and comprehensively controlling the service quality and effect situation data; thirdly, analyzing the data conditions of epidemic prevention and safety measures of the target area by monitoring video data, and comprehensively controlling and emergently disposing safety situation data of the target area; and fourthly, analyzing the pedestrian flow data of the new retail area of the target area through video data monitoring, and providing a decision for the efficient operation of the target area.
The monitoring analysis relies on a video intelligent monitoring analysis platform 1, and adopts various artificial intelligent analysis algorithms of target detection and identification and target tracking to analyze and process real-time video streams or off-line video files, extract key information of massive videos, and classify and aggregate the key information so as to search and push the key information in an intelligent manner. Performing image description and analysis after the video stream is cut by using a segmentation method based on a threshold value, a segmentation method based on a region and a segmentation method based on an edge, and further performing analysis, identification, tracking, understanding and compression coding on the image; the Multi-attribute Classification neural network (Multi-task Classification) is used for extracting 14 types of attribute-subdivided data of the segmented object target, and subdividing the attributes of the person into at least 14 data categories including data of age prejudgment, angle, gender, bag type, hat, whether the person is wearing objects, trousers, bag, shoes, jacket style, glasses, mask, jacket color and under-garment color.
The people flow data statistical template 2 a: and detecting the object entering the network point through the picture of the accessed monitoring camera 2, tracking (through an algorithm), and recording the data statistics of the client entering the network point. And (3) carrying out detection, feature extraction and region analysis methods on the object entering the website through the accessed picture of the monitoring camera 2, and identifying the working object or the client. The service execution duration data counted by the target execution counting unit 4 is analyzed, objects entering a network point are detected through the images of the accessed monitoring cameras 2, the handling duration is analyzed through the residence time of the clients in the area of the new retail counter, the threshold value is set according to the national network power supply service standard, and each piece of data is different based on the data type and does not exceed 5 minutes and 20 minutes respectively. The temperature measurement and identification of the entrance of the target are performed by the body temperature detection sensor 3 set on the monitoring camera 2: and detecting the body temperature of all the objects entering the target place, and judging whether the body temperature detection is carried out or not through the warehouse entry people flow non-temperature-detection alarm module 2 b. And then, the statistical unit 4 is executed based on the target to analyze the passenger flow data of the new retail area: and analyzing the new retail area of the target place, including the display counter area of the new retail, according to the time period, counter area, people flow and residence time data, and giving corresponding data or reports. And then the statistical unit 4 is executed based on the target to analyze the specific customer data: the method comprises the steps of extracting analysis data of old people and children at a target place, and sending out an alarm and recording when the old people or the children appear in a non-business area or stay at the target place for a long time (exceeding a set threshold value). And then the identification of the non-standard data by the statistical unit 4 is performed based on the target: and recording the data monitoring of the irregular behaviors of the working object in a data form of the abnormal execution behaviors for the secondary confirmation of the background object. And carrying out multi-dimensional statistics on the irregular behavior information in different time periods, different types and different places.
The specific implementation process of the system of the invention comprises the following steps:
1) accessing data
And accessing online video stream data of each target through a video intelligent monitoring and analyzing platform (1).
2) Target detection
The method comprises two processes of target frame prediction and target type classification, and taking target object detection as an example, the object detection comprises two parts of object identification and object position prediction.
3) Object segmentation
And describing and analyzing the image after the video stream is cut by utilizing a segmentation method based on a threshold value, a segmentation method based on a region and a segmentation method based on an edge, and further analyzing, identifying, tracking, understanding and compressing and coding the image.
4) Attribute extraction
The Multi-attribute Classification neural network (Multi-task Classification) extracts 14 types of segment attribute data from the segmented target. The attributes of the subject are subdivided into at least 14 categories including age prediction, angle, gender, bag type, hat, presence, pants, bags, shoes, jacket style, glasses, mask, jacket color, and under-garment color data.
5) Target data tracking
And identifying implicit motion and morphological characteristic data to associate the detection object with the predicted target position by applying a Kalman-GRU method, and realizing direct end-to-end association through a GRU network to realize target tracking.
The system adopts a plurality of artificial intelligence analysis methods of target detection and identification and target tracking to analyze and process the real-time video stream or off-line video file data, extracts key information of massive videos, and classifies and aggregates the key information so as to search and push the key information in an intelligent way. In a scene of a target site on the spot, the target is mounted in various ways and the illumination is varied. The problem of feature extraction and classification capability of the target detection algorithm needs to be solved. Through comparison and test, the target detection algorithm selects and adopts a Yolo algorithm in an One-stage algorithm, and the structural part of the convolutional neural network is adapted based on the Yolo algorithm, so that the robustness and the generalization capability of the whole network are greatly enhanced.
In particular, the method comprises the following steps of,
(1) input terminal 11: the system comprises a Mosaic data enhancement end, a cmBN end and an SAT self-confrontation training connecting end. The Mosaic scheme used for data enhancement is a CutMix data enhancement scheme, and 4 pictures are adopted and spliced through random scaling, random cutting and random arrangement. The detection data set is greatly enriched, and especially random scaling increases many small targets, so that the network robustness is better.
(2) The central processing unit 12: the CSPDarknet53 is adopted based on the feature extraction network, and the network feature extraction capability is improved. And adapting a leak _ relu activation function, using a Mish activation function, and simultaneously adopting Dropblock in a regularization mode. Through the Mish activation function, the capability of extracting network features can be increased. Meanwhile, in order to relieve the phenomenon of overfitting in the training process, the previous Dropout mode is adapted, and a more robust Dropblock regularization mode is used instead.
(3) Individual connection module 14 (tack layer): and a Neck layer is added between the central processing unit 12 and the final output end 13, and the feature extraction capability of the network is improved by adopting an SPP module and an FPN + PAN structure. In the SPP module, the largest pooling mode of k ═ {1 × 1,5 × 5,9 × 9,13 × 13} is used, and then the feature maps of different scales are subjected to Concat operation.
(4) Individual prediction block module 18 (budget layer): the main improvement of the output 13 is the Loss function CIOU _ Loss of the target frame during training, and the nms of the prediction frame filtering is changed into DIOU _ nms. The nms is mainly used for screening a prediction box, and a common nms mode is generally adopted in a common target detection algorithm. The goal detection of the system adopts DIOU _ nms for nms, and the more important goal overlapped in the middle can be detected.
Through the target detection algorithm, the head and shoulders of the object in the image can be positioned and classified.
On the basis, the number of the clients and the staff objects on site in the video is monitored and counted in real time, and auxiliary basis is provided for judging whether the working objects are in the operating specification or not according to the position information.
The data mining process of the invention comprises the following steps:
the whole process of model construction is summarized as follows:
determining data requirements;
data preparation (including data acquisition, data quality check, data exploration, data cleaning, data preprocessing and data table generation);
thirdly, completing different alarm analysis based on various monitoring data of the target in the data table;
selecting a training sample target place service prediction model based on different alarm analysis results;
model tuning and verifying.
The data quality of the model is in the reliability aspect, the video data come from a target monitoring system (a monitoring camera 2), and although abnormal data are collected to a certain extent, the subsequent data mining is not influenced. In the aspect of integrity, the video data has an interruption problem, but the subsequent data processing can be recovered for perfection.
Preprocessing of data: data scrubbing is employed to find and correct errors in the data. In order to meet the requirements of a target algorithm, transformation processing is carried out on original data, and a characteristic variable construction and aggregation method is used for data transformation based on understanding of the data and data demand analysis.
Data exploration: and accessing a monitoring video stream of a test point target site into the video intelligent monitoring and analyzing platform 1, and completing the functions of feature labeling, real-time monitoring, intelligent analysis, early warning and alarming and online reporting. Firstly, various target site field data conditions are quantified in real time through video data monitoring, and target site data are displayed to develop a panorama; secondly, analyzing the service standard data condition of the target place through video data monitoring, and comprehensively controlling the client service quality and effect situation data; thirdly, analyzing the data conditions of epidemic prevention and safety measures of the target area by monitoring video data, and comprehensively controlling and emergently disposing safety situation data of the target area; and fourthly, analyzing the people flow heat data of the new retail area of the target area through video data monitoring, and efficiently operating and deciding the target area.
Data distribution analysis and data analysis conclusion:
and performing normality check on the data in the video data table of the deleted element-independent field by using a normality check method. The original hypothesis of the normality test is that the test case conforms to normal distribution, whether the object distribution conforms to a specified region is judged, and if the test result does not conform to a set rule, the original hypothesis is rejected, namely the data distribution does not conform to the normal distribution.
Model construction
1. A target detection model:
the target detection comprises two processes of target frame prediction and target type classification, and taking the object detection as an example, the object detection comprises two parts of object identification and object position prediction. In each frame of the video, firstly, all targets which accord with the object characteristics are found out through a target detector, the predicted object positions are generally marked by borders (bounding boxes), a confidence coefficient is predicted for each possible target border by using a classification model, and finally, a final detection result is generated according to the confidence coefficient and the border position information. Object detection of an object is essentially the localization of multiple object objects, i.e. the localization of multiple object objects in a picture.
The target detection algorithm includes three parts: detection window selection, feature design, and classifier design. [ Here, the applicant has extended: since the face detection method based on Adaboost was proposed by Viola Jones in 2001, the target detection algorithm goes through the traditional frame of artificially designed features and a shallow classifier, and goes through the frame of End-To-End based on big data and a deep neural network, and the target detection technology gradually matures. In fact, the traditional method is the mainstream in the industry before 2013, and the research object is based on the traditional feature optimization detection method. With the huge advantage of the deep learning-based AlexNet gaining the image classification champion in 2012, the whole academic and industrial fields gradually begin to use deep learning for detection, because the deep neural network can automatically extract features and the classification recall rate and accuracy rate are remarkably improved compared with the traditional method. The latest target detection algorithm is basically a deep learning-based method, starting from the earliest RCNN, and is improved by nearly one time from the initial recording of 49.6% of accuracy. The precision of the traditional method, such as SVM-HOG, reaches 31.5, which is much lower than that of deep learning. In order to overcome the development and evolution of deep learning, in the two detection concepts of Region pro-poral Based and Non-Region pro-poral Based, the appearance of the following models has attracted extensive attention in the industry and become a milestone:
Region Proposal Based,
R-CNN and SPPNet were introduced in-2014
Fast R-CNN and Faster R-CNN were introduced in 2015
R-FCN was introduced in-2016
Non-Region Proposal Based,
Yolo and SSD were introduced in 2015
Yolo9000 appeared in 2016
It should be noted that, a model that is not the latest model is not necessarily the first choice in practical application, and some models increase the accuracy by several percent but consume a large amount of computing resources compared with the previous models, so that in practical application, a proper model needs to be selected by comprehensively considering the cost performance. ]
The target detection is the first step and the key step of the whole video analysis application, and massive basic data are collected by a high-gravity target detection algorithm in multiple dimensions of different scenes, different angles, different light rays and different seasons. And the algorithm is continuously iterated and optimized, so that the high generalization capability is achieved. The solidified algorithm model can reach a level far beyond the average level of industry in a new scene without any adaptive optimization. If in the final scene, after the corresponding adaptive algorithm optimization is carried out, the accuracy of target detection can be improved to a certain degree.
Target tracking algorithm
The main objective of multi-target tracking (MOT) is to automatically track an object of interest in a video and to restore a motion trajectory of the tracked object by using spatial, temporal and visual feature information in video data. MOT technology itself is well suited to handle complex scenes with large numbers of targets and has great potential in camera surveillance, behavioral analysis, autopilot/navigation, and smart city construction. [ Here, the applicant has extended: early target tracking techniques focused primarily on single target tracking, which generally utilized conventional computer vision techniques to perform feature engineering and then utilized these artificial features to build classification models. Due to the rise of the end-to-end learning concept in this year, related researches start to be transferred to deep learning, for example, a Discriminant Correlation Filters (DCFs) is embedded into a deep neural network as a calculation module, and in addition, an Efficient Convolution Operator (ECO) method reaches the most advanced level in the visual tracking benchmark game of 2017, and the performance of an accelerated version ECO-HC based on manual features on a single GPU reaches 60 fps.
Currently, the most mainstream frames in the MOT field are basically tracked by using detection frames (tracking-by-detection), and the main idea is to link the detection frames in different frames by various data association methods, and the general flow is as follows: the real position of the object in each frame is estimated by a detector, and then the detection results of a plurality of frames are synthesized to dynamically generate or delete the motion trail of multiple targets. Under this genre, various tracking strategies emerge, and the tracking algorithm based on kalman prediction is the most popular way.
From the original single-target tracking to the current multi-target tracking, from offline to online, from computation-intensive to real-time, from the traditional computer vision method to the current deep learning method, the tracking technology has undergone great development, although the limitations still exist, for example, if the extremely complex scene containing a large amount of shelters and frequent crossing of moving targets is faced, the MOT technology at the present stage is far from satisfactory. This problem is not well solved, in part because these features are too complex and diverse, for example, the features of the detected object can vary greatly from location to location in the display, which makes it difficult to associate the detected object with the predicted target location, especially when there is a large variation in light, scale or occlusion. ]
The system of the present invention designs a new method of GRU-based data correlation within the framework of Kalman prediction, called the Kalman-GRU method, by associating the detection object with the predicted target position by using implicit motion and morphological features simultaneously, i.e., a direct end-to-end correlation is achieved through the GRU network, rather than explicitly weighting the motion parameters and morphological features as before. Experimental results show that the effect of the implicit data association method is superior to that of the previous display data association method, and extra calculation cost is not introduced.
The invention can effectively record the data generation process and the alarm generation process through the actual monitoring of the application scene. Effective data support is provided for the management of the target place, so that the service quality of the target place is improved.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, substitutions and improvements made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (10)
1. A video image detection system based on artificial intelligence comprises a video intelligent monitoring and analyzing platform (1) arranged at the rear end and a monitoring camera (2) arranged at the front end, wherein the monitoring camera (2) is connected with the video intelligent monitoring and analyzing platform (1); the monitoring camera (2) on set up and read people's flow data statistics template (2a) of the people's flow data of warehouse entry target place, set up body temperature detection sensor (3) on monitoring camera (2), people's flow data statistics template (2a) be connected with body temperature detection sensor (3), set up warehouse entry people flow on the people's flow data statistics template (2a) and do not measure the temperature alarm module (2b), its characterized in that: the video intelligent monitoring and analyzing platform (1) is accessed to the online video stream data of each target place; the video intelligent monitoring and analyzing platform (1) comprises an input end (11), a central processing unit (12) and an output end (13); and a separate connection module (14) is added between the central processing unit (12) and the final output end (13), and the central processing unit (12) is connected with a modeling unit (15).
2. The artificial intelligence based video image detection system according to claim 1, further comprising a timer (5) and a target execution statistical unit (4) which are disposed at a target site, wherein the timer (5), the monitoring camera (2) at the front end of the target execution statistical unit (4) and the video intelligent monitoring analysis platform (1) at the rear end are respectively connected.
3. The artificial intelligence based video image detection system according to claim 1 or 2, wherein an online report generation module (16) and a presentation unit (17) are arranged on the video intelligent monitoring and analysis platform (1);
the monitoring camera (2) is connected by adopting a Multi-attribute Classification neural network (Multi-task Classification), 14 kinds of data extraction with subdivided attributes are carried out on the divided human body targets, and the 14 kinds of data comprise target age prejudgment, angle, gender, bag type, hat, whether articles are carried in front of the body, trousers, bags, shoes, jacket style, glasses, mask, jacket color and under-garment color data.
4. An artificial intelligence based video image detection system according to claim 2, wherein the execution of the target execution statistics unit (4) comprises: service execution duration data analysis based on the target execution statistics unit (4) statistics; and then, a statistical unit (4) is executed based on the target to analyze the passenger flow data of the new retail area: and then performing a statistical unit (4) analysis of the specific customer data based on the target; a specific client data non-execution alarm module (4a) is arranged on the target execution statistical unit (4).
5. An artificial intelligence based video image detection system according to claim 4, wherein the execution of the target execution statistics unit (4) further comprises: and performing the identification of the non-specification data by the statistical unit (4) based on the target.
6. An artificial intelligence based video image detection system according to claim 1, characterized in that said input (11): the method comprises a Mosaic data enhancement, cmBN and SAT self-confrontation training connecting end; the central processing unit (12): based on the feature extraction network, CSPDarknet53 is adopted, a Mish activation function is used, and Dropblock is adopted in a regularization mode; the individual connection module (14) adopts an SPP module and an FPN + PAN structure, and in the SPP module, the maximum pooling mode of k ═ {1 × 1,5 × 5,9 × 9,13 × 13} is used, and then Concat operation is carried out on feature maps with different scales.
7. The system according to claim 1 or 6, wherein the video intelligent monitoring and analyzing platform (1) further comprises a single prediction box module (18) for changing the nms of the prediction box filtering to DIOU _ nms.
8. A video image detection method based on artificial intelligence is characterized by comprising the following steps:
1) the monitoring analysis relies on a video intelligent monitoring analysis platform (1), and various artificial intelligent analysis algorithms of target detection and identification and target tracking are adopted to analyze and process real-time video streams or off-line video files, extract key information of massive videos, and classify and aggregate the key information so as to search and push the key information in an intelligent manner;
2) the method comprises the steps of performing image description and analysis after an accessed video stream is sliced, and further performing analysis, identification, tracking, understanding and compression coding on the image;
3) performing 14 types of attribute-subdivided data extraction on the segmented object by using a Multi-attribute Classification neural network (Multi-task Classification), and subdividing the attributes of the person into at least 14 types of data;
the temperature of the entrance of the target is measured and identified by a body temperature detection sensor (3) arranged on the monitoring camera (2).
9. A video image detection method based on artificial intelligence is characterized by comprising the following steps:
1) accessing data
Accessing online video stream data of each target through a video intelligent monitoring and analyzing platform (1);
2) target detection
The method comprises two processes of target frame prediction and target type classification, wherein the target detection comprises two parts of target identification and target position prediction by taking target object detection as an example;
selecting a Yolo algorithm in an One-stage algorithm for the target detection algorithm;
3) object segmentation
Carrying out image description and analysis after the video stream is cut into slices by utilizing a threshold-based segmentation method, a region-based segmentation method and an edge-based segmentation method, and further carrying out analysis, identification, tracking, understanding and compression coding on the images;
4) attribute extraction
A Multi-attribute Classification neural network (Multi-task Classification) for extracting 14 types of subdivision attribute data of the divided object; subdividing the attributes of the subject into at least 14 categories including age prediction, angle, gender, bag type, hat, presence of a wearer, pants, bags, shoes, jacket style, glasses, mask, jacket color, and under-garment color data;
5) target data tracking
And identifying implicit motion and morphological characteristic data to associate the detection object with the predicted target position by applying a Kalman-GRU method, and realizing direct end-to-end association through a GRU network to realize target tracking.
10. The video image detection system based on artificial intelligence of claim 8 or 9, characterized in that, the object entering the network site is detected and tracked through the image of the monitor camera (2) and the data statistics of the client entering the network site is recorded; detecting, characteristic extracting and area analyzing methods are carried out on the objects entering the network points through the accessed pictures of the monitoring camera (2), the working objects or clients are identified, and the statistical data are sequentially analyzed based on the target execution statistical unit (4);
model construction of a modeling unit (15) on a video intelligent monitoring analysis platform (1): the method comprises the following steps:
1) and a target detection model:
the target detection comprises two processes of target frame prediction and target type classification, in each frame of a video, firstly, all targets which accord with the characteristics of the target are found out through a target detector, the predicted positions of the targets are generally marked by a frame (bounding box), a confidence coefficient is predicted for each possible target frame by using a classification model, and finally, a final detection result is generated according to the confidence coefficient and the frame position information;
the target detection algorithm includes three parts: detecting window selection, feature design and classifier design;
2) and associating the detected object with the predicted target location in multi-target tracking (MOT) data by utilizing implicit motion and morphological features simultaneously.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110701735.7A CN113436165A (en) | 2021-06-23 | 2021-06-23 | Video image detection system based on artificial intelligence and detection method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110701735.7A CN113436165A (en) | 2021-06-23 | 2021-06-23 | Video image detection system based on artificial intelligence and detection method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113436165A true CN113436165A (en) | 2021-09-24 |
Family
ID=77755197
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110701735.7A Pending CN113436165A (en) | 2021-06-23 | 2021-06-23 | Video image detection system based on artificial intelligence and detection method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113436165A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115575400A (en) * | 2022-09-30 | 2023-01-06 | 招商局重庆公路工程检测中心有限公司 | Heaven and earth integrated highway detection system and method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109819208A (en) * | 2019-01-02 | 2019-05-28 | 江苏警官学院 | A kind of dense population security monitoring management method based on artificial intelligence dynamic monitoring |
TWM603165U (en) * | 2020-05-05 | 2020-10-21 | 李永裕 | Intelligent entrance guard management system |
CN111832400A (en) * | 2020-06-04 | 2020-10-27 | 北京航空航天大学 | Mask wearing condition monitoring system and method based on probabilistic neural network |
CN112001353A (en) * | 2020-09-03 | 2020-11-27 | 杭州云栖智慧视通科技有限公司 | Pedestrian re-identification method based on multi-task joint supervised learning |
CN112399144A (en) * | 2020-11-05 | 2021-02-23 | 上海明略人工智能(集团)有限公司 | Thermal imaging monitoring early warning method and device and thermal imaging monitoring management system |
-
2021
- 2021-06-23 CN CN202110701735.7A patent/CN113436165A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109819208A (en) * | 2019-01-02 | 2019-05-28 | 江苏警官学院 | A kind of dense population security monitoring management method based on artificial intelligence dynamic monitoring |
TWM603165U (en) * | 2020-05-05 | 2020-10-21 | 李永裕 | Intelligent entrance guard management system |
CN111832400A (en) * | 2020-06-04 | 2020-10-27 | 北京航空航天大学 | Mask wearing condition monitoring system and method based on probabilistic neural network |
CN112001353A (en) * | 2020-09-03 | 2020-11-27 | 杭州云栖智慧视通科技有限公司 | Pedestrian re-identification method based on multi-task joint supervised learning |
CN112399144A (en) * | 2020-11-05 | 2021-02-23 | 上海明略人工智能(集团)有限公司 | Thermal imaging monitoring early warning method and device and thermal imaging monitoring management system |
Non-Patent Citations (2)
Title |
---|
徐子睿等: "基于YOLOv4的车辆检测与流量统计研究", 现代信息科技, vol. 4, no. 15, 10 August 2020 (2020-08-10), pages 98 - 103 * |
韩晓微等: "基于深度学习的客流量统计方法", 计算机系统应用, vol. 29, no. 4, 15 April 2020 (2020-04-15), pages 25 - 31 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115575400A (en) * | 2022-09-30 | 2023-01-06 | 招商局重庆公路工程检测中心有限公司 | Heaven and earth integrated highway detection system and method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108053427B (en) | Improved multi-target tracking method, system and device based on KCF and Kalman | |
Cheng et al. | Vision-based monitoring of site safety compliance based on worker re-identification and personal protective equipment classification | |
Hsiao et al. | Occlusion reasoning for object detectionunder arbitrary viewpoint | |
CN106128022B (en) | A kind of wisdom gold eyeball identification violent action alarm method | |
US20170261264A1 (en) | Fault diagnosis device based on common information and special information of running video information for electric-arc furnace and method thereof | |
CN102163290B (en) | Method for modeling abnormal events in multi-visual angle video monitoring based on temporal-spatial correlation information | |
CN106571014A (en) | Method for identifying abnormal motion in video and system thereof | |
CN104915655A (en) | Multi-path monitor video management method and device | |
US20150339831A1 (en) | Multi-mode video event indexing | |
CN113553979B (en) | Safety clothing detection method and system based on improved YOLO V5 | |
CN103279737B (en) | A kind of behavioral value method of fighting based on space-time interest points | |
CN103902966B (en) | Video interactive affair analytical method and device based on sequence space-time cube feature | |
CN109948455B (en) | Detection method and device for left-behind object | |
Patil et al. | Fggan: A cascaded unpaired learning for background estimation and foreground segmentation | |
CN111738218B (en) | Human body abnormal behavior recognition system and method | |
Ferryman et al. | Performance evaluation of crowd image analysis using the PETS2009 dataset | |
Gnouma et al. | Abnormal events’ detection in crowded scenes | |
CN107657232A (en) | A kind of pedestrian's intelligent identification Method and its system | |
CN112347909A (en) | Retail store entrance and exit passenger flow statistical method | |
CN108830204B (en) | Method for detecting abnormality in target-oriented surveillance video | |
CN117541994A (en) | Abnormal behavior detection model and detection method in dense multi-person scene | |
CN115169673A (en) | Intelligent campus epidemic risk monitoring and early warning system and method | |
Fan et al. | Video anomaly detection using CycleGan based on skeleton features | |
CN113436165A (en) | Video image detection system based on artificial intelligence and detection method thereof | |
Altowairqi et al. | A Review of the Recent Progress on Crowd Anomaly Detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |