CN116958791A

CN116958791A - Camera polling calling alarming method in machine vision based on deep learning

Info

Publication number: CN116958791A
Application number: CN202310955634.1A
Authority: CN
Inventors: 陈大龙; 陈兴林; 范道啸
Original assignee: Nanjing Howso Technology Co ltd
Current assignee: Nanjing Howso Technology Co ltd
Priority date: 2023-07-31
Filing date: 2023-07-31
Publication date: 2023-10-27

Abstract

The invention discloses a camera polling calling alarming method in machine vision based on deep learning, which comprises the following steps: s1: initializing nodes and installing an algorithm; s2: according to the camera information, configuring a camera for each algorithm in turn, and sending a request to acquire a video stream; s3: receiving and processing the video stream to obtain preprocessed image data; s4: processing the preprocessed image data through a target detection and recognition model to extract target information; s5: judging whether to trigger an alarm according to the detection results of the target detection and recognition model; s6: if the alarm is triggered, sending an alarm notification according to a preset notification mechanism. Through the application of a deep learning algorithm, the polling sequence of the cameras is dynamically adjusted by utilizing the correlation and spatial locality among the cameras, so that the performance and the efficiency of an alarm system are improved; not only can reduce resource consumption and improve the real-time performance of the alarm, but also can enhance the accuracy and reliability of the alarm.

Description

Camera polling calling alarming method in machine vision based on deep learning

Technical Field

The invention relates to the technical field of deep learning, in particular to a camera polling calling alarming method in machine vision based on deep learning.

Background

In the rapid development of machine vision technology, cameras play an important role as one of the most common and widely used sensing devices. With the continuous development of the fields of intelligent cities, intelligent home furnishings, industrial automation and the like, the number and application prospects of cameras are increased. Traditional machine vision methods are mainly based on image processing and pattern recognition techniques. The image processing technology comprises image enhancement, filtering, edge detection and the like, and is used for preprocessing an image and extracting features. Pattern recognition techniques include feature extraction, feature matching, classification and recognition, etc., for extracting meaningful information from images and performing object recognition. In the field of security protection, machine vision technology plays an important role. For example, the video monitoring system utilizes the machine vision technology to realize real-time analysis and alarm of the monitoring picture, and can identify abnormal behaviors, face recognition, license plate recognition and the like. In addition, machine vision can also be used to aid police surveys and crime detection, providing critical image and video evidence. Despite significant advances in the field of security, machine vision technology still has some limitations and challenges. First, object detection and tracking in complex scenes remains a challenge, especially in cases of varying illumination, occlusion, and background complexity. Second, the processing and storage of large-scale data is also a challenge, requiring efficient algorithms and system support. In addition, machine vision techniques also need to take privacy and security concerns into account to ensure legal use and protection of personal information.

The camera polling call refers to the call in turn in a plurality of cameras according to a certain sequence so as to acquire image or video data. The purpose is to realize the whole coverage and continuous monitoring of the monitoring area and provide timely visual information for analysis and decision. The polling call method fully utilizes the resources of a plurality of cameras, so that the monitoring system can realize a wider monitoring range. However, in a large-scale camera network, efficient call and utilization of camera resources becomes a critical issue. Existing warning systems often acquire image data by continuously polling all cameras and perform warning analysis. However, this polling method has some limitations, such as excessive resource consumption and high delay.

Chinese patent document (CN 110097787 a) discloses a ship collision early warning monitoring system and method based on a monitoring navigation mark lamp, the system comprises a base, a folding support rod, a solar panel, a shell, a transparent organic glass protective cover, a navigation mark lamp, a camera component, a main control board, an acceleration sensor, a storage battery, a GPS positioning module, a light sensor, a charging control board, a network module and a background server. The method comprises the steps of constructing a neural network model to identify the ship category and the position of the ship in an image; the main control panel controls the camera component to respectively carry out image acquisition on the polling of a plurality of directions of the channel; identifying the ship type and making a fuzzy distance judgment based on the classification; and the acceleration sensor generates three-level alarms after judging the ship impact event and transmits the three-level alarms back to the background server. In the technical scheme, although the camera component is adopted to respectively carry out image acquisition on a plurality of directional polls of the channel, in the technical scheme, no deep learning algorithm is combined to carry out the round adjustment.

When implementing a camera polling call method based on deep learning, a few research problems and challenges are faced. First, how to efficiently schedule according to the correlation between cameras and image content is a key issue. Meanwhile, real-time performance is a non-negligible factor in a machine vision alarm system, and how to maintain the performance of the system under the real-time performance requirement is a challenge. In addition, the complexity and parameter adjustment of the deep learning algorithm are also issues to be addressed.

The existing polling call methods have advantages and disadvantages, and the selection of the proper method needs to be evaluated according to specific application scenes and requirements. The time-based method is simple and easy to implement, and is suitable for the condition of infrequent scene change; the space-based method has strong adaptability and can realize continuous tracking of the target; the content-based method is more intelligent, and can carry out self-adaptive calling according to the change of the image content. However, each method presents certain limitations and challenges. Time-based methods may not be flexible in coping with target position changes; the space-based method has higher requirements on target position information and camera layout; content-based approaches present challenges in terms of algorithm complexity and real-time.

In practical application, the scene characteristics, the system resources and the performance requirements need to be comprehensively considered to select a proper polling method. In addition, the combined calling can be performed by combining a plurality of methods, so that the advantages of the methods are fully utilized, and the respective limitations are solved. The traditional camera polling call method is mainly designed aiming at a single camera, and in practical application, a network formed by a plurality of cameras is often present. Therefore, there is a need to develop an application of a camera polling call method in machine vision alarms that achieves high efficiency in a multi-source collaborative network.

Disclosure of Invention

The invention aims to solve the technical problem of providing a camera polling calling alarming method in machine vision based on deep learning, which utilizes the correlation and spatial locality among cameras to dynamically adjust the polling sequence of the cameras and improves the performance and efficiency of an alarming system.

In order to solve the technical problems, the invention adopts the following technical scheme: the camera polling call alarming method based on the deep learning in the machine vision comprises the following steps:

s1: initializing nodes (polling nodes) and installing algorithms;

S2: according to the camera information, configuring a camera for each algorithm in turn, and sending a request to acquire a video stream;

s3: receiving and processing the video stream to obtain preprocessed image data;

s4: processing the preprocessed image data through a target detection and recognition model to extract target information;

s5: judging whether to trigger an alarm according to the detection results of the target detection and recognition model;

s6: if the alarm is triggered, sending an alarm notification according to a preset notification mechanism.

By adopting the technical scheme, in order to solve the resource scheduling problem in the traditional alarm system, the polling sequence of the cameras is dynamically adjusted by utilizing the correlation and the spatial locality among the cameras through the application of a deep learning algorithm, so that the performance and the efficiency of the alarm system are improved; the method not only can reduce the resource consumption and improve the real-time performance of the alarm, but also can enhance the accuracy and reliability of the alarm. Therefore, the method has important significance for improving the performance of the machine vision alarm system.

It should be noted that: in step S1, the installed algorithm is an intelligent program that performs target detection for the program and correctly generates an expected alarm, and includes existing scene algorithms and atomic algorithms; the atomic algorithm comprises algorithms such as identification without a safety helmet, identification without a protective tool, vehicle intrusion detection, face shielding, open fire detection and the like; the scene algorithm comprises a personnel pulling inspection, personnel gathering detection, sleep detection, fall detection, locomotive number identification, etc. The scene algorithm and the atomic algorithm can be existing, and a person skilled in the art can also use the scene algorithm and the atomic algorithm according to actual situations.

Preferably, in the step S2, the camera is selected and configured according to the resolution and the image quality of the camera, the field of view range and the network connection mode, and the configuration of the camera includes setting parameters of the camera.

Preferably, in the step S3, real-time video data is obtained from the camera and preprocessed, and the specific steps are as follows:

s31, video stream receiving: acquiring real-time video streams transmitted by a camera through a network or an interface;

s32, decoding data: decoding the received video stream to obtain original image data and marking the quality of the original image data;

s33, preprocessing an image: preprocessing the decoded original image data to obtain preprocessed image data so as to improve the accuracy and efficiency of a subsequent algorithm.

Preferably, the preprocessing of the image data in step S33 specifically includes:

s331: scaling the image data to adjust the image data to the image data with the same size;

s332: clipping the image data;

s333: the image data is subjected to data enhancement operations including random rotation, flipping, and translation.

Preferably, the training step of the target detection and recognition model in step S4 is as follows:

S41 data set division: dividing the selected data set into a training set, a verification set and a test set;

s42, training a model: constructing and training a target detection and recognition model by using image data and quality annotation information of a training set and a deep learning framework; then adopting an optimization algorithm and a loss function, and setting super parameters; to minimize training errors and improve model performance;

s43, model verification: in the training process, the data of the verification set is used for monitoring the performance and generalization capability of the obtained target detection and identification model, and model adjustment and optimization are carried out by evaluating the performance of the target detection and identification model on the verification set so as to improve the effect of the target detection and identification model in practical application;

s44 model test: the image data of the test set is adopted to evaluate the obtained target detection and identification model, the obtained target detection and identification model is inferred, and evaluation indexes are calculated, so that the accuracy and generalization capability of the target detection and identification model on unseen data are measured.

By adopting the technical scheme, a target detection and recognition model based on deep learning is constructed and trained, the polling sequence of the cameras is dynamically adjusted by learning the correlation and the image content among the cameras, and the performance and the effect of the method are verified by experimental design and evaluation; the camera resource can be effectively utilized, and the performance and efficiency of the machine vision alarm system are improved.

Preferably, the deep learning framework in step S42 constructs and trains target detection and recognition models including four models of a regional convolutional neural network (R-CNN), a Fast regional convolutional neural network (Fast R-CNN), a Faster regional convolutional neural network (Fast R-CNN), and a single-shot multi-frame detector (SSD). The Chinese names of four machine vision training models, namely R-CNN, fast R-CNN and SSD: R-CNN (Region-based Convolutional Neural Network): a regional convolutional neural network; fast R-CNN: a fast regional convolutional neural network; faster R-CNN: faster regional convolutional neural networks; SSD (Single Shot MultiBox Detector): a single shot multi-frame detector.

Preferably, in the step S4, a target detection and recognition algorithm is used to extract key target information from the image data; wherein the deep learning framework used in the step S42 is tensor flow (TensorFlow) or torch (pyresch) or scientific research (kerras); the image classification algorithm used is an Alexarnt network (AlexNet) or a VGG network or a residual network (ResNet) or an acceptance network. Wherein TensorFlow: tensorFlow is an open source deep learning framework developed by the Google Brain team. It represents a computational task using a dataflow graph approach, where nodes represent mathematical operations and edges represent data flows. TensorFlow provides rich APIs and tools that facilitate building complex neural network models. It supports distributed computing and GPU acceleration and therefore performs well on large-scale data and complex models. PyTorch (torch): pyTorch is a deep learning framework developed by the AI institute of Facebook. Unlike TensorFlow, pyTorch uses a dynamic graph approach, i.e., performs computations directly when the model is defined, making debugging and modifying the model more flexible and intuitive. The API of PyTorch is simple and easy to use, provides rich tensor operation and automatic derivation mechanisms, and is convenient for researchers and developers to conduct rapid prototype design and experiments. Keras (scientific research): keras is an advanced neural network API, initially defined by Chollet development has now been integrated into the TensorFlow. The Keras is simple and easy to use, provides a concise API and a modularized framework, and enables a user to quickly build a deep learning model. Keras is suitable for beginners and rapid prototyping, while also meeting complex models and high-level requirements.

Preferably, the means for triggering the alarm in step S5 includes detecting the number of targets exceeding a preset threshold, triggering the alarm when the targets are detected within a preset area, and detecting the target behavior or action.

Preferably, the means for warning notification in step S6 includes: sound or flashing light, short message or telephone notice, and real-time monitoring interface.

Preferably, the evaluating the index in step S44 includes evaluating accuracy, recall, preparation rate, and ratio of accuracy and recall of the target detection and recognition model.

Compared with the prior art, the invention has the beneficial effects that: according to the method, a target detection and recognition model based on deep learning is constructed and trained, the polling sequence of the cameras is dynamically adjusted by learning the correlation and image content among the cameras, and the performance and effect of the method are verified by experimental design and evaluation; the camera resources can be effectively utilized, and the performance and efficiency of the machine vision alarm system are improved; in addition, the effectiveness of the method is also proved by a comparison experiment; the method specifically comprises the following steps:

(1) Modeling and learning the correlation and image content among cameras through a deep learning algorithm, so that the polling sequence of the cameras can be dynamically adjusted; therefore, limited resources can be effectively utilized, and the performance and efficiency of the alarm system are improved;

(2) The method comprises the steps of training and deducing the environment and other components by using a camera network simulator and a deep learning model to support the experiment and the result evaluation;

(3) Comprehensive experimental evaluation is carried out aiming at different scenes and data sets, and the advantages of the method in the aspects of alarm accuracy, instantaneity, resource utilization rate and the like are evaluated by comparing the performance difference of the camera polling calling method based on deep learning with the traditional method.

Drawings

Fig. 1 is a schematic diagram of an authorization flow of a camera polling call in machine vision method based on deep learning according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the drawings of the embodiments of the present invention.

Examples: the camera polling based on deep learning calls an alarm method in machine vision, wherein the polling call is a method for systematically checking and acquiring camera data, and the method realizes comprehensive monitoring and data acquisition by sequentially accessing different cameras; as shown in fig. 1, the method specifically comprises the following steps:

S1: initializing nodes (polling nodes) and installing algorithms;

in the method for polling and calling the camera, a proper camera device is required to be selected, so in the step S2, the camera is selected and configured according to the resolution, the image quality, the visual field range and the network connection mode of the camera, and the configuration of the camera comprises the setting of parameters of the camera;

wherein, resolution and image quality: selecting proper resolution and image quality according to actual requirements;

visual field range: selecting a visual field range of the camera according to the size and the layout of the monitoring area;

network connection: selecting a camera supporting network connection so as to perform remote access and data transmission;

the camera configuration includes setting parameters of the camera such as image brightness, contrast, frame rate, etc. to obtain optimal image quality and performance;

in the step S3, real-time video data is obtained from the camera and preprocessed, which specifically includes the steps of:

s31, video stream receiving: acquiring real-time video streams transmitted by a camera through a network or other interfaces;

s33, preprocessing an image: preprocessing the decoded original image data to obtain preprocessed image data so as to improve the accuracy and efficiency of a subsequent algorithm;

the specific step of preprocessing the image data in step S33 is as follows:

s332: clipping the image data;

s333: performing data enhancement operations on the image data, including random rotation, flipping, and translation;

s4: processing the preprocessed image data through a target detection and recognition model to extract target information; the data is preprocessed to ensure data quality and consistency.

The training steps of the target detection and recognition model in the step S4 are as follows:

the selection of the data set mainly needs to take the following points into consideration:

data diversity: the data set comprises images in various different scenes, covers different camera settings and illumination conditions, and can fully examine the robustness and adaptability of the camera polling calling method;

Marking quality: the data set provides accurate target labeling information, including object positions and categories, which is very important for experiments of target detection and identification;

the method is widely applied: the data set is widely applied and verified in the field of machine vision alarm, a plurality of related research papers also use the data set for experiments, and the data set is selected to be better compared and contrasted with the existing research;

through the data preprocessing steps, the quality and consistency of an experimental data set are ensured, and a reliable basis is provided for subsequent experiments and model training; these pretreatment steps will help us obtain accurate experimental results and ensure repeatability and comparability of the experiment;

s42, training a model: constructing and training a target detection and recognition model by using image data and quality annotation information of a training set and a deep learning framework; then adopting an optimization algorithm and a loss function, and setting super parameters; to minimize training errors and improve model performance; methods based on traditional machine learning, such as Haar features and cascade classifiers; deep learning-based methods such as Convolutional Neural Network (CNN), regional convolutional neural network (R-CNN), YOLO (You Only Look Once), etc.; the algorithms realize detection and identification of targets by carrying out steps such as feature extraction, target positioning and classification on images;

The deep learning framework in the step S42 constructs and trains target detection and recognition models including four models of a regional convolution neural network (R-CNN), a Fast regional convolution neural network (Fast R-CNN), a Faster regional convolution neural network (Fast R-CNN) and a single-shot multi-frame detector (SSD); the Chinese names of four machine vision training models, namely R-CNN, fast R-CNN and SSD: R-CNN (Region-based Convolutional Neural Network): a regional convolutional neural network; fast R-CNN: a fast regional convolutional neural network; faster R-CNN: faster regional convolutional neural networks; SSD (Single Shot MultiBox Detector): a single-shot multi-frame detector;

in the step S4, a target detection and recognition algorithm is adopted to extract key target information from the image data; wherein the deep learning framework used in the step S42 is tensor flow (TensorFlow) or torch (pyresch) or scientific research (kerras); the image classification algorithm used is Alexarnt network (AlexNet) or VGG network or residual network (ResNet) or acceptance network;

wherein TensorFlow: tensorFlow is an open source deep learning framework developed by the Google Brain team; rich tools and libraries are provided that support the construction and training of various deep learning models. The system has good expandability and cross-platform support; it represents computational tasks using a dataflow graph approach, where nodes represent mathematical operations and edges represent data flows; tensorFlow provides rich APIs and tools that facilitate building complex neural network models. The method supports distributed computing and GPU acceleration, so that the method is excellent in large-scale data and complex models;

PyTorch (torch): pyTorch is a deep learning framework developed by the AI institute of Facebook; unlike TensorFlow, pyTorch adopts a dynamic diagram mode, namely, calculation is directly executed when a model is defined, so that the model is more flexible and visual to debug and modify; it is widely used in research and development; the PyTorch provides a dynamic graph mechanism, so that the definition and the debugging of the model are more flexible; the API of PyTorch is simple and easy to use, provides rich tensor operation and automatic derivation mechanisms, and is convenient for researchers and developers to carry out rapid prototype design and experiments;

keras (scientific research): keras is an advanced neural network API, initially defined byChollet development has now been integrated into the TensorFlow. The Keras is simple and easy to use, provides a concise API and a modularized framework, and enables a user to quickly build a deep learning model. Keras is suitable for beginners and rapid prototyping, while also meeting complex models and high-level requirements. Can be used as the front end of TensorFlow, theano or CNTK; the method provides a concise and visual interface, and is convenient for a user to quickly build and train the deep learning model; the deep learning training frames provide rich functions and tools, can simplify the development and training processes of models, and simultaneously support efficient reasoning and deployment on different hardware platforms;

In the camera polling call method, the application of the deep learning algorithm has important significance. Through a deep learning algorithm, target detection and identification of camera images can be realized, and accurate target information is provided for subsequent polling call. Meanwhile, the training speed of the model can be accelerated and the performance of the model can be improved by using the deep learning training framework;

the image classification algorithm is AlexNet or VGG or ResNet or acceptance; object detection is an important task in machine vision alerts for identifying specific objects in images or videos. Deep learning can extract rich characteristic representations from images through network structures such as Convolutional Neural Networks (CNNs) and the like, and high-efficiency and accurate target detection is realized; wherein CNN is a commonly used deep learning model, particularly suitable for processing image data; the CNN structure comprises an input layer, a convolution layer, a pooling layer, a full connection layer and an output layer; the convolution layer extracts the features of the image through convolution operation, the pooling layer is used for reducing the dimension of the feature map, and the full-connection layer is used for classification or regression tasks. Some classical deep learning object detection models, such as fast R-CNN, YOLO and SSD, have achieved excellent performance in practice and are widely used in machine vision alert systems;

Image classification is the task of classifying images into different categories, which are used in machine vision alerts to determine whether a specific object or event exists in an image; the deep learning can automatically learn more discriminative features from the images through training and learning of a deep neural network, and accurate image classification is realized; some classical deep learning image classification models, such as AlexNet, VGGNet and ResNet, have become benchmark models for image classification tasks and are widely used in the field of machine vision alarms; deployment and instantaneity of the deep learning model are also issues to be considered in machine vision alarms. In order to realize real-time alarm and processing, model compression and acceleration technologies such as model pruning, quantization, hardware acceleration and the like can be adopted to reduce the calculation complexity of the model and improve the operation efficiency;

the AlexNet is a breakthrough of deep learning on an image classification task, and adopts a structure of a plurality of rolling layers and pooling layers;

VGG (Visual Geometry Group) is an image classification network with deep structure, extracting features by stacking multiple rolling and pooling layers;

ResNet (Residual Network) uses residual connections to solve the degradation problem of deep networks, enabling training of very deep networks;

The acceptance series network adopts a plurality of convolution kernels with different sizes and a pooling layer, and improves classification performance through feature fusion;

s43, model verification: in the training process, the data of the verification set is used for monitoring the performance and generalization capability of the obtained target detection and identification model, and model adjustment and optimization are carried out by evaluating the performance of the target detection and identification model on the verification set so as to improve the effect of the target detection and identification model in practical application; in addition to data challenges, optimization of deep learning models is also a critical issue; deep neural networks typically have a large number of parameters, resulting in a model training process that is easily affected by overfitting; to overcome this problem, regularization techniques, such as L1 and L2 regularization, may be employed to limit the complexity of the model; in addition, the selection of optimization algorithms is also very important, and common optimization algorithms include random gradient descent (SGD), adam, adagard, and the like; the algorithms can update and adjust parameters according to the loss function of the model so as to improve the convergence speed and performance of the model;

s44 model test: evaluating the obtained target detection and identification model by adopting image data of a test set, reasoning the obtained target detection and identification model, and calculating an evaluation index to measure the accuracy and generalization capability of the target detection and identification model on unseen data; the evaluation indexes in the step S44 include the accuracy, recall rate, preparation rate and ratio of accuracy and recall rate of the evaluation target detection and recognition model;

the alarm triggering mode in the step S5 comprises the steps of detecting the number of targets exceeding a preset threshold, triggering an alarm when the targets are detected to be in a preset area and detecting target behaviors or actions;

wherein, the target number: triggering an alarm when the target number exceeding the preset threshold is detected;

target position: triggering an alarm when detecting that the target is in a preset area;

target behavior: triggering an alarm when a specific target behavior or action is detected;

The means for alarm notification in said step S6 comprises: sound or flashing light, short message or telephone notice, real-time monitoring interface;

wherein, sound or flashing light: generating sound or flashing lights in the monitored area to attract attention;

short message or telephone notification: notifying the related personnel of the alarm information through a short message or a telephone;

real-time monitoring interface: the alarm information is displayed in visual or acoustic form on the monitoring interface.

Specific examples: when the technical proposal is adopted for experiments, a stable and consistent experimental environment needs to be ensured,

The following are relevant information for the experimental setup:

hardware equipment: a computer with a high-performance display card is used as an experimental platform to support training and reasoning tasks of a deep learning model;

operating system: a stable operating system is selected as an experimental environment-Windows, and compatibility with the deep learning framework is ensured;

deep learning framework: a widely used and powerful deep learning framework is selected as an experimental tool, namely PyTorch; using the latest stable version, and installing the related dependent libraries as required;

(II) experiments were performed according to the following procedure:

(1) Data set partitioning: first, we divide the selected dataset into a training set, a validation set and a test set; typically, we use most of the data for training and validation, leaving a small portion as a test set for evaluating the performance of the final model.

(2) Model training: using the image data of the training set and the related annotation information, we construct and train a target detection and recognition model using a deep learning framework; we use appropriate optimization algorithms and loss functions and set appropriate hyper-parameters to minimize training errors and improve the performance of the model.

(3) Model verification: during the training process, we use the data of the validation set to monitor the performance and generalization ability of the model; by evaluating the performance of the model on the verification set, the model can be adjusted and optimized to improve the effect of the model in practical application;

(4) Model test: after training and validation is completed, we use the image data of the test set to evaluate the performance of the final model; the model is inferred and an evaluation index is calculated to measure the accuracy and generalization capability of the model on unseen data;

(III) evaluation index

The following evaluation index was used to measure the performance of the model:

(1) Accuracy (Precision): the proportion of the actual positive category in the samples with the positive category predicted by the finger model;

(2) Recall (Recall): the ratio of the samples which are actually positive categories and are correctly predicted to be positive categories by the model is indicated;

(3) Accuracy (Accuracy): the proportion of the samples of the pattern correctly classified to the total number of samples;

(4) F1 Score (F1 Score): comprehensively considering accuracy and recall;

(IV) analysis and discussion of Experimental results

(1) Experimental results

Table 1 results of experimental evaluation of four different models

Model	Accuracy of	Recall rate of recall	Accuracy of	F1 fraction
					Model A	0.85	0.82	0.89	0.83
Model B	0.82	0.79	0.87	0.80
					Model C	0.88	0.86	0.91	0.87
Model D	0.83	0.80	0.88	0.81

As can be seen from table 1, the results obtained when experimental evaluation was performed using four different models (model a, model B, model C, and model D), each of which was represented in numerical form in terms of four indexes of accuracy, recall, accuracy, and F1 score;

according to the experimental results, the difference in various indexes between different models can be observed. Model C achieved the best performance in accuracy and recall, reaching high scores of 0.88 and 0.86, while model B was slightly inferior in accuracy and F1 score.

(2) Model performance analysis

1) Accuracy (Precision): accuracy measures how many of the identified positive categories the model is truly positive. From the experimental results, model C achieves the highest accuracy score (0.88), indicating that the model can accurately identify most of the positive class targets. Model a and model D also exhibited good accuracy scores of 0.85 and 0.83, respectively.

2) Recall (Recall): recall measures the ratio between the number of positive categories that a model can correctly identify and the number of actual positive categories. In our experiment, model C achieved the highest recall score (0.86), indicating that the model was able to effectively capture most of the positive class targets. Model a and model D also exhibited higher recall rates of 0.82 and 0.80, respectively.

3) Accuracy (Accuracy): accuracy is a measure of the proportion of samples that the model correctly classifies in overall prediction. According to the experimental results, model C is excellent in accuracy, reaching a high score of 0.91. Model a and model D also have relatively high accuracy scores of 0.89 and 0.88, respectively.

4) F1 fraction: the F1 score is an index that comprehensively considers accuracy and recall and measures the balance of the model in terms of positive and negative class predictions. Experimental results show that model C achieves the highest F1 score (0.87), indicating that the model achieves good performance in terms of balance accuracy and recall. Model a and model D also exhibited relatively high F1 scores of 0.83 and 0.81, respectively.

In a comprehensive view, the experimental result shows the superiority of the model C on various evaluation indexes, and the model C has potential and application value in machine vision alarm. Model a and model D also performed relatively well, but they were slightly different in some metrics than model C. Through analysis of experimental results, we can further explore the advantages, limitations and room for improvement of each model, and provide valuable references for future research and applications.

(3) Interpretation of results

From the analysis of experimental data we can draw the following conclusions:

(1) R-CNN (model a): although R-CNN performs well in terms of target location and classification accuracy, its slow speed limits its feasibility in real-time applications due to the complex flow of candidate region extraction and individual classification.

(2) Fast R-CNN (model B): by introducing regional pooling operation, fast R-CNN combines the feature extraction and classification processes into a forward propagation process, thereby improving the efficiency. Compared with R-CNN, fast R-CNN has improved target positioning and classification accuracy and faster processing speed.

(3) Faster R-CNN (model C): the introduction of the fast R-CNN of the candidate regional extraction network (RPN) further improves speed and accuracy. By rapidly generating candidate regions, the complex process of selective searching is avoided. Experimental data shows that Faster R-CNN is superior to R-CNN and Faster R-CNN in terms of target localization and classification accuracy, and has a Faster detection speed.

(4) SSD (model D): SSD adopts the method of single detection multi-frame, detects different size targets by applying different scale convolution filters on different scale feature maps. Experimental data shows that SSD performs well in multi-target detection, is able to detect multiple targets in a short time, and is advantageous in real-time applications.

In combination, these four models all have their unique advantages and applicability in machine vision alerting. The R-CNN and Fast R-CNN are excellent in accuracy, and are suitable for scenes with high requirements on accurate target positioning. Faster R-CNN balances in terms of speed and accuracy, and is suitable for multi-target detection tasks. The SSD has remarkable advantages in speed and multi-target detection, and is particularly suitable for real-time application scenes.

Considering comprehensively, a proper model can be selected according to the requirements of application scenes. For tasks requiring high precision target positioning, R-CNN and Fast R-CNN are good choices. And for real-time application and multi-target detection tasks, the Faster R-CNN and SSD have better performance. Furthermore, parameter tuning and model optimization may also have a significant impact on the performance of the model for a particular data set and application scenario.

Application case analysis:

1. actual scene introduction and demand analysis:

traffic banking is a large bank with numerous branches and ATM machines. Because of the large business scale and large customer quantity, security and prevention risks are important tasks of bank management. Traffic banks wish to enhance monitoring and security of branch authorities and ATM machines by introducing machine vision technology, particularly camera polling call methods. Specific requirements include real-time monitoring of abnormal behavior, providing real-time alert notifications, data analysis and optimization, and the like.

2. Application case of camera polling calling method in machine vision alarm

In the application of traffic banks, the camera polling call method is widely applied to the following aspects:

1. abnormal behavior detection:

by installing cameras at the branch office and ATM, the system can monitor the behavior of personnel in real time. The camera polling call method can identify potential abnormal behaviors such as illegal entry, potential theft, abnormal operation of the ATM and the like. Once the system detects abnormal behavior, an alarm notification is triggered immediately, and the security team can take measures in time to process.

2. Real-time alert notification:

the camera polling calling method can detect abnormal behaviors in real time and trigger an alarm. Upon finding an abnormal behavior, the system will automatically send a real-time alert notification to the security team. After the security team receives the alarm, the security team can immediately take corresponding actions, such as sending security personnel to the site to verify the condition, so as to ensure the security of the branch office and the ATM.

3. Data analysis and optimization:

the camera polling calling method not only can monitor and alarm in real time, but also can collect a large amount of data for analysis and optimization. Through analysis of the data, the traffic bank can discover potential modes and trends of security risks, so that corresponding precautions can be taken. In addition, the data analysis can be used for optimizing the layout and configuration of the cameras so as to improve the monitoring effect and the safety.

3. Case analysis and result display

Through the application of the camera polling and calling method, the following remarkable results are achieved by the transportation bank:

1. timely detection of abnormal behavior:

through the camera polling calling method, the traffic bank can timely discover and identify various abnormal behaviors including illegal entry, theft and the like. This helps to prevent potential security threats and to secure customer funds and banking facilities.

2. Real-time alarms and responses:

once the system detects abnormal behavior, the camera polling call method can trigger alarm notification immediately. The security team can respond quickly and take appropriate action, such as sending security personnel to the field to verify the condition, and coordinating the cooperation with the relevant departments.

3. Data analysis and optimization:

and the traffic bank uses the camera to poll the data collected by the calling method for further analysis. Through the mode and trend analysis of the abnormal behavior, the bank can identify the key factors of the safety risk and formulate corresponding improvement measures. In addition, the layout and configuration of the cameras based on data analysis are optimized, so that the monitoring effect and coverage area can be improved, and the safety is further enhanced.

In general, the camera polling method has remarkable effect in the application of traffic banks. By monitoring abnormal behaviors, real-time alarming and response mechanisms and analyzing and optimizing data in real time, the bank successfully improves the safety of branch institutions and ATM machines, and manages and prevents the safety risks more effectively. This provides an important guarantee for traffic banking operations and customer trust, as well as a beneficial reference and reference for other banks and financial institutions.

4. Advantages of camera polling call

(1) Advantages of camera polling call over traditional call methods

1. Performance improvement:

the camera polling call method exhibits performance improvements in several respects. The method can reduce average delay, improve frame rate and reduce CPU occupancy rate, thereby realizing smoother image processing and real-time application.

2. Hardware resource optimization utilization:

the polling call method can fully utilize hardware resources. By processing the data of the cameras in parallel, the utilization rate of hardware equipment can be improved, and therefore more efficient image processing is achieved.

3. And (3) optimizing memory occupation:

compared with the traditional calling mode, the polling calling method can optimize memory occupation. Through reasonable data processing and resource management strategies, the memory consumption is reduced, and the overall stability and reliability of the system are improved.

4. Concurrent support:

the poll call method supports processing data streams for multiple cameras. The method can process the images of a plurality of cameras simultaneously, provides higher concurrency, and is suitable for application scenes needing to process the data of a plurality of cameras simultaneously.

5. And (3) expansibility:

the polling call method has higher expansibility. It can easily accommodate different numbers and types of cameras and support dynamic addition or removal of camera devices, thereby providing greater flexibility and scalability.

6. Multipath camera support:

compared with the traditional calling mode, the polling calling method can better support multiple paths of cameras. The method can process the data streams of a plurality of cameras and provide efficient data management and processing strategies to ensure that each camera can be properly processed.

7. Definition support:

the polling call method can better support high-definition image processing. The method can process the image data with higher resolution and larger size, and meets the application scene with higher requirements on the definition of the image.

(2) Compared with the traditional calling mode, the method has the advantage that the performance of the hardware equipment is improved by the polling calling of the camera

The following is a table for improving the performance of the traditional camera calling method and the polling calling method according to the same hardware equipment;

Table 2 results of performance improvement of the conventional camera invoking method and the polling invoking method according to the same form of hardware equipment

In the above table 2, performance indexes of the conventional camera invoking method and the polling invoking method on the same hardware device are compared;

1. meaning of each column

The method comprises the following steps: the calling method used;

average delay: reading image data from the camera and processing the image data with an average time delay;

frame rate: number of frames processed per second;

CPU occupancy rate: average occupancy rate of CPU in the process;

memory occupancy (GB): memory resources required in the processing process are occupied;

display card occupation (%): average occupancy rate of display card in the processing course;

video memory occupation (GB): the occupation amount of the video memory in the processing process;

hard disk occupancy (MB/s): the read-write speed of the hard disk is required in the processing process;

maximum number of supported cameras: the number of cameras supported simultaneously;

definition support: the highest image definition supported;

2. performance comparison

From the table we can observe that:

1. memory occupation: compared with the traditional camera calling method, the polling calling method has lower memory occupation, and reduces the memory occupation from 32GB to 8GB;

2. average delay: compared with the traditional camera calling method, the polling calling method has larger performance improvement, and the average delay is reduced from 50 milliseconds to 20 milliseconds, namely the response time of image processing is faster;

3. Frame rate: the frame rate of the polling call method is higher than that of the traditional camera call method, and the polling call method is improved from 30 frames/second to 60 frames/second, namely the number of images processed per second is increased;

cpu occupancy: under the same performance, the polling calling method has relatively low occupancy rate of the CPU, and reduces from 80% to 40%, namely CPU resources can be utilized more efficiently;

5. display card occupation: under the same performance, the polling calling method has relatively low occupancy rate of the display card, and reduces from 60% to 30%;

6. and (3) occupation of a video memory: compared with the traditional camera calling method, the polling calling method has the advantages that the occupied amount of the video memory is lower, and the occupied amount is reduced from 16GB to 8GB;

7. the hard disk occupies: the polling calling method has higher requirement on the read-write speed of the hard disk, and increases from 100MB/s to 200MB/s;

8. maximum number of supported cameras: compared with the traditional camera calling method, the polling calling method has the advantages that the number of supported cameras is increased from 40 paths to 400 paths;

9. definition support: compared with the traditional camera invoking method, the polling invoking method can support higher image definition and is improved from 2K to 4K;

(3) Performance comparison summary

To sum up, according to the data in the table:

1. compared with the traditional camera call method, the polling call method has obvious performance improvement in the aspects of average delay, frame rate, CPU occupancy rate and the like. These performance enhancements can have a positive impact on applications such as real-time image processing;

2. Through reasonable experimental design and method, the polling call method has obvious performance improvement compared with the traditional camera call method in aspects of CPU occupation, memory occupation, video memory occupation, hard disk occupation, maximum support of camera road number, definition support and the like. The polling call method not only can improve the performance of hardware equipment, but also can surprisingly find that the polling call method can improve the performance of the traditional camera in a call mode by 2 times and improve the performance of the traditional camera in the aspect of memory by 4 times in the whole experimental process;

3. this indicates that the camera polling call method can provide significantly better performance and higher image processing capability under higher hardware configuration. It can effectively cope with high-strength loads and improve performance by fully utilizing hardware resources. Therefore, under the scene that a large amount of image data needs to be processed and the high definition requirement is met, more excellent performance can be realized by adopting the polling call method;

in general, the polling call method shows obvious advantages in aspects of hardware equipment performance improvement, CPU occupation, memory occupation, video card occupation, hard disk occupation, maximum support of camera road number, definition support and the like. Under a higher hardware configuration, it can provide more excellent performance and more efficient image processing capability;

5. Discussion and hope

(1) Interpretation and interpretation of experimental results

From the analysis and discussion of experimental results, we can draw the following conclusions and explanations:

1. in the application of the camera polling method, four different deep learning models are compared, including R-CNN, fast R-CNN and SSD. Experimental results show that the models achieve good performance in terms of target detection and identification. They can accurately detect abnormal behavior and generate corresponding alarm notification;

2. in evaluating the metrics, we use the metrics of accuracy, recall, accuracy, and F1 score to evaluate the performance of the model. The experimental result shows that different models show slight differences in different indexes, but can meet the safety requirements of traffic banks as a whole;

3. experimental results also show that the performance of the model is affected by the quality and scale of the data set. The accuracy and the robustness of the model can be improved by a large-scale data set and image data with high quality;

(2) Limitations and direction of improvement of the method

Although the camera polling call method achieves good results in machine vision alarms, there are some limitations and improved directions:

1. Diversity of data sets:

the data set used in the current experiment mainly covers the scenes of traffic banking branch institutions and ATM machines, but the application in other banking facilities or different environments has not been widely explored; future studies may consider expanding the diversity of data sets to more fully evaluate the performance and applicability of the model;

2. real-time performance and efficiency:

the camera polling call method needs to process a large amount of video data in real time, which provides challenges for computing resources and algorithm efficiency; future research may explore more efficient model designs and optimization algorithms to achieve a balance of real-time and efficiency;

(3) Potential direction of future research

Based on the above discussion and experimental results, future studies may be developed in the following directions:

1. integration of non-visual sensors:

besides the camera data, other sensor data such as sound, temperature, pressure and the like can be considered to be integrated, so that the comprehensiveness and accuracy of the safety monitoring system are further improved;

2. reinforcement learning and adaptive algorithm:

the reinforcement learning and self-adaptive algorithm is introduced, so that the system can learn and optimize from real-time data, and the capability of detecting and early warning abnormal behaviors is improved; the method can enable the system to have higher intelligence and self-adaption, and can adapt to the changes of different scenes and environments;

3. Fusion of multimodal data:

combining images, videos, texts and other sensor data, and carrying out fusion and analysis on the multi-mode data so as to provide more comprehensive safety monitoring and early warning capability; for example, suspicious transactions and fraudulent activity may be more accurately identified in combination with video images and transaction text data;

4. privacy protection and data security:

privacy protection and data security are important considerations in the camera polling call method. Future research may explore privacy protection techniques and data encryption methods to protect personal privacy and security of sensitive information;

in short, the machine vision alarm based on the camera polling method has important significance in the application of traffic banks. Future researches can be developed in the aspects of diversity, instantaneity and efficiency of data sets, integration of non-visual sensors, reinforcement learning and self-adaptive algorithm, multi-mode data fusion, privacy protection, data security and the like, so that the performance and reliability of the system are further improved, and the continuously-changing safety monitoring requirements are met;

6. conclusion(s)

(1) Summary

The method is based on the machine vision alarm of the camera polling method, and the machine vision alarm is deeply discussed and analyzed in the application of traffic banks. Through selection, experimental setting and algorithm design of a data set, a camera polling calling method based on deep learning is successfully realized, and the application of the camera polling calling method in machine vision alarm is evaluated and analyzed;

(2) Contribution and application value

The main contributions of this method are the following:

1. the machine vision alarm framework based on the camera polling method is provided, and the automatic target detection and alarm functions are realized by combining a deep learning algorithm, so that the safety monitoring efficiency and accuracy are improved;

2. the feasibility and the effectiveness of the method are verified in the practical application cases of traffic banks, and the application potential of the method in the field of safety monitoring is proved;

3. experimental data and result analysis are provided, and valuable references and guidance are provided for related research and practical application;

the application value of the study is mainly expressed in the following aspects:

1. efficiency and accuracy of safety monitoring are improved: the machine vision alarm system based on the camera polling method can monitor and identify safety events in real time, provide timely early warning and response, and help traffic banks to timely treat safety problems;

2. reducing labor cost and workload: the automatic alarm system reduces the workload of security personnel, improves the working efficiency and reduces the labor cost;

3. providing reference for other industries and fields: the application of the camera polling method in the traffic bank can provide beneficial experience and hint for the safety monitoring of other industries and fields; the method mainly focuses on the application of a camera polling method in machine vision alarm, but an actual safety monitoring system can also relate to the acquisition and analysis of other sensor data, such as sound, temperature, vibration and the like. Future research can explore how to integrate and fuse various sensor data, and improve the comprehensiveness and accuracy of safety monitoring.

In conclusion, the method obtains a certain result and application value by carrying out deep research and analysis on the application of the camera polling method in machine vision alarm.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the invention.

Claims

1. The camera polling calling alarming method based on deep learning in machine vision is characterized by comprising the following steps:

s1: initializing nodes and installing an algorithm;

2. The camera polling call method based on deep learning according to claim 1, wherein in the step S2, the camera is selected and configured according to the resolution and image quality of the camera, the field of view range, and the network connection mode, and the configuration of the camera includes setting parameters of the camera.

3. The method for invoking the alert in machine vision by camera polling based on deep learning according to claim 1, wherein the step S3 is to acquire real-time video data from the camera and perform preprocessing, and comprises the following specific steps:

4. The camera polling based on deep learning method according to claim 3, wherein the preprocessing of the image data in step S33 comprises the following specific steps:

s332: clipping the image data;

5. The camera polling call method based on deep learning in machine vision according to claim 3, wherein the training step of the object detection and recognition model in step S4 is:

s42, training a model: constructing and training a target detection and recognition model by using image data and quality annotation information of a training set and a deep learning framework; then adopting an optimization algorithm and a loss function, and setting super parameters;

6. The camera polling call in machine vision method according to claim 5, wherein the deep learning framework construction and training object detection and recognition model in step S42 includes four models of a regional convolution neural network, a fast regional convolution neural network, a faster regional convolution neural network, and a single-shot multi-frame detector.

7. The camera polling call method based on deep learning in machine vision according to claim 5, wherein the step S4 extracts key target information from the image data using a target detection and recognition algorithm; wherein the deep learning framework used in the step S42 is tensor flow or torch or scientific research; the image classification algorithm used is an Alexander network or a VGG network or a residual network or an acceptance network.

8. The method according to claim 5, wherein the means for triggering the alert in the machine vision includes detecting the number of targets exceeding a predetermined threshold, triggering the alert when the targets are detected within a predetermined area, and detecting the target behavior or action.

9. The camera polling based on deep learning method according to claim 5, wherein the means for alerting notification in step S6 comprises: sound or flashing light, short message or telephone notice, and real-time monitoring interface.

10. The camera polling call in machine vision method according to claim 5, wherein the evaluation index in step S44 includes an accuracy, a recall rate, a preparation rate, and a ratio of the accuracy and the recall rate of the evaluation target detection and recognition model.