CN116343330A - Abnormal behavior identification method for infrared-visible light image fusion - Google Patents

Abnormal behavior identification method for infrared-visible light image fusion Download PDF

Info

Publication number
CN116343330A
CN116343330A CN202310211094.6A CN202310211094A CN116343330A CN 116343330 A CN116343330 A CN 116343330A CN 202310211094 A CN202310211094 A CN 202310211094A CN 116343330 A CN116343330 A CN 116343330A
Authority
CN
China
Prior art keywords
image
visible light
infrared
data
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310211094.6A
Other languages
Chinese (zh)
Inventor
常荣
唐立军
党军朋
张毅
韩兆武
杨扬
易亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yuxi Power Supply Bureau of Yunnan Power Grid Co Ltd
Original Assignee
Yuxi Power Supply Bureau of Yunnan Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yuxi Power Supply Bureau of Yunnan Power Grid Co Ltd filed Critical Yuxi Power Supply Bureau of Yunnan Power Grid Co Ltd
Priority to CN202310211094.6A priority Critical patent/CN116343330A/en
Publication of CN116343330A publication Critical patent/CN116343330A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/803Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of image recognition processing, in particular to an abnormal behavior recognition method for infrared-visible light image fusion. Firstly, carrying out image enhancement processing on infrared-visible light images, carrying out target classification and labeling on the fused images, constructing a target detection model as a training sample, and inputting feature vectors corresponding to fused image information into the target detection model to obtain a recognition result; inputting the infrared and visible light fusion video stream into a 3D neural network, and performing feature calculation on the time dimension and the space dimension of video data; and performing feature extraction on the related data of the human body joint point by using the 3D convolutional neural network, and detecting abnormal behaviors according to the gesture information obtained by extracting the human body skeleton and the target position information obtained by visual angle transformation. The design of the invention improves the training speed and reduces the training time; the method has strong adaptability to data, and particularly can obtain better effect under the condition of less calibration data.

Description

Abnormal behavior identification method for infrared-visible light image fusion
Technical Field
The invention relates to the technical field of image recognition processing, in particular to an abnormal behavior recognition method for infrared-visible light image fusion.
Background
The infrared-visible light system is used for realizing all-day and all-weather monitoring by using two technologies of visible light and infrared. The monitoring transmission is realized through various transmission means such as a network, wireless transmission or an optical cable, so that an upper department can intuitively and real-timely control the site situation and can operate a front camera outside a thousand miles for key observation. The system can also be used in the fields of military, public security, fire protection, forest fire prevention in oil fields, traffic management, power grid industry and other important places needing all-day and all-weather monitoring. However, in the existing system, when the system faces severe environments such as fog, insufficient illumination and severe weather, the monitored video image is severely disturbed and influenced, so that the final imaging quality is reduced, the target recognition rate is reduced, and even the monitoring system cannot work, thereby influencing the working stability. Therefore, research on multi-feature infrared-visible light multi-source image enhancement technology provides better monitoring video effect for remote monitoring personnel, and is an important subject in the industry at present.
The main purpose of image enhancement is to solve the problems of complex background and low illumination by using a convolutional neural network, and mainly extracts image feature points to repeatedly perform feature enhancement through convolution, so that the required target difference features are maximized, the recognition accuracy is improved, and a better monitoring video effect is provided for remote monitoring personnel.
The human behavior recognition and deep learning theory is a research hotspot in the field of intelligent video analysis, has received wide attention in academic and engineering circles in recent years, and is a theoretical foundation in the fields of intelligent video analysis and understanding, video monitoring, human-computer interaction and the like. In recent years, deep learning algorithms, which have been widely focused, have been successfully used in various fields such as speech recognition and pattern recognition. Deep learning theory has achieved remarkable achievements in still image feature extraction and has gradually been generalized to video behavior recognition studies with time series. How to further improve the accuracy of human behavior recognition in video images in low-light environments is a technical problem to be solved by the invention.
Therefore, the traditional behavior recognition method and the human behavior recognition method based on deep learning are analyzed and summarized. The traditional method has high requirements on the environment or shooting conditions of the video, and the characteristic extraction method is designed manually and priori. The behavior recognition method based on deep learning does not need to manually design a feature extraction method like the traditional method, and training and learning can be performed on video data to obtain the most effective characterization method. In view of this, we propose an abnormal behavior recognition method for infrared-visible image fusion.
Disclosure of Invention
The invention aims to provide an abnormal behavior identification method for infrared-visible light image fusion, which is used for solving the problems in the background technology.
In order to solve the above technical problems, one of the purposes of the present invention is to provide an abnormal behavior identification method for infrared-visible light image fusion, which comprises the following steps:
s1, carrying out image enhancement processing on infrared-visible light images;
s2, inputting the visible light and the infrared light after the image enhancement into a generator of a Fusion-GAN network, wherein the generator of the Fusion-GAN can capture data distribution, a discriminator can estimate the probability that a sample comes from training data instead of the generator, then the generator and the discriminator are opposed, the discriminator of the Fusion-GAN takes a Fusion image and a visible light image as input to distinguish the Fusion image and the visible light image, the convolution of the generator and the discriminator is changed into a depth separable convolution, the Fusion image and the visible light image are processed by adopting a mobilet-v 3 architecture, the calculated amount is reduced, and the Fusion image is output; inputting the output fusion image into a discriminator to independently adjust the fusion image information to obtain a result; in the process of countermeasure learning of the generator and the discriminator, the fusion image is continuously optimized, and after the loss function reaches balance, the image with the best effect is reserved;
s3, classifying and labeling targets of the fused images, carrying out normalization processing according to category coordinate information, inputting the images and the fused images into a yolov5 network, carrying out HLV color transformation on the fused images, and splicing the images by adopting Mosaic data enhancement to serve as training samples; the improved feature pyramid model is named as AF-FPN, and the self-Adaptive Attention Module (AAM) and the Feature Enhancement Module (FEM) are utilized to reduce information loss and enhance the feature pyramid with representation capability in the process of generating a feature map, so that the detection performance of a YOLOv5 network on a multi-scale target is improved on the premise of ensuring real-time detection, a target detection model is constructed, and feature vectors corresponding to fused image information are input into the target detection model to obtain a recognition result;
s4, after the Fusion of the infrared and visible light images of the improved Fusion-GAN network is completed, inputting an infrared and visible light Fusion video stream into a 3D neural network, and performing feature calculation on the time dimension and the space dimension of video data;
s5, dividing the input video into two independent data streams: a low resolution data stream and an original resolution data stream; the two data streams alternately comprise a convolution layer, a regular layer and an extraction layer, and the two data streams are finally combined into two full-connection layers for subsequent feature recognition;
and S6, performing feature extraction on the related data of the human body joint point by using a 3D convolutional neural network, and detecting abnormal behaviors according to the posture information obtained by extracting the human body skeleton and the target position information obtained by visual angle transformation.
As a further improvement of the present technical solution, before the S1 infrared-visible light images are all subjected to image enhancement processing, the method further includes a step of creating a data set:
continuously acquiring the acquired image data through a camera, transmitting the acquired image data to a processing end, extracting frame by frame, and acquiring image information; the input image is subjected to simple translation, scaling, color change, clipping and Gaussian blur, the category of the image is not influenced, and the problems of insufficient samples and sample quality can be well solved;
the data is trained by adopting an improved yolov5 method, the training times of 500 rounds are respectively set, and when the training times reach 450 rounds, model loss tends to be stable, so that the training times are set to be 500 rounds; setting the initial learning rate to 0.001, 0.0001, 0.0005, 0.00001, and setting the initial learning rate to 0.001 because the model converges faster when the learning rate is 0.0001; the momentum of training was chosen to be 0.9, and the training batch batch_size was set to 2;
in order to promote the model convergence to be quicker and more accurate, a cross entropy function is adopted as the loss function; every 200 iterations, taking a snapshot of the current state, performing multiple training after modifying the value of the Batch-Size for multiple times, and achieving the optimal final convergence accuracy when the Batch-Size is set to be 50.
As a further improvement of the present technical solution, the image enhancement processing in S1 further includes:
after the image feature points are extracted, feature enhancement is repeatedly carried out through an algorithm and training of a deep convolutional neural network, so that the required target difference features are maximized, the recognition accuracy is improved, and a Norm normalization layer is added after a convolutional layer to improve the distinction between a main body and other parts;
wherein the deep convolutional layer neural network comprises 5 convolutional layers (conv), 3 pooling layers (pool), 2 LRN layers (norm), 2 random sampling layers (drop), 3 full connectivity layers (fc), and 1 softmax classification regression layer; the convolution layer (conv) and the pooling layer (pool) alternate, the pooling layer (pool) being max-pooling.
As a further improvement of the present technical solution, the convolution layer parameters are respectively: the blob types of cony1, conv2, conv3, conv4, conv5 are respectively [1, 96, 55, 55], [1, 256, 27, 27], [1, 384, 13, 13] and [1, 256, 13, 13], the steps are respectively 4, 2, 1:
the pool layer parameters were: pool1: [1, 96, 27, 27], pool2: [1, 256, 13, 13], pool5: [1, 256,6,6];
the calculation formula of the convolution is:
Figure BDA0004112715640000041
in the formula (1), M j For the input of the set of feature maps,
Figure BDA0004112715640000042
for the j-th output of the current layer 1, -, a>
Figure BDA0004112715640000043
For the convolution kernel, input feature map +.>
Figure BDA0004112715640000044
Convolving (i.e. let)>
Figure BDA0004112715640000045
For bias, reLU represents an activation function;
the calculation formula of the output dimension of the convolution layer is as follows:
N 2 =(N 1 -F 1 +2P)/S+1 (2);
in the formula (2), the size of the input picture is N 1 ×N 1 The convolution kernel has a size F 1 ×F 1 The step length is S, and P represents the pixel number of padding, namely the expansion width; the output picture size is N 2 ×N 2
The output dimension calculation formula of pool pooling layer is as follows:
N 3 =(N 1 -F 2 )/S+1 (3);
in formula (3), the core size of the pool layer is F 2
As a further improvement of the present technical solution, in S2, the loss function set by the generator is:
Figure BDA0004112715640000051
in the formula (4), H and W represent the height and width of the inputted image, respectively,
Figure BDA0004112715640000052
representing matrix norms>
Figure BDA0004112715640000053
Representing the gradient operator, ζ is a positive parameter controlling the trade-off between the two terms;
the loss function set by the discriminator is as follows:
Figure BDA0004112715640000054
in the formula (5), a and b respectively represent the fused image I v And visible light image I f Tag D of (2) θD (I v ) And D θD (I f ) Is the classification result of the two images.
As a further improvement of the present technical solution, the S3 further includes:
the labeling category comprises a safety helmet, a non-wearing safety helmet, a reflective garment and a non-wearing reflective garment;
performing HLV color transformation on the fused images, and splicing the images by adopting Mosaic data enhancement to serve as training samples; setting the learning rate to be 0.001, setting the batch size to be 16, and optimizing the loss function by adopting a gradient descent method; the model is evaluated by adopting accuracy, recall and F1 score, and the model is calculated according to the category calibrated by the model and the category detected by the algorithm, and is divided into: true example TP (TruePositive), false positive example FP (FalsePositive), true negative example TN (TrueNegative), false negative example FN (FalseNegative);
the accuracy, recall and F1-score formulas are as follows:
Figure BDA0004112715640000061
Figure BDA0004112715640000062
Figure BDA0004112715640000063
in the formula (8), P and R are the accuracy Presicon and the Recall ratio Recall calculated in the formulas (6) and (7) respectively;
and testing the trained model, and inputting the feature vector corresponding to the fused image information into the target detection model to obtain a final recognition result.
As a further improvement of the present technical solution, in S4, feature calculation is performed in a time dimension and a space dimension of the video data;
the first layer of the convolutional neural network is a hard-coded convolutional kernel, and comprises gray data, gradients in the z and y directions, optical flows in the z and y directions, 3 convolutional layers, 2 downsampling layers and 1 fully-connected layer;
and 3DCNN is used in the video block with the fixed length, and a multi-resolution convolutional neural network is used for extracting video features.
As a further improvement of the present technical solution, the S4 further includes: unsupervised behavior recognition using an automatic encoder, learning a function h using an AutoEncoder W,b (z) so that h W,b Approximately equal to z, an equivalence function is obtained, so that the output of the model is almost equal to the input;
expanding independent subspace analysis to three-dimensional video data, and modeling a video block by using an unsupervised learning algorithm; firstly, an ISA algorithm is used on a small input block, then a learned network and an input image of a larger block are convolved, and responses obtained in the convolution process are combined together to serve as input of a next layer; the resulting description method is applied to video data.
As a further improvement of the present technical solution, the S5 further includes: the static frame data stream uses single frame data, the dynamic data stream between frames uses optical flow data, and each data stream uses a deep convolutional neural network for feature extraction.
As a further improvement of the present technical solution, the S6 further includes:
performing posture estimation on a human body in the fused video by using a 3DCNN network structure to obtain skeleton points of the human body; outputting a plurality of key skeleton points of a human body in real time through a 3DCNN network structure; the coordinates of the bone points of the plurality of parts in the image are respectively recorded as (x i, ,y i ) Subscript i denotes an articulation point of the i-th part;
using D body To represent the length of the torso of a human body, where x 1 ,x 8 ,x 11 ,y 1 ,y 8 ,y 11 Respectively representing the coordinates of the neck, the left waist and the right waist skeleton points; and inputting the feature points obtained by the fusion image through the 3DCNN into an SVM network for classification, classifying the feature points into unsafe behaviors of falling, climbing or pushing, and finally obtaining a final recognition result.
The third object of the present invention is to provide an abnormal behavior recognition platform device, which includes a processor, a memory, and a computer program stored in the memory and running on the processor, wherein the processor is used for implementing the steps of the abnormal behavior recognition method of the infrared-visible light image fusion when executing the computer program.
A fourth object of the present invention is to provide a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above-described abnormal behavior recognition method of infrared-visible light image fusion.
Compared with the prior art, the invention has the beneficial effects that:
1. in the abnormal behavior recognition method for infrared-visible light image fusion, for image enhancement, fine adjustment is performed on a cafene network model, a preprocessing link of a traditional image recognition algorithm is removed, parameters are greatly reduced through sparse connection and weight sharing of a convolution layer, training speed is improved, and training time is shortened; the algorithm has stronger expansibility, and the accuracy of image identification can be improved by increasing the types and the number of pictures of the samples and further optimizing the fine-tuning network structure model, so that the accuracy requirement of image content identification and extraction under low illumination is met;
2. in the abnormal behavior recognition method based on infrared-visible light image fusion, the deep network can learn the features from the data without supervision, and when training samples are enough, the features learned through the deep network often have certain semantic features and are more suitable for recognition of targets and behaviors; the method has strong adaptability to data, and particularly can obtain better effect under the condition of less calibration data.
Drawings
FIG. 1 is an exemplary overall process flow diagram of the present invention;
FIG. 2 is a diagram of an exemplary ReLU function in the invention;
FIG. 3 is a schematic diagram of an exemplary deep convolutional neural network of the present invention;
FIG. 4 is a graph of exemplary test results in accordance with the present invention;
FIG. 5 is a diagram of an exemplary 3DCNN architecture in the present invention;
FIG. 6 is a block diagram of an exemplary multi-resolution convolutional neural network of the present invention;
FIG. 7 is a block diagram of an exemplary back propagation algorithm in accordance with the present invention;
FIG. 8 is a diagram of an exemplary ISA-3D architecture in accordance with the present invention;
FIG. 9 is a schematic diagram of an exemplary acquisition of skeletal points of a human body in accordance with the present invention;
FIG. 10 is a table of results on a KTH, UCFSports, hollyword database using an exemplary automatic encoder of the present invention;
fig. 11 is a block diagram of an exemplary electronic computer platform according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1 to 10, the present embodiment provides an abnormal behavior recognition method for infrared-visible light image fusion, which includes the following steps:
s1, carrying out image enhancement processing on infrared-visible light images;
s2, inputting the visible light and the infrared light after the image enhancement into a generator of a Fusion-GAN network, wherein the generator of the Fusion-GAN can capture data distribution, a discriminator can estimate the probability that a sample comes from training data instead of the generator, then the generator and the discriminator are opposed, the discriminator of the Fusion-GAN takes a Fusion image and a visible light image as input to distinguish the Fusion image and the visible light image, the convolution of the generator and the discriminator is changed into a depth separable convolution, the depth separable convolution is processed by adopting a mobilet-v 3 architecture, the calculated amount is reduced, and the Fusion image is output; inputting the output fusion image into a discriminator to independently adjust the fusion image information to obtain a result; in the process of countermeasure learning of the generator and the discriminator, the fusion image is continuously optimized, and after the loss function reaches balance, the image with the best effect is reserved;
s3, classifying and labeling the target of the fused image, carrying out normalization processing according to category coordinate information, inputting the target and the fused image into a YOLOv5 network, carrying out HLV color conversion on the fused image, splicing the image by adopting Mosaic data enhancement as a training sample, and providing an improved feature pyramid model named AF-FPN, wherein the feature pyramid model is characterized in that the self-Adaptive Attention Module (AAM) and the Feature Enhancement Module (FEM) are utilized to reduce information loss and enhance the representation capability in the generation process of the feature image, the detection performance of the YOLOv5 network on a multi-scale target is improved on the premise of ensuring real-time detection, a target detection model is constructed, and the feature vector corresponding to the fused image information is input into the target detection model to obtain a recognition result;
s4, after the Fusion of the infrared and visible light images of the improved Fusion-GAN network is completed, inputting an infrared and visible light Fusion video stream into a 3D neural network, and performing feature calculation on the time dimension and the space dimension of video data;
s5, dividing the input video into two independent data streams: a low resolution data stream and an original resolution data stream; the two data streams alternately comprise a convolution layer, a regular layer and an extraction layer, and the two data streams are finally combined into two full-connection layers for subsequent feature recognition;
and S6, performing feature extraction on the related data of the human body joint point by using a 3D convolutional neural network, and detecting abnormal behaviors according to the posture information obtained by extracting the human body skeleton and the target position information obtained by visual angle transformation.
In this embodiment, the multi-feature infrared-visible multi-source image enhancement process is as follows:
after extracting the image feature points, repeatedly performing feature enhancement through an algorithm and training of a deep convolutional neural network, and acquiring 10 types of 993 pictures by using a Python crawler technology before model training in the embodiment, wherein the 10 types of pictures are divided into a test set of 200 pictures and a training set of 793 pictures. The convolutional neural network can directly input images without complex preprocessing operation, because of the limitation of hardware conditions, the embodiment only performs resolution unification on the images, transforms the images into 256 multiplied by 256, randomly extracts 20 images from 10 images into a test set, places the rest images into a training set, uses a mean value calculation file provided by caffe to subtract the mean value from the images for training, and can reduce the similarity between image data through calculation, thereby greatly improving the training precision and speed.
The present invention is for image enhancement: in the convolutional neural layer, the size of the convolutional kernel affects the abstract effect of the image features. Generally, the larger the convolution kernel, the better the effect, but the smaller the multiple convolution kernels of the training parameters, the finer the effect of the fewer training parameters, which requires more layers to achieve the same effect. In the structure of this embodiment, the first convolution layer uses the convolution kernel of 11X11, and the convolution kernel is larger, and although a better abstract effect can be achieved, the processing is rough, so that the Norm normalization layer is added after Conv, and the distinction between the main body and other parts is improved.
Typically, the convolution layer and the ReLU layer occur in pairs. The expression of the canonical ReLU activation function is: y= {0, max (x) }, when x >0 is input, the output is x itself; if the input is less than or equal to 0, then 0 is output. In convolutional neural networks, it is customary to replace the previous activation functions such as tanh, sigmoid, etc. with a ReLU excitation function, as shown in fig. 2, the derivative of the ReLU function is constant at x >0, while the tanh and sigmoid functions are not, so the ReLU function avoids that the derivative becomes smaller as the tanh and sigmoid functions approach the target at both ends, resulting in a slow convergence due to BP back propagation error when training the neural network. The ReLU has the advantages of fast convergence and simple gradient solving, has sparsity after training, can reduce data redundancy and enhances the expression capability of special region characteristics.
The pooling layer is also called a spatial downsampling layer, and in the convolutional neural network, the pooling layer generally obtains new features after integrating feature points in a small neighborhood by using pooling after image convolution after the convolutional layer. Typically, convolution and pooling exist in the form of Conv-Pool, reducing the redundancy of information caused after convolution. The pool layer is also called a downsampling layer, so that the purpose of reducing the dimension can be achieved, the dimension of the feature vector output by the previous convolution layer is reduced, and the overfitting can be reduced.
The embodiment adopts max-pulling to reduce the noise of the image, and reduces the overfitting phenomenon that the convolution output result of the image is too sensitive to the input error.
The max-pulling algorithm adopted in this embodiment can ensure that the position and rotation of the feature are unchanged for the image first, which is a good feature since the valid feature obtained after the convolution can be extracted regardless of the position at which it appears. In addition, max-pooling greatly reduces the number of parameters of the model in this embodiment, while for the norm layer following the pool layer, the number of neurons is greatly reduced.
The invention carries out fine adjustment on the caffeet network model, generates an optimal recognition network model for the data set, and summarizes some advantages of the deep convolutional neural network: the preprocessing link of the traditional image recognition algorithm is eliminated, parameters are greatly reduced through sparse connection and weight sharing of the convolution layer, training speed is improved, and training time is shortened. The embodiment has the defects that the hardware environment is poor, the samples are small, but the algorithm has strong expansibility, and the accuracy of image identification can be improved by increasing the types and the number of pictures of the samples and further optimizing and fine-tuning the network structure model of the embodiment, so that the accuracy requirement of image content identification and extraction under low illumination is met.
Further, the algorithm and training based on the deep convolutional neural network are as follows:
as shown in fig. 3, the deep convolutional layer neural network in this embodiment comprises 5 convolutional layers (conv), 3 pooling layers (pool), 2 LRN layers (norm), 2 random sampling layers (drop), 3 full connectivity layers (fc) and 1 softmax classification regression layer; the convolution layer (conv) and the pooling layer (pool) alternate, the pooling layer (pool) being max-pooling.
Wherein the convolution layer parameters are respectively: the blob types of conv1, conv2, conv3, conv4 and conv5 are respectively [1, 96, 55, 55], [1, 256, 27, 27], [1, 384, 13, 13] and [1, 256, 13, 13], and the steps are respectively 4, 2, 1 and 1;
the pool layer parameters were: pool1: [1, 96, 27, 27], pool2: [1, 256, 13, 13], pool5: [1, 256,6,6]; the calculation formula of the convolution is as follows:
Figure BDA0004112715640000121
in the formula (1), M j For the input of the set of feature maps,
Figure BDA0004112715640000122
for the j-th output of the current layer 1, -, a>
Figure BDA0004112715640000123
For the convolution kernel, input feature map +.>
Figure BDA0004112715640000124
Convolving (i.e. let)>
Figure BDA0004112715640000125
For bias, reLU represents an activation function;
the calculation formula of the output dimension of the convolution layer is as follows:
N 2 =(N 1 -F 1 +2P)/S+1 (2);
in the formula (2), the size of the input picture is N 1 ×N 1 The convolution kernel has a size F 1 ×F 1 The step length is S, and P represents the pixel number of padding, namely the expansion width; the output picture size is N 2 ×N 2
The output dimension calculation formula of pool pooling layer is as follows:
N 3 =(N 1 -F 2 )/S+1 (3);
in formula (3), the core size of the pool layer is F 2
In this embodiment, specific network parameters are set according to the own data set, fig. 4 is an iteration 1000, and each 50 iterations test a training learning network on the test set, and output loss values and accuracy. For every 200 iterations, a snapshot of the current state is taken. After the value of the Batch-Size is modified for multiple times, training is carried out for multiple times, the final convergence accuracy is optimal when the Batch-Size is set to be 50, and the average recognition rate of the model to the image is highest and reaches 92.50%.
Analysis shows that too small a Batch-Size can cause too large a concussion in recognition rate. The reason that the recognition accuracy can be improved by adjusting the value of the Batch-Size is that under the condition that the data set is smaller, the more accurate the determined descending direction is, the training oscillation can be reduced, the CPU utilization rate is improved, and the large matrix multiplication calculation efficiency is improved. Since the final convergence accuracy falls into different local extrema, when the batch_size increases to a certain value, the optimum in the final convergence accuracy is reached.
In this embodiment, the abnormal behavior recognition process of the infrared-visible light image fusion is as follows:
step 1, inputting the visible light and the infrared light after the image enhancement into a generator of a Fusion-GAN network, changing the convolution of the generator and a discriminator into a depth separable convolution, adopting a mobilet-v 3 architecture for processing, reducing the calculated amount and outputting a Fusion image; then inputting the output fusion image into a discriminator to independently adjust the fusion image information, and obtaining a result;
wherein, the loss function of the setting generator is:
Figure BDA0004112715640000131
in the formula (4), H and W represent the height and width of the inputted image, respectively,
Figure BDA0004112715640000132
representing matrix norms>
Figure BDA0004112715640000133
Representing the gradient operator, ζ is a positive parameter controlling the trade-off between the two terms;
the loss function set by the arbiter is as follows:
Figure BDA0004112715640000134
in the formula (5), a and b respectively represent the fused image I v And visible light image I f Tag D of (2) θD (I v ) And D θD (I f ) The classification result of the two images;
and in the process of countermeasure learning of the generator and the discriminator, the fusion image is continuously optimized, and after the loss function reaches balance, the image with the best effect is reserved.
Marking the fused images by labelimg marking software, wherein marking categories are safety helmet, unworn safety helmet, reflective clothing, unworn reflective clothing and the like, storing the images in an xml format, and normalizing the category coordinate information in the xml format to form the coordinate information of a txt file storage category; inputting txt and a fusion image into a YOLOv5 network, performing HLV color transformation on the fused image, splicing the images by adopting Mosaic data enhancement as a training sample, and providing an improved feature pyramid model named AF-FPN, wherein an Adaptive Attention Module (AAM) and a Feature Enhancement Module (FEM) are utilized to reduce information loss and enhance a feature pyramid representing capability in a feature image generating process, so that the detection performance of the YOLOv5 network on a multi-scale target is improved on the premise of ensuring real-time detection, and a target detection model is constructed;
the learning rate is set to be 0.001, the batch size is 16, and a gradient descent method is adopted to optimize the loss function; the model is evaluated by adopting accuracy, recall and F1 score, and is calculated according to the category calibrated by the model and the category detected by the algorithm, and the model is divided into the following 4 categories: true (TruePositive, TP), false (FalsePositive, FP), true (TrueNegative, TN), false (FalseNegative, FN); the accuracy, recall, and F1-Score formulas are as follows:
Figure BDA0004112715640000143
Figure BDA0004112715640000144
Figure BDA0004112715640000151
in the formula (8), P and R are the accuracy Presicon and the Recall ratio Recall calculated in the formulas (6) and (7), respectively:
and testing the trained model, and inputting the feature vector corresponding to the fused image information into the target detection model to obtain a final recognition result.
Step 3, after Fusion-GAN network has fused infrared and visible light images, inputting infrared and visible light fused video stream into 3D neural network, 3DCNN is that traditional CNN is expanded to 3DCNN with time information as shown in fig. 5, and performing feature calculation in time dimension and space dimension of video data;
the first layer of the convolutional neural network is a hard-coded convolutional kernel, and comprises gray data, gradients in the z and y directions, optical flows in the z and y directions, 3 convolutional layers, 2 downsampling layers and 1 fully-connected layer; using 3DCNN within a fixed-length video block, extracting video features using a multi-resolution convolutional neural network, the input video is split into two independent data streams: a low resolution data stream and an original resolution data stream; both data streams alternately comprise a convolution layer, a regularization layer and an extraction layer, and the two data streams are finally combined into two full connection layers for subsequent feature recognition, and the structure diagram is shown in fig. 6. A convolutional neural network of two data streams is also used for video behavior recognition. They separate video into a static frame data stream and an inter-frame dynamic data stream. The static frame data stream may use single frame data, the inter-frame dynamic data stream uses optical flow data, and each data stream uses a deep convolutional neural network for feature extraction. And finally, identifying the action of the obtained features by using the SVM. They propose to use only the relevant data of the joint point part of the human body posture to perform the feature extraction by the deep convolution network, finally use the statistical method to convert the whole video into a feature vector, and use the SVM to perform the training and recognition of the final classification model.
Performing feature extraction on the related data of the human body joint point by using a 3D convolutional neural network, and detecting abnormal behaviors according to gesture information obtained by extracting a human body skeleton and target position information obtained by visual angle transformation;
the design 3DCNN consists of 8 convolutional layers, 5 pooled layers and 2 fully-connected layers, including a softmax function, the input size of the network is 3 x 16 x 112, the size of the convolutional kernel is set to 3 x 3, the step length is 1 multiplied by 1, the input fusion video stream is subjected to convolution calculation, after calculation, the characteristic image is pooled, the size of a pooling kernel is 2 multiplied by 2, the step length is 2 multiplied by 2, and 4098 output is performed in total. Setting the training learning rate as 0.001, training times as 100 batches, and stopping training when the loss function is minimum to obtain the optimal model.
And estimating the posture of the human body in the fused video by using a 3DCNN network structure to obtain skeleton points of the human body. As shown in fig. 9, 18 key skeletal points of eyes, arms, knees, etc. of a human body are output in real time through a 3DCNN network structure.
The coordinates of the bone points of 18 sites in the image were recorded as (x i ,y i ) Subscript i denotes an articulation point of the i-th part; using D body To represent the length of the torso of a human body, where x 1 ,x 8 ,x 11 ,y 1 ,y 8 ,y 11 The coordinates of the neck and the left and right waist skeleton points are respectively represented. And inputting the feature points obtained by the fusion image through the 3DCNN into an SVM network for classification, classifying the feature points into unsafe behaviors such as falling, climbing, pushing and the like, and finally obtaining a final recognition result.
In addition, in this step, an automatic encoder can also be used for unsupervised behavior recognition. The automatic encoder is an unsupervised learning algorithm that uses a back-propagation algorithm to let the target value equal to the input value, as shown in fig. 7. Learning a function h using AutoEncoder W,b (z) so that h W,b Approximately z, an equivalence function is obtained such that the output of the model is almost equal to the input. Expanding independent subspace analysis onto three-dimensional video data usingAn unsupervised learning algorithm models the video blocks. This method first uses the ISA algorithm on a small input block, then convolves the learned network with the input image of a larger block, and combines the responses from the convolution process together as the input to the next layer, as shown in fig. 8. The resulting description method was applied to video data, and this method was tested on three well-known behavior recognition libraries simultaneously, as shown in table 1 of fig. 10, which is the results thereof on the KTH, UCFSports, hollyword database. It can be seen that the ISA algorithm achieves more excellent performance on the Hollywood2 dataset with a complex environment, which is nearly 10% higher than the spatiotemporal point of interest algorithm.
In addition, the invention recognizes abnormal behavior of human body: aiming at the problem of behavior identification of a target person under a low-illumination condition, infrared and visible light image fusion and behavior identification are combined, abnormal behaviors are detected according to gesture information obtained by extracting a human skeleton and target position information obtained by visual angle transformation, feature extraction is carried out on related data of a human joint point by using a 3D convolutional neural network, the abnormal behaviors are detected according to the gesture information obtained by extracting the human skeleton and the target position information obtained by visual angle transformation, a human motion feature model library of the illegal behaviors is formed, and the matching actions of the on-site construction video and the model library after the model library is established are illegal actions. Based on infrared-visible light image fusion, the following illegal behavior detection is to be realized under low illumination: climbing detection, personnel identification, area intrusion detection, safety belt detection, insulator detection, safety helmet detection and the like. The identification precision (precision) target value is more than or equal to 95 percent, recall target value (recall) is more than or equal to 90 percent, and speed (FPS) target value is 30. Since the deep network can learn features from data unsupervised, and the learning mode accords with the mechanism of human perception world, the features learned by the deep network often have certain semantic features when training samples are enough, and are more suitable for identifying targets and behaviors. The method has strong adaptability to data, and particularly can obtain better effect under the condition of less calibration data; convolutional neural networks have achieved excellent results in terms of image recognition.
As shown in fig. 11, the present embodiment also provides an abnormal behavior recognition platform apparatus, which includes a processor, a memory, and a computer program stored in the memory and running on the processor.
The processor comprises one or more than one processing core, the processor is connected with the memory through a bus, the memory is used for storing program instructions, and the processor realizes the steps of the abnormal behavior identification method for infrared-visible light image fusion when executing the program instructions in the memory.
Alternatively, the memory may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
In addition, the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the steps of the abnormal behavior identification method for infrared-visible light image fusion when being executed by a processor.
Optionally, the present invention also provides a computer program product containing instructions which, when run on a computer, cause the computer to perform the steps of the method for identifying abnormal behavior of infrared-visible light image fusion of the above aspects.
It will be appreciated by those of ordinary skill in the art that the processes for implementing all or part of the steps of the above embodiments may be implemented by hardware, or may be implemented by a program for instructing the relevant hardware, and the program may be stored in a computer readable storage medium, where the above storage medium may be a read-only memory, a magnetic disk or optical disk, etc.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the above-described embodiments, and that the above-described embodiments and descriptions are only preferred embodiments of the present invention, and are not intended to limit the invention, and that various changes and modifications may be made therein without departing from the spirit and scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (10)

1. The abnormal behavior identification method for infrared-visible light image fusion is characterized by comprising the following steps of:
s1, carrying out image enhancement processing on infrared-visible light images;
s2, inputting the visible light and the infrared light after the image enhancement into a generator of a Fusion-GAN network, changing the convolution of the generator and a discriminator into a depth separable convolution, adopting a mobilet-v 3 architecture for processing, reducing the calculated amount and outputting a Fusion image; inputting the output fusion image into a discriminator to independently adjust the fusion image information to obtain a result; in the process of countermeasure learning of the generator and the discriminator, the fusion image is continuously optimized, and after the loss function reaches balance, the image with the best effect is reserved;
s3, classifying and labeling targets of the fused images, carrying out normalization processing according to category coordinate information, inputting the images and the fused images into a yolov5 network, carrying out HLV color transformation on the fused images, and splicing the images by adopting Mosaic data enhancement to serve as training samples; the improved feature pyramid model is named as AF-FPN, and the self-adaptive attention module and the feature enhancement module are utilized to reduce information loss and enhance the feature pyramid with representation capability in the process of generating the feature map, so that the detection performance of a YOLOv5 network on a multi-scale target is improved on the premise of ensuring real-time detection, a target detection model is constructed, and feature vectors corresponding to fused image information are input into the target detection model to obtain a recognition result;
s4, after the Fusion of the infrared and visible light images of the improved Fusion-GAN network is completed, inputting an infrared and visible light Fusion video stream into a 3D neural network, and performing feature calculation on the time dimension and the space dimension of video data;
s5, dividing the input video into two independent data streams: a low resolution data stream and an original resolution data stream; the two data streams alternately comprise a convolution layer, a regular layer and an extraction layer, and the two data streams are finally combined into two full-connection layers for subsequent feature recognition;
and S6, performing feature extraction on the related data of the human body joint point by using a 3D convolutional neural network, and detecting abnormal behaviors according to the posture information obtained by extracting the human body skeleton and the target position information obtained by visual angle transformation.
2. The method for identifying abnormal behavior of infrared-visible light image fusion according to claim 1, further comprising the step of creating a data set before the S1 infrared-visible light images are each subjected to image enhancement processing:
continuously acquiring the acquired image data through a camera, transmitting the acquired image data to a processing end, extracting frame by frame, and acquiring image information; performing simple translation, scaling, color change, clipping and Gaussian blur on an input image;
training the data by adopting an improved yolov5 method, respectively setting training times of 500 rounds, and setting the initial learning rate to be 0.001; the momentum of training was chosen to be 0.9, and the training batch batch_size was set to 2;
the loss function adopts a cross entropy function to promote the model to be converged more rapidly and accurately.
3. The abnormal behavior recognition method of infrared-visible light image fusion according to claim 1, wherein the image enhancement processing in S1 further comprises:
after the image feature points are extracted, feature enhancement is repeatedly carried out through an algorithm and training of a deep convolutional neural network, so that the required target difference features are maximized, the recognition accuracy is improved, and a Norm normalization layer is added after a convolutional layer to improve the distinction between a main body and other parts;
wherein the deep convolutional layer neural network comprises 5 convolutional layers, 3 pooling layers, 2 LRN layers, 2 random sampling layers, 3 fully connected layers, and 1 softmax classification regression layer; the convolution layer and the pooling layer alternate, and the pooling layer is max-pooling.
4. The method for identifying abnormal behavior of infrared-visible light image fusion according to claim 3, wherein the convolution layer parameters are respectively: the blob types of conv1, conv2, conv3, conv4 and conv5 are respectively [1, 96, 55, 55], [1, 256, 27, 27], [1, 384, 13, 13] and [1, 256, 13, 13], and the steps are respectively 4, 2, 1 and 1;
the pool layer parameters were: pool l: [1, 96, 27, 27], pool2: [1, 256, 13, 13], pool5: [1, 256,6,6];
the calculation formula of the convolution is:
Figure FDA0004112715630000021
in the formula (1), M j For the input of the set of feature maps,
Figure FDA0004112715630000031
for the j-th output of the current layer 1, -, a>
Figure FDA0004112715630000032
For the convolution kernel, input feature map +.>
Figure FDA0004112715630000033
Convolving (i.e. let)>
Figure FDA0004112715630000034
For bias, reLU represents an activation function;
the calculation formula of the output dimension of the convolution layer is as follows:
N 2 =(N 1 -F 1 +2P)/S+1 (2);
in the formula (2), the size of the input picture is N 1 ×N 1 The convolution kernel has a size F 1 ×F 1 Step size S, P represents padThe pixel number of ding, namely expanding the width; the output picture size is N 2 ×N 2
The output dimension calculation formula of pool pooling layer is as follows:
N 3 =(N 1 -F 2 )/S+1 (3);
in formula (3), the core size of the pool layer is F 2
5. The method for identifying abnormal behavior of infrared-visible light image fusion according to claim 1, wherein in S2, the loss function set by the generator is:
Figure FDA0004112715630000035
in the formula (4), H and W represent the height and width of the inputted image, respectively,
Figure FDA0004112715630000036
representing matrix norms>
Figure FDA0004112715630000037
Representing the gradient operator, ζ is a positive parameter controlling the trade-off between the two terms;
the loss function set by the discriminator is as follows:
Figure FDA0004112715630000038
in the formula (5), a and b respectively represent the fused image I v And visible light image I f Tag D of (2) θD (I v ) And D θD (I f ) Is the classification result of the two images.
6. The abnormal behavior recognition method of infrared-visible light image fusion according to claim 1, wherein S3 further comprises:
the labeling category comprises a safety helmet, a non-wearing safety helmet, a reflective garment and a non-wearing reflective garment;
performing HLV color transformation on the fused images, and splicing the images by adopting Mosaic data enhancement to serve as training samples; setting the learning rate to be 0.001, setting the batch size to be 16, and optimizing the loss function by adopting a gradient descent method; the model is evaluated by adopting accuracy, recall and F1 score, and the model is calculated according to the category calibrated by the model and the category detected by the algorithm, and is divided into: true example TP, false positive example FP, true negative example TN, false negative example FN;
the accuracy, recall and F1-score formulas are as follows:
Figure FDA0004112715630000041
Figure FDA0004112715630000042
Figure FDA0004112715630000043
in the formula (8), P and R are the accuracy Presicon and the Recall ratio Recall calculated in the formulas (6) and (7) respectively;
and testing the trained model, and inputting the feature vector corresponding to the fused image information into the target detection model to obtain a final recognition result.
7. The method for identifying abnormal behavior of infrared-visible light image fusion according to claim 1, wherein in S4, feature computation is performed in a time dimension and a space dimension of video data;
the first layer of the convolutional neural network is a hard-coded convolutional kernel, and comprises gray data, gradients in the z and y directions, optical flows in the z and y directions, 3 convolutional layers, 2 downsampling layers and 1 fully-connected layer;
and 3DCNN is used in the video block with the fixed length, and a multi-resolution convolutional neural network is used for extracting video features.
8. The method for identifying abnormal behavior in accordance with claim 7, wherein S4 further comprises: unsupervised behavior recognition using an automatic encoder, learning a function h using an AutoEncoder W,b (iz) so that h W,b Approximately equal to z, an equivalence function is obtained, so that the output of the model is almost equal to the input;
expanding independent subspace analysis to three-dimensional video data, and modeling a video block by using an unsupervised learning algorithm; firstly, an ISA algorithm is used on a small input block, then a learned network and an input image of a larger block are convolved, and responses obtained in the convolution process are combined together to serve as input of a next layer; the resulting description method is applied to video data.
9. The method for identifying abnormal behavior of infrared-visible light image fusion according to claim 1, wherein S5 further comprises: the static frame data stream uses single frame data, the dynamic data stream between frames uses optical flow data, and each data stream uses a deep convolutional neural network for feature extraction.
10. The method for identifying abnormal behavior in combination with infrared-visible light image according to claim 1, wherein S6 further comprises:
performing posture estimation on a human body in the fused video by using a 3DCNN network structure to obtain skeleton points of the human body; outputting a plurality of key skeleton points of a human body in real time through a 3DCNN network structure; the coordinates of the bone points of the plurality of parts in the image are respectively recorded as (x i ,y i ) Subscript i denotes an articulation point of the i-th part;
using D body To represent the length of the torso of a human body, where x 1 ,x 8 ,x 11 ,y 1 ,y 8 ,y 11 Respectively representing the coordinates of the neck, the left waist and the right waist skeleton points; and inputting the feature points obtained by the fusion image through the 3DCNN into an SVM network for classification, classifying the feature points into falling, climbing or charging unsafe behaviors, and finally obtaining a final recognition result.
CN202310211094.6A 2023-03-07 2023-03-07 Abnormal behavior identification method for infrared-visible light image fusion Pending CN116343330A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310211094.6A CN116343330A (en) 2023-03-07 2023-03-07 Abnormal behavior identification method for infrared-visible light image fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310211094.6A CN116343330A (en) 2023-03-07 2023-03-07 Abnormal behavior identification method for infrared-visible light image fusion

Publications (1)

Publication Number Publication Date
CN116343330A true CN116343330A (en) 2023-06-27

Family

ID=86878303

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310211094.6A Pending CN116343330A (en) 2023-03-07 2023-03-07 Abnormal behavior identification method for infrared-visible light image fusion

Country Status (1)

Country Link
CN (1) CN116343330A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116634110A (en) * 2023-07-24 2023-08-22 清华大学 Night intelligent culture monitoring system based on semantic coding and decoding
CN116704267A (en) * 2023-08-01 2023-09-05 成都斐正能达科技有限责任公司 Deep learning 3D printing defect detection method based on improved YOLOX algorithm
CN116863286A (en) * 2023-07-24 2023-10-10 中国海洋大学 Double-flow target detection method and model building method thereof
CN116881830A (en) * 2023-07-26 2023-10-13 中国信息通信研究院 Self-adaptive detection method and system based on artificial intelligence
CN116994295A (en) * 2023-09-27 2023-11-03 华侨大学 Wild animal category identification method based on gray sample self-adaptive selection gate
CN117893413A (en) * 2024-03-15 2024-04-16 博创联动科技股份有限公司 Vehicle-mounted terminal man-machine interaction method based on image enhancement

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116634110A (en) * 2023-07-24 2023-08-22 清华大学 Night intelligent culture monitoring system based on semantic coding and decoding
CN116863286A (en) * 2023-07-24 2023-10-10 中国海洋大学 Double-flow target detection method and model building method thereof
CN116634110B (en) * 2023-07-24 2023-10-13 清华大学 Night intelligent culture monitoring system based on semantic coding and decoding
CN116863286B (en) * 2023-07-24 2024-02-02 中国海洋大学 Double-flow target detection method and model building method thereof
CN116881830A (en) * 2023-07-26 2023-10-13 中国信息通信研究院 Self-adaptive detection method and system based on artificial intelligence
CN116704267A (en) * 2023-08-01 2023-09-05 成都斐正能达科技有限责任公司 Deep learning 3D printing defect detection method based on improved YOLOX algorithm
CN116704267B (en) * 2023-08-01 2023-10-27 成都斐正能达科技有限责任公司 Deep learning 3D printing defect detection method based on improved YOLOX algorithm
CN116994295A (en) * 2023-09-27 2023-11-03 华侨大学 Wild animal category identification method based on gray sample self-adaptive selection gate
CN116994295B (en) * 2023-09-27 2024-02-02 华侨大学 Wild animal category identification method based on gray sample self-adaptive selection gate
CN117893413A (en) * 2024-03-15 2024-04-16 博创联动科技股份有限公司 Vehicle-mounted terminal man-machine interaction method based on image enhancement

Similar Documents

Publication Publication Date Title
CN116343330A (en) Abnormal behavior identification method for infrared-visible light image fusion
CN108537743B (en) Face image enhancement method based on generation countermeasure network
Tao et al. Smoke detection based on deep convolutional neural networks
CN111723654B (en) High-altitude parabolic detection method and device based on background modeling, YOLOv3 and self-optimization
Chen et al. A UAV-based forest fire detection algorithm using convolutional neural network
EP4099220A1 (en) Processing apparatus, method and storage medium
WO2018065158A1 (en) Computer device for training a deep neural network
CN112801015B (en) Multi-mode face recognition method based on attention mechanism
CN111274916A (en) Face recognition method and face recognition device
CN110222718B (en) Image processing method and device
Yu et al. Human action recognition using deep learning methods
CN110390308B (en) Video behavior identification method based on space-time confrontation generation network
CN108537181A (en) A kind of gait recognition method based on the study of big spacing depth measure
CN116363748A (en) Power grid field operation integrated management and control method based on infrared-visible light image fusion
CN109447014A (en) A kind of online behavioral value method of video based on binary channels convolutional neural networks
CN112766145B (en) Method and device for identifying dynamic facial expressions of artificial neural network
Ye Intelligent Image Processing Technology for Badminton Robot under Machine Vision of Internet of Things
CN113378638A (en) Human body joint point detection and D-GRU network-based abnormal behavior identification method for wheelers
CN110348395B (en) Skeleton behavior identification method based on space-time relationship
CN112487926A (en) Scenic spot feeding behavior identification method based on space-time diagram convolutional network
CN111291785A (en) Target detection method, device, equipment and storage medium
Hakim et al. Optimization of the Backpropagation Method with Nguyen-widrow in Face Image Classification
Jin et al. Research on Human Action Recognition Based on Global-Local Features of Video
Palanivel et al. OBJECT DETECTION AND RECOGNITION IN DARK USING YOLO
Tanwar et al. Object detection using image dehazing: A journey of visual improvement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination