CN116958876A - Video abnormal behavior detection method and system based on multispectral binocular stereoscopic vision - Google Patents

Video abnormal behavior detection method and system based on multispectral binocular stereoscopic vision Download PDF

Info

Publication number
CN116958876A
CN116958876A CN202310940861.7A CN202310940861A CN116958876A CN 116958876 A CN116958876 A CN 116958876A CN 202310940861 A CN202310940861 A CN 202310940861A CN 116958876 A CN116958876 A CN 116958876A
Authority
CN
China
Prior art keywords
target
model
video
multispectral
binocular
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310940861.7A
Other languages
Chinese (zh)
Other versions
CN116958876B (en
Inventor
陈燕
刘攀博
李祖贺
王凤琴
杨永双
王丽萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University of Light Industry
Original Assignee
Zhengzhou University of Light Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University of Light Industry filed Critical Zhengzhou University of Light Industry
Priority to CN202310940861.7A priority Critical patent/CN116958876B/en
Publication of CN116958876A publication Critical patent/CN116958876A/en
Application granted granted Critical
Publication of CN116958876B publication Critical patent/CN116958876B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • G06V10/763Non-hierarchical techniques, e.g. based on statistics of modelling distributions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a method and a system for detecting abnormal video behaviors based on multispectral binocular stereoscopic vision, wherein the method comprises the steps of carrying out visual calibration on binocular video under a multispectral platform based on a space coordinate transformation relation and an image registration model, realizing image background modeling according to a calibration result, and segmenting a foreground target; aiming at the segmented target, determining the significance characteristics of each spectrum dimension of the target in the global and local space-time domains by establishing a target apparent texture model, a motion significance model and a depth significance model; all the salient features are formed into a multi-scale multi-mode feature fusion model under different scales, and abnormal behaviors of the target are detected through the fusion model. The application improves the target abnormality detection efficiency and the detection accuracy.

Description

Video abnormal behavior detection method and system based on multispectral binocular stereoscopic vision
Technical Field
The application belongs to the technical field of machine vision, and particularly relates to a system and a method for detecting abnormal video behaviors based on multispectral binocular stereoscopic vision.
Background
Multispectral imaging has the advantages of higher spatial resolution and wide spectral range, and the spectral range covers the wave band from visible light to infrared light, so that the functions of detection and analysis can be completed by fully utilizing the spatial information and the spectral information of the multispectral imaging.
The existing abnormal target detection method based on spectrum imaging mainly comprises a Fourier transform-based method, a learning-based method and the like, high-resolution space data are required to be utilized for mining, the available data volume is very limited, other components such as background and noise are inevitably remained, the abnormal target detection is inaccurate, the spectrum information is seriously influenced by external factors when being acquired, and the spectrum data redundancy is large.
Therefore, a new method for detecting abnormal behavior of video is needed to solve the above technical problems.
Disclosure of Invention
Aiming at the defects in the prior art, the application aims to provide a video abnormal behavior detection system and method based on multispectral binocular stereoscopic vision, which can solve the problems of inaccurate abnormal target detection and larger spectral data redundancy.
In order to achieve the above purpose, the application adopts the following technical scheme:
the application provides a video abnormal behavior detection method based on multispectral binocular stereoscopic vision, which comprises the following steps:
based on a space coordinate transformation relation and an image registration model, performing visual calibration on binocular videos under a multispectral platform, and realizing image background modeling according to a calibration result to segment a foreground target;
for a segmented target, determining various spectral dimension saliency features of the target in global and local space-time domains by establishing a target apparent texture model, a motion saliency model and a depth saliency model;
all the salient features are formed into a multi-scale multi-mode feature fusion model under different scales, and abnormal behaviors of the target are detected through the fusion model;
wherein, carry out binocular vision under the multispectral platform and mark, further include:
receiving a video sequence and an initial set of parameters corresponding to each camera of the binocular camera;
generating internal parameters using position and calibration parameters corresponding to an image acquisition system coupled to the binocular camera by generating external parameters based on tracking features in the video sequence;
combining the external parameters with internal parameters to determine an internal-external parameter set for each camera and each time instance of the video sequence;
wherein the external parameters are generated by:
determining a feature correspondence between a first downsampled frame sequence and a second downsampled frame sequence of the video sequence based on tracking features within a plurality of overlapping blocks of the first downsampled frame sequence and the second downsampled frame sequence;
and generating the external parameters by using the characteristic correspondence.
Preferably, the visual calibration of binocular vision under a multispectral platform based on the spatial coordinate transformation relationship and the image registration model further comprises:
based on a space coordinate system transformation model and an image registration algorithm, calibrating internal and external parameters of the multispectral sensor by utilizing a chessboard model and a ball stick model, realizing optimization solution of the internal and external parameters between binocular cameras under a maximum likelihood estimation theory, and providing scene information for background modeling and foreground object segmentation of a later multispectral binocular video scene;
background modeling based on a Gaussian mixture model is adopted in a space-time domain of a spectrum and a depth dimension to obtain a parameterized model, and background modeling based on a nuclear density estimation model is adopted to obtain a non-parameterized model;
and combining the parameterized model with the non-parameterized model to obtain a global optimization solving result.
Preferably, said determining respective spectral dimension saliency features of said object in global and local spatio-temporal domains further comprises:
from the perspective of a scene behavior model, obtaining behavior saliency feature vector representation of an image in global and local space-time domains based on pixel level;
by establishing a spectrum saliency distribution model in a time-space domain scene, extracting a target area which is consistent and sensitive to multi-source information by utilizing motion independence, persistence and interruption of a target in a time domain spectrum dimension and color, texture and structural characteristics of the target in the space domain spectrum dimension, thereby improving accurate interpretation of the target area in the scene;
global and local scenes are described by building a target apparent texture model, a motion saliency model and a depth saliency model.
Preferably, the establishing the target apparent texture model further comprises:
establishing association of apparent states of the target between global and local scenes under a time-space domain combining spectrum-depth, establishing a target shape context perception texture model, and analyzing differences between the apparent states of the target and the apparent states of the target or other groups in a space dimension; the difference of the current apparent state and the past apparent state of the target is analyzed in a time dimension.
Preferably, the establishing a motion saliency model further includes:
establishing a motion significance model based on an optical flow field in a combined spectrum-depth space-time domain, and achieving the consistency of the internal motion of the target according to the direction and behavior characteristics of the target in the scene and the anti-interference performance of the spectrum image in the three-dimensional space;
preferably, the establishing the depth saliency model further includes:
and establishing a target depth saliency model under a combined spectrum-depth time-space domain, and obtaining the description of the target structure and deformation by calculating the depth change difference between the neighborhood frames of the target in a depth dimension space.
Preferably, the forming all the salient features into a multi-scale multi-mode feature fusion model under different scales, detecting the abnormal behavior of the target through the fusion model, further includes:
establishing a fusion model based on shape context information, optical flow field motion information and conditional probability depth information in a spectrum-depth space-time domain, and calculating differences between target motion and apparent states in a multi-scale scene range and prior scene target states;
constructing multi-scale feature vector representation from a low layer to a high layer, establishing a joint optimization method based on pixel-level features and behavior structure-level features, and separating abnormal behaviors from dominant behaviors;
and establishing a dynamic background online updating, behavior online learning and online detection mechanism, optimizing the model by utilizing a visual word bag framework, and replacing the traditional single unordered feature word by using a feature layering algorithm to realize the online sensing of the abnormal behavior event.
Preferably, the constructing is represented by multi-scale feature vectors from a lower layer to a higher layer, and a joint optimization method based on pixel-level features and behavior structure-level features is established to separate abnormal behaviors from dominant behaviors, and the method further comprises:
in the training stage, firstly extracting a local target in a training video, and then calculating a spectrum apparent characteristic set and a single-scale characteristic set of the target; respectively gathering the two types of features to obtain a spectrum apparent feature word bag and a single-scale feature word bag; based on two types of word bags, counting the occurrence times of each visual word in the training video to obtain a spectrum apparent characteristic histogram and a single-scale characteristic histogram of the training video, cascading the two histogram vectors, and distributing behavior category identifiers as clustering histograms of the training video; calculating cluster histograms of all training videos, inputting Bayesian classifier training, and determining an action classifier model;
in the test stage, calculating a target and two types of feature sets in the test video according to the process, projecting the feature sets into a word bag space by adopting a K nearest neighbor algorithm, counting the occurrence times of each visual word, obtaining a word bag frequency histogram of the test video, and inputting a trained Bayesian classifier for abnormal behavior recognition.
In another aspect, the present application provides a system for detecting abnormal video behavior based on multispectral binocular stereoscopic vision, including:
the target region extraction module is used for carrying out visual calibration on binocular videos under a multispectral platform based on a space coordinate transformation relation and an image registration model, realizing image background modeling according to a calibration result and segmenting a foreground target;
the feature model building module is used for determining the significance features of each spectrum dimension of the target in the global and local space-time domains by building a target apparent texture model, a motion significance model and a depth significance model aiming at the segmented target;
the abnormal behavior detection module is used for forming a multi-scale multi-mode feature fusion model of all the significant features under different scales, and detecting the abnormal behavior of the target through the fusion model;
wherein, carry out binocular vision under the multispectral platform and mark, further include:
receiving a video sequence and an initial set of parameters corresponding to each camera of the binocular camera;
generating internal parameters using position and calibration parameters corresponding to an image acquisition system coupled to the binocular camera by generating external parameters based on tracking features in the video sequence;
combining the external parameters with internal parameters to determine an internal-external parameter set for each camera and each time instance of the video sequence;
wherein the external parameters are generated by:
determining a feature correspondence between a first downsampled frame sequence and a second downsampled frame sequence of the video sequence based on tracking features within a plurality of overlapping blocks of the first downsampled frame sequence and the second downsampled frame sequence;
and generating the external parameters by using the characteristic correspondence.
In yet another aspect, the present application provides an electronic device comprising a processor and a memory, the memory storing a plurality of instructions, the processor being configured to read the instructions and perform the method of the first aspect.
In yet another aspect, the application provides a computer readable storage medium storing a plurality of instructions readable by a processor and performing the method of the first aspect.
Compared with the prior art, the application has the advantages that: firstly, performing visual calibration on binocular videos under a multispectral platform to realize image background modeling and target extraction; establishing an apparent texture model, a motion significance model and a depth significance model of the target to determine the significance characteristics of each spectrum dimension of the target in the global and local space-time domains; the abnormal behavior of the target is detected by utilizing the multi-scale multi-mode feature fusion model, so that the target abnormality detection efficiency is higher, and the detection accuracy of the abnormal target is improved. The result of the comparative simulation experiment with other anomaly detection algorithms on different data sets shows that the method provided by the application has better comprehensive performance on different data sets and good anomaly detection capability.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for detecting abnormal video behaviors based on multi-spectrum binocular stereoscopic vision according to the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
According to the method, an image saliency detection technology and a video moving target extraction, tracking and identification technology are combined, and when a typical scene behavior state is detected, a multispectral binocular video is subjected to dynamic background modeling, saliency feature extraction and fusion and multilayer behavior optimization solving in sequence, so that abnormal behaviors of targets in the video are detected. Embodiments of the present application are described in further detail below with reference to the accompanying drawings.
As shown in fig. 1, the application provides a method for detecting abnormal video behaviors based on multispectral binocular stereo vision, which comprises the following steps:
and 101, performing visual calibration on binocular videos under a multispectral platform based on a space coordinate transformation relation and an image registration model, and realizing image background modeling and foreground target segmentation according to a calibration result.
In image analysis based on video streams, estimation and removal of the background in a scene are important preconditions for detecting a moving target, and the step mainly utilizes the depth detection and target high-discrimination capability of a binocular multispectral platform to provide multidimensional information output for background modeling and foreground target segmentation. The specific process is as follows:
step 111: based on a space coordinate system transformation model and an image registration algorithm, the chessboard model and the spherical stick model are utilized to calibrate internal and external parameters of the multispectral sensor, so that the optimization solution of the internal and external parameters between binocular cameras under the maximum likelihood estimation theory is realized, and accurate and stable scene information is provided for background modeling and foreground object segmentation of a later multispectral binocular video scene.
For calibration of the external parameters within the binocular camera, in a preferred embodiment, a video sequence and an initial set of parameters corresponding to each camera of the binocular camera are first received. Generating external parameters based on tracking features in the video sequence, generating internal parameters using position and calibration parameters corresponding to an image acquisition system coupled to the binocular camera, combining the external parameters with the internal parameters to determine an internal and external parameter set for each camera and each time instance of the video sequence. Wherein the external parameters are generated by: a feature correspondence between a first downsampled frame sequence and a second downsampled frame sequence of the video sequence is determined based on tracking features within a plurality of overlapping blocks of the first downsampled frame sequence and the second downsampled frame sequence. And generating the external parameters by using the characteristic correspondence.
Step 112: background modeling based on a Gaussian mixture model is adopted in a space-time domain of a spectrum and a depth dimension to obtain a parameterized model, and background modeling based on a nuclear density estimation model is adopted to obtain a non-parameterized model;
and combining the parameterized model with the non-parameterized model to obtain a global optimization solving result.
Preferably, the binocular stereoscopic image acquisition system comprises at least two image sensors, each acquiring a different reflected intensity in a multispectral range of light reflected by the object. Object characteristic information is acquired by an image sensor, the object characteristic information comprising reflected intensity ranges of a video object in a plurality of spectral ranges. Images respectively located within a preset distance from the binocular stereoscopic image acquisition system are acquired, and the distance is determined based on the visual angle difference of the image data acquired by the two image sensors. A video object is extracted from the acquired image, the video object having a reflection intensity lying in a spectral range contained in the object feature information.
Wherein the target feature information further includes an index range determined based on reflection intensities in a plurality of spectral ranges, and after acquiring images respectively located within a preset distance from the binocular stereoscopic image acquisition system, an index of the image is determined in the reflection intensities in the plurality of spectral ranges contained in the target feature information such that the extracted image has an index located within the index range contained in the target feature information.
As a further embodiment, a binocular stereoscopic image acquisition system is mounted on a mobile device, and when target feature information is acquired through an image sensor, geographic location calibration information of images is synchronously stored to determine the geographic location calibration information of the video target based on the geographic location calibration information of images and a distance from the binocular stereoscopic image acquisition system to the video target.
Step 102, aiming at the segmented target, determining the significance characteristics of each spectrum dimension of the target in the global and local space-time domains by establishing a target apparent texture model, a motion significance model and a depth significance model.
From the perspective of the scene behavior model, a behavior saliency feature vector representation of the image in the global and local space-time domains based on pixel level is obtained. By establishing a spectrum saliency distribution model in a time-space domain scene, a target area which is consistent and sensitive to multi-source information is extracted by utilizing motion independence, persistence and interruption of a target in a time domain spectrum dimension and color, texture and structural characteristics of the target in the space domain spectrum dimension, so that accurate interpretation of the target area in the scene is improved. Global and local scenes are described by building a target apparent texture model, a motion saliency model and a depth saliency model. The specific process of the step is as follows:
step 121: firstly, establishing association of apparent states of targets between global and local scenes under a time-space domain combining spectrum-depth, establishing a target shape context perception texture model, and analyzing differences between the apparent states of the targets and the apparent states of the targets or other groups in a space dimension; analyzing the difference between the current apparent state and the past apparent state of the target in a time dimension;
step 122: secondly, establishing a motion significance model based on an optical flow field in a combined spectrum-depth space-time domain, and achieving the consistency of the internal motion of the target according to the direction and behavior characteristics of the target in the scene and the anti-interference performance of the spectrum image in the three-dimensional space;
step 123: then establishing a target depth saliency model under a combined spectrum-depth time-space domain, and obtaining descriptions of a target structure and deformation by calculating depth change differences between neighborhood frames of the target in a depth dimension space;
by modeling the three behaviors, a target area with time consistency, space consistency and spectrum consistency expression can be provided for later abnormal behavior perception.
In a further embodiment, the depth saliency model is constructed by:
dividing regions based on a distance and a color similarity between pixels in a left-view image and a right-view image from an input stereoscopic image;
creating a view angle difference map of divided regions based on view angle differences obtained from pixel differences of the left-view image and the right-view image;
calculating depth saliency by comparing color differences of the divided regions and contrast average values of the view angle difference map, calculating saliency based on prior knowledge of the divided regions, the prior knowledge being composed of different image features; the a priori knowledge-based salience includes a priori knowledge of the frequency, color, and size of the divided regions, and a priori knowledge of the position and viewing angle differences of the divided regions.
Based on the depth saliency and the a priori knowledge-based saliency, a salient region of the image is extracted.
According to a preferred embodiment, the depth saliency Cf (x) of a location x is defined as:
Cf(x)=-2c(x,d)+c(x,d-1)+c(x,d+1)
wherein c (x, d) = Σ ch∈(R,G,B) [chL(x)-chR(x-d)]。
Where d is the viewing angle difference, chL and chR are the left-view image and right-view image normalized in channel ch, respectively.
To achieve a consistent gaze angle; the stereoscopic image of the generated scene is displayed according to the left-view stereoscopic tensor and the right-view stereoscopic tensor and by combining the binocular vision display technology.
Firstly, acquiring a stereoscopic vision tensor P of a current scene; and then, according to the camera detection data Rx, ry and Rz, obtaining tracking matrixes of the binocular camera about x, y and z axes respectively:
and then the tracking tensor A of the camera can be obtained til
A til =til Rx ×til Ry ×til Rz
Further, the transforming of the stereoscopic tensor includes: binocular offset tensor B oft And viewing angle offset tensor a til Wherein binocular offset tensor B oft When generating the view angle difference image, the left and right cameras are respectively shifted to the left and right by + -oft/2, and then the binocular shift tensors of the left and right cameras are respectively:
binocular offset tensor B oft The method comprises the following steps:
the transformed stereoscopic tensor P'
P’=A til ×B oft ×P
Transformed left view stereometric tensor P' R =A til *B oftR * P is as follows; correspondingly, the transformed right view stereovision tensor P' L =A til *B oftL *P。
And respectively applying the two stereoscopic vision tensors, so that a binocular stereoscopic image of the scene is generated by using the transformed stereoscopic vision tensors, thereby achieving the consistency of the visual angles.
And 103, forming a multi-scale multi-mode feature fusion model by all the salient features under different scales, and detecting abnormal behaviors of the target through the fusion model.
The texture model describes the apparent change characteristics of a moving object in a scene, the moving model describes the direction and speed change characteristics of the moving object in a certain time period, and the depth model can more accurately express the movement of the moving object in a three-dimensional space. However, using one model alone or simply using a different model for abnormal behavior description is not sufficient to accurately express the actually occurring abnormal behavior. Furthermore, an outlier target is typically treated as a smaller entity in a local scene, however, a scene-wide change may make a difference to the outlier determination of the target. Thus, the main process of this part is:
step 131: establishing a fusion model based on shape context information, optical flow field motion information and conditional probability depth information in a spectrum-depth space-time domain, and calculating differences between target motion and apparent states in a multi-scale scene range and prior scene target states;
step 132: constructing multi-scale feature vector representation from a low layer to a high layer, establishing a joint optimization method based on pixel-level features and behavior structure-level features, and separating abnormal behaviors from dominant behaviors;
the specific process further comprises the following steps:
in the training stage, firstly, a local target in a training video is extracted, and then, a spectrum apparent characteristic set and a single-scale characteristic set of the target are calculated. And respectively gathering the two types of features to obtain a spectrum apparent feature word bag and a single-scale feature word bag. Based on two types of word bags, counting the occurrence times of each visual word in the training video, obtaining a spectrum apparent characteristic histogram and a single-scale characteristic histogram of the training video, cascading the two histogram vectors, and distributing behavior category identifiers as clustering histograms of the training video. And calculating cluster histograms of all training videos, inputting the cluster histograms into a Bayesian classifier for training, and determining an action classifier model.
In the test stage, calculating a target and two types of feature sets in the test video according to the process, projecting the feature sets into a word bag space by adopting a K nearest neighbor algorithm, counting the occurrence times of each visual word, obtaining a word bag frequency histogram of the test video, and inputting a trained Bayesian classifier for abnormal behavior recognition.
Since the same class of actions has a similar spectral appearance feature set and a single-scale feature set. And constructing an action prototype through a visual word bag model, describing the behavior in the video by the prototype, and realizing classification and identification by adopting a Bayesian classifier.
The visual word bag model is based on K-means clustering, and multi-scale visual word bags L= (L) are obtained by clustering the normalized single-scale feature set and the spectrum apparent feature set respectively 1 ,l 2 ,…,l m ) And spectral apparent feature word bag f= (q) 1 ,q 2 ,…,q n ) For the size of the cluster center, l i For single-scale visual words, q i Is a spectrum appearance characteristic word. And (3) carrying out serial fusion on the two word bags to obtain the reinforced word bag, wherein the size of the reinforced word bag is (m+n).
For each video to be analyzed, a spectral appearance feature descriptor of a target of the video is first calculated. The distance between the two types of feature descriptors and each word in the corresponding word bag is calculated by adopting a K nearest neighbor algorithm, and the two types of feature descriptors are classified as the word class closest to the word class. Then, regarding the behavior in the video as a text formed by the words in the two word bags together, and counting the occurrence times of each word in the visual word bag to obtain a word bag frequency histogram H= (H) for representing the behavior 1 ,h 2 ,…,h m+n ) Wherein h is i Representing the frequency of occurrence of the ith spatiotemporal word in the video.
In a further preferred embodiment, the spectral appearance characteristic descriptor is determined by the following procedure:
the complete apparent features are denoted P and each spectral apparent feature is denoted Pi. Assuming that n candidate apparent features exist in a single frame image, constructing an evaluation function to evaluate the accuracy of each candidate apparent feature, and selecting the apparent feature with smaller loss function of the apparent feature and model probability feature loss function as the optimal global apparent feature. Each global model sample is processed in the current frame I t Is a loss function C (I t P) is expressed as:
where ε is the set of spectra, ε (I) t ,P i ) For spectral apparent characteristic P i Apparent model in current image, delta i,j (P i ,P j ) Is a function of the spectral potential in the model. The appearance model pi (I t ,P i ) Including contours at spectral edges in the image and optical flow in the current spectral region:
Π(I t ,P i )=Π c (I t |P i )+Π f (I t ,I t+1 |P i )
II in which c (I t |P i ) Representing profile factor, pi f (I t ,I t+1 |P i ) Representative optical flowFactors.
For the profile factor pi c (I t |P i ) Image samples containing different apparent features are taken and the actual apparent feature locations therein annotated, and then the apparent features P along each spectrum i The outline of (2) is detected to obtain a feature vector h i (I|P i ). For each spectral apparent feature P i Training the support vector machine, and then calculating the contour factor pi (I t ,P i )。
For optical flow factor pi f (I t ,I t+1 |P i ) Acquiring two adjacent frames of images I t 、I t+1 The corresponding light flow diagram is U t . Light flow graph U t U corresponding to each pixel (x, y) t (x, y) represents the pixel (x, y) from the graph I t To diagram I t+1 The horizontal axis direction and the vertical axis direction of the corresponding pixels. Spectrum p i Optical flow factor pi of (2) f (I t ,I t+1 |P i ) The method comprises the following steps:
wherein R (i) represents a spectral appearance characteristic P i N represents the number of (x, y) pixel pairs in region R (i).
In the classification stage, for a dataset containing n classes of behavior categories, the bag-of-words frequency histogram H and the ith category c in each video are computed i Inputting a Bayesian classifier for training to obtain a prototype of each behavior; in the anomaly detection stage, calculating a spectrum apparent characteristic descriptor of a video to be detected, mapping the spectrum apparent characteristic descriptor to a visual word bag space to obtain a word bag frequency histogram H ', inputting the H' into a trained Bayes classifier, and outputting a result c i And the behavior category of the video to be detected is the behavior category of the video to be detected.
Step 133: and establishing a dynamic background online updating, behavior online learning and online detection mechanism, optimizing the model by utilizing a visual word bag framework (Bag of visual words), and replacing the traditional single unordered feature word by using a feature layering algorithm to realize the online perception of the abnormal behavior event.
Specifically, in optimizing the model with the visual bag of words framework, a first convolutional neural network is first trained by: receiving original data comprising normal actions, abnormal actions and specified labels, and outputting a clustering feature space and a clustering decision space, wherein the clustering feature space and the clustering decision space are used for classifying the normal actions of the original data into the specified labels; locating a discriminating clustering feature in the raw data spatially processed by the clustering feature of the first convolutional neural network and mapping the discriminating clustering feature onto the raw data as a spatial probability tag;
randomly extracting the blocks of the original data processed through the clustering feature space, sequencing the randomly extracted blocks, and packaging visual word features according to the sequence after sequencing;
training a second convolutional neural network by: receiving the original data and the space probability label, and outputting a general feature space and a general decision space for classifying abnormal actions of the original data into the space probability label;
receiving the original label, the specified label and the space probability label, and outputting a cluster and a general feature and a decision space of a main combination of the specified label and the space probability label for classifying both normal actions and abnormal actions of the original data into the main combination through the first convolutional neural network; receiving unlabeled data; classifying the unlabeled data into a specified label and a spatial probability label of the master combination; mapping the primary combined specified label and the space probability label to the secondary combined specified label and the space probability error classification label;
and receiving the unlabeled data, a secondary error classification label and a penalty matrix, and outputting a secondary combined decision and feature space for classifying both normal actions and abnormal actions of the unlabeled data into the secondary error classification label according to the penalty matrix through the second convolutional neural network.
Compared with the prior art, the method has the advantages that binocular vision is firstly calibrated visually under a multispectral platform, and image background modeling and target extraction are realized; establishing an apparent texture model, a motion significance model and a depth significance model of the target to determine the significance characteristics of each spectrum dimension of the target in the global and local space-time domains; the abnormal behavior of the target is detected by utilizing the multi-scale multi-mode feature fusion model, so that the target abnormality detection efficiency is higher, and the detection accuracy of the abnormal target is improved. The result of the comparative simulation experiment with other anomaly detection algorithms on different data sets shows that the method provided by the application has better comprehensive performance on different data sets and good anomaly detection capability.
The application also provides an electronic device comprising a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the steps of the video abnormal behavior detection method based on multispectral binocular stereoscopic vision are realized when the processor executes the computer program. The functions implemented by the functional modules in this embodiment are the same as those described above with reference to the system and method, and are not described here again.
The application also provides a computer readable storage medium storing a computer program which when executed by a processor implements the steps of the video anomaly detection method based on multispectral binocular stereoscopic vision as described above.
It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on a plurality of computer-usable storage media (including, but not limited to, magnetic disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that in the present application, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is only a specific embodiment of the application to enable those skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. The method for detecting the abnormal video behaviors based on the multispectral binocular stereoscopic vision is characterized by comprising the following steps of:
based on a space coordinate transformation relation and an image registration model, performing visual calibration on binocular videos under a multispectral platform, and realizing image background modeling according to a calibration result to segment a foreground target;
for a segmented target, determining various spectral dimension saliency features of the target in global and local space-time domains by establishing a target apparent texture model, a motion saliency model and a depth saliency model;
all the salient features are formed into a multi-scale multi-mode feature fusion model under different scales, and abnormal behaviors of the target are detected through the fusion model;
wherein, carry out binocular vision under the multispectral platform and mark, further include:
receiving a video sequence and an initial set of parameters corresponding to each camera of the binocular camera;
generating internal parameters using position and calibration parameters corresponding to an image acquisition system coupled to the binocular camera by generating external parameters based on tracking features in the video sequence;
combining the external parameters with internal parameters to determine an internal-external parameter set for each camera and each time instance of the video sequence;
wherein the external parameters are generated by:
determining a feature correspondence between a first downsampled frame sequence and a second downsampled frame sequence of the video sequence based on tracking features within a plurality of overlapping blocks of the first downsampled frame sequence and the second downsampled frame sequence;
and generating the external parameters by using the characteristic correspondence.
2. The method for detecting abnormal video behaviors based on multispectral binocular stereoscopic vision according to claim 1, wherein the method for performing visual calibration on binocular video under a multispectral platform based on a spatial coordinate transformation relation and an image registration model, further comprises:
based on a space coordinate system transformation model and an image registration algorithm, calibrating internal and external parameters of the multispectral sensor by utilizing a chessboard model and a ball stick model, realizing optimization solution of the internal and external parameters between binocular cameras under a maximum likelihood estimation theory, and providing scene information for background modeling and foreground object segmentation of a later multispectral binocular video scene;
background modeling based on a Gaussian mixture model is adopted in a space-time domain of a spectrum and a depth dimension to obtain a parameterized model, and background modeling based on a nuclear density estimation model is adopted to obtain a non-parameterized model;
and combining the parameterized model with the non-parameterized model to obtain a global optimization solving result.
3. The method for detecting video anomalies based on multi-spectral binocular stereoscopic vision according to claim 1, wherein the determining the respective spectral dimension saliency features of the target in both global and local spatiotemporal domains further comprises:
from the perspective of a scene behavior model, obtaining behavior saliency feature vector representation of an image in global and local space-time domains based on pixel level;
by establishing a spectrum saliency distribution model in a time-space domain scene, extracting a target area which is consistent and sensitive to multi-source information by utilizing motion independence, persistence and interruption of a target in a time domain spectrum dimension and color, texture and structural characteristics of the target in the space domain spectrum dimension, thereby improving accurate interpretation of the target area in the scene;
global and local scenes are described by building a target apparent texture model, a motion saliency model and a depth saliency model.
4. The method for detecting abnormal video behavior based on multi-spectral binocular stereoscopic vision according to claim 3, wherein the establishing a target apparent texture model further comprises:
establishing association of apparent states of the target between global and local scenes under a time-space domain combining spectrum-depth, establishing a target shape context perception texture model, and analyzing differences between the apparent states of the target and the apparent states of the target or other groups in a space dimension; the difference of the current apparent state and the past apparent state of the target is analyzed in a time dimension.
5. The method for detecting abnormal video behavior based on multispectral binocular stereoscopic vision according to claim 3, wherein the establishing a motion significance model further comprises:
and establishing a motion significance model based on an optical flow field in a combined spectrum-depth space-time domain, and achieving the consistency of the internal motion of the target according to the direction and behavior characteristics of the target in the scene and the anti-interference performance of the spectrum image in the three-dimensional space.
6. The method for detecting abnormal video behavior based on multispectral binocular stereoscopic vision according to claim 3, wherein the establishing a depth saliency model further comprises:
and establishing a target depth saliency model under a combined spectrum-depth time-space domain, and obtaining the description of the target structure and deformation by calculating the depth change difference between the neighborhood frames of the target in a depth dimension space.
7. The method for detecting abnormal video behavior based on multispectral binocular stereoscopic vision according to claim 1, wherein the forming all the salient features into a multiscale and multimodality feature fusion model under different scales, detecting the abnormal behavior of the target through the fusion model, further comprises:
establishing a fusion model based on shape context information, optical flow field motion information and conditional probability depth information in a spectrum-depth space-time domain, and calculating differences between target motion and apparent states in a multi-scale scene range and prior scene target states;
constructing multi-scale feature vector representation from a low layer to a high layer, establishing a joint optimization method based on pixel-level features and behavior structure-level features, and separating abnormal behaviors from dominant behaviors;
and establishing a dynamic background online updating, behavior online learning and online detection mechanism, optimizing the model by utilizing a visual word bag framework, and replacing the traditional single unordered feature word by using a feature layering algorithm to realize the online sensing of the abnormal behavior event.
8. The method for detecting abnormal behavior of video based on multispectral binocular stereoscopic vision according to claim 7, wherein the constructing is represented by a multiscale feature vector from a lower layer to a higher layer, a joint optimization method based on pixel-level features and behavior structure-level features is established, and the method for separating abnormal behavior and dominant behavior further comprises:
in the training stage, firstly extracting a local target in a training video, and then calculating a spectrum apparent characteristic set and a single-scale characteristic set of the target; respectively gathering the two types of features to obtain a spectrum apparent feature word bag and a single-scale feature word bag; based on two types of word bags, counting the occurrence times of each visual word in the training video to obtain a spectrum apparent characteristic histogram and a single-scale characteristic histogram of the training video, cascading the two histogram vectors, and distributing behavior category identifiers as clustering histograms of the training video; calculating cluster histograms of all training videos, inputting Bayesian classifier training, and determining an action classifier model;
in the test stage, calculating a target and two types of feature sets in the test video according to the process, projecting the feature sets into a word bag space by adopting a K nearest neighbor algorithm, counting the occurrence times of each visual word, obtaining a word bag frequency histogram of the test video, and inputting a trained Bayesian classifier for abnormal behavior recognition.
9. A video abnormal behavior detection system based on multispectral binocular stereoscopic vision, comprising:
the target region extraction module is used for carrying out visual calibration on binocular videos under a multispectral platform based on a space coordinate transformation relation and an image registration model, realizing image background modeling according to a calibration result and segmenting a foreground target;
the feature model building module is used for determining the significance features of each spectrum dimension of the target in the global and local space-time domains by building a target apparent texture model, a motion significance model and a depth significance model aiming at the segmented target;
the abnormal behavior detection module is used for forming a multi-scale multi-mode feature fusion model of all the significant features under different scales, and detecting the abnormal behavior of the target through the fusion model;
wherein, carry out binocular vision under the multispectral platform and mark, further include:
receiving a video sequence and an initial set of parameters corresponding to each camera of the binocular camera;
generating internal parameters using position and calibration parameters corresponding to an image acquisition system coupled to the binocular camera by generating external parameters based on tracking features in the video sequence;
combining the external parameters with internal parameters to determine an internal-external parameter set for each camera and each time instance of the video sequence;
wherein the external parameters are generated by:
determining a feature correspondence between a first downsampled frame sequence and a second downsampled frame sequence of the video sequence based on tracking features within a plurality of overlapping blocks of the first downsampled frame sequence and the second downsampled frame sequence;
and generating the external parameters by using the characteristic correspondence.
10. Terminal device, characterized by comprising a memory, a processor and a computer program stored in the memory and executable on the processor, which processor, when executing the computer program, realizes the steps of the method according to any of claims 1-8.
11. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the steps of the method according to any of claims 1-8.
CN202310940861.7A 2023-07-28 2023-07-28 Video abnormal behavior detection method and system based on multispectral binocular stereoscopic vision Active CN116958876B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310940861.7A CN116958876B (en) 2023-07-28 2023-07-28 Video abnormal behavior detection method and system based on multispectral binocular stereoscopic vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310940861.7A CN116958876B (en) 2023-07-28 2023-07-28 Video abnormal behavior detection method and system based on multispectral binocular stereoscopic vision

Publications (2)

Publication Number Publication Date
CN116958876A true CN116958876A (en) 2023-10-27
CN116958876B CN116958876B (en) 2024-06-14

Family

ID=88447335

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310940861.7A Active CN116958876B (en) 2023-07-28 2023-07-28 Video abnormal behavior detection method and system based on multispectral binocular stereoscopic vision

Country Status (1)

Country Link
CN (1) CN116958876B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729848A (en) * 2013-12-28 2014-04-16 北京工业大学 Hyperspectral remote sensing image small target detection method based on spectrum saliency
WO2015161816A1 (en) * 2014-04-25 2015-10-29 Tencent Technology (Shenzhen) Company Limited Three-dimensional facial recognition method and system
WO2015196281A1 (en) * 2014-06-24 2015-12-30 Sportlogiq Inc. System and method for visual event description and event analysis
CN106709447A (en) * 2016-12-21 2017-05-24 华南理工大学 Abnormal behavior detection method in video based on target positioning and characteristic fusion
CA3032487A1 (en) * 2016-08-03 2018-02-08 Jiangsu University Saliency-based method for extracting road target from night vision infrared image
CN110111338A (en) * 2019-04-24 2019-08-09 广东技术师范大学 A kind of visual tracking method based on the segmentation of super-pixel time and space significance
CN111126195A (en) * 2019-12-10 2020-05-08 郑州轻工业大学 Abnormal behavior analysis method based on scene attribute driving and time-space domain significance
CN112651940A (en) * 2020-12-25 2021-04-13 郑州轻工业大学 Collaborative visual saliency detection method based on dual-encoder generation type countermeasure network
CN114627339A (en) * 2021-11-09 2022-06-14 昆明物理研究所 Intelligent recognition and tracking method for border crossing personnel in dense jungle area and storage medium
CN114913442A (en) * 2021-01-29 2022-08-16 中移(苏州)软件技术有限公司 Abnormal behavior detection method and device and computer storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729848A (en) * 2013-12-28 2014-04-16 北京工业大学 Hyperspectral remote sensing image small target detection method based on spectrum saliency
WO2015161816A1 (en) * 2014-04-25 2015-10-29 Tencent Technology (Shenzhen) Company Limited Three-dimensional facial recognition method and system
WO2015196281A1 (en) * 2014-06-24 2015-12-30 Sportlogiq Inc. System and method for visual event description and event analysis
CA3032487A1 (en) * 2016-08-03 2018-02-08 Jiangsu University Saliency-based method for extracting road target from night vision infrared image
CN106709447A (en) * 2016-12-21 2017-05-24 华南理工大学 Abnormal behavior detection method in video based on target positioning and characteristic fusion
CN110111338A (en) * 2019-04-24 2019-08-09 广东技术师范大学 A kind of visual tracking method based on the segmentation of super-pixel time and space significance
CN111126195A (en) * 2019-12-10 2020-05-08 郑州轻工业大学 Abnormal behavior analysis method based on scene attribute driving and time-space domain significance
CN112651940A (en) * 2020-12-25 2021-04-13 郑州轻工业大学 Collaborative visual saliency detection method based on dual-encoder generation type countermeasure network
CN114913442A (en) * 2021-01-29 2022-08-16 中移(苏州)软件技术有限公司 Abnormal behavior detection method and device and computer storage medium
CN114627339A (en) * 2021-11-09 2022-06-14 昆明物理研究所 Intelligent recognition and tracking method for border crossing personnel in dense jungle area and storage medium

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
JUN YU: "Abnormal behavior recognition based on feature fusion C3D network", ELECTRONIC IMAGING, 31 December 2022 (2022-12-31) *
余东行;张保明;郭海涛;赵传;徐俊峰;: "联合显著性特征与卷积神经网络的遥感影像舰船检测", 中国图象图形学报, no. 12, 16 December 2018 (2018-12-16) *
李一白: "基于显著性检测的不同视角下红外与可见光图像融合", 激光与红外, 30 April 2021 (2021-04-30) *
李庆武;周亚琴;马云鹏;邢俊;许金鑫;: "基于双目视觉的显著性目标检测方法", 光学学报, no. 03, 13 November 2017 (2017-11-13) *
罗凡波;王平;梁思源;徐桂菲;王伟;: "基于深度学习与稀疏光流的人群异常行为识别", 计算机工程, no. 04, 31 December 2018 (2018-12-31) *
胡春海;万欣;李勇骁;刘斌;赵兴;: "视觉显著性驱动的运动鱼体视频分割算法", 燕山大学学报, no. 01, 31 January 2017 (2017-01-31) *
高智勇;唐文峰;贺良杰;: "基于运动显著性的移动镜头下的运动目标检测", 计算机应用, no. 06, 10 June 2016 (2016-06-10) *

Also Published As

Publication number Publication date
CN116958876B (en) 2024-06-14

Similar Documents

Publication Publication Date Title
CN110097568B (en) Video object detection and segmentation method based on space-time dual-branch network
CN111797653B (en) Image labeling method and device based on high-dimensional image
US8620026B2 (en) Video-based detection of multiple object types under varying poses
CN109800794B (en) Cross-camera re-identification fusion method and system for appearance similar targets
US9008439B2 (en) Image processing method and system
An et al. Scene learning for cloud detection on remote-sensing images
CN110334708A (en) Difference automatic calibrating method, system, device in cross-module state target detection
CN111241989A (en) Image recognition method and device and electronic equipment
CN111192293A (en) Moving target pose tracking method and device
CN106156778A (en) The apparatus and method of the known object in the visual field identifying three-dimensional machine vision system
Shahab et al. How salient is scene text?
CN112215925A (en) Self-adaptive follow-up tracking multi-camera video splicing method for coal mining machine
CN107622280B (en) Modularized processing mode image saliency detection method based on scene classification
CN108073940B (en) Method for detecting 3D target example object in unstructured environment
CN110910497B (en) Method and system for realizing augmented reality map
CN117949942B (en) Target tracking method and system based on fusion of radar data and video data
CN117576029A (en) Binocular vision-based part defect detection and evaluation method and device
CN108491857A (en) A kind of multiple-camera target matching method of ken overlapping
Gao Performance evaluation of automatic object detection with post-processing schemes under enhanced measures in wide-area aerial imagery
CN117557784B (en) Target detection method, target detection device, electronic equipment and storage medium
CN117315210A (en) Image blurring method based on stereoscopic imaging and related device
CN116958876B (en) Video abnormal behavior detection method and system based on multispectral binocular stereoscopic vision
Dilawari et al. Toward generating human-centered video annotations
CN116912670A (en) Deep sea fish identification method based on improved YOLO model
Liu Research on intelligent visual image feature region acquisition algorithm in Internet of Things framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant