CN112653886B - Monitoring video stream forgery detection method and positioning method based on wireless signals - Google Patents

Monitoring video stream forgery detection method and positioning method based on wireless signals Download PDF

Info

Publication number
CN112653886B
CN112653886B CN202011471045.9A CN202011471045A CN112653886B CN 112653886 B CN112653886 B CN 112653886B CN 202011471045 A CN202011471045 A CN 202011471045A CN 112653886 B CN112653886 B CN 112653886B
Authority
CN
China
Prior art keywords
human body
paf
jhm
network
video frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011471045.9A
Other languages
Chinese (zh)
Other versions
CN112653886A (en
Inventor
王巍
黄勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202011471045.9A priority Critical patent/CN112653886B/en
Publication of CN112653886A publication Critical patent/CN112653886A/en
Application granted granted Critical
Publication of CN112653886B publication Critical patent/CN112653886B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N17/00Diagnosis, testing or measuring for television systems or their details
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast

Abstract

The invention discloses a monitoring video stream counterfeiting detection method and a monitoring video stream counterfeiting positioning method based on wireless signals, and belongs to the field of video stream counterfeiting detection. The method comprises the following steps: acquiring wireless signals of a video monitoring area at equal time intervals in real time to obtain the same number of wireless signals corresponding to each monitoring video frame; respectively extracting JHM and PAF tensors from a monitoring video frame and a wireless signal; and if the difference between the monitoring video frame JHM and the wireless signal JHM is large, judging that the current monitoring video frame is forged. When the monitoring camera and the wireless signal receiving end are in the same space, the visual signal and the wireless signal can simultaneously sense human semantic information, and the human semantic information is extracted from the wireless signal. As the JHM tensors respectively extracted from the video frames and the wireless signals are similar when no counterfeiting attack exists, and otherwise are not similar, the JHM tensors of the monitoring video frames and the wireless signals are compared frame by frame to realize counterfeiting judgment, and real-time and fine-grained detection is met.

Description

Monitoring video stream forgery detection method and positioning method based on wireless signals
Technical Field
The invention belongs to the field of monitoring video stream forgery detection, and particularly relates to a monitoring video stream forgery detection method and a positioning method based on wireless signals.
Background
With the increasing demand for safety in daily life, the video monitoring system has wider and wider application range indoors and outdoors, such as bank crime prevention, customer monitoring of retail stores, and the like. Due to the rapid growth in popularity and significance in the real world, video surveillance systems inevitably become attractive targets for attacks in the field of network security. Recent research has shown that a malicious attacker can dive into the monitoring system by using a security hole of the monitoring camera or hijacking his connected ethernet cable, then tamper with the real video stream content, and further mask illegal activities in the monitored area, leaving behind any perceptible clues. Under such attack threats, it is important for a monitoring system to rapidly alert an ongoing network attack and to track potential intruders in real time.
Aiming at the problem of camera replay attack, Nitya Lakshmann et al propose a video forgery detection method in 2019 in the 'detecting and detecting subsequent camera replay attack with wi-fi channel state information', and the main idea is to compare the similarity of two signal characteristics based on the vision in the same space and the relevant time-frequency domain characteristics (the start and stop time and the main frequency components of an event) of a WiFi CSI signal so as to judge whether the current video stream is not matched with the current wireless signal. If a mismatch is detected as the presence of an attack. However, this approach relies on visual and event-level features in the wireless signal, causing the following problems: firstly, the detection time is long, and an original signal sequence of 10-30s is required to be used as a detection unit; secondly, the precision is low, if only the data corresponding to one event is used, the system precision is less than 40 percent; thirdly, the situation that a plurality of people are in the coverage range of the camera cannot be supported on the assumption that only one person exists in the region of interest; fourth, even if an attack is detected, the system cannot locate where the intruder is in the monitored area and what to do. Therefore, the real-time, high-precision and fine-grained requirements of the monitoring video stream on intrusion detection cannot be met.
Cao Zhe et al in 2016 propose a positioning method, and the main idea is to design a fast association algorithm of human body joint points based on human body semantic information PAF tensor, and further estimate the 2D posture of a human body from one picture. However, if the scheme is directly utilized to recover respective human body postures from the vision, the wireless JHM tensor and the wireless PAF tensor respectively, and the difference of the two human body postures is compared one by one, so that the positioning of the intruder is realized. The direct method performs the joint association algorithm twice on each picture, and the calculation complexity is high.
Disclosure of Invention
Aiming at the defects and the improvement requirements of the prior art, the invention provides a monitoring video stream counterfeiting detection method and a positioning method based on wireless signals, and aims to solve the problem that the requirements of real-time performance and fine-grained detection capability cannot be met simultaneously in video stream counterfeiting attack detection in the conventional video monitoring system.
To achieve the above object, according to a first aspect of the present invention, there is provided a surveillance video stream forgery detection method based on a wireless signal, the method including:
s1, collecting wireless signals of a video monitoring area at equal time intervals in real time to obtain the same number of wireless signals corresponding to each monitoring video frame;
s2, extracting a human body joint heat map and a PAF tensor from the monitoring video frame and the wireless signal respectively;
and S3, if the difference between the human body joint heat map of the monitoring video frame and the human body joint heat map of the wireless signal is large, judging that the current monitoring video frame is forged.
Preferably, in step S2, a pre-trained openpos network is used to extract a human joint heat map and a PAF tensor from a monitoring video frame; extracting a human joint heat map and a PAF tensor from a wireless signal by adopting a pre-trained RF2Pose network;
the RF2Pose network includes:
the wireless signal conversion module is used for converting all wireless signals corresponding to each monitoring video frame into intermediate characteristics with the same length dimension as the monitoring video frame and outputting the intermediate characteristics to the human joint thermal diagram generation module and the PAF tensor generation module;
the human body joint heat map generation module is used for generating a human body joint heat map according to the intermediate characteristics;
the PAF tensor generation module is used for generating the PAF tensor according to the intermediate features;
the training sample of the RF2Pose network is a wireless signal, and the label is a corresponding human joint heat map and a PAF tensor output by the OpenPose network.
Has the advantages that: according to the invention, human body semantic information is extracted from the wireless signals corresponding to each video frame through the deep neural network, the influence relationship between the wireless signals and a human body is difficult to model through a traditional method due to the complexity and the changeability of the wireless signals, and the deep learning method has extremely strong characteristic mapping capability, so that the human body semantic information can be effectively recovered from the wireless signals corresponding to each video frame by using the deep neural network, and thus frame-by-frame forgery detection can be realized. In addition, in the training stage, the RF2Pose network is trained by using a cross-modal training method, and because a large amount of time and labor are needed for manually marking the tags of the wireless signals, the trained OpenPose network outputs the tags as the tags corresponding to the wireless signals, so that a large amount of labor cost can be saved.
Preferably, the wireless signal transformation module comprises a deconvolution network, a convolution network and a residual error network which are connected in series in sequence; the human body joint heat map generation module comprises a decoding network and a full convolution network which are sequentially connected in series, wherein the decoding network is a repeated structure of one layer of deconvolution layer and two layers of convolution layers; the PAF tensor generation module comprises a decoding network and a full convolution network which are sequentially connected in series, wherein the decoding network is a repeated structure of one layer of deconvolution layer and two layers of convolution layers.
Has the advantages that: the invention designs a novel deep neural network structure which comprises a wireless signal transformation module, a human joint heat map generation module and a PAF tensor generation module, so that a human joint heat map and a PAF tensor are extracted from complex and changeable wireless signals. Because the spatial resolution of the wireless signal is low, the wireless signal conversion module firstly utilizes a deconvolution network to carry out up-sampling on the input wireless signal to ensure that the wireless signal has higher spatial resolution, and then inputs a convolution network and a residual error network to obtain intermediate characteristics which relate the human body information and the video frame spatial information; because the human body semantic information of the intermediate features is rough and the mapping relation between the human body semantic information and the human body joint heat map is further clear, the human body joint heat map generation module firstly utilizes a decoding network to gradually refine the intermediate features through a repeated structure of a layer of deconvolution layer and two layers of convolution layers, and then utilizes a full convolution network to convert the refined intermediate feature map into the human body joint heat map; similarly, because the human semantic information of the intermediate features is rough, and the mapping relationship between the intermediate features and the PAF tensor needs to be further defined, the PAF tensor generation module firstly uses a decoding network to gradually refine the intermediate features through a repeated structure of a layer of deconvolution layer and two layers of convolution layers, and then uses a full convolution network to convert the refined intermediate feature map into the PAF tensor.
Preferably, the loss function of the RF2pos network in the training phase is as follows:
Figure BDA0002833805750000041
Figure BDA0002833805750000042
Figure BDA0002833805750000043
Figure BDA0002833805750000044
Figure BDA0002833805750000045
wherein the content of the first and second substances,
Figure BDA00028338057500000411
and
Figure BDA00028338057500000412
respectively represents the loss functions of JHM and PAF corresponding to the human body joint heat diagram,
Figure BDA0002833805750000046
respectively representing JHM and PAF tensors obtained by inputting the y training sample video frame into an OpenPose network,
Figure BDA00028338057500000413
respectively representing JHM and PAF tensors obtained by inputting a Y training sample wireless signal into an RF2Pose network, Y representing the number of training samples, J representing the number of connecting joint points abstracted by a human body, C representing the number of human body parts, h and w representing pixel point coordinates,
Figure BDA0002833805750000047
and
Figure BDA0002833805750000048
respectively representing weight factors, lambda, of JHM and PAF tensors pixel by pixel1、λ2、β1And beta2To represent
Figure BDA00028338057500000414
And
Figure BDA00028338057500000415
the balancing coefficients of the influence in the overall objective function,
Figure BDA0002833805750000049
representing the confidence of the j-th joint point on the visual JHM pixel point (h, w),
Figure BDA00028338057500000410
representing the confidence of the j-th joint point on the wireless JHM pixel point (h, w),
Figure BDA0002833805750000051
representing the position and direction information of the c-th human body part on the visual PAF pixel points (h, w),
Figure BDA0002833805750000052
and (3) representing the position and direction information of the c-th human body part on the wireless PAF pixel points (h, w).
Has the advantages that: the present invention proposes a new loss function, since the background in the video frame often occupies most of the pixels, most of the elements in the jhm (paf) tensor are equal to zero, while the conventional L2 loss function tends to reduce the overall (all elements) error, thus resulting in a large gap between the visual jhm (paf) and wireless jhm (paf) tensors. To this end, the invention is therefore based on the provision of
Figure BDA0002833805750000053
Linearly related to the absolute value of the (h, w) th element to increase the weight of the non-zero element, so that the RF2Pose network has morePaying attention to the region where the human body exists, and neglecting the background region, thereby greatly reducing the difference between visual JHM (PAF) and wireless JHM (PAF) tensors; in addition, considering that JHM and PAF tensors indicate different human semantic information and have different numerical scales, the method sets different coefficients lambda1、λ1、β1And beta2De-balancing
Figure BDA0002833805750000054
And
Figure BDA0002833805750000055
influence in the overall objective function. Through the training process, the deep neural network can extract more effective PAF information from the wireless signals, and the JHM tensor recovered from the wireless signals is more accurate.
Preferably, in step S3, a threshold is set or a two-classifier is constructed based on the similarity or difference of the human joint heat maps, and the difference between the human joint heat map of the monitoring video frame and the wireless signal human joint heat map is determined.
Has the advantages that: the invention uses the similarity or difference of the human joint heat map as the judgment basis. If the attack does not exist, the similarity is higher or the difference is smaller; otherwise, the similarity is lower or the difference is larger. Therefore, based on the relationship, the redundant information in the two human body joint heat maps can be reduced by utilizing the similarity or the difference, and simple and quick counterfeit detection is realized.
To achieve the above object, according to a second aspect of the present invention, there is provided a method for counterfeit location of a surveillance video stream based on a wireless signal, the method comprising:
(T1) performing a forgery detection on the surveillance video stream using the detection method according to the first aspect;
(T2) for the surveillance video frame whose detection result is false, calculating the absolute value of the difference between the human body joint heat map of the current surveillance video frame and the human body joint heat map of the wireless signal, and the PAF tensor and value;
(T3) selecting an abnormal human body joint point set corresponding to the current monitoring video frame based on the difference absolute value of the human body joint heat map;
(T4) performing joint point association operation on the PAF tensor and value of the current monitoring video frame and the abnormal human body joint point set to obtain an association state between the abnormal human body joint points, thereby determining the position of the forged human body object in the current monitoring video frame.
Preferably, in the step (T3), the abnormal human body joint point set corresponding to the current monitoring video frame is selected by a non-maximum suppression method.
Has the advantages that: according to the method, the abnormal human body joint points corresponding to the current monitoring video frame are selected through a non-maximum value inhibition method, elements in the JHM difference tensor have positive values and negative values, and non-maximum value inhibition operation has corresponding significance only on the positive values.
To achieve the above object, according to a third aspect of the present invention, there is provided a computer readable storage medium storing one or more first programs, the one or more first programs being executed by one or more processors to implement the steps of the wireless signal based surveillance video stream forgery detection method according to the first aspect; or, the computer readable storage medium stores one or more second programs, which are executed by one or more processors to implement the steps of the method for monitoring video stream forgery location based on wireless signals according to the second aspect.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
(1) the JHM tensor extracted from the monitoring video frame is compared with the JHM tensor extracted from the wireless signal in the same space to judge whether the video frame is forged or not. Due to the fact that the electrolyte constant of the human body is large, the wireless electromagnetic signals can be subjected to strong reflection when encountering the human body. Therefore, when the monitoring camera and the wireless signal receiving end are in the same space, the visual signal and the wireless signal can simultaneously sense the human semantic information, and therefore the human semantic information can be extracted from the widely existing wireless signals. Because the JHM tensors extracted from the video frame corresponding to the same moment and the wireless signal are similar when no counterfeiting attack exists, and the JHM tensors extracted from the wireless signal are dissimilar when the counterfeiting attack exists, the JHM tensors of the monitored video frame and the corresponding wireless signal are compared frame by frame to realize counterfeiting judgment, so that the requirements of real-time performance and fine-grained detection capability are met simultaneously.
(2) In the positioning stage, the absolute value of the difference tensor of the vision and the wireless JHM is subjected to non-maximum suppression, so that a candidate abnormal human body joint point set is selected. Thereafter, the visual PAF tensor is summed with the wireless PAF tensor to obtain a PAF sum value tensor. And finally, the position and the posture of the abnormal human body target in the picture can be positioned by utilizing a human body joint point association algorithm. The abnormal human body joint point set is selected directly based on the absolute value of the JHM difference tensor, so that the respective human body joint point sets can be prevented from being selected from the vision tensor and the wireless JHM tensor respectively; the vision and wireless PAF tensors are summed to obtain the PAF and the value tensor, the correlation of the following human body joint points is not greatly influenced, and the calculation amount can be reduced; the human body joint point association algorithm is only needed to be used once, so that the calculation expense is reduced.
Drawings
Fig. 1 is a schematic view of a video frame forgery attack scene existing in a video monitoring system according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for detecting and locating a surveillance video stream forgery attack based on a wireless signal according to an embodiment of the present invention;
fig. 3(a) is a schematic diagram of CSI resampling provided by an embodiment of the present invention;
FIG. 3(b) is a schematic diagram of an outlier distribution provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of human semantic feature extraction based on video frames and wireless signals according to an embodiment of the present invention;
fig. 5 is a schematic diagram of video frame forgery detection and localization based on human semantic features according to an embodiment of the present invention;
FIG. 6 is a forged positioning result under video frame interframe attack based on human semantic features according to an embodiment of the present invention;
fig. 7 is a forged positioning result under attack in a forged frame of a video frame based on human semantic features according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Due to the rapid growth in popularity and significance in the real world, video surveillance systems inevitably become attractive targets for attacks in the field of network security. As shown in fig. 1, a malicious attacker can sneak into the monitoring system by hijacking his cable, then tamper with the real video stream content, and further mask illegal activities in the monitored area. Under such attack threats, it is important for video surveillance systems to quickly detect ongoing counterfeit attacks and track potential intruders in real time. However, none of the existing methods can simultaneously meet the real-time and fine-grained requirements of surveillance video stream forgery detection.
Therefore, the present invention provides a method for detecting and positioning a surveillance video stream forgery attack based on wireless signals as shown in fig. 2. In particular, the present embodiment takes a WiFi signal as an example to verify the feasibility of the present invention. Today, many surveillance cameras cover areas such as shops, homes, etc. that are also covered by WiFi hotspots. Since the human body has a high reflection coefficient for WiFi signals. In these areas, the synchronized surveillance video signal and the received WiFi signal will carry consistent human semantic information. Once an attacker makes a forgery attack on the monitoring video stream, the cross-modal information correspondence relationship is decoupled. Therefore, the corresponding relation of the information can be utilized to detect and locate the video frame forgery in real time. As shown in fig. 2, the system includes three functional modules, namely, multi-modal signal processing, human semantic information extraction, and counterfeit detection and localization.
In the multi-mode signal processing module, the real-time video stream from the monitoring camera is firstly decompressed and recorded as
II=(I1,…,Im,…,IM)
Where M is the number of video frames contained in a GOP (group of pictures) in the video stream, ImIs a picture after decompression. Meanwhile, the system receives CSI amplitude data stream from a WiFi receiving end in the same area, and the CSI amplitude data stream is recorded as the CSI amplitude data stream
Figure BDA0002833805750000091
Wherein A isnIs a CSI amplitude matrix comprising all amplitude characteristics of the nth CSI sample point.
In the wireless signal processing stage, the received wireless signal measurement value is resampled at equal time intervals based on the timestamp of the video stream, and after resampling, the information correspondence between the vision and the wireless signal is more accurate.
As shown in fig. 3(a), due to a random access mechanism and the existence of a packet loss phenomenon, a time interval between two adjacent original CSI sampling points is not fixed. Therefore, the number of CSI sampling points in unit time is not fixed, the periodicity of CSI data is reduced, and accurate correspondence between visual information and wireless information is not facilitated.
To solve this problem, the system is based on II and II considering that the time stamp of video stream II is more accurate
Figure BDA0002833805750000092
Time stamp pair of
Figure BDA0002833805750000093
In the CSI measurementAnd (5) performing line resampling. The specific method comprises the following steps: meter tm-1And tmRespectively two consecutive video frames Im-1And ImA corresponding time stamp. Due to the sampling rate F of the WiFi signalwOften greater than the frame rate F of the video streamIThus the system is for video frame ImF is less than or equal to F in resampling modew/FIA CSI sampling point and
Figure BDA0002833805750000094
wherein the time interval between each sampling point is fixed and is (t)m-tm-1) and/F. To achieve the above objective, a linear interpolation method with low computational complexity may be utilized.
In addition, the system will be paired due to the interference of the environmental noise
Figure BDA0002833805750000096
And eliminating abnormal points by the corresponding resampling points. In particular, the value of the outlier differs greatly from the surrounding values, and therefore affects the validity of the resample value. Aiming at the problem that the numerical value of the abnormal point has larger change relative to the surrounding numerical value, so that the effectiveness of a resampling value can be influenced, the method provided by the invention can be used for eliminating the abnormal point of the resampled wireless signal, and the resampling and the abnormal point elimination can enable the deep neural network to better extract the tensors of JHM and PAF from the wireless signal.
To solve this problem, the present embodiment performs outlier elimination on the resample values of the CSI using the Hampel filtering. After Hampel filtering, outliers in the CSI can be effectively detected and rejected, as shown in FIG. 3 (b). Through the above operations, corresponding to the video frame ImA set R containing F CSI amplitude characteristics can be obtainedmAnd named RF frame. Or replacing outliers with neighborhood means.
In the human body semantic information extraction module, the system respectively extracts the semantic information from the video frame ImAnd RF frame RmCorresponding human body semantic information is extracted. The system represents semantic information of a human body as JHM features (Joint Heat Maps) and PAF features (Part Affinity Fields). In particular, the human body is abstracted into J connectedAnd (4) a joint point. JHM represents different articulation points in ImConfidence at different locations. For a certain video frame
Figure BDA0002833805750000101
JHM is a three-dimensional tensor
Figure BDA0002833805750000102
And is marked as
Figure BDA0002833805750000103
Wherein the content of the first and second substances,
Figure BDA0002833805750000104
is the confidence of the jth joint point in image space. In addition, PAF indicates spatial information of C human body parts (one part is determined by two connected joint points). In particular, the PAF can be expressed as a 4-dimensional tensor
Figure BDA0002833805750000105
Is recorded as
Figure BDA0002833805750000106
Wherein the content of the first and second substances,
Figure BDA0002833805750000107
representing the position and orientation information of the c-th body part in image space. As shown in FIG. 4, the system utilizes an OpenPose neural network and an RF2Pose neural network to respectively derive the frequency frame ImAnd RF frame RmExtracting the corresponding JHM and PAF tensors. In particular, for video frame ImThe openpos neural network firstly utilizes an fpn (feature Pyramid network) network to extract multi-scale visual features from the video frames. Then, the openpos neural network extracts JHM and PAF features by using a two-stage CNN network (convolutional neural network). The above two steps can be described as
Figure BDA0002833805750000108
Wherein the content of the first and second substances,
Figure BDA0002833805750000109
parameters representing an openpos neural network. For RF frame RmThe RF2pos neural network includes a CSI converter, a JHM generator, and a PAF generator. R to be input by CSI convertermAnd converting the intermediate features into intermediate features containing human body semantic information, extracting a JHM tensor from the intermediate features by a JHM generator, and extracting a PAF tensor from the intermediate features by a PAF generator. Thus, given input RmThe output of the RF2Pose neural network can be recorded as
Figure BDA0002833805750000111
Wherein the content of the first and second substances,
Figure BDA0002833805750000112
representing parameters of the RF2pos neural network. For the
Figure BDA0002833805750000113
The OpenPose network can be trained by utilizing the existing public data set. For the
Figure BDA0002833805750000114
Due to the lack of corresponding public data sets, a method for cross-modal learning needs to be designed. Specifically, in the training phase, a set of training sets { (I) is giveny,Ry)}y=1:YFirst, will { Iy}y=1:YInputting the data into an OpenPose neural network and obtaining a corresponding visual semantic feature set
Figure BDA0002833805750000115
Subsequently, { R }y}y=1:YAs RF2Pose neural networks
Figure BDA0002833805750000116
The input of (a) is performed,
Figure BDA0002833805750000117
as a corresponding training label. Therefore, the temperature of the molten metal is controlled,
Figure BDA0002833805750000118
with the goal of minimizing its output
Figure BDA0002833805750000119
Difference in output:
Figure BDA00028338057500001120
wherein
Figure BDA00028338057500001110
And
Figure BDA00028338057500001111
are loss functions corresponding to JHM and PAF, respectively, which can be further expressed as
Figure BDA00028338057500001112
Figure BDA00028338057500001113
In the above two formulas,
Figure BDA00028338057500001114
and
Figure BDA00028338057500001115
the JHM and PAF tensors are respectively the weight factors of pixel points by pixel points. Since most of the element values in the JHM and PAF tensors are close to zero, the settings are set
Figure BDA00028338057500001116
And
Figure BDA00028338057500001117
linearly related to the absolute value of the (h, w) th element to give greater weight to the non-zero elements. In addition, the present embodiment sets different coefficients λ in consideration that JHM and PAF tensors indicate different human semantic information and have different numerical scales1、λ2、β1And beta2De-balancing
Figure BDA00028338057500001118
And
Figure BDA00028338057500001119
influence in the overall objective function. Thus, it is possible to obtain
Figure BDA0002833805750000121
Figure BDA0002833805750000122
In the counterfeit detection and positioning module, the system is based on the obtained JHM tensor
Figure BDA0002833805750000123
And
Figure BDA0002833805750000124
and PAF tensor
Figure BDA0002833805750000125
And
Figure BDA0002833805750000126
for video frame ImAnd carrying out counterfeit detection and positioning. In the video frame forgery detection stage, two JHM tensors are used
Figure BDA00028338057500001218
And
Figure BDA0002833805750000127
making difference to obtain JHM difference tensor DmIs composed of
Figure BDA0002833805750000128
The differential operation of the above formula can effectively retain the forged information and remove redundant irrelevant information. This is because, if the video frame ImNot forged, then
Figure BDA0002833805750000129
And
Figure BDA00028338057500001210
should be very similar, then the difference between them should be small. Instead, they should be very different. Based on JHM difference tensor, it can be input into a two-class detection network to estimate ImProbability of being attacked. In particular, a two-class detection network is denoted
Figure BDA00028338057500001211
Its output is a probability vector
Figure BDA00028338057500001212
Figure BDA00028338057500001213
Is obtained by
Figure BDA00028338057500001214
The system may then be based on
Figure BDA00028338057500001215
And
Figure BDA00028338057500001216
the value of (2) makes a decision:
Figure BDA00028338057500001217
in this embodiment, a forgery probability threshold value greater than 0.5 is determined as forgery. Once a forgery attack is detected, the video surveillance system responds to the forgery attack in a timely manner, such as early warning. And further positioning the abnormal human body target in the forged area, thereby accurately tracking the track and the behavior of the invader.
As shown in fig. 5, the present invention provides a positioning method, including: adopting the detection method to forge and detect the monitoring video stream; calculating the absolute value of the difference value between the human body joint heat map of the current monitoring video frame and the human body joint heat map of the wireless signal, and the PAF tensor and value for the monitoring video frame with the detection result of forgery; selecting an abnormal human body joint point set corresponding to the current monitoring video frame based on the absolute value of the difference value of the human body joint heat map; and performing joint point correlation operation on the PAF tensor and value of the current monitoring video frame and the abnormal human body joint point set to obtain the correlation state between the abnormal human body joint points, thereby determining the position of the forged human body object in the current monitoring video frame.
To do this, the JHM difference tensor D is first differentiatedmSelecting candidate human body joint points. Due to DmIs that
Figure BDA0002833805750000131
And
Figure BDA0002833805750000132
so that its value has a positive or negative value. Thus, the system is based on DmThe absolute value of each element in the set is subjected to non-maximum suppression operation, and a candidate joint point set is selected:
Figure BDA0002833805750000133
wherein N isjRepresents the number of j-th joint points,
Figure BDA0002833805750000134
is the nth candidate point for the jth joint point. Then, based on the two PAF tensors
Figure BDA0002833805750000135
And
Figure BDA0002833805750000136
will be provided with
Figure BDA0002833805750000137
The candidate joint points in (1) are associated, so as to select the abnormal human body target. For this purpose, the system first
Figure BDA0002833805750000138
And
Figure BDA0002833805750000139
summing yields the PAF and tensor:
Figure BDA00028338057500001310
then based on
Figure BDA00028338057500001311
And
Figure BDA00028338057500001312
and estimating the area and the posture of the abnormal human body target. In particular, a human body posture can be characterized as a set of associated joint points, and
Figure BDA00028338057500001313
best mode of association
Figure BDA00028338057500001314
Can be obtained in the following way
Figure BDA00028338057500001315
Wherein the content of the first and second substances,
Figure BDA00028338057500001316
representing the correlation function. In addition, the first and second substrates are,
Figure BDA00028338057500001317
each element in (1) is a binary variable
Figure BDA00028338057500001318
And indicate the k-th1A node and a kth node2The association status of the individual nodes. Based on the operation, the system can obtain the position and the state of the abnormal human body target, so that the positioning of the forged area is realized.
Fig. 6 shows the counterfeit positioning effect under interframe attack, where the upper half represents the original video frame, the lower half represents the counterfeit video frame, and the positioning result (the associated human joint point) is superimposed on the counterfeit frame. The original video frame in fig. 6 is replaced by a video frame that does not contain a human object. The method provided by the invention can effectively position all the fake human body targets under the interframe attack.
Fig. 7 shows the counterfeit positioning effect under intra-frame attack, where the upper half represents the original video frame, the lower half represents the counterfeit video frame, and the positioning result (the associated human joint point) is overlapped on the counterfeit frame. The left human target in the original video frame in fig. 7 is replaced by the background. The method provided by the invention can effectively position all the fake human body targets under intra-frame attack.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1. A surveillance video stream forgery detection method based on wireless signals is characterized by comprising the following steps:
s1, collecting wireless signals of a video monitoring area at equal time intervals in real time to obtain the same number of wireless signals corresponding to each monitoring video frame;
s2, extracting the heatmap JHM and the PAF tensors of the human body joint points from the monitoring video frames and the wireless signals through a deep neural network;
and S3, if the difference between the heatmap of the human body joint extracted from the monitoring video frame and the heatmap of the human body joint extracted from the wireless signal is large, judging that the current monitoring video frame is forged.
2. The detection method as claimed in claim 1, wherein in step S2, the pretrained openpos network is used to extract heatmap JHM and PAF tensors of human body joints from the monitored video frames; extracting the heatmap JHM and PAF tensors of human body joint points from wireless signals by adopting a pre-trained RF2Pose network;
the RF2Pose network includes:
the wireless signal conversion module is used for converting all wireless signals corresponding to each monitoring video frame into intermediate characteristics with the same length and width dimensions as those of the monitoring video frame and outputting the intermediate characteristics to a heatmap generation module and a PAF tensor generation module of human body joint points;
the heatmap generation module of the human body joint point is used for generating JHM of the human body joint point according to the intermediate features;
the PAF tensor generation module is used for generating the PAF tensor according to the intermediate features;
the training sample of the RF2Pose network is a wireless signal, and the label is JHM and PAF tensors which are output by the OpenPose network and correspond to human body joint points.
3. The detection method according to claim 2, wherein the wireless signal transformation module comprises a deconvolution network, a convolution network and a residual error network which are connected in series in sequence; the heatmap generation module of the human body joint point comprises a decoding network and a full convolution network which are sequentially connected in series, wherein the decoding network is a repeated structure of one layer of deconvolution layer and two layers of convolution layers; the PAF tensor generation module comprises a decoding network and a full convolution network which are sequentially connected in series, wherein the decoding network is a repeated structure of one layer of deconvolution layer and two layers of convolution layers.
4. The detection method according to claim 2 or 3, wherein the Loss function Loss of the RF2Pose network in the training phase is as follows:
Figure FDA0003305609900000021
Figure FDA0003305609900000022
Figure FDA0003305609900000023
Figure FDA0003305609900000024
Figure FDA0003305609900000025
wherein the content of the first and second substances,
Figure FDA0003305609900000026
representing the parameters of the RF2pos neural network,
Figure FDA0003305609900000027
and
Figure FDA0003305609900000028
respectively representing the loss functions of JHM and PAF corresponding to human body joint points,
Figure FDA0003305609900000029
respectively representing JHM and PAF tensors obtained by inputting the y training sample video frame into an OpenPose network,
Figure FDA00033056099000000210
respectively representing JHM and PAF tensors obtained by inputting a Y training sample wireless signal into an RF2Pose network, Y representing the number of training samples, J representing the number of connecting joint points abstracted by a human body, C representing the number of human body parts, h and w representing pixel point coordinates,
Figure FDA00033056099000000211
and
Figure FDA00033056099000000212
respectively representing weight factors, lambda, of JHM and PAF tensors pixel by pixel1、λ2、β1And beta2To represent
Figure FDA0003305609900000031
And
Figure FDA0003305609900000032
the balancing coefficients of the influence in the overall objective function,
Figure FDA0003305609900000033
representing the confidence of the jth joint point on the visual JHM pixel point (g, w),
Figure FDA0003305609900000034
representing the confidence of the jth joint point on the wireless JHM pixel point (g, w),
Figure FDA0003305609900000035
showing that the c-th human body part is in a visual PAF imagePosition and orientation information on the pixel points (g, w),
Figure FDA0003305609900000036
and (3) representing the position and direction information of the c-th human body part on the wireless PAF pixel points (g, w).
5. The detecting method according to claim 1, wherein in step S3, based on the similarity or difference of the heatmap of the human body joint points, a threshold is set or two classifiers are constructed to determine the difference between the heatmap of the human body joint points extracted from the monitoring video frame and the heatmap of the human body joint points extracted from the wireless signal.
6. A monitoring video stream forgery location method based on wireless signals is characterized by comprising the following steps:
(T1) performing a forgery detection of the surveillance video stream using the detection method according to any one of claims 1 to 5;
(T2) for the surveillance video frame whose detection result is falsification, calculating a difference absolute value between a heatmap of a human body joint extracted from the current surveillance video frame and a heatmap of a human body joint extracted from the wireless signal, and calculating a sum of a PAF tensor of the human body joint extracted from the current surveillance video frame and a PAF tensor of the human body joint extracted from the wireless signal;
(T3) selecting an abnormal human body joint point set corresponding to the current monitoring video frame based on the difference absolute value of the heatmap of the human body joint points;
(T4) performing joint point association operation on the PAF tensor and value calculated by the current monitoring video frame and the abnormal human body joint point set to obtain an association state between the abnormal human body joint points, thereby determining the position of the forged human body object in the current monitoring video frame.
7. The method according to claim 6, wherein in step (T3), the abnormal human joint set corresponding to the current surveillance video frame is selected by a non-maximum suppression method.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores one or more first programs, which are executed by one or more processors to implement the steps of the wireless signal-based surveillance video stream forgery detection method according to any one of claims 1 to 5; or, the computer readable storage medium stores one or more second programs, which are executed by one or more processors to implement the steps of the wireless signal based surveillance video stream forgery location method according to claim 6 or 7.
CN202011471045.9A 2020-12-14 2020-12-14 Monitoring video stream forgery detection method and positioning method based on wireless signals Active CN112653886B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011471045.9A CN112653886B (en) 2020-12-14 2020-12-14 Monitoring video stream forgery detection method and positioning method based on wireless signals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011471045.9A CN112653886B (en) 2020-12-14 2020-12-14 Monitoring video stream forgery detection method and positioning method based on wireless signals

Publications (2)

Publication Number Publication Date
CN112653886A CN112653886A (en) 2021-04-13
CN112653886B true CN112653886B (en) 2021-12-03

Family

ID=75354109

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011471045.9A Active CN112653886B (en) 2020-12-14 2020-12-14 Monitoring video stream forgery detection method and positioning method based on wireless signals

Country Status (1)

Country Link
CN (1) CN112653886B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115357755B (en) * 2022-08-10 2023-04-07 北京百度网讯科技有限公司 Video generation method, video display method and device
CN115412726B (en) * 2022-09-02 2024-03-01 北京瑞莱智慧科技有限公司 Video authenticity detection method, device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8990587B1 (en) * 2005-12-19 2015-03-24 Rpx Clearinghouse Llc Method and apparatus for secure transport and storage of surveillance video
CN110084781A (en) * 2019-03-22 2019-08-02 西安电子科技大学 The passive evidence collecting method and system of monitor video tampering detection based on characteristic point
CN111652875A (en) * 2020-06-05 2020-09-11 西安电子科技大学 Video counterfeiting detection method, system, storage medium and video monitoring terminal

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6868174B2 (en) * 2000-11-29 2005-03-15 Xerox Corporation Anti-counterfeit detection for low end products
CN111967427A (en) * 2020-08-28 2020-11-20 广东工业大学 Fake face video identification method, system and readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8990587B1 (en) * 2005-12-19 2015-03-24 Rpx Clearinghouse Llc Method and apparatus for secure transport and storage of surveillance video
CN110084781A (en) * 2019-03-22 2019-08-02 西安电子科技大学 The passive evidence collecting method and system of monitor video tampering detection based on characteristic point
CN111652875A (en) * 2020-06-05 2020-09-11 西安电子科技大学 Video counterfeiting detection method, system, storage medium and video monitoring terminal

Also Published As

Publication number Publication date
CN112653886A (en) 2021-04-13

Similar Documents

Publication Publication Date Title
Arroyo et al. Expert video-surveillance system for real-time detection of suspicious behaviors in shopping malls
CN108447078B (en) Interference perception tracking algorithm based on visual saliency
CN112653886B (en) Monitoring video stream forgery detection method and positioning method based on wireless signals
CN109934177A (en) Pedestrian recognition methods, system and computer readable storage medium again
CN102811343A (en) Intelligent video monitoring system based on behavior recognition
CN101295405A (en) Portrait and vehicle recognition alarming and tracing method
CN106851049A (en) A kind of scene alteration detection method and device based on video analysis
Thenmozhi et al. RETRACTED: Adaptive motion estimation and sequential outline separation based moving object detection in video surveillance system
Karpagavalli et al. Estimating the density of the people and counting the number of people in a crowd environment for human safety
CN102254394A (en) Antitheft monitoring method for poles and towers in power transmission line based on video difference analysis
CN112101158A (en) Ship navigation auxiliary system and method based on deep learning and visual SLAM
Dong et al. Camera anomaly detection based on morphological analysis and deep learning
Avanzato et al. YOLOv3-based mask and face recognition algorithm for individual protection applications
Hou et al. Video road vehicle detection and tracking based on OpenCV
Brodsky et al. Visual surveillance in retail stores and in the home
Huang et al. Forgery attack detection in surveillance video streams using wi-fi channel state information
Hu et al. Anomaly detection in crowded scenes via sa-mhof and sparse combination
CN114092851A (en) Monitoring video abnormal event detection method based on time sequence action detection
Mahmoud A novel image fusion scheme using wavelet transform for concealed weapon detection
Rout et al. A novel five-frame difference scheme for local change detection in underwater video
Fu et al. Research on detection and recognition of abnormal behavior in video
San Miguel et al. A flood detection and warning system based on video content analysis
CN115205327B (en) Infrared small target tracking method fusing historical library information
Cho et al. A crowd-filter for detection of abandoned objects in crowded area
Kumar et al. Image Fusion Algorithm Based on Multi-Focus Image Fusion Using a Guided-Filter-Based Images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant