CN116994390A - Security monitoring system and method based on Internet of things - Google Patents

Security monitoring system and method based on Internet of things Download PDF

Info

Publication number
CN116994390A
CN116994390A CN202310881030.7A CN202310881030A CN116994390A CN 116994390 A CN116994390 A CN 116994390A CN 202310881030 A CN202310881030 A CN 202310881030A CN 116994390 A CN116994390 A CN 116994390A
Authority
CN
China
Prior art keywords
reaction state
feature
training
reaction
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310881030.7A
Other languages
Chinese (zh)
Inventor
余炳南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhangzhou Nolan Information Technology Co ltd
Original Assignee
Zhangzhou Nolan Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhangzhou Nolan Information Technology Co ltd filed Critical Zhangzhou Nolan Information Technology Co ltd
Priority to CN202310881030.7A priority Critical patent/CN116994390A/en
Publication of CN116994390A publication Critical patent/CN116994390A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/18Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
    • G08B13/189Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
    • G08B13/194Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
    • G08B13/196Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
    • G08B13/19602Image analysis to detect motion of the intruder, e.g. by frame subtraction
    • G08B13/19606Discriminating between target movement or movement in an area of interest and other non-signicative movements, e.g. target movements induced by camera shake or movements of pets, falling leaves, rotating fan
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/18Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
    • G08B13/189Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
    • G08B13/194Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
    • G08B13/196Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
    • G08B13/19602Image analysis to detect motion of the intruder, e.g. by frame subtraction
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/18Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
    • G08B13/189Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
    • G08B13/194Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
    • G08B13/196Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
    • G08B13/19602Image analysis to detect motion of the intruder, e.g. by frame subtraction
    • G08B13/19613Recognition of a predetermined image pattern or behaviour pattern indicating theft or intrusion
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B3/00Audible signalling systems; Audible personal calling systems
    • G08B3/10Audible signalling systems; Audible personal calling systems using electric transmission; using electromagnetic transmission

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Electromagnetism (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a security monitoring system and a security monitoring method based on the Internet of things. Through such a mode, can realize security protection control intelligently, improve the discernment precision to invading personnel, reduce the false alarm rate to avoid illegal personnel to get into indoor, guarantee indoor financial security.

Description

Security monitoring system and method based on Internet of things
Technical Field
The application relates to the field of intelligent monitoring, in particular to a security monitoring system and a security monitoring method based on the Internet of things.
Background
In recent years, with the improvement of the living standard of people, the importance of the household property safety is also increasing. However, conventional mechanical home defense systems have a number of hidden hazards and limitations. The traditional burglary-resisting door and window has the defects in the aspects of overall attractive appearance, emergency escape and the like, and cannot meet the pursuit of people on higher life quality.
Currently, many households have installed monitoring systems, typically network-connectable cameras deployed indoors to enable indoor environmental monitoring. However, such a system only provides simple video monitoring, requires manual viewing of the video for analysis to determine if someone else is invading, and is not convenient to use. To address this problem, some households began to employ intelligent security monitoring systems. The system mainly performs identity verification on indoor personnel through face recognition, and once personnel failing in verification are found, an alarm is triggered. However, in actual life, the visitors such as friends and relatives are inevitably visited indoors, the identity of the visitor needs to be recorded in advance by the monitoring system, otherwise, the visitor is easy to report by mistake, and inconvenience is brought to the user.
Therefore, an optimized security monitoring system based on the internet of things is desired.
Disclosure of Invention
The present application has been made to solve the above-mentioned technical problems. The embodiment of the application provides a security monitoring system and a security monitoring method based on the Internet of things, wherein after a camera detects that an unauthenticated person exists indoors, a preset scene simulation sound is automatically played to analyze behavior reaction of the unauthenticated person after hearing the scene simulation sound, whether the reaction of the unauthenticated person is abnormal is judged, and an alarm is sent out if the reaction is abnormal. Through such a mode, can realize security protection control intelligently, improve the discernment precision to invading personnel, reduce the false alarm rate to avoid illegal personnel to get into indoor, guarantee indoor financial security.
According to one aspect of the present application, there is provided a security monitoring system based on the internet of things, comprising:
the monitoring video acquisition module is used for acquiring a reaction state monitoring video of a monitored person after hearing a scene simulation sound through a camera;
the video semantic analysis module is used for extracting semantic features of the reaction state monitoring video to obtain reaction state semantic understanding features; and
and the personnel reaction state detection module is used for determining whether the reaction state of the monitored personnel is normal or not based on the reaction state semantic understanding characteristics.
According to another aspect of the present application, there is provided a security monitoring method based on the internet of things, including:
the method comprises the steps that a camera is used for collecting a reaction state monitoring video of a monitored person after hearing scene simulation sound;
extracting semantic features of the reaction state monitoring video to obtain reaction state semantic understanding features; and
and determining whether the reaction state of the monitored person is normal or not based on the reaction state semantic understanding characteristics.
Compared with the prior art, the security monitoring system and the security monitoring method based on the Internet of things, provided by the application, automatically play the preset scene simulation sound after the camera detects that the unauthorized person exists indoors to analyze the behavior reaction of the unauthorized person after hearing the scene simulation sound, judge whether the reaction of the unauthorized person is abnormal, and send out an alarm if the reaction is abnormal. Through such a mode, can realize security protection control intelligently, improve the discernment precision to invading personnel, reduce the false alarm rate to avoid illegal personnel to get into indoor, guarantee indoor financial security.
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing embodiments of the present application in more detail with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and together with the embodiments of the application, and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.
FIG. 1 is a block diagram of a security monitoring system based on the Internet of things according to an embodiment of the application;
fig. 2 is a system architecture diagram of a security monitoring system based on the internet of things according to an embodiment of the present application;
FIG. 3 is a block diagram of a training phase of a security monitoring system based on the Internet of things according to an embodiment of the application;
fig. 4 is a block diagram of a video semantic analysis module in a security monitoring system based on the internet of things according to an embodiment of the present application;
FIG. 5 is a block diagram of a reaction state semantic association coding unit in a security monitoring system based on the Internet of things according to an embodiment of the application;
FIG. 6 is a flow chart of a security monitoring method based on the Internet of things according to an embodiment of the application;
fig. 7 is a schematic view of a security monitoring system based on the internet of things according to an embodiment of the present application.
Detailed Description
Hereinafter, exemplary embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.
As used in the specification and in the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.
Although the present application makes various references to certain modules in a system according to embodiments of the present application, any number of different modules may be used and run on a user terminal and/or server. The modules are merely illustrative, and different aspects of the systems and methods may use different modules.
A flowchart is used in the present application to describe the operations performed by a system according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in order precisely. Rather, the various steps may be processed in reverse order or simultaneously, as desired. Also, other operations may be added to or removed from these processes.
Hereinafter, exemplary embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.
Currently, many households have installed monitoring systems, typically network-connectable cameras deployed indoors to enable indoor environmental monitoring. However, such a system only provides simple video monitoring, requires manual viewing of the video for analysis to determine if someone else is invading, and is not convenient to use. To address this problem, some households began to employ intelligent security monitoring systems. The system mainly performs identity verification on indoor personnel through face recognition, and once personnel failing in verification are found, an alarm is triggered. However, in actual life, the visitors such as friends and relatives are inevitably visited indoors, the identity of the visitor needs to be recorded in advance by the monitoring system, otherwise, the visitor is easy to report by mistake, and inconvenience is brought to the user. Therefore, an optimized security monitoring system based on the internet of things is desired.
In the technical scheme of the application, a security monitoring system based on the Internet of things is provided. Fig. 1 is a block diagram of a security monitoring system based on the internet of things according to an embodiment of the present application. Fig. 2 is a system architecture diagram of a security monitoring system based on the internet of things according to an embodiment of the present application. As shown in fig. 1 and 2, the security monitoring system 300 based on the internet of things according to the embodiment of the application includes the steps of: the monitoring video acquisition module 310 is used for acquiring a reaction state monitoring video of a monitored person after hearing a scene simulation sound through a camera; the video semantic analysis module 320 is configured to perform semantic feature extraction on the reaction state monitoring video to obtain reaction state semantic understanding features; and a person reaction status detection module 330, configured to determine whether the reaction status of the monitored person is normal based on the reaction status semantic understanding feature.
In particular, the monitoring video acquisition module 310 is configured to acquire, through a camera, a monitoring video of a reaction state of a monitored person after hearing a scene simulation sound. Wherein, the scene simulation sound refers to the generation or reproduction of sound effects to create a sound experience of a specific environment or scene. These sounds may be used to create a particular atmosphere, augment a virtual reality experience, provide a relaxed or focused background sound effect, and so forth. It should be noted that, during the selection and installation of the cameras, the camera capable of recording sound should be selected, so as to record the reaction state of the monitored person after hearing the scene simulation sound and distinguish the different scene simulation sounds; the stability of the camera should be guaranteed during installation.
Accordingly, in one possible implementation, the following steps may be used to collect, by the camera, a reaction state monitoring video of the monitored person after hearing the scene simulation sound, for example: the installation and setting are carried out according to the specification. Ensuring that the camera equipment can work normally and is connected to a monitoring system or a computer; the image pickup apparatus is placed at a proper position so that the reaction state of the monitored person can be captured. Parameters such as angle, focal length and the like of the image pickup equipment are adjusted so as to ensure that the picture is clearly visible; the configuration is performed according to the operation guideline of the system. Setting parameters such as video recording time, storage position, video recording quality and the like so as to meet the demands of users; using a suitable device or software, the scene sound you want to simulate is played. Ensuring that the sound can be heard by the monitored person; starting the camera equipment or the monitoring system, and starting to record the reaction state of the monitored personnel. The normal operation of the camera equipment is ensured, and the activities and expressions of the monitored personnel can be captured; and observing the monitoring video, and recording the reaction state of the monitored person after hearing the scene simulation sound. Note observing changes in expression, posture, language, etc.; and storing the monitoring video in a safe position and analyzing according to the requirement. You can play back the video, analyze the reaction state of the monitored personnel, so as to obtain the relevant information and insight.
In particular, the video semantic analysis module 320 is configured to perform semantic feature extraction on the reaction state monitoring video to obtain reaction state semantic understanding features. In particular, in one specific example of the present application, as shown in fig. 4, the video semantic analysis module 320 includes: the monitoring video segmentation unit 321 is configured to perform video segmentation on the reaction state monitoring video to obtain a plurality of reaction state monitoring video segments; a reaction state feature extraction unit 322, configured to obtain a plurality of reaction state time sequence feature graphs by respectively performing feature extraction on the plurality of reaction state monitoring video segments by using a reaction state time sequence feature extractor based on the first deep neural network model; and a reaction state semantic association coding unit 323, configured to perform association analysis on the plurality of reaction state timing feature graphs to obtain a reaction state semantic understanding feature vector as the reaction state semantic understanding feature.
Correspondingly, the monitoring video slicing unit 321 is configured to perform video slicing on the reaction state monitoring video to obtain a plurality of reaction state monitoring video segments. It should be appreciated that the video slicing is performed on the reaction status monitoring video to obtain a plurality of reaction status monitoring video segments, so as to more fully understand the behavior pattern and dynamic change of the unauthenticated personnel, thereby more accurately analyzing the behavior response of the unauthenticated personnel to provide a more reliable security protection measure. Specifically, by segmenting the reaction status monitoring video into a plurality of segments, each segment can be independently analyzed to better observe and identify abnormal behavior of unauthorized persons. The method can improve the accuracy and the sensitivity of the system to the behaviors of the unauthorized personnel, thereby more effectively judging whether the alarm reminding needs to be executed or not.
Accordingly, in one possible implementation manner, the reaction state monitoring video may be subjected to video slicing to obtain a plurality of reaction state monitoring video segments, for example: importing the reaction state monitoring video into a computer or a server so as to carry out subsequent processing; preprocessing the imported video, including removing noise, adjusting brightness and contrast, and the like, to improve the effect of subsequent processing. The method comprises the steps of carrying out a first treatment on the surface of the Face detection is performed on each frame in the video using a face detection algorithm, such as a deep learning based face detection model. This can help us determine if a face is present in the video and can locate the position of the face; key frames are extracted from each frame in which a face is detected. The key frame is the frame which can represent the whole video content most in the video, and is usually the frame with large facial expression or action change; and extracting the characteristics of each key frame. Various computer vision algorithms and deep learning models may be used to extract features such as facial expression features, motion features, etc.; the extracted features are encoded for subsequent feature analysis and comparison. Common encoding methods include vectorization, reduction and normalization; analyzing the encoded features, and analyzing the relationship and mode among the features by using a clustering algorithm, a classification algorithm or other machine learning methods; and cutting the video into a plurality of reaction state monitoring video fragments according to the result of the feature analysis. The segmentation can be performed according to specific criteria or rules, such as time intervals, feature similarities, etc.; and saving or outputting the segmented video clips to a designated position so as to further analyze, insight and decision.
Correspondingly, the reaction state feature extraction unit 322 is configured to perform feature extraction on the plurality of reaction state monitoring video segments by using a reaction state time sequence feature extractor based on the first deep neural network model, so as to obtain a plurality of reaction state time sequence feature graphs. In particular, in one specific example of the present application, the first deep neural network model is a three-dimensional convolutional neural network model. It should be understood that, since the time sequence dynamic change characteristic information about the action response state of the unauthenticated person exists in each of the response state monitoring video clips, that is, the action response state of the unauthenticated person has the time sequence correlation characteristic in the time dimension. Therefore, in the technical scheme of the application, the plurality of reaction state monitoring video segments are respectively encoded in the reaction state time sequence feature extractor based on the three-dimensional convolutional neural network model so as to respectively extract time sequence dynamic association feature information of the behavior reaction state of the unauthenticated personnel in the each reaction state monitoring video segment in a time dimension, thereby obtaining a plurality of reaction state time sequence feature diagrams. Thus, the method is beneficial to extracting the action response detail action semantic feature information of the unauthorized personnel in each local segment in the response state monitoring video, and can improve the accuracy and sensitivity of the behavior recognition of the unauthorized personnel, thereby improving the response state detection accuracy of the monitored personnel.
According to an embodiment of the present application, the plurality of reaction state monitoring video segments are respectively passed through a reaction state time sequence feature extractor based on a three-dimensional convolutional neural network model to obtain a plurality of reaction state time sequence feature diagrams, including: each layer of the three-dimensional convolutional neural network model-based reaction state time sequence feature extractor is used for respectively carrying out input data in forward transfer of the layer: carrying out convolution processing on input data to obtain a convolution characteristic diagram; pooling the convolution feature images based on the local feature matrix to obtain pooled feature images; performing nonlinear activation on the pooled feature map to obtain an activated feature map; the output of the last layer of the response state time sequence feature extractor based on the three-dimensional convolution neural network model is the plurality of response state time sequence feature graphs, and the input of the first layer of the response state time sequence feature extractor based on the three-dimensional convolution neural network model is the plurality of response state monitoring video clips.
Notably, the three-dimensional convolutional neural network (3D Convolutional Neural Network,3D CNN) is a deep learning model for processing video, time series, or three-dimensional data. Compared with a traditional two-dimensional convolutional neural network (2D CNN), the 3D CNN considers time dimension information in the convolutional operation, and can better capture time sequence characteristics in data. The 3D CNN has achieved remarkable results in the fields of video classification, motion recognition, medical image analysis, etc., and is widely used in actual scenes.
It should be noted that, in other specific examples of the present application, the plurality of reaction state time sequence feature patterns may be obtained by performing feature extraction on the plurality of reaction state monitoring video segments by using a reaction state time sequence feature extractor based on the first deep neural network model in other manners, for example: and preprocessing each reaction state monitoring video segment, including video format conversion, resolution adjustment and the like. Ensuring that the video clips have consistent format and quality; face detection is performed on each video frame using a face detection algorithm, such as a deep learning based face detection model. This can accurately locate the monitored object by identifying and locating the face in the video; key frames are extracted from each video clip. Key frames are representative frames in a video and may represent the content of the entire video segment. Key frames may be selected using a key frame extraction algorithm, such as a method based on image quality assessment or key frame density analysis; and extracting the characteristics of each key frame. The image features may be extracted using Convolutional Neural Networks (CNNs), for example using a pre-trained convolutional neural network model, such as VGGNet, resNet, and the like. These models can extract high-level semantic features from the image; the features of each key frame are encoded for subsequent feature analysis. The high-dimensional features may be encoded into a low-dimensional representation using an encoding method, such as Principal Component Analysis (PCA) or Local Binary Pattern (LBP), etc.; the encoded features are analyzed to understand and analyze the reaction status of the monitored person. Machine learning or deep learning models, such as Support Vector Machines (SVMs), recurrent Neural Networks (RNNs), or three-dimensional convolutional neural networks (3D CNNs), etc., may be used to classify or regression analyze features.
Correspondingly, the reaction state semantic association coding unit 323 is configured to perform association analysis on the plurality of reaction state timing feature graphs to obtain a reaction state semantic understanding feature vector as the reaction state semantic understanding feature. In particular, in one specific example of the present application, as shown in fig. 5, the reaction state semantic association encoding unit 323 includes: a feature map expanding subunit 3231, configured to expand the plurality of reaction state timing feature maps into reaction state timing feature vectors, respectively, so as to obtain a plurality of reaction state timing feature vectors; and a reaction state global associated semantic understanding subunit 3232, configured to perform global associated semantic understanding on the plurality of reaction state timing feature vectors by using a reaction state semantic understanding device based on the second deep neural network model, so as to obtain the reaction state semantic understanding feature vector.
The feature map expanding subunit 3231 is configured to expand the plurality of response state timing feature maps into response state timing feature vectors respectively to obtain a plurality of response state timing feature vectors. And considering that time sequence change characteristic information related to behavior reaction detail semantics of unauthenticated personnel in each reaction state monitoring video segment has a global-based association relation in the whole reaction state monitoring video. Therefore, in order to more comprehensively understand the behavior pattern and dynamic change of the unauthenticated personnel, in the technical scheme of the present application, it is necessary to further develop the plurality of reaction state time sequence feature diagrams into reaction state time sequence feature vectors to obtain a plurality of reaction state time sequence feature vectors.
Accordingly, in one possible implementation, the plurality of reaction state timing feature diagrams may be respectively expanded into reaction state timing feature vectors to obtain a plurality of reaction state timing feature vectors, for example: each feature map represents the timing characteristics of a reaction status monitoring video clip. These feature maps may be two-dimensional matrices, where each element represents a feature at a certain point in time; each feature map is expanded or compressed. This can be accomplished by the steps of: dividing a time axis: the time axis of each feature map is divided into time windows of fixed length. This may be achieved by defining the size and overlap ratio of the time windows. For example, each time window may represent a 1 second feature; and (3) feature map expansion: for each time window, the features in the feature map are expanded into a vector. Each row or column of the feature map may be connected into a vector using a flattening operation. Thus, each time window corresponds to a feature vector; feature vector compression: if the dimension of the feature map is high, the feature vector can be compressed by using a dimension reduction method to reduce the dimension and extract the most important features. For example, principal Component Analysis (PCA) or other dimension reduction techniques may be used to extract the dominant features; extracting time sequence feature vectors from each time sequence feature graph of the reaction states to obtain a plurality of time sequence feature vectors of the reaction states. Each feature vector represents a feature within a time window that reflects the reaction status of the monitored person over that time period.
The reaction state global associated semantic understanding subunit 3232 is configured to perform global associated semantic understanding on the plurality of reaction state timing feature vectors by using a reaction state semantic understanding device based on the second deep neural network model to obtain the reaction state semantic understanding feature vector. In particular, in one specific example of the present application, the second deep neural network model is a recurrent neural network model. The plurality of response state time sequence feature vectors are encoded in a response state semantic understanding device based on a cyclic neural network model, so that response state time sequence change features of the unauthorized person after hearing the scene simulation sound in each video segment of the response state monitoring video are extracted based on global relevance feature distribution information, and the response state semantic understanding feature vectors are obtained.
Notably, the recurrent neural network (Recurrent Neural Network, RNN) is a neural network model for processing sequence data. Unlike conventional feed forward neural networks, RNNs have a loop connection that enables information to be transferred and persisted in the network. A key feature of RNN is that it can handle input sequences of arbitrary length and can capture time-dependent relationships in the sequence. Each time step, the RNN receives an input and a hidden state (hidden state) and outputs a predicted value and updated hidden state. The hidden state can be seen as a memory of the network for previous inputs.
It should be noted that, in other specific examples of the present application, the reaction state semantic understanding feature vectors may be obtained by performing global associative semantic understanding on the plurality of reaction state timing feature vectors by using a reaction state semantic understanding device based on the second deep neural network model in other manners, for example: time sequence feature vector data comprising a plurality of reaction states is collected. The time sequence feature vector of each reaction state can be extracted from the monitoring video through the steps of preprocessing, face detection and the like; an appropriate deep neural network model, such as a three-dimensional convolutional neural network (3D CNN), is selected for processing the time series eigenvector data. The 3D CNN can effectively capture time sequence and space characteristics; the time sequence feature vector data is input into a deep neural network model, and features are extracted through a forward propagation process. The deep neural network model learns semantic features in the input data; and carrying out global association on the feature vectors of the reaction states to obtain semantic understanding feature vectors of the reaction states. This may be achieved by stitching, averaging, etc. a plurality of feature vectors; the resulting semantically understood feature vector of the reaction state is converted into an appropriate representation, such as a vector or matrix.
It should be noted that, in other specific examples of the present application, the semantic feature extraction may be performed on the reaction state monitoring video in other manners to obtain a reaction state semantic understanding feature, for example: the monitoring video is converted into a format suitable for processing, and the format of the monitoring video can be common video formats such as MP4, AVI and the like. The video quality is ensured to be good, and the video is clearly visible; the video is segmented into appropriate time periods or scenes as needed. For example, the video may be divided into multiple segments according to different reaction states or events for better analysis and understanding; a face in the video is detected using a face detection algorithm. This can help you locate facial features of the monitored person for subsequent feature extraction and analysis; key frames, i.e., important frames representing video content, are extracted from the video. These key frames can be used for subsequent feature extraction and analysis, reducing the computational effort; semantic features in the video are extracted using computer vision and deep learning techniques. Features of the image or sequence data may be extracted using a pre-trained deep learning model, such as a Convolutional Neural Network (CNN) or a Recurrent Neural Network (RNN). For example, features such as facial expression, action pose, gaze direction, etc. may be extracted; the extracted semantic features are encoded for subsequent feature analysis and processing. Techniques such as Principal Component Analysis (PCA) or Locality Sensitive Hashing (LSH) may be used to reduce feature dimensions, improve feature expressive power and computational efficiency; the encoded features are analyzed and understood. The features may be further mined and inferred using machine learning algorithms such as clustering, classification, timing analysis, and the like. For example, different reaction states, moods, or patterns of behavior may be identified; semantic understanding features of the reaction state are extracted from the results of the feature analysis as needed. This may include understanding and expression of mood, intent, attitude, etc.
In particular, the person reaction status detection module 330 is configured to determine whether the reaction status of the monitored person is normal based on the reaction status semantic understanding feature. In particular, in one specific example of the present application, the person reaction state detection module 330 includes: and the reaction state semantic understanding feature vector passes through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the reaction state of the monitored personnel is normal or not. That is, the reaction state semantic understanding feature vector is passed through a classifier to obtain a classification result, which is used to indicate whether the reaction state of the monitored person is normal. That is, the time sequence global associated characteristic information of the reaction state of the monitored person in the time dimension after hearing the scene simulation sound is used for carrying out classification processing, so that the reaction state of the monitored person is detected, and an alarm is sent out when the abnormal reaction of the person is detected. Through the mode, security monitoring can be intelligently realized, the recognition accuracy of invading personnel is improved, and the false alarm rate is reduced.
According to the embodiment of the application, the reaction state semantic understanding feature vector is passed through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the reaction state of the monitored personnel is normal or not, and the method comprises the following steps: performing full-connection coding on the reaction state semantic understanding feature vector by using a plurality of full-connection layers of the classifier to obtain a coding classification feature vector; and passing the coding classification feature vector through a Softmax classification function of the classifier to obtain the classification result.
A Classifier (Classifier) refers to a machine learning model or algorithm that is used to classify input data into different categories or labels. The classifier is part of supervised learning, which performs classification tasks by learning mappings from input data to output categories.
The fully connected layer (Fully Connected Layer) is one type of layer commonly found in neural networks. In the fully connected layer, each neuron is connected to all neurons of the upper layer, and each connection has a weight. This means that each neuron in the fully connected layer receives inputs from all neurons in the upper layer, and weights these inputs together, and then passes the result to the next layer. Fully connected layers are typically used in the last layers of the neural network to classify or regress features extracted from the previous layers. The output of the fully-connected layer can be subjected to nonlinear transformation through an activation function so as to increase the expression capacity of the network.
The Softmax classification function is a commonly used activation function for multi-classification problems. It converts each element of the input vector into a probability value between 0 and 1, and the sum of these probability values equals 1. The Softmax function is commonly used at the output layer of a neural network, and is particularly suited for multi-classification problems, because it can map the network output into probability distributions for individual classes.
It should be noted that, in other specific examples of the present application, it may also be determined whether the reaction state of the monitored person is normal based on the reaction state semantic understanding feature in other ways, for example: acquiring video data of monitored personnel, and ensuring video quality and definition; preprocessing the video, including denoising, debouncing, video segmentation and other operations, so as to improve the effect of subsequent processing; extracting a face region of a monitored person from the video by using a face detection algorithm such as Haar cascade, MTCNN and the like; the key frames are extracted from the video, namely, the important frames representing the video content, and key frame extraction algorithms can be used, such as methods based on inter-frame difference, key point detection and the like; semantic features are extracted from the keyframes using a deep learning model, such as a three-dimensional convolutional neural network (3D CNN). The 3D CNN is capable of capturing temporal and spatial features in video; encoding the extracted semantic features, the high-dimensional features may be converted into a low-dimensional representation using encoding methods such as Principal Component Analysis (PCA), self-encoder, etc.; analyzing the coded characteristics, and determining the reaction state of the monitored personnel by using methods such as clustering, classification, regression and the like; and judging whether the reaction state of the monitored personnel is normal or not according to the result of the feature analysis. A certain threshold or criterion may be set to make the judgment, such as the number of abnormal features, the degree of abnormal features, etc.
It should be appreciated that training of the three-dimensional convolutional neural network model-based reaction state timing feature extractor, the cyclic neural network model-based reaction state semantic comprehender, and the classifier is required prior to the inference using the neural network models described above. That is, the computer information management method based on artificial intelligence of the present application further comprises a training stage for training the three-dimensional convolutional neural network model-based reaction state timing feature extractor, the cyclic neural network model-based reaction state semantic comprehener, and the classifier.
Fig. 3 is a block diagram of a training phase of a security monitoring system based on the internet of things according to an embodiment of the present application. As shown in fig. 3, a security monitoring system 300 based on the internet of things according to an embodiment of the present application includes: training phase 400, comprising: the training data acquisition unit 410 is configured to acquire training data, where the training data includes a training response state monitoring video of a monitored person after hearing a scene simulation sound, and a true value of whether the response state of the monitored person is normal; the training video segmentation unit 420 is configured to perform video segmentation on the training response state monitoring video to obtain a plurality of training response state monitoring video segments; the training local reaction state time sequence feature extraction unit 430 is configured to pass the plurality of training reaction state monitoring video segments through the three-dimensional convolutional neural network model-based reaction state time sequence feature extractor to obtain a plurality of training reaction state time sequence feature graphs; a training feature map expanding unit 440, configured to expand the plurality of training response state timing feature maps into training response state timing feature vectors respectively to obtain a plurality of training response state timing feature vectors; the training response state global association coding unit 450 is configured to pass the plurality of training response state time sequence feature vectors through the response state semantic understanding device based on the recurrent neural network model to obtain training response state semantic understanding feature vectors; and a classification loss unit 460, configured to pass the training reaction state semantic understanding feature vector through the classifier to obtain a classification loss function value; the self-association unit 470 is configured to calculate a vector multiplication of the training reaction state semantic understanding feature vector and a transpose vector of the training reaction state semantic understanding feature vector to obtain an association feature matrix; a manifold convex decomposition consistency loss unit 480 for calculating a manifold convex decomposition consistency factor of the correlation feature matrix to obtain a manifold convex decomposition consistency loss function value; the model training unit 490 is configured to train the three-dimensional convolutional neural network model-based reaction state timing feature extractor, the cyclic neural network model-based reaction state semantic understanding unit, and the classifier by using a weighted sum of the classification loss function value and the manifold convex decomposition consistency loss function value as a loss function value and by back propagation of gradient descent.
In particular, in the technical solution of the present application, here, considering that each reaction state timing feature vector expresses the image semantic feature associated with the spatiotemporal fusion of the corresponding reaction state monitoring video segment, when the reaction state semantic understanding feature vector is obtained through the reaction state semantic understanding device based on the cyclic neural network model, the context associated feature based on the feature vector granularity of the reaction state timing feature vector can be extracted, but it is still expected that the reaction state semantic understanding feature vector can express finer and more global associated features. Thus, the applicant of the present application first calculates the position-by-position association of the reaction state semantic understanding feature vector with its own transpose vector to obtain an associated feature matrixConsidering that the correlation feature matrix can express position-by-position correlation of feature value granularity of the reaction state semantic understanding feature vector, the correlation feature matrix can be made +.>Manifold expressions in a high-dimensional feature space are consistent in the full-space correlation dimension and the feature value-by-feature value correlation dimension, so that the correlation feature matrix can be improved >Is a global associative expression effect. Based on this, the correlation feature matrix is +.>The manifold convex decomposition consistency factor of the feature matrix is introduced as a loss function, and is specifically expressed as follows:
wherein->A +.o representing the correlation characteristic matrix>Characteristic value of the location->And->The first part is the correlation characteristic matrix +.>Mean vector and diagonal vector of individual row vectors, < ->Representing a norm of the vector,/->Frobenius norms of the matrix are represented, < >>Is the length of the feature vector, and +.>、/>And->Is a weight superparameter,/->Representation->Function (F)>Representing the manifold convex decomposition consistency loss function value. That is, taking into account the correlation characteristic matrixThe row or column dimension of the reaction state semantic understanding feature vector expresses the relevance of each feature value of the reaction state semantic understanding feature vector to the feature vector whole, and the diagonal dimension expresses the self-relevance of each feature value of the reaction state semantic understanding feature vector, the manifold convex decomposition consistency factor is specific to the relevance feature matrix->Distribution relevance in sub-dimensions represented by row direction and diagonal direction is determined by the relevance feature matrix +.>The represented feature manifold is geometrically convex decomposed to flatten a set of finite convex polygons of manifolds in different dimensions and constrain the geometric convex decomposition in the form of sub-dimension associated shape weights to facilitate the association feature matrix >The consistency of the convex geometric representation of the feature manifold in the resolvable dimensions represented by the rows and diagonals, such that the correlation feature matrix +.>Manifold representations within a high-dimensional feature space remain consistent across spatially-dependent dimensions. In this way, gradients are passed back through the correlation feature matrix as the model trainsIn this case, the acquisition of the correlation characteristic matrix by self-correlation is promoted>Is a representation of feature vectors for finer and more global associated features. Therefore, after the unauthenticated personnel exist in a room, the preset scene simulation sound is automatically played to analyze the behavior reaction of the unauthenticated personnel after hearing the scene simulation sound, whether the reaction of the unauthenticated personnel is abnormal or not is judged, and an alarm is sent out when the abnormal reaction of the unauthenticated personnel is detected, so that the security monitoring is intelligently realized, the recognition accuracy of the intruder is improved, the false alarm rate is reduced, illegal personnel can be prevented from entering the room, and the indoor financial security is ensured.
As described above, the security monitoring system 300 based on the internet of things according to the embodiment of the present application may be implemented in various wireless terminals, for example, a server having a security monitoring algorithm based on the internet of things, and the like. In one possible implementation, the security monitoring system 300 based on the internet of things according to the embodiment of the present application may be integrated into a wireless terminal as one software module and/or hardware module. For example, the security monitoring system 300 based on the internet of things may be a software module in the operating system of the wireless terminal, or may be an application developed for the wireless terminal; of course, the security monitoring system 300 based on the internet of things can be one of numerous hardware modules of the wireless terminal.
Alternatively, in another example, the security monitoring system 300 based on the internet of things and the wireless terminal may be separate devices, and the security monitoring system 300 based on the internet of things may be connected to the wireless terminal through a wired and/or wireless network and transmit the interaction information according to the agreed data format.
Further, a security monitoring method based on the Internet of things is also provided.
Fig. 6 is a flowchart of a security monitoring method based on the internet of things according to an embodiment of the application. As shown in fig. 6, a security monitoring method based on internet of things according to an embodiment of the present application includes: s110, collecting a reaction state monitoring video of a monitored person after hearing a scene simulation sound through a camera; s120, extracting semantic features of the reaction state monitoring video to obtain reaction state semantic understanding features; and S130, determining whether the reaction state of the monitored personnel is normal or not based on the reaction state semantic understanding characteristics.
In summary, the security monitoring method based on the internet of things according to the embodiment of the application is explained, which automatically plays a preset scene simulation sound after a camera detects that an unauthenticated person exists indoors to analyze a behavior reaction of the unauthenticated person after hearing the scene simulation sound, judges whether the reaction of the unauthenticated person is abnormal, and sends out an alarm if the reaction is abnormal. Through such a mode, can realize security protection control intelligently, improve the discernment precision to invading personnel, reduce the false alarm rate to avoid illegal personnel to get into indoor, guarantee indoor financial security.
Fig. 7 is a schematic view of a security monitoring system based on the internet of things according to an embodiment of the present application. As shown in fig. 7, in this application scene, a reaction state monitoring video of a monitored person after hearing a scene simulation sound is acquired through a camera (e.g., C as illustrated in fig. 7). Then, the monitoring video is input to a server (for example, S in fig. 7) deployed with a security monitoring algorithm based on the internet of things, where the server can process the input monitoring video with the security monitoring algorithm based on the internet of things to generate a classification result for indicating whether the reaction state of the monitored person is normal.
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (10)

1. Security monitoring system based on thing networking, its characterized in that includes:
the monitoring video acquisition module is used for acquiring a reaction state monitoring video of a monitored person after hearing a scene simulation sound through a camera;
the video semantic analysis module is used for extracting semantic features of the reaction state monitoring video to obtain reaction state semantic understanding features; and
and the personnel reaction state detection module is used for determining whether the reaction state of the monitored personnel is normal or not based on the reaction state semantic understanding characteristics.
2. The artificial intelligence based computer information management method of claim 1, wherein the video semantic analysis module comprises:
the monitoring video segmentation unit is used for carrying out video segmentation on the reaction state monitoring video to obtain a plurality of reaction state monitoring video fragments;
the response state feature extraction unit is used for respectively carrying out feature extraction on the plurality of response state monitoring video segments through a response state time sequence feature extractor based on the first deep neural network model so as to obtain a plurality of response state time sequence feature graphs; and
and the reaction state semantic association coding unit is used for carrying out association analysis on the plurality of reaction state time sequence feature graphs to obtain a reaction state semantic understanding feature vector as the reaction state semantic understanding feature.
3. The artificial intelligence based computer information management method of claim 2, wherein the first deep neural network model is a three-dimensional convolutional neural network model.
4. The artificial intelligence based computer information management method of claim 3, wherein the reaction state semantic association encoding unit comprises:
a feature map expanding subunit, configured to expand the plurality of reaction state timing feature maps into reaction state timing feature vectors respectively to obtain a plurality of reaction state timing feature vectors; and
and the reaction state global association semantic understanding subunit is used for carrying out global association semantic understanding on the plurality of reaction state time sequence feature vectors through a reaction state semantic understanding device based on the second deep neural network model so as to obtain the reaction state semantic understanding feature vectors.
5. The artificial intelligence based computer information management method of claim 4, wherein the second deep neural network model is a recurrent neural network model.
6. The computer information management method based on artificial intelligence according to claim 5, wherein the person reaction state detection module is configured to: and the reaction state semantic understanding feature vector passes through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the reaction state of the monitored personnel is normal or not.
7. The artificial intelligence based computer information management method of claim 6, further comprising a training module for training the three-dimensional convolutional neural network model based reaction state timing feature extractor, the cyclic neural network model based reaction state semantic comprehener, and the classifier.
8. The artificial intelligence based computer information management method of claim 7, wherein the training module comprises:
the training data acquisition unit is used for acquiring training data, wherein the training data comprises a training response state monitoring video of a monitored person after hearing a scene simulation sound and a true value of whether the response state of the monitored person is normal or not;
the training video segmentation unit is used for carrying out video segmentation on the training reaction state monitoring video to obtain a plurality of training reaction state monitoring video fragments;
the training local reaction state time sequence feature extraction unit is used for enabling the plurality of training reaction state monitoring video fragments to respectively pass through the three-dimensional convolutional neural network model-based reaction state time sequence feature extractor so as to obtain a plurality of training reaction state time sequence feature diagrams;
The training feature diagram expanding unit is used for expanding the training response state time sequence feature diagrams into training response state time sequence feature vectors respectively to obtain a plurality of training response state time sequence feature vectors;
the training response state global associated coding unit is used for enabling the plurality of training response state time sequence feature vectors to pass through the response state semantic understanding device based on the cyclic neural network model to obtain training response state semantic understanding feature vectors; and
the classification loss unit is used for enabling the training reaction state semantic understanding feature vector to pass through the classifier to obtain a classification loss function value;
the self-correlation unit is used for calculating the vector multiplication of the training reaction state semantic understanding feature vector and the transpose vector of the training reaction state semantic understanding feature vector to obtain a correlation feature matrix;
the manifold convex decomposition consistency loss unit is used for calculating manifold convex decomposition consistency factors of the correlation feature matrix to obtain manifold convex decomposition consistency loss function values;
the model training unit is used for training the three-dimensional convolutional neural network model-based reaction state time sequence feature extractor, the cyclic neural network model-based reaction state semantic comprehener and the classifier by taking the weighted sum of the classification loss function value and the manifold convex decomposition consistency loss function value as the loss function value and through back propagation of gradient descent.
9. The artificial intelligence based computer information management method of claim 8, wherein the manifold convex decomposition consistency loss unit is configured to: calculating a manifold convex decomposition consistency factor of the correlation feature matrix according to the following loss formula to obtain a manifold convex decomposition consistency loss function value;
wherein->A +.o representing the correlation characteristic matrix>Characteristic value of the location->And->The first part is the correlation characteristic matrix +.>Mean vector and diagonal vector of individual row vectors, < ->Representing a norm of the vector,/->Frobenius norms of the matrix are represented, < >>Is the length of the feature vector, and +.>、/>And->Is a weight superparameter,/->Representation->Function (F)>Representing the manifold convex decomposition consistency loss function value.
10. The security monitoring method based on the Internet of things is characterized by comprising the following steps of: the method comprises the steps that a camera is used for collecting a reaction state monitoring video of a monitored person after hearing scene simulation sound; extracting semantic features of the reaction state monitoring video to obtain reaction state semantic understanding features; and determining whether the reaction state of the monitored person is normal or not based on the reaction state semantic understanding characteristic.
CN202310881030.7A 2023-07-18 2023-07-18 Security monitoring system and method based on Internet of things Pending CN116994390A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310881030.7A CN116994390A (en) 2023-07-18 2023-07-18 Security monitoring system and method based on Internet of things

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310881030.7A CN116994390A (en) 2023-07-18 2023-07-18 Security monitoring system and method based on Internet of things

Publications (1)

Publication Number Publication Date
CN116994390A true CN116994390A (en) 2023-11-03

Family

ID=88525831

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310881030.7A Pending CN116994390A (en) 2023-07-18 2023-07-18 Security monitoring system and method based on Internet of things

Country Status (1)

Country Link
CN (1) CN116994390A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117388893A (en) * 2023-12-11 2024-01-12 深圳市移联通信技术有限责任公司 Multi-device positioning system based on GPS
CN117392615A (en) * 2023-12-12 2024-01-12 南昌理工学院 Anomaly identification method and system based on monitoring video
CN117423197A (en) * 2023-12-18 2024-01-19 广东省华视安技术有限公司 Intelligent security software monitoring method and system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117388893A (en) * 2023-12-11 2024-01-12 深圳市移联通信技术有限责任公司 Multi-device positioning system based on GPS
CN117388893B (en) * 2023-12-11 2024-03-12 深圳市移联通信技术有限责任公司 Multi-device positioning system based on GPS
CN117392615A (en) * 2023-12-12 2024-01-12 南昌理工学院 Anomaly identification method and system based on monitoring video
CN117392615B (en) * 2023-12-12 2024-03-15 南昌理工学院 Anomaly identification method and system based on monitoring video
CN117423197A (en) * 2023-12-18 2024-01-19 广东省华视安技术有限公司 Intelligent security software monitoring method and system
CN117423197B (en) * 2023-12-18 2024-02-20 广东省华视安技术有限公司 Intelligent security software monitoring method and system

Similar Documents

Publication Publication Date Title
Yu et al. An online one class support vector machine-based person-specific fall detection system for monitoring an elderly individual in a room environment
Vishnu et al. Human fall detection in surveillance videos using fall motion vector modeling
Zhang et al. MoWLD: a robust motion image descriptor for violence detection
Zhou et al. Activity analysis, summarization, and visualization for indoor human activity monitoring
CN116994390A (en) Security monitoring system and method based on Internet of things
Kumar et al. The p-destre: A fully annotated dataset for pedestrian detection, tracking, and short/long-term re-identification from aerial devices
Charfi et al. Optimized spatio-temporal descriptors for real-time fall detection: comparison of support vector machine and Adaboost-based classification
Wu et al. Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes
US10366595B2 (en) Surveillance method and system based on human behavior recognition
JP2016072964A (en) System and method for subject re-identification
Tay et al. A robust abnormal behavior detection method using convolutional neural network
JP2004192637A (en) Face detection
Gul et al. Multi-view gait recognition system using spatio-temporal features and deep learning
Amiri et al. Non-intrusive human activity monitoring in a smart home environment
Huang et al. Deepfake mnist+: a deepfake facial animation dataset
Elbasi Reliable abnormal event detection from IoT surveillance systems
ALDHAMARI et al. Abnormal behavior detection using sparse representations through sequentialgeneralization of k-means
Jang et al. Detection of dangerous situations using deep learning model with relational inference
Vinavatani et al. Ai for detection of missing person
US20210352207A1 (en) Method for adapting the quality and/or frame rate of a live video stream based upon pose
Sarcar et al. Detecting violent arm movements using cnn-lstm
Srivastava et al. Real life violence detection in surveillance videos using spatiotemporal features
Yadav et al. Human Illegal Activity Recognition Based on Deep Learning Techniques
Yasin et al. Anomaly Prediction over Human Crowded Scenes via Associate-Based Data Mining and K-Ary Tree Hashing
Drosou et al. Event-based unobtrusive authentication using multi-view image sequences

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination