CN111582006A - Video analysis method and device - Google Patents

Video analysis method and device Download PDF

Info

Publication number
CN111582006A
CN111582006A CN201910121021.1A CN201910121021A CN111582006A CN 111582006 A CN111582006 A CN 111582006A CN 201910121021 A CN201910121021 A CN 201910121021A CN 111582006 A CN111582006 A CN 111582006A
Authority
CN
China
Prior art keywords
classification information
face
module
intercepted
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910121021.1A
Other languages
Chinese (zh)
Inventor
范慧慧
王天宇
高在伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN201910121021.1A priority Critical patent/CN111582006A/en
Priority to PCT/CN2020/074895 priority patent/WO2020168960A1/en
Publication of CN111582006A publication Critical patent/CN111582006A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a video analysis method and a video analysis device, wherein the method comprises the following steps: detecting a monitoring target in the collected video stream; intercepting a video image containing a monitoring target; identifying the intercepted video image to obtain the classification information of each monitoring target; therefore, in the scheme, the whole video stream is not classified and identified, but the video image containing the monitoring target is intercepted, and only the intercepted video image is classified and identified, so that the calculation amount is reduced.

Description

Video analysis method and device
Technical Field
The invention relates to the technical field of monitoring, in particular to a video analysis method and a video analysis device.
Background
In the related scheme, monitoring equipment is usually arranged in an area needing to be monitored, the monitoring equipment collects video streams, and whether personnel or vehicles illegally intruding into the area exist or not is judged by analyzing the video streams. In the scheme, the video stream is analyzed integrally, and the calculation amount is large.
Disclosure of Invention
An embodiment of the invention provides a video analysis method and device to reduce the amount of calculation.
To achieve the above object, an embodiment of the present invention provides a video analysis method, including:
detecting a monitoring target in the collected video stream;
intercepting a video image containing the monitoring target;
and identifying the intercepted video image to obtain the classification information of each monitoring target.
Optionally, the detecting a monitoring target in the acquired video stream includes: detecting a moving object in the acquired video stream;
the intercepting of the video image containing the monitoring target comprises:
and intercepting one or more frames of video images containing the moving target.
Optionally, the obtaining of the classification information of each monitoring target by identifying the intercepted video image includes:
inputting the intercepted video image into a first neural network model obtained by pre-training, and classifying moving targets in the video image by using the first neural network model to obtain classification information of each moving target output by the first neural network model.
Optionally, the detecting a monitoring target in the acquired video stream includes: carrying out face recognition in the collected video stream to obtain a recognition result;
the intercepting of the video image containing the monitoring target comprises:
intercepting a face area in an image containing a face according to the recognition result;
the step of obtaining the classification information of each monitoring target by identifying the intercepted video image comprises the following steps:
and matching the intercepted human face area with human face data stored in a human face database to obtain the classification information of the human face area.
Optionally, the obtaining of the classification information of the face region by matching the intercepted face region with the face data stored in the face database includes:
inputting the intercepted human face area into a second neural network model obtained by pre-training, and converting the human face area into modeling data by using the second neural network model;
obtaining classification information of the face region by matching the modeling data with face data stored in a face database, wherein the classification information comprises: and the face data successfully matched with the modeling data exists or does not exist in the face database.
Optionally, after the classification information of each monitoring target is obtained by identifying the intercepted video image, the method further includes:
judging whether the classification information meets a preset alarm condition or not;
and if the data are matched, outputting alarm information.
Optionally, after obtaining the classification information of each moving object output by the first neural network model, the method further includes:
judging whether the classification information meets a preset alarm condition or not; if the data are in accordance with the preset data, outputting alarm information;
the preset alarm condition comprises the following steps:
the classification information of the moving target is personnel; or the classification information of the moving target is a vehicle.
Optionally, after the obtaining of the classification information of the face region, the method further includes:
judging whether the classification information meets a preset alarm condition or not; if the data are in accordance with the preset data, outputting alarm information;
the preset alarm condition comprises the following steps:
the classification information of the face region is as follows: and the face data successfully matched with the modeling data exists or does not exist in the face database.
In order to achieve the above object, an embodiment of the present invention further provides a video analysis apparatus, including:
the detection module is used for detecting a monitoring target in the acquired video stream;
the intercepting module is used for intercepting a video image containing the monitoring target;
and the classification module is used for identifying the intercepted video image to obtain the classification information of each monitoring target.
Optionally, the detection module is specifically configured to: detecting a moving object in the acquired video stream;
the intercepting module is specifically configured to: and intercepting one or more frames of video images containing the moving target.
Optionally, the classification module is specifically configured to:
inputting the intercepted video image into a first neural network model obtained by pre-training, and classifying moving targets in the video image by using the first neural network model to obtain classification information of each moving target output by the first neural network model.
Optionally, the detection module is specifically configured to: carrying out face recognition in the collected video stream to obtain a recognition result;
the intercepting module is specifically configured to: intercepting a face area in an image containing a face according to the recognition result;
the classification module is specifically configured to: and matching the intercepted human face area with human face data stored in a human face database to obtain the classification information of the human face area.
Optionally, the classification module is specifically configured to:
inputting the intercepted human face area into a second neural network model obtained by pre-training, and converting the human face area into modeling data by using the second neural network model;
obtaining classification information of the face region by matching the modeling data with face data stored in a face database, wherein the classification information comprises: and the face data successfully matched with the modeling data exists or does not exist in the face database.
Optionally, the apparatus further comprises:
the first judgment module is used for judging whether the classification information meets the preset alarm condition or not; if the first alarm module is matched with the first alarm module, triggering the first alarm module;
the first alarm module is used for outputting alarm information.
Optionally, the apparatus further comprises:
the second judgment module is used for judging whether the classification information meets the preset alarm condition or not; the preset alarm condition comprises the following steps: the classification information of the moving target is personnel; or the classification information of the moving target is a vehicle; if the first alarm module is matched with the second alarm module, triggering the second alarm module;
and the second alarm module is used for outputting alarm information.
Optionally, the apparatus further comprises:
the third judging module is used for judging whether the classification information meets the preset alarm condition or not; the preset alarm condition comprises the following steps: the classification information of the face region is as follows: the face database has or does not have face data successfully matched with the modeling data; if the first alarm module is matched with the second alarm module, triggering the third alarm module;
and the third alarm module is used for outputting alarm information.
In the embodiment of the invention, a monitoring target is detected in the collected video stream; intercepting a video image containing a monitoring target; identifying the intercepted video image to obtain the classification information of each monitoring target; therefore, in the scheme, the whole video stream is not classified and identified, but the video image containing the monitoring target is intercepted, and only the intercepted video image is classified and identified, so that the calculation amount is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a first flowchart of a video analysis method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a video analysis method according to an embodiment of the present invention;
fig. 3 is a schematic diagram of interaction between a monitoring point and an NVR according to an embodiment of the present invention;
fig. 4 is a third flowchart illustrating a video analysis method according to an embodiment of the invention;
FIG. 5 is a schematic diagram of an interaction between another monitoring point and an NVR according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a video analysis apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a video analysis system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to solve the foregoing technical problems, embodiments of the present invention provide a video analysis method and apparatus, where the method and apparatus may be applied to a Camera, such as an IPC (IP Camera), or may be applied to an NVR (network video Recorder), or may be applied to other electronic devices, or may be applied to a video analysis system, and are not limited specifically. First, a video analysis method provided by an embodiment of the present invention is described in detail below.
Fig. 1 is a first flowchart of a video analysis method according to an embodiment of the present invention, including:
s101: and detecting a monitoring target in the acquired video stream.
For example, in one embodiment, the monitored target may be a moving target; in this case, S101 may include: moving objects are detected in the captured video stream. For example, an algorithm such as a frame difference method, a background subtraction algorithm, or an optical flow method may be used to detect a moving object in a video stream.
In another embodiment, the monitoring target may be a human face; in this case, S101 may include: and carrying out face recognition in the collected video stream to obtain a recognition result. For example, a face recognition algorithm may be used to identify a face in a video stream.
S102: and intercepting a video image containing the monitoring target.
In the above one embodiment, the monitoring target is a moving target, in this case, S102 may include: and intercepting one or more frames of video images containing the moving target. For example, the multi-frame video image may be a small video, such as a small video that may be several seconds before or after the key frame.
In another embodiment, the monitoring target is a human face, in this case, S102 may include: and intercepting a face area in the image containing the face according to the recognition result. Alternatively, one or more frames of video images including the face region may be cut out from the video stream according to the recognition result.
S103: and identifying the intercepted video image to obtain the classification information of each monitoring target.
In the above one embodiment, the monitoring target is a moving target, in this case, S103 may include: inputting the intercepted video image into a first neural network model obtained by pre-training, and classifying moving targets in the video image by using the first neural network model to obtain classification information of each moving target output by the first neural network model.
For example, the moving object may be a person, a vehicle, or the like. The first neural network model is a model for classifying the moving target. The process of training the first neural network model may include: acquiring a sample image to be trained, wherein the sample image can comprise moving targets such as people or vehicles; adding labels to various moving objects in the sample image, wherein the labels are the types of the moving objects, such as vehicles, personnel and the like; and inputting the sample image into a neural network with a preset structure, carrying out iterative adjustment on the neural network by taking the label as supervision, and obtaining a trained first neural network model when an iteration ending condition is met.
The video image captured in S102 is input into the first neural network model, and the first neural network model can output the classification information of each moving object in the video image, where the classification information is the category of the moving object, such as vehicle, person, and the like.
For example, some scenes have a high security level, and perimeter protection is required for the scenes, that is, whether people or vehicles enter the scenes or not is judged. By applying the embodiment, on one hand, the classification information of each moving target is obtained, if the classification information is personnel or vehicles, related personnel can be timely reminded to perform subsequent processing, and effective perimeter precaution is realized; on the other hand, firstly, the moving object detection is performed on the video stream, the moving object detection algorithm can be understood as a coarse detection algorithm with a small calculation amount, then, a small part of video image in the video stream is intercepted, and only the small part of video image is subjected to fine identification, namely, the fine identification is performed by using
The first neural network model identifies the classification information of the moving target, and compared with the scheme of analyzing the whole video stream, the scheme reduces the calculation amount.
In another embodiment, the monitoring target is a human face, in this case, S103 may include: and matching the intercepted human face area with human face data stored in a human face database to obtain the classification information of the human face area.
For example, some scenarios only allow authorized persons to enter, and stranger (unauthorized person) identification schemes need to be executed for the scenarios, and the present embodiment may be adopted in such cases. For example, the face data of authorized persons may be stored in the face database, and the face region captured in S102 is matched with the face database, that is, whether a person in the video stream is an authorized person is determined. The classification information of the face region may be: the face data successfully matched with the face area exists or does not exist in the face database; alternatively, the classification information of the face region may also be: authorized or unauthorized persons (strangers).
As another example, some scenarios may require identifying a designated person, such as an attendance checking scenario, or a VIP (important person) identification scenario, and the present embodiment may also be used in these scenarios. For example, the face data of the designated person may be stored in the face database, and the face region captured in S102 is matched with the face database, that is, whether the person in the video stream is the designated person is determined. The classification information of the face region may be: the face data successfully matched with the face area exists or does not exist in the face database; alternatively, the classification information of the face region may also be: designated person or non-designated person.
In one case, S103 may include: inputting the intercepted human face area into a second neural network model obtained by pre-training, and converting the human face area into modeling data by using the second neural network model; obtaining classification information of the face region by matching the modeling data with face data stored in a face database, wherein the classification information comprises: and the face data successfully matched with the modeling data exists or does not exist in the face database.
The second neural network model may be a face modeling model, which may convert the face image into modeling data, i.e., structural data. In this case, the face database stores the modeling data (structure data) converted by the second neural network model. And matching the modeling data obtained after the conversion of the face region intercepted in the step S102 with the modeling data in the face database, wherein if the matching is successful, the person corresponding to the face region is an authorized person or a designated person, and if the matching is unsuccessful, the person corresponding to the face region is an unauthorized person (stranger) or a non-designated person.
By applying the embodiment, on one hand, the classification information of the face area is obtained, whether the personnel is authorized personnel or appointed personnel can be judged according to the classification information, and related personnel are timely reminded to carry out subsequent processing according to the judgment result, so that effective stranger alarm or appointed personnel identification can be realized; on the other hand, the face recognition is firstly carried out on the video stream, the face recognition algorithm can be understood as a coarse detection algorithm, the calculated amount is small, then a small part of video images (or image areas) in the video stream are intercepted, and only the intercepted part is subjected to fine recognition, namely face matching is carried out.
As an embodiment, after S103, the method may further include: judging whether the classification information meets a preset alarm condition or not; and if the data are matched, outputting alarm information.
In one embodiment, the monitoring target is a moving target, in which case the preset alarm condition may include: the classification information of the moving target is personnel; or the classification information of the moving target is a vehicle.
As described above, if perimeter precaution is required, that is, whether a person or a vehicle enters a scene is determined, the present embodiment may be adopted to determine whether the classification information of the moving object is a person or a vehicle, and if the determination result is yes, alarm information is output.
In another embodiment, the monitored target is a human face, and in this case, the preset alarm condition may include: the classification information of the face region is as follows: and the face data successfully matched with the modeling data exists or does not exist in the face database.
As described above, if a stranger (unauthorized person) identification scheme needs to be executed, the present embodiment may be adopted to determine whether face data successfully matched with the modeling data corresponding to the face region exists in the face database, if so, it indicates that the person corresponding to the face region is an authorized person, and if not, it indicates that the person corresponding to the face region is a stranger (unauthorized person), and output alarm information.
If the designated person needs to be identified, the embodiment can be adopted to judge whether the face database has face data successfully matched with the modeling data corresponding to the face area, if so, the person corresponding to the face area is indicated as the designated person, and alarm information is output.
In one embodiment, S101 and S102 may be performed by the IPC, which then sends the captured video image to the NVR, which performs the subsequent steps.
With the embodiment of the invention shown in fig. 1, a surveillance target is detected in a captured video stream; intercepting a video image containing a monitoring target; identifying the intercepted video image to obtain the classification information of each monitoring target; therefore, in the scheme, the whole video stream is not classified and identified, but the video image containing the monitoring target is intercepted, and only the intercepted video image is classified and identified, so that the calculation amount is reduced.
Fig. 2 is a schematic flowchart of a second video analysis method according to an embodiment of the present invention, including:
s201: moving objects are detected in the captured video stream.
S202: and intercepting one or more frames of video images containing the moving target.
S203: inputting the intercepted video image into a first neural network model obtained through pre-training, and classifying moving targets in the video image by using the first neural network model to obtain classification information of each moving target output by the first neural network model.
S204: judging whether the classified information meets preset alarm conditions or not; the preset alarm condition comprises the following steps: the classification information of the moving target is personnel; or the classification information of the moving target is a vehicle. If yes, S205 is executed.
S205: and outputting alarm information.
For example, in a scene that perimeter precaution is required, the embodiment of fig. 2 of the present invention may be applied to determine whether a person or a vehicle enters the scene, and alarm if the determination result is yes.
By applying the embodiment of the invention shown in FIG. 2, on the first aspect, the video image is identified by using the first neural network model to obtain the classification information of each moving target, if the classification information is personnel or vehicles, the relevant personnel can be timely reminded to perform subsequent processing, and effective perimeter precaution is realized; on the other hand, moving object detection is performed on the video stream firstly, the moving object detection algorithm can be understood as a coarse detection algorithm, the calculated amount is small, then a small part of video images in the video stream are intercepted, and only the small part of video images are subjected to fine identification, namely the classification information of the moving objects is identified by utilizing the first neural network model.
In some related schemes, an infrared detector is used for emitting infrared laser, the infrared laser forms a monitoring area, and when a person breaks into the monitoring area, the waveform of the infrared laser changes, so that whether the person breaks into the monitoring area can be judged based on the waveform of the infrared laser. However, in this scheme, the monitoring area formed by the infrared laser emitted by one infrared detector is limited, and if the area to be monitored is large, a plurality of infrared detectors need to be arranged, which is high in cost.
By adopting the embodiment, the monitoring is carried out according to the image collected by the image collecting equipment, a plurality of infrared detectors are not required to be arranged, and the monitoring cost is reduced.
An embodiment applied to the surrounding protection scene is described below with reference to fig. 3:
the method comprises the steps that a monitoring point (which can be IPC) collects a video stream, the video stream is subjected to moving target detection, one or more frames of video images containing moving targets are intercepted according to detection results, and the intercepted video images are sent to the NVR.
The NVR receives a video image sent by the monitoring point, inputs the video image into a first neural network model obtained through pre-training, and classifies moving targets in the video image by using the first neural network model to obtain classification information of each moving target output by the first neural network model. The classification information may be, but is not limited to, a person, a vehicle, an object, and the like.
Assuming that the preset alarm condition is as follows: the classification information of the moving target is personnel; alternatively, the classification information of the moving object is a vehicle. And if the classification information of the moving target output by the first neural network model is a vehicle or a person, outputting alarm information.
In the embodiment, the alarm information is output only under the condition that the classification information accords with the preset alarm condition, so that the false alarm condition caused by wind-blown grass movement, pet interference and light change can be reduced, and the alarm accuracy is improved.
Fig. 4 is a schematic flow chart of a video analysis method according to an embodiment of the present invention, including:
s401: and carrying out face recognition in the collected video stream to obtain a recognition result.
S402: and intercepting a face area in the image containing the face according to the recognition result.
S403: and inputting the intercepted human face region into a second neural network model obtained by pre-training, and converting the human face region into modeling data by using the second neural network model.
S404: obtaining the classification information of the face region by matching the modeling data with face data stored in a face database; the classification information includes: and the face data successfully matched with the modeling data exists or does not exist in the face database.
For example, a face image of an authorized person may be collected in advance, the face image may be converted into modeling data by using the second neural network model, and the modeling data obtained by the conversion may be stored in the face database as face data.
S405: judging whether the classification information meets a preset alarm condition or not; the preset alarm condition comprises the following steps: the classification information of the face region is as follows: and the face data successfully matched with the modeling data exists or does not exist in the face database. If yes, go to S406.
S406: and outputting alarm information.
For example, in a scene that a stranger (unauthorized person) needs to be identified, the embodiment of fig. 4 of the present invention may be applied to store the face data of the authorized person in the face database, and match the intercepted modeling data converted from the face region with the face database, that is, determine whether the person in the video stream is an authorized person. And if the person in the video stream is judged to be a stranger (unauthorized person), alarming is carried out.
As another example, if it is necessary to identify a designated person, the embodiment of fig. 4 of the present invention may be applied to store the face data of the designated person in the face database, and match the modeling data converted from the intercepted face area with the face database, that is, determine whether the person in the video stream is the designated person. And if the person in the video stream is determined to be the designated person, alarming.
By applying the embodiment shown in fig. 4 of the invention, on one hand, the classification information of the face area is obtained, whether the person is an authorized person or an appointed person can be judged according to the classification information, and related persons are timely reminded to carry out subsequent processing according to the judgment result, so that effective stranger alarm or appointed person identification can be realized; on the other hand, the face recognition is firstly carried out on the video stream, the face recognition algorithm can be understood as a coarse detection algorithm, the calculated amount is small, then a small part of video images (or image areas) in the video stream are intercepted, and only the intercepted part is subjected to fine recognition, namely face matching is carried out.
An embodiment applied to a stranger alarm scenario is described below with reference to fig. 5:
a monitoring point (which can be IPC) collects a video stream, performs face recognition on the video stream, and intercepts one or more frames of face images containing faces or intercepts a face area in the images according to the recognition result; and sending the intercepted face image or the face area to the NVR. For convenience of description, the cut face image or the face region is collectively referred to as a face image.
The NVR receives the face image sent by the monitoring point, inputs the face image into a second neural network model obtained through pre-training, and converts the face image into modeling data by using the second neural network model; matching the modeling data obtained by conversion with face data stored in a face database; if the matching is successful, the person corresponding to the face area is an authorized person, and if the matching is unsuccessful, the person corresponding to the face area is a stranger, and alarm information is output.
Corresponding to the foregoing method embodiment, an embodiment of the present invention further provides a video analysis apparatus, as shown in fig. 6, including:
the detection module 601 is configured to detect a monitoring target in the acquired video stream;
an intercepting module 602, configured to intercept a video image including the monitoring target;
the classification module 603 is configured to obtain classification information of each monitoring target by identifying the captured video image.
As an embodiment, the detection module 601 is specifically configured to: detecting a moving object in the acquired video stream;
intercept module 602 is specifically configured to: and intercepting one or more frames of video images containing the moving target.
As an embodiment, the classification module 603 is specifically configured to:
inputting the intercepted video image into a first neural network model obtained by pre-training, and classifying moving targets in the video image by using the first neural network model to obtain classification information of each moving target output by the first neural network model.
As an embodiment, the detection module 601 is specifically configured to: carrying out face recognition in the collected video stream to obtain a recognition result;
intercept module 602 is specifically configured to: intercepting a face area in an image containing a face according to the recognition result;
the classification module 603 is specifically configured to: and matching the intercepted human face area with human face data stored in a human face database to obtain the classification information of the human face area.
As an embodiment, the classification module 603 is specifically configured to:
inputting the intercepted human face area into a second neural network model obtained by pre-training, and converting the human face area into modeling data by using the second neural network model;
obtaining classification information of the face region by matching the modeling data with face data stored in a face database, wherein the classification information comprises: and the face data successfully matched with the modeling data exists or does not exist in the face database.
As an embodiment, the apparatus further comprises: a first judging module and a first alarming module (not shown in the figure), wherein,
the first judgment module is used for judging whether the classification information meets the preset alarm condition or not; if the first alarm module is matched with the first alarm module, triggering the first alarm module;
the first alarm module is used for outputting alarm information.
As an embodiment, the apparatus further comprises: a second judging module and a second alarm module (not shown in the figure), wherein,
the second judgment module is used for judging whether the classification information meets the preset alarm condition or not; the preset alarm condition comprises the following steps: the classification information of the moving target is personnel; or the classification information of the moving target is a vehicle; if the first alarm module is matched with the second alarm module, triggering the second alarm module;
and the second alarm module is used for outputting alarm information.
As an embodiment, the apparatus further comprises: a third judging module and a third alarm module (not shown in the figure), wherein,
the third judging module is used for judging whether the classification information meets the preset alarm condition or not; the preset alarm condition comprises the following steps: the classification information of the face region is as follows: the face database has or does not have face data successfully matched with the modeling data; if the first alarm module is matched with the second alarm module, triggering the third alarm module;
and the third alarm module is used for outputting alarm information.
In the embodiment of the invention, a monitoring target is detected in the collected video stream; intercepting a video image containing a monitoring target; identifying the intercepted video image to obtain the classification information of each monitoring target; therefore, in the scheme, the whole video stream is not classified and identified, but the video image containing the monitoring target is intercepted, and only the intercepted video image is classified and identified, so that the calculation amount is reduced.
An embodiment of the present invention further provides an electronic device, as shown in fig. 7, including a processor 701 and a memory 702,
a memory 702 for storing a computer program;
the processor 701 is configured to implement any of the video analysis methods described above when executing the program stored in the memory 702.
The Memory mentioned in the above electronic device may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. The memory may also be at least one memory device located remotely from the processor as an embodiment.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
An embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the computer program implements any one of the video analysis methods described above.
An embodiment of the present invention further provides a video analysis system, as shown in fig. 8, including: a monitoring point and a processing device, wherein,
the monitoring point is used for detecting a monitoring target in the acquired video stream; intercepting a video image containing the monitoring target; sending the intercepted video image to the processing device;
and the processing equipment is used for receiving the video images and identifying the received video images to obtain the classification information of each monitoring target.
For example, the monitoring point may be IPC, and the processing device may be NVR, which is not limited specifically.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, the device embodiment, the computer-readable storage medium embodiment, and the system embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and the relevant points can be referred to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (16)

1. A method of video analysis, comprising:
detecting a monitoring target in the collected video stream;
intercepting a video image containing the monitoring target;
and identifying the intercepted video image to obtain the classification information of each monitoring target.
2. The method of claim 1, wherein detecting a surveillance target in the captured video stream comprises: detecting a moving object in the acquired video stream;
the intercepting of the video image containing the monitoring target comprises:
and intercepting one or more frames of video images containing the moving target.
3. The method according to claim 2, wherein the obtaining of the classification information of each monitoring target by identifying the intercepted video image comprises:
inputting the intercepted video image into a first neural network model obtained by pre-training, and classifying moving targets in the video image by using the first neural network model to obtain classification information of each moving target output by the first neural network model.
4. The method of claim 1, wherein detecting a surveillance target in the captured video stream comprises: carrying out face recognition in the collected video stream to obtain a recognition result;
the intercepting of the video image containing the monitoring target comprises:
intercepting a face area in an image containing a face according to the recognition result;
the step of obtaining the classification information of each monitoring target by identifying the intercepted video image comprises the following steps:
and matching the intercepted human face area with human face data stored in a human face database to obtain the classification information of the human face area.
5. The method of claim 4, wherein the obtaining the classification information of the face region by matching the intercepted face region with face data stored in a face database comprises:
inputting the intercepted human face area into a second neural network model obtained by pre-training, and converting the human face area into modeling data by using the second neural network model;
obtaining classification information of the face region by matching the modeling data with face data stored in a face database, wherein the classification information comprises: and the face data successfully matched with the modeling data exists or does not exist in the face database.
6. The method according to claim 1, wherein after the obtaining of the classification information of each monitoring target by identifying the intercepted video image, the method further comprises:
judging whether the classification information meets a preset alarm condition or not;
and if the data are matched, outputting alarm information.
7. The method of claim 3, further comprising, after said obtaining classification information for each moving object output by the first neural network model:
judging whether the classification information meets a preset alarm condition or not; if the data are in accordance with the preset data, outputting alarm information;
the preset alarm condition comprises the following steps:
the classification information of the moving target is personnel; or the classification information of the moving target is a vehicle.
8. The method according to claim 5, further comprising, after the obtaining the classification information of the face region:
judging whether the classification information meets a preset alarm condition or not; if the data are in accordance with the preset data, outputting alarm information;
the preset alarm condition comprises the following steps:
the classification information of the face region is as follows: and the face data successfully matched with the modeling data exists or does not exist in the face database.
9. A video analysis apparatus, comprising:
the detection module is used for detecting a monitoring target in the acquired video stream;
the intercepting module is used for intercepting a video image containing the monitoring target;
and the classification module is used for identifying the intercepted video image to obtain the classification information of each monitoring target.
10. The apparatus according to claim 9, wherein the detection module is specifically configured to: detecting a moving object in the acquired video stream;
the intercepting module is specifically configured to: and intercepting one or more frames of video images containing the moving target.
11. The apparatus according to claim 10, wherein the classification module is specifically configured to:
inputting the intercepted video image into a first neural network model obtained by pre-training, and classifying moving targets in the video image by using the first neural network model to obtain classification information of each moving target output by the first neural network model.
12. The apparatus according to claim 9, wherein the detection module is specifically configured to: carrying out face recognition in the collected video stream to obtain a recognition result;
the intercepting module is specifically configured to: intercepting a face area in an image containing a face according to the recognition result;
the classification module is specifically configured to: and matching the intercepted human face area with human face data stored in a human face database to obtain the classification information of the human face area.
13. The apparatus according to claim 12, wherein the classification module is specifically configured to:
inputting the intercepted human face area into a second neural network model obtained by pre-training, and converting the human face area into modeling data by using the second neural network model;
obtaining classification information of the face region by matching the modeling data with face data stored in a face database, wherein the classification information comprises: and the face data successfully matched with the modeling data exists or does not exist in the face database.
14. The apparatus of claim 9, further comprising:
the first judgment module is used for judging whether the classification information meets the preset alarm condition or not; if the first alarm module is matched with the first alarm module, triggering the first alarm module;
the first alarm module is used for outputting alarm information.
15. The apparatus of claim 11, further comprising:
the second judgment module is used for judging whether the classification information meets the preset alarm condition or not; the preset alarm condition comprises the following steps: the classification information of the moving target is personnel; or the classification information of the moving target is a vehicle; if the first alarm module is matched with the second alarm module, triggering the second alarm module;
and the second alarm module is used for outputting alarm information.
16. The apparatus of claim 13, further comprising:
the third judging module is used for judging whether the classification information meets the preset alarm condition or not; the preset alarm condition comprises the following steps: the classification information of the face region is as follows: the face database has or does not have face data successfully matched with the modeling data; if the first alarm module is matched with the second alarm module, triggering the third alarm module;
and the third alarm module is used for outputting alarm information.
CN201910121021.1A 2019-02-19 2019-02-19 Video analysis method and device Pending CN111582006A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910121021.1A CN111582006A (en) 2019-02-19 2019-02-19 Video analysis method and device
PCT/CN2020/074895 WO2020168960A1 (en) 2019-02-19 2020-02-12 Video analysis method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910121021.1A CN111582006A (en) 2019-02-19 2019-02-19 Video analysis method and device

Publications (1)

Publication Number Publication Date
CN111582006A true CN111582006A (en) 2020-08-25

Family

ID=72112900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910121021.1A Pending CN111582006A (en) 2019-02-19 2019-02-19 Video analysis method and device

Country Status (2)

Country Link
CN (1) CN111582006A (en)
WO (1) WO2020168960A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101154A (en) * 2020-09-02 2020-12-18 腾讯科技(深圳)有限公司 Video classification method and device, computer equipment and storage medium
CN112183353A (en) * 2020-09-28 2021-01-05 腾讯科技(深圳)有限公司 Image data processing method and device and related equipment

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112148896A (en) * 2020-09-10 2020-12-29 京东数字科技控股股份有限公司 Data processing method and device for terminal media monitoring and broadcasting
CN112329517B (en) * 2020-09-17 2022-11-29 中国南方电网有限责任公司超高压输电公司南宁监控中心 Transformer substation disconnecting link confirmation video image analysis method and system
CN112464030A (en) * 2020-11-25 2021-03-09 浙江大华技术股份有限公司 Suspicious person determination method and device
CN112653874B (en) * 2020-12-01 2022-09-20 杭州勋誉科技有限公司 Storage device and intelligent video monitoring system
CN112818757A (en) * 2021-01-13 2021-05-18 上海应用技术大学 Gas station safety detection early warning method and system
CN114821844A (en) * 2021-01-28 2022-07-29 深圳云天励飞技术股份有限公司 Attendance checking method and device based on face recognition, electronic equipment and storage medium
CN112989934A (en) * 2021-02-05 2021-06-18 方战领 Video analysis method, device and system
CN113112754B (en) * 2021-03-02 2022-10-11 深圳市哈威飞行科技有限公司 Drowning alarm method, drowning alarm device, drowning alarm platform, drowning alarm system and computer readable storage medium
CN113139679A (en) * 2021-04-06 2021-07-20 青岛以萨数据技术有限公司 Urban road rescue early warning method, system and equipment based on neural network
CN113177459A (en) * 2021-04-25 2021-07-27 云赛智联股份有限公司 Intelligent video analysis method and system for intelligent airport service
CN113824926A (en) * 2021-08-17 2021-12-21 衢州光明电力投资集团有限公司赋腾科技分公司 Portable video analysis device and method
CN113888827A (en) * 2021-10-14 2022-01-04 深圳市巨龙创视科技有限公司 Camera control method and system
CN114821934A (en) * 2021-12-31 2022-07-29 北京无线电计量测试研究所 Garden perimeter security control system and method
CN114639061A (en) * 2022-04-02 2022-06-17 山东博昂信息科技有限公司 Vehicle detection method, system and storage medium
CN114821957A (en) * 2022-05-13 2022-07-29 湖南工商大学 AI video analysis system and method
CN115278361B (en) * 2022-07-20 2023-08-01 重庆长安汽车股份有限公司 Driving video data extraction method, system, medium and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102184388A (en) * 2011-05-16 2011-09-14 苏州两江科技有限公司 Face and vehicle adaptive rapid detection system and detection method
WO2017024963A1 (en) * 2015-08-11 2017-02-16 阿里巴巴集团控股有限公司 Image recognition method, measure learning method and image source recognition method and device
CN206164722U (en) * 2016-09-21 2017-05-10 深圳市泛海三江科技发展有限公司 Discuss super electronic monitoring system based on face identification
CN108122246A (en) * 2017-12-07 2018-06-05 中国石油大学(华东) Video monitoring intelligent identifying system
WO2018133666A1 (en) * 2017-01-17 2018-07-26 腾讯科技(深圳)有限公司 Method and apparatus for tracking video target
CN108596140A (en) * 2018-05-08 2018-09-28 青岛海信移动通信技术股份有限公司 A kind of mobile terminal face identification method and system
CN109241349A (en) * 2018-08-14 2019-01-18 中国电子科技集团公司第三十八研究所 A kind of monitor video multiple target classification retrieving method and system based on deep learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8131022B2 (en) * 2004-08-31 2012-03-06 Panasonic Corporation Surveillance recorder and its method
CN101854516B (en) * 2009-04-02 2014-03-05 北京中星微电子有限公司 Video monitoring system, video monitoring server and video monitoring method
CN103268680B (en) * 2013-05-29 2016-02-24 北京航空航天大学 A kind of family intelligent monitoring burglary-resisting system
CN106372576A (en) * 2016-08-23 2017-02-01 南京邮电大学 Deep learning-based intelligent indoor intrusion detection method and system
CN109002744A (en) * 2017-06-06 2018-12-14 中兴通讯股份有限公司 Image-recognizing method, device and video monitoring equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102184388A (en) * 2011-05-16 2011-09-14 苏州两江科技有限公司 Face and vehicle adaptive rapid detection system and detection method
WO2017024963A1 (en) * 2015-08-11 2017-02-16 阿里巴巴集团控股有限公司 Image recognition method, measure learning method and image source recognition method and device
CN206164722U (en) * 2016-09-21 2017-05-10 深圳市泛海三江科技发展有限公司 Discuss super electronic monitoring system based on face identification
WO2018133666A1 (en) * 2017-01-17 2018-07-26 腾讯科技(深圳)有限公司 Method and apparatus for tracking video target
CN108122246A (en) * 2017-12-07 2018-06-05 中国石油大学(华东) Video monitoring intelligent identifying system
CN108596140A (en) * 2018-05-08 2018-09-28 青岛海信移动通信技术股份有限公司 A kind of mobile terminal face identification method and system
CN109241349A (en) * 2018-08-14 2019-01-18 中国电子科技集团公司第三十八研究所 A kind of monitor video multiple target classification retrieving method and system based on deep learning

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101154A (en) * 2020-09-02 2020-12-18 腾讯科技(深圳)有限公司 Video classification method and device, computer equipment and storage medium
CN112101154B (en) * 2020-09-02 2023-12-15 腾讯科技(深圳)有限公司 Video classification method, apparatus, computer device and storage medium
CN112183353A (en) * 2020-09-28 2021-01-05 腾讯科技(深圳)有限公司 Image data processing method and device and related equipment
CN112183353B (en) * 2020-09-28 2022-09-20 腾讯科技(深圳)有限公司 Image data processing method and device and related equipment

Also Published As

Publication number Publication date
WO2020168960A1 (en) 2020-08-27

Similar Documents

Publication Publication Date Title
CN111582006A (en) Video analysis method and device
US9852342B2 (en) Surveillance system
EP3002741B1 (en) Method and system for security system tampering detection
KR102478335B1 (en) Image Analysis Method and Server Apparatus for Per-channel Optimization of Object Detection
JP2018101317A (en) Abnormality monitoring system
CN111814510B (en) Method and device for detecting legacy host
CN112132048A (en) Community patrol analysis method and system based on computer vision
KR20190035187A (en) Sound alarm broadcasting system in monitoring area
CN112288975A (en) Event early warning method and device
CN111931862A (en) Method and system for detecting illegal posted advertisements and electronic equipment
KR101107120B1 (en) Device for sound source tracing and object recognition and method for sound source tracing and object recognition
US20230306805A1 (en) Intelligent integrated security system and method
US20220335724A1 (en) Processing apparatus, processing method, and non-transitory storage medium
Ibraheam et al. Animal species recognition using deep learning
CN116798176A (en) Data management system based on big data and intelligent security
CN112329499A (en) Image processing method, device and equipment
US20210216755A1 (en) Face authentication system and face authentication method
KR20030040434A (en) Vision based method and apparatus for detecting an event requiring assistance or documentation
Zamri et al. Snatch Theft Detection Using Deep Learning Models
Naurin et al. A proposed architecture to suspect and trace criminal activity using surveillance cameras
Gorodnichy et al. Recognizing people and their activities in surveillance video: technology state of readiness and roadmap
US11756217B2 (en) Multi-camera system to perform movement pattern anomaly detection
CN113705274B (en) Climbing behavior detection method and device, electronic equipment and storage medium
CN111723795A (en) Abnormal license plate recognition method and device, electronic equipment and storage medium
KR102332699B1 (en) Event processing system for detecting changes in spatial environment conditions using image model-based AI algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination