CN111582006A

CN111582006A - Video analysis method and device

Info

Publication number: CN111582006A
Application number: CN201910121021.1A
Authority: CN
Inventors: 范慧慧; 王天宇; 高在伟
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2019-02-19
Filing date: 2019-02-19
Publication date: 2020-08-25
Also published as: WO2020168960A1

Abstract

The embodiment of the invention provides a video analysis method and a video analysis device, wherein the method comprises the following steps: detecting a monitoring target in the collected video stream; intercepting a video image containing a monitoring target; identifying the intercepted video image to obtain the classification information of each monitoring target; therefore, in the scheme, the whole video stream is not classified and identified, but the video image containing the monitoring target is intercepted, and only the intercepted video image is classified and identified, so that the calculation amount is reduced.

Description

Video analysis method and device

Technical Field

The invention relates to the technical field of monitoring, in particular to a video analysis method and a video analysis device.

Background

In the related scheme, monitoring equipment is usually arranged in an area needing to be monitored, the monitoring equipment collects video streams, and whether personnel or vehicles illegally intruding into the area exist or not is judged by analyzing the video streams. In the scheme, the video stream is analyzed integrally, and the calculation amount is large.

Disclosure of Invention

An embodiment of the invention provides a video analysis method and device to reduce the amount of calculation.

To achieve the above object, an embodiment of the present invention provides a video analysis method, including:

detecting a monitoring target in the collected video stream;

intercepting a video image containing the monitoring target;

and identifying the intercepted video image to obtain the classification information of each monitoring target.

Optionally, the detecting a monitoring target in the acquired video stream includes: detecting a moving object in the acquired video stream;

the intercepting of the video image containing the monitoring target comprises:

and intercepting one or more frames of video images containing the moving target.

Optionally, the obtaining of the classification information of each monitoring target by identifying the intercepted video image includes:

inputting the intercepted video image into a first neural network model obtained by pre-training, and classifying moving targets in the video image by using the first neural network model to obtain classification information of each moving target output by the first neural network model.

Optionally, the detecting a monitoring target in the acquired video stream includes: carrying out face recognition in the collected video stream to obtain a recognition result;

the intercepting of the video image containing the monitoring target comprises:

intercepting a face area in an image containing a face according to the recognition result;

the step of obtaining the classification information of each monitoring target by identifying the intercepted video image comprises the following steps:

and matching the intercepted human face area with human face data stored in a human face database to obtain the classification information of the human face area.

Optionally, the obtaining of the classification information of the face region by matching the intercepted face region with the face data stored in the face database includes:

inputting the intercepted human face area into a second neural network model obtained by pre-training, and converting the human face area into modeling data by using the second neural network model;

obtaining classification information of the face region by matching the modeling data with face data stored in a face database, wherein the classification information comprises: and the face data successfully matched with the modeling data exists or does not exist in the face database.

Optionally, after the classification information of each monitoring target is obtained by identifying the intercepted video image, the method further includes:

judging whether the classification information meets a preset alarm condition or not;

and if the data are matched, outputting alarm information.

Optionally, after obtaining the classification information of each moving object output by the first neural network model, the method further includes:

judging whether the classification information meets a preset alarm condition or not; if the data are in accordance with the preset data, outputting alarm information;

the preset alarm condition comprises the following steps:

the classification information of the moving target is personnel; or the classification information of the moving target is a vehicle.

Optionally, after the obtaining of the classification information of the face region, the method further includes:

the preset alarm condition comprises the following steps:

the classification information of the face region is as follows: and the face data successfully matched with the modeling data exists or does not exist in the face database.

In order to achieve the above object, an embodiment of the present invention further provides a video analysis apparatus, including:

the detection module is used for detecting a monitoring target in the acquired video stream;

the intercepting module is used for intercepting a video image containing the monitoring target;

and the classification module is used for identifying the intercepted video image to obtain the classification information of each monitoring target.

Optionally, the detection module is specifically configured to: detecting a moving object in the acquired video stream;

the intercepting module is specifically configured to: and intercepting one or more frames of video images containing the moving target.

Optionally, the classification module is specifically configured to:

Optionally, the detection module is specifically configured to: carrying out face recognition in the collected video stream to obtain a recognition result;

the intercepting module is specifically configured to: intercepting a face area in an image containing a face according to the recognition result;

the classification module is specifically configured to: and matching the intercepted human face area with human face data stored in a human face database to obtain the classification information of the human face area.

Optionally, the classification module is specifically configured to:

Optionally, the apparatus further comprises:

the first judgment module is used for judging whether the classification information meets the preset alarm condition or not; if the first alarm module is matched with the first alarm module, triggering the first alarm module;

the first alarm module is used for outputting alarm information.

Optionally, the apparatus further comprises:

the second judgment module is used for judging whether the classification information meets the preset alarm condition or not; the preset alarm condition comprises the following steps: the classification information of the moving target is personnel; or the classification information of the moving target is a vehicle; if the first alarm module is matched with the second alarm module, triggering the second alarm module;

and the second alarm module is used for outputting alarm information.

Optionally, the apparatus further comprises:

the third judging module is used for judging whether the classification information meets the preset alarm condition or not; the preset alarm condition comprises the following steps: the classification information of the face region is as follows: the face database has or does not have face data successfully matched with the modeling data; if the first alarm module is matched with the second alarm module, triggering the third alarm module;

and the third alarm module is used for outputting alarm information.

In the embodiment of the invention, a monitoring target is detected in the collected video stream; intercepting a video image containing a monitoring target; identifying the intercepted video image to obtain the classification information of each monitoring target; therefore, in the scheme, the whole video stream is not classified and identified, but the video image containing the monitoring target is intercepted, and only the intercepted video image is classified and identified, so that the calculation amount is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a first flowchart of a video analysis method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a video analysis method according to an embodiment of the present invention;

fig. 3 is a schematic diagram of interaction between a monitoring point and an NVR according to an embodiment of the present invention;

fig. 4 is a third flowchart illustrating a video analysis method according to an embodiment of the invention;

FIG. 5 is a schematic diagram of an interaction between another monitoring point and an NVR according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a video analysis apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a video analysis system according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to solve the foregoing technical problems, embodiments of the present invention provide a video analysis method and apparatus, where the method and apparatus may be applied to a Camera, such as an IPC (IP Camera), or may be applied to an NVR (network video Recorder), or may be applied to other electronic devices, or may be applied to a video analysis system, and are not limited specifically. First, a video analysis method provided by an embodiment of the present invention is described in detail below.

Fig. 1 is a first flowchart of a video analysis method according to an embodiment of the present invention, including:

s101: and detecting a monitoring target in the acquired video stream.

For example, in one embodiment, the monitored target may be a moving target; in this case, S101 may include: moving objects are detected in the captured video stream. For example, an algorithm such as a frame difference method, a background subtraction algorithm, or an optical flow method may be used to detect a moving object in a video stream.

In another embodiment, the monitoring target may be a human face; in this case, S101 may include: and carrying out face recognition in the collected video stream to obtain a recognition result. For example, a face recognition algorithm may be used to identify a face in a video stream.

S102: and intercepting a video image containing the monitoring target.

In the above one embodiment, the monitoring target is a moving target, in this case, S102 may include: and intercepting one or more frames of video images containing the moving target. For example, the multi-frame video image may be a small video, such as a small video that may be several seconds before or after the key frame.

In another embodiment, the monitoring target is a human face, in this case, S102 may include: and intercepting a face area in the image containing the face according to the recognition result. Alternatively, one or more frames of video images including the face region may be cut out from the video stream according to the recognition result.

S103: and identifying the intercepted video image to obtain the classification information of each monitoring target.

In the above one embodiment, the monitoring target is a moving target, in this case, S103 may include: inputting the intercepted video image into a first neural network model obtained by pre-training, and classifying moving targets in the video image by using the first neural network model to obtain classification information of each moving target output by the first neural network model.

For example, the moving object may be a person, a vehicle, or the like. The first neural network model is a model for classifying the moving target. The process of training the first neural network model may include: acquiring a sample image to be trained, wherein the sample image can comprise moving targets such as people or vehicles; adding labels to various moving objects in the sample image, wherein the labels are the types of the moving objects, such as vehicles, personnel and the like; and inputting the sample image into a neural network with a preset structure, carrying out iterative adjustment on the neural network by taking the label as supervision, and obtaining a trained first neural network model when an iteration ending condition is met.

The video image captured in S102 is input into the first neural network model, and the first neural network model can output the classification information of each moving object in the video image, where the classification information is the category of the moving object, such as vehicle, person, and the like.

For example, some scenes have a high security level, and perimeter protection is required for the scenes, that is, whether people or vehicles enter the scenes or not is judged. By applying the embodiment, on one hand, the classification information of each moving target is obtained, if the classification information is personnel or vehicles, related personnel can be timely reminded to perform subsequent processing, and effective perimeter precaution is realized; on the other hand, firstly, the moving object detection is performed on the video stream, the moving object detection algorithm can be understood as a coarse detection algorithm with a small calculation amount, then, a small part of video image in the video stream is intercepted, and only the small part of video image is subjected to fine identification, namely, the fine identification is performed by using

The first neural network model identifies the classification information of the moving target, and compared with the scheme of analyzing the whole video stream, the scheme reduces the calculation amount.

In another embodiment, the monitoring target is a human face, in this case, S103 may include: and matching the intercepted human face area with human face data stored in a human face database to obtain the classification information of the human face area.

For example, some scenarios only allow authorized persons to enter, and stranger (unauthorized person) identification schemes need to be executed for the scenarios, and the present embodiment may be adopted in such cases. For example, the face data of authorized persons may be stored in the face database, and the face region captured in S102 is matched with the face database, that is, whether a person in the video stream is an authorized person is determined. The classification information of the face region may be: the face data successfully matched with the face area exists or does not exist in the face database; alternatively, the classification information of the face region may also be: authorized or unauthorized persons (strangers).

As another example, some scenarios may require identifying a designated person, such as an attendance checking scenario, or a VIP (important person) identification scenario, and the present embodiment may also be used in these scenarios. For example, the face data of the designated person may be stored in the face database, and the face region captured in S102 is matched with the face database, that is, whether the person in the video stream is the designated person is determined. The classification information of the face region may be: the face data successfully matched with the face area exists or does not exist in the face database; alternatively, the classification information of the face region may also be: designated person or non-designated person.

In one case, S103 may include: inputting the intercepted human face area into a second neural network model obtained by pre-training, and converting the human face area into modeling data by using the second neural network model; obtaining classification information of the face region by matching the modeling data with face data stored in a face database, wherein the classification information comprises: and the face data successfully matched with the modeling data exists or does not exist in the face database.

The second neural network model may be a face modeling model, which may convert the face image into modeling data, i.e., structural data. In this case, the face database stores the modeling data (structure data) converted by the second neural network model. And matching the modeling data obtained after the conversion of the face region intercepted in the step S102 with the modeling data in the face database, wherein if the matching is successful, the person corresponding to the face region is an authorized person or a designated person, and if the matching is unsuccessful, the person corresponding to the face region is an unauthorized person (stranger) or a non-designated person.

By applying the embodiment, on one hand, the classification information of the face area is obtained, whether the personnel is authorized personnel or appointed personnel can be judged according to the classification information, and related personnel are timely reminded to carry out subsequent processing according to the judgment result, so that effective stranger alarm or appointed personnel identification can be realized; on the other hand, the face recognition is firstly carried out on the video stream, the face recognition algorithm can be understood as a coarse detection algorithm, the calculated amount is small, then a small part of video images (or image areas) in the video stream are intercepted, and only the intercepted part is subjected to fine recognition, namely face matching is carried out.

As an embodiment, after S103, the method may further include: judging whether the classification information meets a preset alarm condition or not; and if the data are matched, outputting alarm information.

In one embodiment, the monitoring target is a moving target, in which case the preset alarm condition may include: the classification information of the moving target is personnel; or the classification information of the moving target is a vehicle.

As described above, if perimeter precaution is required, that is, whether a person or a vehicle enters a scene is determined, the present embodiment may be adopted to determine whether the classification information of the moving object is a person or a vehicle, and if the determination result is yes, alarm information is output.

In another embodiment, the monitored target is a human face, and in this case, the preset alarm condition may include: the classification information of the face region is as follows: and the face data successfully matched with the modeling data exists or does not exist in the face database.

As described above, if a stranger (unauthorized person) identification scheme needs to be executed, the present embodiment may be adopted to determine whether face data successfully matched with the modeling data corresponding to the face region exists in the face database, if so, it indicates that the person corresponding to the face region is an authorized person, and if not, it indicates that the person corresponding to the face region is a stranger (unauthorized person), and output alarm information.

If the designated person needs to be identified, the embodiment can be adopted to judge whether the face database has face data successfully matched with the modeling data corresponding to the face area, if so, the person corresponding to the face area is indicated as the designated person, and alarm information is output.

In one embodiment, S101 and S102 may be performed by the IPC, which then sends the captured video image to the NVR, which performs the subsequent steps.

With the embodiment of the invention shown in fig. 1, a surveillance target is detected in a captured video stream; intercepting a video image containing a monitoring target; identifying the intercepted video image to obtain the classification information of each monitoring target; therefore, in the scheme, the whole video stream is not classified and identified, but the video image containing the monitoring target is intercepted, and only the intercepted video image is classified and identified, so that the calculation amount is reduced.

Fig. 2 is a schematic flowchart of a second video analysis method according to an embodiment of the present invention, including:

s201: moving objects are detected in the captured video stream.

S202: and intercepting one or more frames of video images containing the moving target.

S203: inputting the intercepted video image into a first neural network model obtained through pre-training, and classifying moving targets in the video image by using the first neural network model to obtain classification information of each moving target output by the first neural network model.

S204: judging whether the classified information meets preset alarm conditions or not; the preset alarm condition comprises the following steps: the classification information of the moving target is personnel; or the classification information of the moving target is a vehicle. If yes, S205 is executed.

S205: and outputting alarm information.

For example, in a scene that perimeter precaution is required, the embodiment of fig. 2 of the present invention may be applied to determine whether a person or a vehicle enters the scene, and alarm if the determination result is yes.

By applying the embodiment of the invention shown in FIG. 2, on the first aspect, the video image is identified by using the first neural network model to obtain the classification information of each moving target, if the classification information is personnel or vehicles, the relevant personnel can be timely reminded to perform subsequent processing, and effective perimeter precaution is realized; on the other hand, moving object detection is performed on the video stream firstly, the moving object detection algorithm can be understood as a coarse detection algorithm, the calculated amount is small, then a small part of video images in the video stream are intercepted, and only the small part of video images are subjected to fine identification, namely the classification information of the moving objects is identified by utilizing the first neural network model.

In some related schemes, an infrared detector is used for emitting infrared laser, the infrared laser forms a monitoring area, and when a person breaks into the monitoring area, the waveform of the infrared laser changes, so that whether the person breaks into the monitoring area can be judged based on the waveform of the infrared laser. However, in this scheme, the monitoring area formed by the infrared laser emitted by one infrared detector is limited, and if the area to be monitored is large, a plurality of infrared detectors need to be arranged, which is high in cost.

By adopting the embodiment, the monitoring is carried out according to the image collected by the image collecting equipment, a plurality of infrared detectors are not required to be arranged, and the monitoring cost is reduced.

An embodiment applied to the surrounding protection scene is described below with reference to fig. 3:

the method comprises the steps that a monitoring point (which can be IPC) collects a video stream, the video stream is subjected to moving target detection, one or more frames of video images containing moving targets are intercepted according to detection results, and the intercepted video images are sent to the NVR.

The NVR receives a video image sent by the monitoring point, inputs the video image into a first neural network model obtained through pre-training, and classifies moving targets in the video image by using the first neural network model to obtain classification information of each moving target output by the first neural network model. The classification information may be, but is not limited to, a person, a vehicle, an object, and the like.

Assuming that the preset alarm condition is as follows: the classification information of the moving target is personnel; alternatively, the classification information of the moving object is a vehicle. And if the classification information of the moving target output by the first neural network model is a vehicle or a person, outputting alarm information.

In the embodiment, the alarm information is output only under the condition that the classification information accords with the preset alarm condition, so that the false alarm condition caused by wind-blown grass movement, pet interference and light change can be reduced, and the alarm accuracy is improved.

Fig. 4 is a schematic flow chart of a video analysis method according to an embodiment of the present invention, including:

s401: and carrying out face recognition in the collected video stream to obtain a recognition result.

S402: and intercepting a face area in the image containing the face according to the recognition result.

S403: and inputting the intercepted human face region into a second neural network model obtained by pre-training, and converting the human face region into modeling data by using the second neural network model.

S404: obtaining the classification information of the face region by matching the modeling data with face data stored in a face database; the classification information includes: and the face data successfully matched with the modeling data exists or does not exist in the face database.

For example, a face image of an authorized person may be collected in advance, the face image may be converted into modeling data by using the second neural network model, and the modeling data obtained by the conversion may be stored in the face database as face data.

S405: judging whether the classification information meets a preset alarm condition or not; the preset alarm condition comprises the following steps: the classification information of the face region is as follows: and the face data successfully matched with the modeling data exists or does not exist in the face database. If yes, go to S406.

S406: and outputting alarm information.

For example, in a scene that a stranger (unauthorized person) needs to be identified, the embodiment of fig. 4 of the present invention may be applied to store the face data of the authorized person in the face database, and match the intercepted modeling data converted from the face region with the face database, that is, determine whether the person in the video stream is an authorized person. And if the person in the video stream is judged to be a stranger (unauthorized person), alarming is carried out.

As another example, if it is necessary to identify a designated person, the embodiment of fig. 4 of the present invention may be applied to store the face data of the designated person in the face database, and match the modeling data converted from the intercepted face area with the face database, that is, determine whether the person in the video stream is the designated person. And if the person in the video stream is determined to be the designated person, alarming.

By applying the embodiment shown in fig. 4 of the invention, on one hand, the classification information of the face area is obtained, whether the person is an authorized person or an appointed person can be judged according to the classification information, and related persons are timely reminded to carry out subsequent processing according to the judgment result, so that effective stranger alarm or appointed person identification can be realized; on the other hand, the face recognition is firstly carried out on the video stream, the face recognition algorithm can be understood as a coarse detection algorithm, the calculated amount is small, then a small part of video images (or image areas) in the video stream are intercepted, and only the intercepted part is subjected to fine recognition, namely face matching is carried out.

An embodiment applied to a stranger alarm scenario is described below with reference to fig. 5:

a monitoring point (which can be IPC) collects a video stream, performs face recognition on the video stream, and intercepts one or more frames of face images containing faces or intercepts a face area in the images according to the recognition result; and sending the intercepted face image or the face area to the NVR. For convenience of description, the cut face image or the face region is collectively referred to as a face image.

The NVR receives the face image sent by the monitoring point, inputs the face image into a second neural network model obtained through pre-training, and converts the face image into modeling data by using the second neural network model; matching the modeling data obtained by conversion with face data stored in a face database; if the matching is successful, the person corresponding to the face area is an authorized person, and if the matching is unsuccessful, the person corresponding to the face area is a stranger, and alarm information is output.

Corresponding to the foregoing method embodiment, an embodiment of the present invention further provides a video analysis apparatus, as shown in fig. 6, including:

the detection module 601 is configured to detect a monitoring target in the acquired video stream;

an intercepting module 602, configured to intercept a video image including the monitoring target;

the classification module 603 is configured to obtain classification information of each monitoring target by identifying the captured video image.

As an embodiment, the detection module 601 is specifically configured to: detecting a moving object in the acquired video stream;

intercept module 602 is specifically configured to: and intercepting one or more frames of video images containing the moving target.

As an embodiment, the classification module 603 is specifically configured to:

As an embodiment, the detection module 601 is specifically configured to: carrying out face recognition in the collected video stream to obtain a recognition result;

intercept module 602 is specifically configured to: intercepting a face area in an image containing a face according to the recognition result;

the classification module 603 is specifically configured to: and matching the intercepted human face area with human face data stored in a human face database to obtain the classification information of the human face area.

As an embodiment, the classification module 603 is specifically configured to:

As an embodiment, the apparatus further comprises: a first judging module and a first alarming module (not shown in the figure), wherein,

the first alarm module is used for outputting alarm information.

As an embodiment, the apparatus further comprises: a second judging module and a second alarm module (not shown in the figure), wherein,

and the second alarm module is used for outputting alarm information.

As an embodiment, the apparatus further comprises: a third judging module and a third alarm module (not shown in the figure), wherein,

and the third alarm module is used for outputting alarm information.

An embodiment of the present invention further provides an electronic device, as shown in fig. 7, including a processor 701 and a memory 702,

a memory 702 for storing a computer program;

the processor 701 is configured to implement any of the video analysis methods described above when executing the program stored in the memory 702.

The Memory mentioned in the above electronic device may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. The memory may also be at least one memory device located remotely from the processor as an embodiment.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

An embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the computer program implements any one of the video analysis methods described above.

An embodiment of the present invention further provides a video analysis system, as shown in fig. 8, including: a monitoring point and a processing device, wherein,

the monitoring point is used for detecting a monitoring target in the acquired video stream; intercepting a video image containing the monitoring target; sending the intercepted video image to the processing device;

and the processing equipment is used for receiving the video images and identifying the received video images to obtain the classification information of each monitoring target.

For example, the monitoring point may be IPC, and the processing device may be NVR, which is not limited specifically.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, the device embodiment, the computer-readable storage medium embodiment, and the system embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and the relevant points can be referred to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method of video analysis, comprising:

detecting a monitoring target in the collected video stream;

intercepting a video image containing the monitoring target;

2. The method of claim 1, wherein detecting a surveillance target in the captured video stream comprises: detecting a moving object in the acquired video stream;

the intercepting of the video image containing the monitoring target comprises:

3. The method according to claim 2, wherein the obtaining of the classification information of each monitoring target by identifying the intercepted video image comprises:

4. The method of claim 1, wherein detecting a surveillance target in the captured video stream comprises: carrying out face recognition in the collected video stream to obtain a recognition result;

the intercepting of the video image containing the monitoring target comprises:

5. The method of claim 4, wherein the obtaining the classification information of the face region by matching the intercepted face region with face data stored in a face database comprises:

6. The method according to claim 1, wherein after the obtaining of the classification information of each monitoring target by identifying the intercepted video image, the method further comprises:

and if the data are matched, outputting alarm information.

7. The method of claim 3, further comprising, after said obtaining classification information for each moving object output by the first neural network model:

the preset alarm condition comprises the following steps:

8. The method according to claim 5, further comprising, after the obtaining the classification information of the face region:

the preset alarm condition comprises the following steps:

9. A video analysis apparatus, comprising:

10. The apparatus according to claim 9, wherein the detection module is specifically configured to: detecting a moving object in the acquired video stream;

11. The apparatus according to claim 10, wherein the classification module is specifically configured to:

12. The apparatus according to claim 9, wherein the detection module is specifically configured to: carrying out face recognition in the collected video stream to obtain a recognition result;

13. The apparatus according to claim 12, wherein the classification module is specifically configured to:

14. The apparatus of claim 9, further comprising:

the first alarm module is used for outputting alarm information.

15. The apparatus of claim 11, further comprising:

and the second alarm module is used for outputting alarm information.

16. The apparatus of claim 13, further comprising:

and the third alarm module is used for outputting alarm information.