WO2020168960A1 - Video analysis method and apparatus - Google Patents

Video analysis method and apparatus Download PDF

Info

Publication number
WO2020168960A1
WO2020168960A1 PCT/CN2020/074895 CN2020074895W WO2020168960A1 WO 2020168960 A1 WO2020168960 A1 WO 2020168960A1 CN 2020074895 W CN2020074895 W CN 2020074895W WO 2020168960 A1 WO2020168960 A1 WO 2020168960A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
video image
classification information
information
intercepted
Prior art date
Application number
PCT/CN2020/074895
Other languages
French (fr)
Chinese (zh)
Inventor
范慧慧
王天宇
高在伟
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Publication of WO2020168960A1 publication Critical patent/WO2020168960A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Definitions

  • This application relates to the field of surveillance technology, and in particular to a video analysis method and device.
  • a monitoring device is set in a scene that needs to be monitored, and the monitoring device collects the video stream of the scene, analyzes the collected video stream, and determines whether there are persons or vehicles illegally breaking into the scene.
  • the overall analysis of the video stream is performed, that is, the target in each frame of the video image of the video stream is accurately classified and identified, which requires a large amount of calculation.
  • the purpose of the embodiments of the present application is to provide a video analysis method and device to reduce the amount of calculation.
  • an embodiment of the present application provides a video analysis method, including:
  • the detecting the monitoring target in the collected video stream includes: detecting the moving target in the collected video stream;
  • the intercepting the video image containing the monitoring target from the video stream includes:
  • Intercepting one or more frames of video images containing the moving target from the video stream Intercepting one or more frames of video images containing the moving target from the video stream.
  • the classification and recognition of the monitoring target in the intercepted video image to obtain the classification information of the monitoring target includes:
  • the output classification information of the moving target
  • the detecting the monitoring target in the collected video stream includes: performing face recognition in the collected video stream to obtain a recognition result;
  • the intercepting the video image containing the monitoring target from the video stream includes:
  • the classifying and identifying the monitoring target in the intercepted video image to obtain the classification information of the monitoring target includes:
  • the intercepted video image is matched with the face data stored in the face database to obtain the classification information of the face.
  • the matching the captured video image with the face data stored in the face database to obtain the classification information of the face includes:
  • the modeling data is matched with the face data stored in the face database to obtain the classification information of the face.
  • the classification information of the face is the first tag information or the second tag information, and the first The tag information indicates that there is face data that successfully matches the modeling data in the face database, and the second tag information indicates that there is no face data that successfully matches the modeling data in the face database. .
  • the method further includes:
  • the method further includes:
  • the preset alarm condition includes: the classification information of the moving target is a person, and/or the classification information of the moving target is a vehicle.
  • the method further includes:
  • the preset alarm condition includes: the classification information of the human face is the first tag information, or the classification information of the human face is the second tag information.
  • an embodiment of the present application also provides a video analysis device, including:
  • the detection module is used to detect the monitoring target in the collected video stream
  • An interception module for intercepting a video image containing the monitoring target from the video stream
  • the classification module is used for classifying and identifying the monitoring target in the intercepted video image, and obtaining classification information of the monitoring target.
  • the detection module is specifically configured to: detect a moving target in the collected video stream;
  • the interception module is specifically configured to intercept one or more frames of video images containing the moving target from the video stream.
  • the classification module is specifically used for:
  • the output classification information of the moving target
  • the detection module is specifically configured to: perform face recognition in the collected video stream to obtain a recognition result
  • the interception module is specifically configured to: according to the recognition result, intercept a face area from a video image containing a face in the video stream as the intercepted video image;
  • the classification module is specifically configured to match the intercepted video image with the face data stored in the face database to obtain the classification information of the face.
  • the classification module is specifically used for:
  • the modeling data is matched with the face data stored in the face database to obtain the classification information of the face.
  • the classification information of the face is the first tag information or the second tag information, and the first The tag information indicates that there is face data that successfully matches the modeling data in the face database, and the second tag information indicates that there is no face data that successfully matches the modeling data in the face database. .
  • the device further includes:
  • the first judgment module is used to judge whether the classification information of the monitoring target meets the preset alarm condition; if it does, trigger the first alarm module;
  • the first alarm module is used to output alarm information.
  • the device further includes:
  • the second judgment module is used to judge whether the classification information of the sports target meets a preset alarm condition;
  • the preset alarm conditions include: the classification information of the sports target is a person, and/or the classification of the sports target The information is the vehicle; if it matches, the second alarm module is triggered;
  • the second alarm module is used to output alarm information.
  • the device further includes:
  • the third judgment module is used to judge whether the classification information of the face meets a preset alarm condition;
  • the preset alarm condition includes: the classification information of the face is the first tag information, or the face The classification information of is the second tag information; if it matches, the third alarm module is triggered;
  • the third alarm module is used to output alarm information.
  • an embodiment of the present application also provides an electronic device, including a processor and a memory;
  • the memory is used to store computer programs
  • the processor is configured to execute a program stored in the memory to implement any of the steps of the video analysis method described above.
  • an embodiment of the present application further provides a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the steps of any of the foregoing video analysis methods are implemented.
  • an embodiment of the present application further provides a computer program, which, when executed by a processor, implements the steps of any of the foregoing video analysis methods.
  • the monitoring target is detected in the collected video stream; the video image containing the monitoring target is intercepted from the video stream; the monitoring target in the intercepted video image is classified and identified to obtain the classification information of the monitoring target. It can be seen that, in the solution provided by the embodiment of the present application, the monitoring target in all the video images of the video stream is not accurately classified and identified, but the video image containing the monitoring target is intercepted, and only the monitoring target in the intercepted video image Perform accurate classification and recognition, reducing the amount of calculation.
  • FIG. 1 is a schematic diagram of the first flow of a video analysis method provided by an embodiment of this application;
  • FIG. 2 is a schematic diagram of a second flow of a video analysis method provided by an embodiment of this application;
  • FIG. 3 is a schematic diagram of the interaction between a monitoring point and an NVR provided by an embodiment of the application
  • FIG. 4 is a schematic diagram of a third process of a video analysis method provided by an embodiment of the application.
  • FIG. 5 is another schematic diagram of the interaction between the monitoring point and the NVR provided by the embodiment of the application.
  • FIG. 6 is a schematic structural diagram of a video analysis device provided by an embodiment of the application.
  • FIG. 7 is a schematic diagram of a structure of an electronic device provided by an embodiment of the application.
  • FIG. 8 is a schematic structural diagram of a video analysis system provided by an embodiment of the application.
  • embodiments of the present application provide a video analysis method and device.
  • the method and device can be applied to cameras, such as IPC (IP Camera, network camera), or can be applied to NVR (Network Video Recorder, network Hard disk video recorder), or can be applied to other electronic equipment, or can be applied to video analysis system, the specific is not limited.
  • IPC IP Camera, network camera
  • NVR Network Video Recorder, network Hard disk video recorder
  • FIG. 1 is a schematic diagram of the first flow of a video analysis method provided by an embodiment of this application, including:
  • the monitoring target may be a person, a vehicle, an object, an animal, etc.
  • the captured video stream includes multiple frames of video images. Based on this, the foregoing detection of the monitoring target in the collected video stream may specifically be: detecting the monitoring target in each frame of video image of the video stream. Since the video stream includes multiple frames of video images, the monitoring target may be detected in the multiple frames of video images included in the video stream.
  • the monitoring target may be a moving target.
  • S101 may include: detecting a moving target in the collected video stream.
  • the frame difference method, or the background subtraction algorithm, or the optical flow method can be used to detect the moving target in the video stream.
  • S101 may include: detecting the moving target in each frame of video image of the video stream.
  • the monitoring target may be a human face.
  • S101 may include: performing face recognition in the collected video stream to obtain the recognition result.
  • a face recognition algorithm can be used to recognize faces in a video stream and obtain the recognition results.
  • S101 may include: performing face recognition in each frame of video image in the video stream to obtain the recognition result of each frame of video image.
  • the recognition result of each frame of video image may include a first recognition result or a second recognition result, where the first recognition result indicates that the video image includes a human face, and the second recognition result indicates that the video image does not include a human face.
  • the recognition result of a frame of video image includes the first recognition result
  • the recognition result of the frame of video image may also include position information of the face region in the video image.
  • the face area is the area where the face in the video image is located.
  • S102 Intercept a video image containing the monitored target.
  • the video image is an image in a video stream.
  • the foregoing S102 may specifically be: intercepting a video image containing a monitoring target from a video stream.
  • the monitoring target is detected in multiple frames of video images, it is possible to intercept multiple frames of video images.
  • the monitoring target is a moving target.
  • S102 may include: intercepting one or more frames of video images containing the moving target from the video stream.
  • S102 may include: if a frame of video image in the video stream contains a moving target, intercept the video image from the video stream; if it is in multiple frames of video images in the video stream If they all contain moving targets, the multiple frames of video images are intercepted from the video stream.
  • the key frame of the video stream contains a moving target. If it contains, intercept multiple frames of video images within a few seconds before and after the key frame, and form a small video from these multiple frames of video images. In this way, it is not necessary to detect all the video images of the video stream, which further reduces the calculation.
  • the monitoring target is a human face.
  • S102 may include: according to the above-mentioned recognition result, intercepting a face area from a video image containing a face in the video stream as the intercepted video image. Or, according to the recognition result, one or more frames of video images containing the face area may be intercepted in the video stream.
  • S102 may include: according to the above recognition result, if it is determined that a frame of the video image of the video stream contains a human face, intercepting the human face area from the video image as the intercepted video image.
  • S102 may further include: according to the above recognition result, if it is determined that one frame of the video image of the video stream contains a human face, intercepting the video image from the video stream.
  • S103 Obtain classification information of the monitored target by recognizing the intercepted video image.
  • the surveillance target in the intercepted video image is classified and identified to obtain the classification information of the surveillance target.
  • the monitoring target is a moving target.
  • S103 may include: inputting the intercepted video image into the first neural network model obtained by pre-training, and using the first neural network model to classify the moving target in the video image to obtain the first neural network model Classification information of the output moving target.
  • the monitoring target is a moving target
  • the intercepted video images into the pre-trained first neural network model, and use the first neural network model to analyze each video image
  • the moving target is classified to obtain the classification information of the moving target contained in each video image output by the first neural network model.
  • the moving target can be people, vehicles, and so on.
  • the first neural network model is a model for classifying moving targets.
  • the process of obtaining the first neural network model by training may include: obtaining a sample image, which may include moving targets such as people or vehicles; adding a label based on the moving target in the sample image, the label is used to indicate the type of the moving target , Such as vehicles, personnel, etc.; input the sample image into the neural network of the preset structure, and use the label as supervision to adjust the parameters of the neural network iteratively; when the iteration end condition is met, the training is completed
  • the first neural network model can be a deep neural network or a convolutional neural network.
  • the video image intercepted in S102 is input to the first neural network model, and the first neural network model outputs classification information of the moving target in the video image.
  • the classification information of the moving target is the type of the moving target, such as vehicles, people and so on.
  • the first neural network model is used to identify the moving target in the video image to obtain the classification information of the moving target. If the classification information is a person or a vehicle, the relevant personnel can be reminded for subsequent processing in time. Achieve effective perimeter prevention.
  • the moving target detection algorithm is a rough detection algorithm.
  • the calculation complexity of the moving target detection algorithm is low, and the amount of calculation is small; after the moving target is detected, the video is intercepted A small part of the video images in the stream are only accurately classified and recognized for the moving targets in this small part of the video images.
  • the first neural network model is used to identify the classification information of the moving targets.
  • Accurate classification and recognition of moving targets in all video images of the video stream is not performed. Compared with the precise classification and recognition of moving targets in all video images of the video stream, the amount of calculation is reduced.
  • the monitoring target is a human face.
  • S103 may include: matching the intercepted video image with the face data stored in the face database to obtain the classification information of the face contained in the intercepted video image.
  • the face data may be a face image or feature information extracted from a face image. The embodiment of the present application does not limit this.
  • the intercepted video images are matched with the face data stored in the face database to obtain the face data contained in each intercepted video image. Classified information.
  • the face data of authorized persons can be stored in the face database, and the video image intercepted in S102 can be matched with the face data stored in the face database to determine whether the person in the video stream is an authorized person.
  • the classification information of the face contained in the video image may be the first tag information or the second tag information.
  • the first tag information indicates that there is face data that matches the intercepted video image successfully in the face database, and the second tag information indicates the person There is no face data that matches the captured video image successfully in the face database.
  • the human face classification information contained in the intercepted video image may also be an authorized person or an unauthorized person. In this case, the person corresponding to the intercepted video image is an authorized person or an unauthorized person.
  • the face data of a designated person can be stored in the face database, and the video image intercepted in S102 can be matched with the face data stored in the face database to determine whether the person in the video stream is the designated person.
  • the classification information of the face contained in the video image may be the first tag information or the second tag information.
  • the first tag information indicates that there is face data that matches the intercepted video image successfully in the face database, and the second tag information indicates the person There is no face data that matches the captured video image successfully in the face database.
  • the human face classification information contained in the intercepted video image may also be a designated person or a non-designated person. In this case, the person corresponding to the intercepted video image is a designated person or a non-designated person.
  • S103 may include: inputting the intercepted video image into a second neural network model obtained by pre-training, and using the second neural network model to convert the intercepted video
  • the image is converted into modeling data; the modeling data is matched with the face data stored in the face database to obtain the classification information of the face contained in the intercepted video image.
  • the human face classification information contained in the intercepted video image is the first tag information or the second tag information.
  • the aforementioned modeling data is the data output after the second neural network model processes the intercepted video image.
  • the intercepted video images can be input into the pre-trained second neural network model, and the second neural network model can be used to intercept the Convert each video image of the video image into the modeling data corresponding to the video image; match the modeling data corresponding to each video image intercepted with the face data stored in the face database to obtain each intercepted video image Contains the classification information of the face.
  • the second neural network model may be a face modeling model, and the second neural network model may convert a face image into modeling data, and the modeling data is a kind of structure data.
  • the process of training to obtain the second neural network model may include: obtaining a sample face image and labels of objects in the sample face image; inputting the sample face image into a neural network with a preset structure to The label is supervised, and the parameters of the neural network are adjusted iteratively; when the iteration end condition is met, the second neural network model that has been trained is obtained.
  • the neural network with a preset structure can be a deep neural network or a convolutional neural network.
  • the network layer that outputs the modeling output may be specified first.
  • the face data stored in the face database is modeling data obtained after the sample face image is transformed by the second neural network model. That is, the face data stored in the face database is data output by the specified network layer of the second neural network model after the sample face image is input to the second neural network model.
  • the modeling data obtained after converting the video image intercepted in S102 is matched with the face data in the face database. If the matching is successful, it means that the person corresponding to the intercepted video image is an authorized person or a designated person. Success means that the person corresponding to the captured video image is an unauthorized person or an unspecified person.
  • the second neural network model is used to obtain the classification information of the face contained in the intercepted video image.
  • the classification information it is determined whether the person is an authorized person or a designated person, and according to The judgment result promptly reminds relevant personnel for follow-up processing, which can realize effective stranger alarm or identification of designated personnel.
  • face recognition is performed on the video stream first.
  • the face recognition algorithm is a rough detection algorithm.
  • the calculation complexity of the face recognition algorithm is low and the calculation amount is small; after the moving target is detected, the video is intercepted A small part of the video image or image area in the stream, only a small part of the intercepted video image or the face in the image area is accurately classified and recognized, and the classification information of the face is determined.
  • no Accurate classification and recognition of human faces in all video images of the video stream reduces the amount of calculation compared with accurate classification and recognition of human faces in all video images of the video stream.
  • S103 may further include: judging whether the classification information of the monitoring target meets the preset alarm condition; if so, outputting the alarm information. If it does not meet the requirements, no treatment is required.
  • the monitoring target is a moving target.
  • the preset alarm condition may include: the classification information of the moving target is a person, and/or the classification information of the moving target is a vehicle.
  • the solution provided in the embodiments of this application can be used, and the preset alarm conditions including the classification information of the moving target are used as the classification of the people and the moving target.
  • the information is a vehicle as an example. It is judged whether the classification information of the moving object is a person or a vehicle. If the classification information of the moving object is a person or the judgment result is a vehicle with classification information of the moving object, an alarm information is output.
  • the monitoring target is a human face.
  • the preset alarm condition may include: the classification information of the human face is the first tag information, or the classification information of the human face is the second tag information.
  • the face data of authorized persons is stored in the face database, and it is judged whether there is a signature corresponding to the intercepted video image in the face database.
  • the model data matches the face data successfully. If it exists, it means that the person corresponding to the intercepted video image is an authorized person, and the classification information of the face contained in the intercepted video image is an authorized person, and no processing is required. If it does not exist, it means that the person corresponding to the intercepted video image is a stranger, and the classification information of the face contained in the intercepted video image is a stranger, and an alarm message is output.
  • the face data of the designated person is stored in the face database, and it is determined whether there is a matching modeling data corresponding to the intercepted video image in the face database Successful face data. If it exists, it means that the person corresponding to the intercepted video image is the designated person, the classification information of the face contained in the intercepted video image is the designated person, and the alarm information is output. If it does not exist, it means that the person corresponding to the intercepted video image is an unspecified person, and the classification information of the face contained in the intercepted video image is an unspecified person, and no processing is required.
  • S101 and S102 may be executed by IPC, and then IPC sends the intercepted video image to NVR, and the NVR executes the subsequent steps.
  • the monitoring target is detected in the collected video stream; the video image containing the monitoring target is intercepted from the video stream; the monitoring target in the intercepted video image is classified and identified to obtain the monitoring target Classified information. It can be seen that, in the solution provided by the embodiment of the present application, the monitoring target in all the video images of the video stream is not accurately classified and identified, but the video image containing the monitoring target is intercepted, and only the monitoring target in the intercepted video image Perform accurate classification and recognition, reducing the amount of calculation.
  • FIG. 2 is a schematic diagram of the second flow of the video analysis method provided by an embodiment of the application, including:
  • S202 Capture one or more frames of video images containing the moving target.
  • S203 Input the intercepted video image into the pre-trained first neural network model, and use the first neural network model to classify the moving target in the intercepted video image to obtain the moving target output by the first neural network model Classification information.
  • S204 Determine whether the classification information meets the preset alarm condition; where the preset alarm condition includes: the classification information of the moving target is a person, and/or the classification information of the moving target is a vehicle. If it matches, execute S205.
  • the classification information in the above step S204 is the classification information of the moving target.
  • step S204 If the classification information of the moving object obtained in step S204 is a person, or the classification information of the moving object is a vehicle, it is determined that the classification information of the moving object meets the preset alarm condition, and step S205 is executed to output the alarm information. If the classification information of the moving object obtained in step S204 is neither a person nor a vehicle, it is determined that the classification information of the moving object does not meet the preset alarm condition, and no processing may be performed.
  • the embodiment shown in FIG. 2 of the present application can be used to determine whether a person or vehicle enters the scene, and an alarm is issued if the determination result is yes.
  • the first neural network model is used to identify the moving target in the video image to obtain the classification information of the moving target. If the classification information is a person or a vehicle, the relevant personnel can be reminded for subsequent processing in time. Achieve effective perimeter prevention.
  • the moving target detection algorithm can be understood as a rough detection algorithm.
  • the calculation complexity of the moving target detection algorithm is low, and the amount of calculation is small; after the moving target is detected, Intercept a small part of the video image in the video stream, and only perform accurate classification and recognition of the moving target in this small part of the video image.
  • the first neural network model is used to identify the classification information of the moving target.
  • the solution provided in the embodiment of the application The accurate classification and recognition of moving targets in all video images of the video stream is not performed. Compared with the accurate classification and recognition of moving targets in all video images of the video stream, the amount of calculation is reduced.
  • infrared detectors are used to emit infrared lasers, and the area covered by the infrared lasers forms a monitoring area.
  • the waveform of the infrared laser changes, so it can be judged whether someone breaks into the monitoring area based on the waveform of the infrared laser.
  • this infrared laser-based monitoring solution for the monitoring area since the infrared laser emitted by one infrared detector covers a limited area, if the monitoring area is large, multiple infrared detectors need to be set up, and the monitoring cost is high.
  • the monitoring area is monitored based on the images collected by the image acquisition device.
  • the field of view of one image acquisition device is relatively large.
  • Using one image acquisition device can realize larger monitoring.
  • Regional monitoring, and the cost of one image acquisition device is lower than the cost of multiple infrared detectors, reducing monitoring costs.
  • the following describes an implementation manner in which the video analysis method provided in an embodiment of the present application is applied in a perimeter defense scenario in conjunction with FIG. 3.
  • the monitoring point in Figure 3 can be IPC.
  • the monitoring point collects the video stream and detects the moving target of the video stream. According to the detection result, one or more frames of video images containing the moving target are intercepted from the video stream, and the intercepted video image is sent to the NVR.
  • the NVR receives the video image sent by the monitoring point, inputs the video image into the first neural network model obtained in advance, and uses the first neural network model to classify the moving targets in the video image to obtain the motion output by the first neural network model Classification information of the target.
  • the classification information of the moving target can be persons, vehicles, objects, etc., and is not specifically limited.
  • the preset alarm condition is: the classification information of the moving target is a person, and/or the classification information of the moving target is a vehicle. If the classification information of the moving target output by the first neural network model is a vehicle or a person, the NVR outputs alarm information.
  • the alarm information is output, which can reduce false alarms caused by disturbance, pet interference, and light changes, and improve the alarm accuracy.
  • FIG. 4 is a schematic diagram of a third process of a video analysis method provided by an embodiment of this application, including:
  • S401 Perform face recognition in the collected video stream to obtain a recognition result.
  • Step S402 may specifically be: according to the recognition result, intercepting a face area in a video image containing a face in the video stream as the intercepted video image.
  • the captured video image can be considered as a face image.
  • S403 Input the intercepted face area into a second neural network model obtained through pre-training, and use the second neural network model to convert the face area into modeling data.
  • Step S403 may specifically include: inputting the intercepted video image into a second neural network model obtained in advance, and using the second neural network model to convert the intercepted video image into modeling data.
  • S404 Obtain classification information of the face region by matching the modeling data with the face data stored in the face database.
  • Step S404 may specifically be: matching the modeling data with the face data stored in the face database to obtain the classification information of the face contained in the intercepted video image.
  • the classification information of the human face contained in the intercepted video image is the first tag information or the second tag information.
  • the face image of an authorized person or a designated person can be collected in advance, the face image can be converted into modeling data using the second neural network model, and the converted modeling data can be stored as face data in the face database in. Then, the second neural network model is used to convert the intercepted video image into modeling data, and the modeling data is matched with the face data stored in the face database to obtain the classification information of the face area.
  • S405 Determine whether the classification information meets the preset alarm condition; where the preset alarm condition includes: the classification information of the face contained in the intercepted video image is the first tag information; or the information of the face contained in the intercepted video image The classification information is the second label information. If it matches, execute S406.
  • the classification information in the above step S405 is the classification information of the human face contained in the intercepted video image.
  • the preset alarm condition includes that the classification information of the face contained in the intercepted video image is the first tag information
  • the classification information of the face contained in the intercepted video image obtained in step S405 is the first tag information
  • step S406 is executed to output the alarm information. If the classification information of the face contained in the intercepted video image obtained in step S405 is the second tag information, it is determined that the classification information of the face contained in the intercepted video image does not meet the preset alarm condition, and no processing is required.
  • the preset alarm condition includes that the classification information of the face contained in the intercepted video image is the second tag information
  • the classification information of the face contained in the intercepted video image obtained in step S405 is the second tag information
  • step S406 is executed to output the alarm information. If the classification information of the face contained in the intercepted video image obtained in step S405 is the first tag information, it is determined that the classification information of the face contained in the intercepted video image does not meet the preset alarm condition, and no processing is required.
  • the embodiment in Figure 4 of this application can be applied to store the face data of authorized persons in a face database, and the modeled data obtained after the intercepted video image is converted to the face The face data stored in the database is matched to determine whether the person in the video stream is an authorized person. If it is determined that the person in the video stream is a stranger, an alarm is issued.
  • the second neural network model is used to obtain the classification information of the face contained in the intercepted video image, and according to the classification information, it is determined whether the person is an authorized person or a designated person, and According to the judgment result, the relevant personnel are promptly reminded for follow-up processing, which can realize effective stranger alarm or identification of designated personnel.
  • face recognition is performed on the video stream first.
  • the face recognition algorithm is a rough detection algorithm. The computational complexity of the face recognition algorithm is low, and the amount of calculation is small; the video stream is intercepted after the moving target is detected Only a small part of the video image or image area in the video image or the image area is accurately classified and recognized, that is, face matching is performed. This solution does not perform accurate classification and recognition on all the video images in the video stream. Accurate classification and recognition of faces in the video stream reduces the amount of calculation compared to accurate classification and recognition of faces in all video images of the video stream.
  • the following describes an implementation manner in which the video analysis method provided by an embodiment of the present application is applied to a stranger alarm scenario with reference to FIG. 5.
  • the monitoring point in Figure 5 can be IPC.
  • the monitoring point collects the video stream, performs face recognition on the video stream, and intercepts one or more frames of video images containing human faces in the video stream according to the recognition results, or intercepts the face area in the video image containing human faces in the video stream; Send the captured video image or face area to the NVR.
  • the intercepted video images or face regions are collectively referred to as face images.
  • the NVR receives the face image sent by the monitoring point, inputs the face image into the second neural network model obtained in advance, and uses the second neural network model to convert the face image into modeling data; the converted model
  • the model data is matched with the face data stored in the face database; if the matching is successful, it means that the person corresponding to the face image is an authorized person, and the classification information of the face contained in the face image is an authorized person. If the matching is unsuccessful, it means that the person corresponding to the face image is a stranger, and the classification information of the face contained in the face image is a stranger, and an alarm message is output.
  • an embodiment of the present application also provides a video analysis device, as shown in FIG. 6, including:
  • the detection module 601 is used to detect a monitoring target in the collected video stream
  • the interception module 602 is used to intercept the video image containing the monitoring target from the video stream;
  • the classification module 603 is used to classify and recognize the monitoring target in the intercepted video image to obtain the classification information of the monitoring target.
  • the detection module 601 is specifically configured to: detect a moving target in the collected video stream;
  • the interception module 602 is specifically configured to intercept one or more frames of video images containing the moving target from the video stream.
  • the classification module 603 is specifically configured to:
  • the detection module 601 is specifically configured to: perform face recognition in the collected video stream to obtain a recognition result;
  • the interception module 602 is specifically used for: intercepting the face area in the video image containing the face in the video stream according to the recognition result as the intercepted video image;
  • the classification module 603 is specifically configured to match the captured video image with the face data stored in the face database to obtain the classification information of the face.
  • the classification module 603 is specifically configured to:
  • the modeling data is matched with the face data stored in the face database to obtain the classification information of the face.
  • the classification information of the face is the first tag information or the second tag information, and the first tag information indicates the face database There is face data that successfully matches the modeling data, and the second tag information indicates that there is no face data that successfully matches the modeling data in the face database.
  • the above-mentioned video analysis device may further include: a first judgment module and a first alarm module (not shown in the figure), wherein:
  • the first judgment module is used to judge whether the classification information of the monitoring target meets the preset alarm condition; if it meets, the first alarm module is triggered;
  • the first alarm module is used to output alarm information.
  • the above-mentioned video analysis device may further include: a second judgment module and a second alarm module (not shown in the figure), wherein:
  • the second judgment module is used to judge whether the classification information of the moving target meets the preset alarm conditions;
  • the preset alarm conditions include: the classification information of the moving target is a person, and/or the classification information of the moving target is a vehicle; if it matches, trigger the first Two alarm modules;
  • the second alarm module is used to output alarm information.
  • the above-mentioned video analysis device may further include: a third judgment module and a third alarm module (not shown in the figure), wherein:
  • the third judgment module is used to judge whether the classification information of the face meets the preset alarm condition;
  • the preset alarm condition includes: the classification information of the face is the first tag information, or the classification information of the face is the second tag information; if Yes, trigger the third alarm module;
  • the third alarm module is used to output alarm information.
  • the monitoring target is detected in the collected video stream; the video image containing the monitoring target is intercepted from the video stream; the monitoring target in the intercepted video image is classified and identified to obtain the classification information of the monitoring target. It can be seen that, in the solution provided by the embodiment of the present application, the monitoring target in all the video images of the video stream is not accurately classified and identified, but the video image containing the monitoring target is intercepted, and only the monitoring target in the intercepted video image Perform accurate classification and recognition, reducing the amount of calculation.
  • An embodiment of the present application also provides an electronic device, as shown in FIG. 7, including a processor 701 and a memory 702,
  • the memory 702 is used to store computer programs
  • the processor 701 is configured to implement any of the above-mentioned video analysis methods when executing a program stored in the memory 702.
  • the memory mentioned in the above electronic device may include random access memory (Random Access Memory, RAM), and may also include non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk storage.
  • RAM Random Access Memory
  • NVM Non-Volatile Memory
  • the memory may also be at least one storage device located far away from the foregoing processor.
  • the aforementioned processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (Network Processor, NP), etc.; it may also be a digital signal processor (Digital Signal Processing, DSP), a dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • CPU central processing unit
  • NP Network Processor
  • DSP Digital Signal Processing
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • the embodiments of the present application also provide a computer-readable storage medium, and a computer program is stored in the computer-readable storage medium.
  • a computer program is stored in the computer-readable storage medium.
  • the embodiments of the present application also provide a computer program, which implements any of the above-mentioned video analysis methods when the computer program is executed by a processor.
  • An embodiment of the present application also provides a video analysis system, as shown in FIG. 8, including: a monitoring point and processing equipment, where:
  • the monitoring point is used to detect the monitoring target in the collected video stream; intercept the video image containing the monitoring target from the video stream; send the intercepted video image to the processing device;
  • the processing device is used to receive the video image, identify the monitoring target in the received video image, and obtain the classification information of the monitoring target.
  • the monitoring point may be an IPC
  • the processing device may be an NVR, which is not specifically limited.
  • the monitoring target is not accurately identified in all video images of the video stream, but the video image containing the monitoring target is intercepted, and only the intercepted video image is accurately identified, which reduces the amount of calculation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

Provided are a video analysis method and apparatus. The method comprises: detecting a monitored target in a collected video stream (S101); capturing a video image containing the monitored target (S102); and performing recognition on the captured video image to obtain classification information of each monitored target (S103). It can be seen that the method involves capturing a video image containing a monitored target and performing classification and recognition only on the captured video image, instead of performing classification and recognition on the whole video stream, thereby reducing the amount of calculations.

Description

一种视频分析方法及装置Video analysis method and device
本申请要求于2019年2月19日提交中国专利局、申请号为201910121021.1发明名称为“一种视频分析方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on February 19, 2019 with the application number 201910121021.1 and the invention title is "a video analysis method and device", the entire content of which is incorporated into this application by reference.
技术领域Technical field
本申请涉及监控技术领域,特别是涉及一种视频分析方法及装置。This application relates to the field of surveillance technology, and in particular to a video analysis method and device.
背景技术Background technique
相关方案中,在需要进行监控的场景中设置监控设备,监控设备采集该场景的视频流,对采集的视频流进行分析,确定是否存在非法闯入该场景的人员或车辆。上述方案中,是对视频流进行整体分析,也就是,对视频流的每一帧视频图像中的目标进行精确的分类识别,计算量较大。In a related solution, a monitoring device is set in a scene that needs to be monitored, and the monitoring device collects the video stream of the scene, analyzes the collected video stream, and determines whether there are persons or vehicles illegally breaking into the scene. In the above solution, the overall analysis of the video stream is performed, that is, the target in each frame of the video image of the video stream is accurately classified and identified, which requires a large amount of calculation.
发明内容Summary of the invention
本申请实施例的目的在于提供一种视频分析方法及装置,以减少计算量。The purpose of the embodiments of the present application is to provide a video analysis method and device to reduce the amount of calculation.
为达到上述目的,本申请实施例提供了一种视频分析方法,包括:To achieve the foregoing objective, an embodiment of the present application provides a video analysis method, including:
在采集的视频流中检测监控目标;Detect monitoring targets in the captured video stream;
从所述视频流中截取包含所述监控目标的视频图像;Intercepting a video image containing the monitoring target from the video stream;
对所截取的视频图像中的所述监控目标进行分类识别,得到所述监控目标的分类信息。Perform classification and recognition on the monitoring target in the intercepted video image to obtain classification information of the monitoring target.
可选的,所述在采集的视频流中检测监控目标,包括:在采集的视频流中检测运动目标;Optionally, the detecting the monitoring target in the collected video stream includes: detecting the moving target in the collected video stream;
所述从所述视频流中截取包含所述监控目标的视频图像,包括:The intercepting the video image containing the monitoring target from the video stream includes:
从所述视频流中截取包含所述运动目标的一帧或多帧视频图像。Intercepting one or more frames of video images containing the moving target from the video stream.
可选的,所述对所截取的视频图像中的所述监控目标进行分类识别,得到所述监控目标的分类信息,包括:Optionally, the classification and recognition of the monitoring target in the intercepted video image to obtain the classification information of the monitoring target includes:
将所截取的视频图像输入至预先训练得到的第一神经网络模型中,利用 所述第一神经网络模型对所截取的视频图像中的所述运动目标进行分类,得到所述第一神经网络模型输出的所述运动目标的分类信息。Input the intercepted video image into a pre-trained first neural network model, and use the first neural network model to classify the moving target in the intercepted video image to obtain the first neural network model The output classification information of the moving target.
可选的,所述在采集的视频流中检测监控目标,包括:在采集的视频流中进行人脸识别,得到识别结果;Optionally, the detecting the monitoring target in the collected video stream includes: performing face recognition in the collected video stream to obtain a recognition result;
所述从所述视频流中截取包含所述监控目标的视频图像,包括:The intercepting the video image containing the monitoring target from the video stream includes:
根据所述识别结果,在所述视频流中包含人脸的视频图像中截取人脸区域,作为所截取的视频图像;According to the recognition result, intercepting a human face area in a video image containing a human face in the video stream, as the intercepted video image;
所述对所截取的视频图像中的所述监控目标进行分类识别,得到所述监控目标的分类信息,包括:The classifying and identifying the monitoring target in the intercepted video image to obtain the classification information of the monitoring target includes:
将所截取的视频图像与人脸数据库中存储的人脸数据进行匹配,得到所述人脸的分类信息。The intercepted video image is matched with the face data stored in the face database to obtain the classification information of the face.
可选的,所述将所截取的视频图像与人脸数据库中存储的人脸数据进行匹配,得到所述人脸的分类信息,包括:Optionally, the matching the captured video image with the face data stored in the face database to obtain the classification information of the face includes:
将所截取的视频图像输入至预先训练得到的第二神经网络模型中,利用所述第二神经网络模型将所截取的视频图像转化为建模数据;Inputting the intercepted video image into a second neural network model obtained by pre-training, and using the second neural network model to convert the intercepted video image into modeling data;
将所述建模数据与人脸数据库中存储的人脸数据进行匹配,得到所述人脸的分类信息,所述人脸的分类信息为第一标签信息或第二标签信息,所述第一标签信息指示所述人脸数据库中存在与所述建模数据匹配成功的人脸数据,所述第二标签信息指示所述人脸数据库中不存在与所述建模数据匹配成功的人脸数据。The modeling data is matched with the face data stored in the face database to obtain the classification information of the face. The classification information of the face is the first tag information or the second tag information, and the first The tag information indicates that there is face data that successfully matches the modeling data in the face database, and the second tag information indicates that there is no face data that successfully matches the modeling data in the face database. .
可选的,在所述对所截取的视频图像中的所述监控目标进行分类识别,得到所述监控目标的分类信息之后,还包括:Optionally, after the classification and recognition of the monitoring target in the intercepted video image to obtain the classification information of the monitoring target, the method further includes:
判断所述监控目标的分类信息是否符合预设报警条件;Determine whether the classification information of the monitoring target meets the preset alarm condition;
如果符合,输出报警信息。If it matches, output an alarm message.
可选的,在得到所述第一神经网络模型输出的所述运动目标的分类信息之后,还包括:Optionally, after obtaining the classification information of the moving target output by the first neural network model, the method further includes:
判断所述运动目标的分类信息是否符合预设报警条件;如果符合,输出报警信息;Determine whether the classification information of the moving target meets the preset alarm conditions; if it meets, output alarm information;
所述预设报警条件包括:所述运动目标的分类信息为人员,和/或,所述运动目标的分类信息为车辆。The preset alarm condition includes: the classification information of the moving target is a person, and/or the classification information of the moving target is a vehicle.
可选的,在得到所述人脸的分类信息之后,还包括:Optionally, after obtaining the classification information of the face, the method further includes:
判断所述人脸的分类信息是否符合预设报警条件;如果符合,输出报警信息;Determine whether the classification information of the human face meets the preset alarm conditions; if it meets, output alarm information;
所述预设报警条件包括:所述人脸的分类信息为所述第一标签信息,或所述人脸的分类信息为所述第二标签信息。The preset alarm condition includes: the classification information of the human face is the first tag information, or the classification information of the human face is the second tag information.
为达到上述目的,本申请实施例还提供了一种视频分析装置,包括:To achieve the foregoing objective, an embodiment of the present application also provides a video analysis device, including:
检测模块,用于在采集的视频流中检测监控目标;The detection module is used to detect the monitoring target in the collected video stream;
截取模块,用于从所述视频流中截取包含所述监控目标的视频图像;An interception module for intercepting a video image containing the monitoring target from the video stream;
分类模块,用于对所截取的视频图像中的所述监控目标进行分类识别,得到所述监控目标的分类信息。The classification module is used for classifying and identifying the monitoring target in the intercepted video image, and obtaining classification information of the monitoring target.
可选的,所述检测模块,具体用于:在采集的视频流中检测运动目标;Optionally, the detection module is specifically configured to: detect a moving target in the collected video stream;
所述截取模块,具体用于:从所述视频流中截取包含所述运动目标的一帧或多帧视频图像。The interception module is specifically configured to intercept one or more frames of video images containing the moving target from the video stream.
可选的,所述分类模块,具体用于:Optionally, the classification module is specifically used for:
将所截取的视频图像输入至预先训练得到的第一神经网络模型中,利用所述第一神经网络模型对所截取的视频图像中的所述运动目标进行分类,得到所述第一神经网络模型输出的所述运动目标的分类信息。Input the intercepted video image into a pre-trained first neural network model, and use the first neural network model to classify the moving target in the intercepted video image to obtain the first neural network model The output classification information of the moving target.
可选的,所述检测模块,具体用于:在采集的视频流中进行人脸识别,得到识别结果;Optionally, the detection module is specifically configured to: perform face recognition in the collected video stream to obtain a recognition result;
所述截取模块,具体用于:根据所述识别结果,在所述视频流中包含人脸的视频图像中截取人脸区域,作为所截取的视频图像;The interception module is specifically configured to: according to the recognition result, intercept a face area from a video image containing a face in the video stream as the intercepted video image;
所述分类模块,具体用于:将所截取的视频图像与人脸数据库中存储的人脸数据进行匹配,得到所述人脸的分类信息。The classification module is specifically configured to match the intercepted video image with the face data stored in the face database to obtain the classification information of the face.
可选的,所述分类模块,具体用于:Optionally, the classification module is specifically used for:
将所截取的视频图像输入至预先训练得到的第二神经网络模型中,利用所述第二神经网络模型将所截取的视频图像转化为建模数据;Inputting the intercepted video image into a second neural network model obtained by pre-training, and using the second neural network model to convert the intercepted video image into modeling data;
将所述建模数据与人脸数据库中存储的人脸数据进行匹配,得到所述人脸的分类信息,所述人脸的分类信息为第一标签信息或第二标签信息,所述第一标签信息指示所述人脸数据库中存在与所述建模数据匹配成功的人脸数据,所述第二标签信息指示所述人脸数据库中不存在与所述建模数据匹配成功的人脸数据。The modeling data is matched with the face data stored in the face database to obtain the classification information of the face. The classification information of the face is the first tag information or the second tag information, and the first The tag information indicates that there is face data that successfully matches the modeling data in the face database, and the second tag information indicates that there is no face data that successfully matches the modeling data in the face database. .
可选的,所述装置还包括:Optionally, the device further includes:
第一判断模块,用于判断所述监控目标的分类信息是否符合预设报警条件;如果符合,触发第一报警模块;The first judgment module is used to judge whether the classification information of the monitoring target meets the preset alarm condition; if it does, trigger the first alarm module;
第一报警模块,用于输出报警信息。The first alarm module is used to output alarm information.
可选的,所述装置还包括:Optionally, the device further includes:
第二判断模块,用于判断所述运动目标的分类信息是否符合预设报警条件;所述预设报警条件包括:所述运动目标的分类信息为人员,和/或,所述运动目标的分类信息为车辆;如果符合,触发第二报警模块;The second judgment module is used to judge whether the classification information of the sports target meets a preset alarm condition; the preset alarm conditions include: the classification information of the sports target is a person, and/or the classification of the sports target The information is the vehicle; if it matches, the second alarm module is triggered;
第二报警模块,用于输出报警信息。The second alarm module is used to output alarm information.
可选的,所述装置还包括:Optionally, the device further includes:
第三判断模块,用于判断所述人脸的分类信息是否符合预设报警条件;所述预设报警条件包括:所述人脸的分类信息为所述第一标签信息,或所述人脸的分类信息为所述第二标签信息;如果符合,触发第三报警模块;The third judgment module is used to judge whether the classification information of the face meets a preset alarm condition; the preset alarm condition includes: the classification information of the face is the first tag information, or the face The classification information of is the second tag information; if it matches, the third alarm module is triggered;
第三报警模块,用于输出报警信息。The third alarm module is used to output alarm information.
为达到上述目的,本申请实施例还提供了一种电子设备,包括处理器和存储器;To achieve the foregoing objective, an embodiment of the present application also provides an electronic device, including a processor and a memory;
所述存储器,用于存放计算机程序;The memory is used to store computer programs;
所述处理器,用于执行所述存储器上所存放的程序,实现上述任一所述的视频分析方法步骤。The processor is configured to execute a program stored in the memory to implement any of the steps of the video analysis method described above.
为达到上述目的,本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现上述任一所述的视频分析方法步骤。In order to achieve the foregoing objective, an embodiment of the present application further provides a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the steps of any of the foregoing video analysis methods are implemented.
为达到上述目的,本申请实施例还提供了一种计算机程序,所述计算机程序被处理器执行时实现上述任一所述的视频分析方法步骤。In order to achieve the foregoing objective, an embodiment of the present application further provides a computer program, which, when executed by a processor, implements the steps of any of the foregoing video analysis methods.
本申请实施例中,在采集的视频流中检测监控目标;从视频流中截取包含监控目标的视频图像;对所截取的视频图像中的监控目标进行分类识别,得到监控目标的分类信息。可见,本申请实施例提供的方案中,并不是对视频流的所有视频图像中的监控目标进行精确的分类识别,而是截取包含监控目标的视频图像,仅对截取的视频图像中的监控目标进行精确的分类识别,减少了计算量。In the embodiment of the present application, the monitoring target is detected in the collected video stream; the video image containing the monitoring target is intercepted from the video stream; the monitoring target in the intercepted video image is classified and identified to obtain the classification information of the monitoring target. It can be seen that, in the solution provided by the embodiment of the present application, the monitoring target in all the video images of the video stream is not accurately classified and identified, but the video image containing the monitoring target is intercepted, and only the monitoring target in the intercepted video image Perform accurate classification and recognition, reducing the amount of calculation.
附图说明Description of the drawings
为了更清楚地说明本申请实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application or related technologies, the following will briefly introduce the drawings that need to be used in the description of the embodiments or related technologies. Obviously, the drawings in the following description are merely present For some of the embodiments of the application, for those of ordinary skill in the art, other drawings can be obtained from these drawings without creative work.
图1为本申请实施例提供的视频分析方法的第一种流程示意图;FIG. 1 is a schematic diagram of the first flow of a video analysis method provided by an embodiment of this application;
图2为本申请实施例提供的视频分析方法的第二种流程示意图;FIG. 2 is a schematic diagram of a second flow of a video analysis method provided by an embodiment of this application;
图3为本申请实施例提供的监控点与NVR交互的一种示意图;FIG. 3 is a schematic diagram of the interaction between a monitoring point and an NVR provided by an embodiment of the application;
图4为本申请实施例提供的视频分析方法的第三种流程示意图;FIG. 4 is a schematic diagram of a third process of a video analysis method provided by an embodiment of the application;
图5为本申请实施例提供的监控点与NVR交互的另一种示意图;FIG. 5 is another schematic diagram of the interaction between the monitoring point and the NVR provided by the embodiment of the application;
图6为本申请实施例提供的视频分析装置的一种结构示意图;FIG. 6 is a schematic structural diagram of a video analysis device provided by an embodiment of the application;
图7为本申请实施例提供的电子设备的一种结构示意图;FIG. 7 is a schematic diagram of a structure of an electronic device provided by an embodiment of the application;
图8为本申请实施例提供的视频分析系统的一种结构示意图。FIG. 8 is a schematic structural diagram of a video analysis system provided by an embodiment of the application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
为了解决上述技术问题,本申请实施例提供了一种视频分析方法及装置,该方法及装置可以应用于摄像机,如IPC(IP Camera,网络摄像机),或者可以应用于NVR(Network Video Recorder,网络硬盘录像机),或者可以应用于其他电子设备,或者可以应用于视频分析系统,具体不做限定。下面首先对本申请实施例提供的视频分析方法进行详细介绍。In order to solve the above technical problems, embodiments of the present application provide a video analysis method and device. The method and device can be applied to cameras, such as IPC (IP Camera, network camera), or can be applied to NVR (Network Video Recorder, network Hard disk video recorder), or can be applied to other electronic equipment, or can be applied to video analysis system, the specific is not limited. The following first describes the video analysis method provided by the embodiment of the present application in detail.
图1为本申请实施例提供的视频分析方法的第一种流程示意图,包括:FIG. 1 is a schematic diagram of the first flow of a video analysis method provided by an embodiment of this application, including:
S101:在采集的视频流中检测监控目标。S101: Detect a monitoring target in the captured video stream.
本申请实施例中,监控目标可以为人员、车辆、物体、动物等。采集的视频流包括多帧视频图像。基于此,上述在采集的视频流中检测监控目标,具体可以为:在视频流的每帧视频图像中检测监控目标。由于视频流包括多帧视频图像,因此,可能在视频流包括的多帧视频图像中检测到监控目标。In the embodiment of the present application, the monitoring target may be a person, a vehicle, an object, an animal, etc. The captured video stream includes multiple frames of video images. Based on this, the foregoing detection of the monitoring target in the collected video stream may specifically be: detecting the monitoring target in each frame of video image of the video stream. Since the video stream includes multiple frames of video images, the monitoring target may be detected in the multiple frames of video images included in the video stream.
一种实施方式中,监控目标可以为运动目标。这种情况下,S101可以包括:在采集的视频流中检测运动目标。比如,可以采用帧差法、或者背景减除算法、或者光流法等算法,检测视频流中的运动目标。In one implementation, the monitoring target may be a moving target. In this case, S101 may include: detecting a moving target in the collected video stream. For example, the frame difference method, or the background subtraction algorithm, or the optical flow method can be used to detect the moving target in the video stream.
具体的,若监控目标为运动目标,则S101可以包括:在视频流的每帧视频图像中检测运动目标。Specifically, if the monitoring target is a moving target, S101 may include: detecting the moving target in each frame of video image of the video stream.
另一种实施方式中,监控目标可以为人脸。这种情况下,S101可以包括:在采集的视频流中进行人脸识别,得到识别结果。比如,可以利用人脸识别算法,识别视频流中的人脸,得到识别结果。In another embodiment, the monitoring target may be a human face. In this case, S101 may include: performing face recognition in the collected video stream to obtain the recognition result. For example, a face recognition algorithm can be used to recognize faces in a video stream and obtain the recognition results.
具体的,若监控目标为人脸,则S101可以包括:在视频流的每帧视频图像中进行人脸识别,得到每帧视频图像的识别结果。每帧视频图像的识别结 果可以包括第一识别结果或第二识别结果,其中,第一识别结果指示视频图像包括人脸,第二识别结果指示视频图像不包括人脸。若一帧视频图像的识别结果包括第一识别结果,则该帧视频图像的识别结果还可以包括视频图像中人脸区域的位置信息。其中,人脸区域即为该视频图像中人脸所在区域。Specifically, if the monitoring target is a human face, S101 may include: performing face recognition in each frame of video image in the video stream to obtain the recognition result of each frame of video image. The recognition result of each frame of video image may include a first recognition result or a second recognition result, where the first recognition result indicates that the video image includes a human face, and the second recognition result indicates that the video image does not include a human face. If the recognition result of a frame of video image includes the first recognition result, the recognition result of the frame of video image may also include position information of the face region in the video image. Among them, the face area is the area where the face in the video image is located.
S102:截取包含该监控目标的视频图像。S102: Intercept a video image containing the monitored target.
本申请实施例中,视频图像为视频流中的图像。基于此,上述S102具体可以为:从视频流中截取包含监控目标的视频图像。这里,若在多帧视频图像中检测到监控目标,则可能截取到多帧视频图像。In this embodiment of the application, the video image is an image in a video stream. Based on this, the foregoing S102 may specifically be: intercepting a video image containing a monitoring target from a video stream. Here, if the monitoring target is detected in multiple frames of video images, it is possible to intercept multiple frames of video images.
一种实施方式中,监控目标为运动目标。这种情况下,S102可以包括:从视频流中截取包含运动目标的一帧或多帧视频图像。In one embodiment, the monitoring target is a moving target. In this case, S102 may include: intercepting one or more frames of video images containing the moving target from the video stream.
具体的,当监控目标为运动目标时,S102可以包括:若视频流中的一帧视频图像中包含运动目标,则从视频流中截取该视频图像;若在视频流中的多帧视频图像中均包含运动目标,则从视频流中截取这多帧视频图像。Specifically, when the monitoring target is a moving target, S102 may include: if a frame of video image in the video stream contains a moving target, intercept the video image from the video stream; if it is in multiple frames of video images in the video stream If they all contain moving targets, the multiple frames of video images are intercepted from the video stream.
一个实施例中,可以检测视频流的关键帧中是否包含运动目标。若包含,则截取该关键帧前后几秒内的多帧视频图像,由这多帧视频图像组成小视频。这样,不必检测视频流的所有视频图像,进一步减少了计算。In one embodiment, it can be detected whether the key frame of the video stream contains a moving target. If it contains, intercept multiple frames of video images within a few seconds before and after the key frame, and form a small video from these multiple frames of video images. In this way, it is not necessary to detect all the video images of the video stream, which further reduces the calculation.
另一种实施方式中,监控目标为人脸。这种情况下,S102可以包括:根据上述识别结果,在视频流中包含人脸的视频图像中截取人脸区域,作为所截取的视频图像。或者,也可以根据该识别结果,在视频流中截取包含人脸区域的一帧或多帧视频图像。In another embodiment, the monitoring target is a human face. In this case, S102 may include: according to the above-mentioned recognition result, intercepting a face area from a video image containing a face in the video stream as the intercepted video image. Or, according to the recognition result, one or more frames of video images containing the face area may be intercepted in the video stream.
具体的,当监控目标为人脸时,则S102可以包括:根据上述识别结果,若确定视频流的一帧视频图像中包含人脸,则从该视频图像中截取人脸区域,作为所截取的视频图像。当监控目标为人脸时,则S102还可以包括:根据上述识别结果,若确定视频流的一帧视频图像中包含人脸,则从视频流中截取该视频图像。Specifically, when the monitoring target is a human face, S102 may include: according to the above recognition result, if it is determined that a frame of the video image of the video stream contains a human face, intercepting the human face area from the video image as the intercepted video image. When the monitoring target is a human face, S102 may further include: according to the above recognition result, if it is determined that one frame of the video image of the video stream contains a human face, intercepting the video image from the video stream.
S103:通过对所截取的视频图像进行识别,得到监控目标的分类信息。S103: Obtain classification information of the monitored target by recognizing the intercepted video image.
本申请实施例中,在从视频流中截取到视频图像后,对所截取的视频图 像中监控目标进行分类识别,得到监控目标的分类信息。In the embodiment of the present application, after the video image is intercepted from the video stream, the surveillance target in the intercepted video image is classified and identified to obtain the classification information of the surveillance target.
一种实施方式中,监控目标为运动目标。这种情况下,S103可以包括:将所截取的视频图像输入至预先训练得到的第一神经网络模型中,利用第一神经网络模型对视频图像中的运动目标进行分类,得到第一神经网络模型输出的运动目标的分类信息。In one embodiment, the monitoring target is a moving target. In this case, S103 may include: inputting the intercepted video image into the first neural network model obtained by pre-training, and using the first neural network model to classify the moving target in the video image to obtain the first neural network model Classification information of the output moving target.
当监控目标为运动目标时,若所截取的视频图像为多个,将所截取的视频图像分别输入至预先训练得到的第一神经网络模型中,利用第一神经网络模型对每个视频图像中的运动目标进行分类,得到第一神经网络模型输出的每个视频图像包含的运动目标的分类信息。When the monitoring target is a moving target, if there are multiple intercepted video images, input the intercepted video images into the pre-trained first neural network model, and use the first neural network model to analyze each video image The moving target is classified to obtain the classification information of the moving target contained in each video image output by the first neural network model.
举例来说,运动目标可以为人员、车辆等。该第一神经网络模型为一种对运动目标进行分类的模型。训练得到该第一神经网络模型的过程可以包括:获取样本图像,该样本图像中可以包括人员、或者车辆等运动目标;基于样本图像中的运动目标添加标签,该标签用于指示运动目标的种类,如车辆、人员等等;将样本图像输入至预设结构的神经网络中,以该标签为监督,对该神经网络的参数进行迭代调整;当满足迭代结束条件时,便得到了训练完成的第一神经网络模型。预设结构的神经网络可以为深度神经网络,也可以为卷积神经网络。For example, the moving target can be people, vehicles, and so on. The first neural network model is a model for classifying moving targets. The process of obtaining the first neural network model by training may include: obtaining a sample image, which may include moving targets such as people or vehicles; adding a label based on the moving target in the sample image, the label is used to indicate the type of the moving target , Such as vehicles, personnel, etc.; input the sample image into the neural network of the preset structure, and use the label as supervision to adjust the parameters of the neural network iteratively; when the iteration end condition is met, the training is completed The first neural network model. The neural network with a preset structure can be a deep neural network or a convolutional neural network.
将S102中截取的视频图像输入该第一神经网络模型,该第一神经网络模型输出视频图像中运动目标的分类信息。运动目标的分类信息为运动目标的种类,如车辆、人员等等。The video image intercepted in S102 is input to the first neural network model, and the first neural network model outputs classification information of the moving target in the video image. The classification information of the moving target is the type of the moving target, such as vehicles, people and so on.
举例来说,一些场景对安全性的要求较高,需要对这些场景进行周界防范,也就是判断是否有人员或者车辆进入场景。应用本申请实施例提供的方案,一方面,利用第一神经网络模型识别视频图像中的运动目标,得到运动目标的分类信息,若分类信息为人员或者车辆,可以及时提醒相关人员进行后续处理,实现了有效的周界防范。另一方面,先对视频流进行运动目标检测,运动目标检测算法为一种粗糙的检测算法,运动目标检测算法的运算复杂度较低,计算量较小;在检测到运动目标后,截取视频流中的小部分视频图像,仅对这小部分视频图像中的运动目标进行精确的分类识别,具体的,利用第一神经网络模型识别运动目标的分类信息,本申请实施例提供的方案 中,未对视频流的所有视频图像中的运动目标进行精确的分类识别,相比于对视频流的所有视频图像中的运动目标进行精确的分类识别,减少了计算量。For example, some scenes have high requirements for safety, and perimeter protection needs to be performed on these scenes, that is, to determine whether there are people or vehicles entering the scene. Applying the solution provided by the embodiments of the present application, on the one hand, the first neural network model is used to identify the moving target in the video image to obtain the classification information of the moving target. If the classification information is a person or a vehicle, the relevant personnel can be reminded for subsequent processing in time. Achieve effective perimeter prevention. On the other hand, first perform moving target detection on the video stream. The moving target detection algorithm is a rough detection algorithm. The calculation complexity of the moving target detection algorithm is low, and the amount of calculation is small; after the moving target is detected, the video is intercepted A small part of the video images in the stream are only accurately classified and recognized for the moving targets in this small part of the video images. Specifically, the first neural network model is used to identify the classification information of the moving targets. In the solution provided in the embodiment of this application, Accurate classification and recognition of moving targets in all video images of the video stream is not performed. Compared with the precise classification and recognition of moving targets in all video images of the video stream, the amount of calculation is reduced.
另一种实施方式中,监控目标为人脸。这种情况下,S103可以包括:将所截取的视频图像与人脸数据库中存储的人脸数据进行匹配,得到所截取的视频图像包含的人脸的分类信息。其中,人脸数据可以为人脸图像,也可以为从人脸图像中提取的特征信息。本申请实施例对此不进行限定。In another embodiment, the monitoring target is a human face. In this case, S103 may include: matching the intercepted video image with the face data stored in the face database to obtain the classification information of the face contained in the intercepted video image. Among them, the face data may be a face image or feature information extracted from a face image. The embodiment of the present application does not limit this.
当监控目标为人脸时,若所截取的视频图像为多个,将所截取的视频图像分别与人脸数据库中存储的人脸数据进行匹配,得到所截取的每一视频图像包含的人脸的分类信息。When the monitoring target is a human face, if there are multiple intercepted video images, the intercepted video images are matched with the face data stored in the face database to obtain the face data contained in each intercepted video image. Classified information.
举例来说,一些场景只允许授权人员进入,需要对这些场景进行陌生人(即非授权人员)识别。这种情况下可以采用本申请实施例提供的方案。比如,可以在人脸数据库中存储授权人员的人脸数据,将S102中所截取的视频图像与人脸数据库中存储的人脸数据进行匹配,以判断视频流中的人员是否为授权人员。视频图像包含的人脸的分类信息可以为第一标签信息或第二标签信息,第一标签信息指示人脸数据库中存在与所截取的视频图像匹配成功的人脸数据,第二标签信息指示人脸数据库中不存在与所截取的视频图像匹配成功的人脸数据。所截取的视频图像包含的人脸的分类信息也可以为授权人员或者非授权人员,此时,所截取的视频图像对应的人员为授权人员或者非授权人员。For example, some scenes only allow authorized personnel to enter, and strangers (ie, unauthorized personnel) need to be identified in these scenes. In this case, the solution provided in the embodiment of this application can be used. For example, the face data of authorized persons can be stored in the face database, and the video image intercepted in S102 can be matched with the face data stored in the face database to determine whether the person in the video stream is an authorized person. The classification information of the face contained in the video image may be the first tag information or the second tag information. The first tag information indicates that there is face data that matches the intercepted video image successfully in the face database, and the second tag information indicates the person There is no face data that matches the captured video image successfully in the face database. The human face classification information contained in the intercepted video image may also be an authorized person or an unauthorized person. In this case, the person corresponding to the intercepted video image is an authorized person or an unauthorized person.
再举一例,一些场景中需要对指定人员进行识别,比如考勤场景,或者VIP(Very Important Person,重要人物)识别场景。这些场景中也可以采用本申请实施例提供的技术方案。比如,可以在人脸数据库中存储指定人员的人脸数据,将S102中所截取的视频图像与人脸数据库中存储的人脸数据进行匹配,以判断视频流中的人员是否为指定人员。视频图像包含的人脸的分类信息可以为第一标签信息或第二标签信息,第一标签信息指示人脸数据库中存在与所截取的视频图像匹配成功的人脸数据,第二标签信息指示人脸数据库中不存在与所截取的视频图像匹配成功的人脸数据。所截取的视频图像包含的人脸的分类信息也可以为指定人员或者非指定人员,此时,所截取的视频图像对应的人员为指定人员或者非指定人员。To give another example, some scenes require identification of designated persons, such as attendance scenes, or VIP (Very Important Person) identification scenes. The technical solutions provided in the embodiments of this application can also be used in these scenarios. For example, the face data of a designated person can be stored in the face database, and the video image intercepted in S102 can be matched with the face data stored in the face database to determine whether the person in the video stream is the designated person. The classification information of the face contained in the video image may be the first tag information or the second tag information. The first tag information indicates that there is face data that matches the intercepted video image successfully in the face database, and the second tag information indicates the person There is no face data that matches the captured video image successfully in the face database. The human face classification information contained in the intercepted video image may also be a designated person or a non-designated person. In this case, the person corresponding to the intercepted video image is a designated person or a non-designated person.
当所截取的视频图像为人脸区域时,一种实施方式中,S103可以包括:将所截取的视频图像输入至预先训练得到的第二神经网络模型中,利用第二神经网络模型将所截取的视频图像转化为建模数据;将建模数据与人脸数据库中存储的人脸数据进行匹配,得到所截取的视频图像包含的人脸的分类信息。所截取的视频图像包含的人脸的分类信息为第一标签信息或第二标签信息。上述建模数据为第二神经网络模型对所截取的视频图像处理后所输出的数据。When the intercepted video image is a human face area, in one embodiment, S103 may include: inputting the intercepted video image into a second neural network model obtained by pre-training, and using the second neural network model to convert the intercepted video The image is converted into modeling data; the modeling data is matched with the face data stored in the face database to obtain the classification information of the face contained in the intercepted video image. The human face classification information contained in the intercepted video image is the first tag information or the second tag information. The aforementioned modeling data is the data output after the second neural network model processes the intercepted video image.
当所截取的视频图像为人脸区域时,若所截取的视频图像为多个,可以将所截取的视频图像分别输入至预先训练得到的第二神经网络模型中,利用第二神经网络模型将所截取的每一视频图像转化为该视频图像对应的建模数据;将所截取的每一视频图像对应的建模数据与人脸数据库中存储的人脸数据进行匹配,得到所截取的每一视频图像包含的人脸的分类信息。When the intercepted video image is a human face area, if there are multiple intercepted video images, the intercepted video images can be input into the pre-trained second neural network model, and the second neural network model can be used to intercept the Convert each video image of the video image into the modeling data corresponding to the video image; match the modeling data corresponding to each video image intercepted with the face data stored in the face database to obtain each intercepted video image Contains the classification information of the face.
第二神经网络模型可以为一种人脸建模模型,该第二神经网络模型可以将人脸图像转化为建模数据,建模数据是一种结构体数据。The second neural network model may be a face modeling model, and the second neural network model may convert a face image into modeling data, and the modeling data is a kind of structure data.
本申请实施例中,训练得到该第二神经网络模型的过程可以包括:获取样本人脸图像以及样本人脸图像中对象的标签;将样本人脸图像输入至预设结构的神经网络中,以标签为监督,对该神经网络的参数进行迭代调整;当满足迭代结束条件时,便得到了训练完成的第二神经网络模型。预设结构的神经网络可以为深度神经网络,也可以为卷积神经网络。本申请实施例中可以先指定输出建模输出的网络层。In this embodiment of the application, the process of training to obtain the second neural network model may include: obtaining a sample face image and labels of objects in the sample face image; inputting the sample face image into a neural network with a preset structure to The label is supervised, and the parameters of the neural network are adjusted iteratively; when the iteration end condition is met, the second neural network model that has been trained is obtained. The neural network with a preset structure can be a deep neural network or a convolutional neural network. In the embodiment of the present application, the network layer that outputs the modeling output may be specified first.
本申请实施例中,人脸数据库中存储的人脸数据为样本人脸图像经过第二神经网络模型转化后得到的建模数据。也就是,人脸数据库中存储的人脸数据为样本人脸图像输入第二神经网络模型后第二神经网络模型的指定网络层输出的数据。将S102中截取的视频图像转化后得到的建模数据与人脸数据库中的人脸数据进行匹配,如果匹配成功,则表示所截取的视频图像对应的人员为授权人员或指定人员,如果匹配不成功,则表示所截取的视频图像对应的人员为非授权人员或非指定人员。In the embodiment of the present application, the face data stored in the face database is modeling data obtained after the sample face image is transformed by the second neural network model. That is, the face data stored in the face database is data output by the specified network layer of the second neural network model after the sample face image is input to the second neural network model. The modeling data obtained after converting the video image intercepted in S102 is matched with the face data in the face database. If the matching is successful, it means that the person corresponding to the intercepted video image is an authorized person or a designated person. Success means that the person corresponding to the captured video image is an unauthorized person or an unspecified person.
应用本申请实施例提供的方案,一方面,利用第二神经网络模型,得到所截取的视频图像包含的人脸的分类信息,根据该分类信息,判断人员是否 为授权人员或指定人员,并根据判断结果及时提醒相关人员进行后续处理,这样能够实现有效的陌生人报警、或者指定人员识别。另一方面,先对视频流进行人脸识别,人脸识别算法为一种粗糙的检测算法,人脸识别算法的运算复杂度较低,计算量较小;在检测到运动目标后,截取视频流中的小部分视频图像或图像区域,仅对所截取的小部分视频图像或图像区域中的人脸进行精确的分类识别,确定人脸的分类信息,本申请实施例提供的方案中,未对视频流的所有视频图像中的人脸进行精确的分类识别,相比于对视频流的所有视频图像中的人脸进行精确的分类识别,减少了计算量。Applying the solution provided by the embodiments of this application, on the one hand, the second neural network model is used to obtain the classification information of the face contained in the intercepted video image. According to the classification information, it is determined whether the person is an authorized person or a designated person, and according to The judgment result promptly reminds relevant personnel for follow-up processing, which can realize effective stranger alarm or identification of designated personnel. On the other hand, face recognition is performed on the video stream first. The face recognition algorithm is a rough detection algorithm. The calculation complexity of the face recognition algorithm is low and the calculation amount is small; after the moving target is detected, the video is intercepted A small part of the video image or image area in the stream, only a small part of the intercepted video image or the face in the image area is accurately classified and recognized, and the classification information of the face is determined. In the solution provided in the embodiment of this application, no Accurate classification and recognition of human faces in all video images of the video stream reduces the amount of calculation compared with accurate classification and recognition of human faces in all video images of the video stream.
作为一种实施方式,在S103之后,还可以包括:判断监控目标的分类信息是否符合预设报警条件;如果符合,输出报警信息。如果不符合,则可以不做任何处理。As an implementation manner, after S103, it may further include: judging whether the classification information of the monitoring target meets the preset alarm condition; if so, outputting the alarm information. If it does not meet the requirements, no treatment is required.
一种实施方式中,监控目标为运动目标。这种情况下,预设报警条件可以包括:运动目标的分类信息为人员,和/或,运动目标的分类信息为车辆。In one embodiment, the monitoring target is a moving target. In this case, the preset alarm condition may include: the classification information of the moving target is a person, and/or the classification information of the moving target is a vehicle.
如上所述,如果需要进行周界防范,也就是判断是否有人员或者车辆进入场景,可以采用本申请实施例提供的方案,以预设报警条件包括运动目标的分类信息为人员和运动目标的分类信息为车辆为例,判断运动目标的分类信息是否为人员或者车辆,如果判断结果为运动目标的分类信息是人员,或者判断结果为运动目标的分类信息车辆,则输出报警信息。As mentioned above, if perimeter prevention is required, that is, to determine whether there are people or vehicles entering the scene, the solution provided in the embodiments of this application can be used, and the preset alarm conditions including the classification information of the moving target are used as the classification of the people and the moving target. The information is a vehicle as an example. It is judged whether the classification information of the moving object is a person or a vehicle. If the classification information of the moving object is a person or the judgment result is a vehicle with classification information of the moving object, an alarm information is output.
另一种实施方式中,监控目标为人脸。这种情况下,预设报警条件可以包括:人脸的分类信息为第一标签信息,或人脸的分类信息为第二标签信息。In another embodiment, the monitoring target is a human face. In this case, the preset alarm condition may include: the classification information of the human face is the first tag information, or the classification information of the human face is the second tag information.
如上所述,如果需要对陌生人进行识别,可以采用本申请实施例提供的方案,人脸数据库中存储授权人员的人脸数据,判断人脸数据库中是否存在与所截取的视频图像对应的建模数据匹配成功的人脸数据。如果存在,表示所截取的视频图像对应的人员为授权人员,所截取的视频图像包含的人脸的分类信息为授权人员,可以不做任何处理。如果不存在,表示所截取的视频图像对应的人员为陌生人,所截取的视频图像包含的人脸的分类信息为陌生人,输出报警信息。As mentioned above, if a stranger needs to be identified, the solution provided in the embodiment of this application can be used. The face data of authorized persons is stored in the face database, and it is judged whether there is a signature corresponding to the intercepted video image in the face database. The model data matches the face data successfully. If it exists, it means that the person corresponding to the intercepted video image is an authorized person, and the classification information of the face contained in the intercepted video image is an authorized person, and no processing is required. If it does not exist, it means that the person corresponding to the intercepted video image is a stranger, and the classification information of the face contained in the intercepted video image is a stranger, and an alarm message is output.
如果需要对指定人员进行识别,也可以采用本申请实施例提供的方案, 人脸数据库中存储指定人员的人脸数据,判断人脸数据库中是否存在与所截取的视频图像对应的建模数据匹配成功的人脸数据。如果存在,表示所截取的视频图像对应的人员为指定人员,所截取的视频图像包含的人脸的分类信息为指定人员,输出报警信息。如果不存在,表示所截取的视频图像对应的人员为非指定人员,所截取的视频图像包含的人脸的分类信息为非指定人员,可以不做任何处理。If it is necessary to identify a designated person, the solution provided in the embodiment of this application can also be used. The face data of the designated person is stored in the face database, and it is determined whether there is a matching modeling data corresponding to the intercepted video image in the face database Successful face data. If it exists, it means that the person corresponding to the intercepted video image is the designated person, the classification information of the face contained in the intercepted video image is the designated person, and the alarm information is output. If it does not exist, it means that the person corresponding to the intercepted video image is an unspecified person, and the classification information of the face contained in the intercepted video image is an unspecified person, and no processing is required.
一种实施方式中,S101和S102可以由IPC执行,然后IPC将截取的视频图像发送至NVR,由NVR执行后续步骤。In an implementation manner, S101 and S102 may be executed by IPC, and then IPC sends the intercepted video image to NVR, and the NVR executes the subsequent steps.
应用本申请图1所示实施例,在采集的视频流中检测监控目标;从视频流中截取包含监控目标的视频图像;对所截取的视频图像中的监控目标进行分类识别,得到监控目标的分类信息。可见,本申请实施例提供的方案中,并不是对视频流的所有视频图像中的监控目标进行精确的分类识别,而是截取包含监控目标的视频图像,仅对截取的视频图像中的监控目标进行精确的分类识别,减少了计算量。Applying the embodiment shown in Figure 1 of this application, the monitoring target is detected in the collected video stream; the video image containing the monitoring target is intercepted from the video stream; the monitoring target in the intercepted video image is classified and identified to obtain the monitoring target Classified information. It can be seen that, in the solution provided by the embodiment of the present application, the monitoring target in all the video images of the video stream is not accurately classified and identified, but the video image containing the monitoring target is intercepted, and only the monitoring target in the intercepted video image Perform accurate classification and recognition, reducing the amount of calculation.
基于图1所示实施例,本申请实施例还提供了一种视频分析方法。参考图2,图2为本申请实施例提供的视频分析方法的第二种流程示意图,包括:Based on the embodiment shown in FIG. 1, the embodiment of the present application also provides a video analysis method. Referring to FIG. 2, FIG. 2 is a schematic diagram of the second flow of the video analysis method provided by an embodiment of the application, including:
S201:在采集的视频流中检测运动目标。S201: Detect a moving target in the collected video stream.
S202:截取包含该运动目标的一帧或多帧视频图像。S202: Capture one or more frames of video images containing the moving target.
S203:将所截取的视频图像输入至预先训练得到的第一神经网络模型中,利用第一神经网络模型对所截取的视频图像中的运动目标进行分类,得到第一神经网络模型输出的运动目标的分类信息。S203: Input the intercepted video image into the pre-trained first neural network model, and use the first neural network model to classify the moving target in the intercepted video image to obtain the moving target output by the first neural network model Classification information.
S204:判断该分类信息是否符合预设报警条件;其中,预设报警条件包括:运动目标的分类信息为人员,和/或运动目标的分类信息为车辆。如果符合,执行S205。S204: Determine whether the classification information meets the preset alarm condition; where the preset alarm condition includes: the classification information of the moving target is a person, and/or the classification information of the moving target is a vehicle. If it matches, execute S205.
上述步骤S204中的分类信息即为运动目标的分类信息。The classification information in the above step S204 is the classification information of the moving target.
S205:输出报警信息。S205: Output alarm information.
若步骤S204中得到运动目标的分类信息为人员,或运动目标的分类信息 为车辆,则判定运动目标的分类信息符合预设报警条件,执行步骤S205,输出报警信息。若步骤S204中得到运动目标的分类信息既不是人员,也不是车辆,则判定运动目标的分类信息不符合预设报警条件,可以不进行任何处理。If the classification information of the moving object obtained in step S204 is a person, or the classification information of the moving object is a vehicle, it is determined that the classification information of the moving object meets the preset alarm condition, and step S205 is executed to output the alarm information. If the classification information of the moving object obtained in step S204 is neither a person nor a vehicle, it is determined that the classification information of the moving object does not meet the preset alarm condition, and no processing may be performed.
举例来说,在需要进行周界防范的场景中,可以应用本申请图2所示实施例,判断是否有人员或者车辆进入场景,并在判断结果为是的情况下进行报警。For example, in a scenario where perimeter prevention is required, the embodiment shown in FIG. 2 of the present application can be used to determine whether a person or vehicle enters the scene, and an alarm is issued if the determination result is yes.
应用本申请图2实施例,第一方面,利用第一神经网络模型识别视频图像中的运动目标,得到运动目标的分类信息,若分类信息为人员或者车辆,可以及时提醒相关人员进行后续处理,实现了有效的周界防范。另一方面,先对视频流进行运动目标检测,运动目标检测算法可以理解为一种粗糙的检测算法,运动目标检测算法的运算复杂度较低,计算量较小;在检测到运动目标后,截取视频流中的小部分视频图像,仅对这小部分视频图像中的运动目标进行精确的分类识别,具体的,利用第一神经网络模型识别运动目标的分类信息,本申请实施例提供的方案中,未对视频流的所有视频图像中的运动目标进行精确的分类识别,相比于对视频流的所有视频图像中的运动目标进行精确的分类识别,减少了计算量。Applying the embodiment of Figure 2 of this application, in the first aspect, the first neural network model is used to identify the moving target in the video image to obtain the classification information of the moving target. If the classification information is a person or a vehicle, the relevant personnel can be reminded for subsequent processing in time. Achieve effective perimeter prevention. On the other hand, first perform moving target detection on the video stream. The moving target detection algorithm can be understood as a rough detection algorithm. The calculation complexity of the moving target detection algorithm is low, and the amount of calculation is small; after the moving target is detected, Intercept a small part of the video image in the video stream, and only perform accurate classification and recognition of the moving target in this small part of the video image. Specifically, the first neural network model is used to identify the classification information of the moving target. The solution provided in the embodiment of the application , The accurate classification and recognition of moving targets in all video images of the video stream is not performed. Compared with the accurate classification and recognition of moving targets in all video images of the video stream, the amount of calculation is reduced.
一些相关方案中,使用红外探测器发射红外激光,红外激光覆盖的区域形成监控区域。当有人闯入监控区域时,红外激光的波形发生变化,因此可以基于红外激光的波形,判断是否有人闯入监控区域。但是这种基于红外激光对监控区域进行监控方案中,由于一台红外探测器发射的红外激光覆盖的区域有限,如果监控区域较大,则需要设置多台红外探测器,监控成本较高。In some related solutions, infrared detectors are used to emit infrared lasers, and the area covered by the infrared lasers forms a monitoring area. When someone breaks into the monitoring area, the waveform of the infrared laser changes, so it can be judged whether someone breaks into the monitoring area based on the waveform of the infrared laser. However, in this infrared laser-based monitoring solution for the monitoring area, since the infrared laser emitted by one infrared detector covers a limited area, if the monitoring area is large, multiple infrared detectors need to be set up, and the monitoring cost is high.
而采用本申请实施例提供的技术方案,根据图像采集设备采集的图像对监控区域进行监控,一台图像采集设备的视场角较大,采用一台图像采集设备,就可以实现对较大监控区域的监控,而一台图像采集设备的成本低于多台红外探测器的成本,降低了监控成本。Using the technical solution provided by the embodiments of this application, the monitoring area is monitored based on the images collected by the image acquisition device. The field of view of one image acquisition device is relatively large. Using one image acquisition device can realize larger monitoring. Regional monitoring, and the cost of one image acquisition device is lower than the cost of multiple infrared detectors, reducing monitoring costs.
下面结合图3介绍一种本申请实施例提供的视频分析方法应用于周界防范场景中的实施方式。图3中监控点可以为IPC。The following describes an implementation manner in which the video analysis method provided in an embodiment of the present application is applied in a perimeter defense scenario in conjunction with FIG. 3. The monitoring point in Figure 3 can be IPC.
监控点采集视频流,对视频流进行运动目标检测,根据检测结果,从视 频流中截取包含运动目标的一帧或多帧视频图像,将所截取的视频图像发送至NVR。The monitoring point collects the video stream and detects the moving target of the video stream. According to the detection result, one or more frames of video images containing the moving target are intercepted from the video stream, and the intercepted video image is sent to the NVR.
NVR接收监控点发送的视频图像,将视频图像输入至预先训练得到的第一神经网络模型中,利用第一神经网络模型对视频图像中的运动目标进行分类,得到第一神经网络模型输出的运动目标的分类信息。运动目标的分类信息可以为人员、车辆、物体等等,具体不做限定。The NVR receives the video image sent by the monitoring point, inputs the video image into the first neural network model obtained in advance, and uses the first neural network model to classify the moving targets in the video image to obtain the motion output by the first neural network model Classification information of the target. The classification information of the moving target can be persons, vehicles, objects, etc., and is not specifically limited.
假设预设报警条件为:运动目标的分类信息为人员,和/或运动目标的分类信息为车辆。如果第一神经网络模型输出的运动目标的分类信息为车辆或者人员,则NVR输出报警信息。It is assumed that the preset alarm condition is: the classification information of the moving target is a person, and/or the classification information of the moving target is a vehicle. If the classification information of the moving target output by the first neural network model is a vehicle or a person, the NVR outputs alarm information.
本申请实施例提供的技术方案中,仅在运动目标的分类信息符合预设报警条件的情况下,输出报警信息,可以减少风吹草动、宠物干扰、灯光变化造成的误报警情况,提高报警准确率。In the technical solution provided by the embodiments of the present application, only when the classification information of the moving target meets the preset alarm conditions, the alarm information is output, which can reduce false alarms caused by disturbance, pet interference, and light changes, and improve the alarm accuracy.
基于图1所示实施例,本申请实施例还提供了一种视频分析方法。参考图4,图4为本申请实施例提供的视频分析方法的第三种流程示意图,包括:Based on the embodiment shown in FIG. 1, the embodiment of the present application also provides a video analysis method. Referring to FIG. 4, FIG. 4 is a schematic diagram of a third process of a video analysis method provided by an embodiment of this application, including:
S401:在采集的视频流中进行人脸识别,得到识别结果。S401: Perform face recognition in the collected video stream to obtain a recognition result.
S402:根据该识别结果,在包含人脸的图像中截取人脸区域。S402: According to the recognition result, intercept the face region in the image containing the face.
步骤S402具体可以为:根据该识别结果,在视频流中包含人脸的视频图像中截取人脸区域,作为所截取的视频图像。这里,所截取的视频图像可以认为是人脸图像。Step S402 may specifically be: according to the recognition result, intercepting a face area in a video image containing a face in the video stream as the intercepted video image. Here, the captured video image can be considered as a face image.
S403:将所截取的人脸区域输入至预先训练得到的第二神经网络模型中,利用第二神经网络模型将该人脸区域转化为建模数据。S403: Input the intercepted face area into a second neural network model obtained through pre-training, and use the second neural network model to convert the face area into modeling data.
步骤S403具体可以为:将所截取的视频图像输入至预先训练得到的第二神经网络模型中,利用第二神经网络模型将所截取的视频图像转化为建模数据。Step S403 may specifically include: inputting the intercepted video image into a second neural network model obtained in advance, and using the second neural network model to convert the intercepted video image into modeling data.
S404:通过将该建模数据与人脸数据库中存储的人脸数据进行匹配,得到该人脸区域的分类信息。S404: Obtain classification information of the face region by matching the modeling data with the face data stored in the face database.
步骤S404具体可以为:将该建模数据与人脸数据库中存储的人脸数据进 行匹配,得到所截取的视频图像包含的人脸的分类信息。其中,所截取的视频图像包含的人脸的分类信息为第一标签信息或第二标签信息。Step S404 may specifically be: matching the modeling data with the face data stored in the face database to obtain the classification information of the face contained in the intercepted video image. Wherein, the classification information of the human face contained in the intercepted video image is the first tag information or the second tag information.
举例来说,可以预先采集授权人员或指定人员的人脸图像,利用第二神经网络模型将该人脸图像转化为建模数据,将转化得到的建模数据作为人脸数据存储至人脸数据库中。之后,利用第二神经网络模型将所截取的视频图像转化为建模数据,将该建模数据与人脸数据库中存储的人脸数据进行匹配,得到该人脸区域的分类信息。For example, the face image of an authorized person or a designated person can be collected in advance, the face image can be converted into modeling data using the second neural network model, and the converted modeling data can be stored as face data in the face database in. Then, the second neural network model is used to convert the intercepted video image into modeling data, and the modeling data is matched with the face data stored in the face database to obtain the classification information of the face area.
S405:判断该分类信息是否符合预设报警条件;其中,预设报警条件包括:所截取的视频图像包含的人脸的分类信息为第一标签信息;或所截取的视频图像包含的人脸的分类信息为第二标签信息。如果符合,执行S406。S405: Determine whether the classification information meets the preset alarm condition; where the preset alarm condition includes: the classification information of the face contained in the intercepted video image is the first tag information; or the information of the face contained in the intercepted video image The classification information is the second label information. If it matches, execute S406.
上述步骤S405中的分类信息即为所截取的视频图像包含的人脸的分类信息。The classification information in the above step S405 is the classification information of the human face contained in the intercepted video image.
S406:输出报警信息。S406: Output alarm information.
在预设报警条件包括所截取的视频图像包含的人脸的分类信息为第一标签信息的情况下,若步骤S405中得到所截取的视频图像包含的人脸的分类信息为第一标签信息,则判定所截取的视频图像包含的人脸的分类信息符合预设报警条件,执行步骤S406,输出报警信息。若步骤S405中得到所截取的视频图像包含的人脸的分类信息为第二标签信息,则判定所截取的视频图像包含的人脸的分类信息不符合预设报警条件,可以不做任何处理。In the case that the preset alarm condition includes that the classification information of the face contained in the intercepted video image is the first tag information, if the classification information of the face contained in the intercepted video image obtained in step S405 is the first tag information, It is determined that the classification information of the human face contained in the intercepted video image meets the preset alarm condition, and step S406 is executed to output the alarm information. If the classification information of the face contained in the intercepted video image obtained in step S405 is the second tag information, it is determined that the classification information of the face contained in the intercepted video image does not meet the preset alarm condition, and no processing is required.
在预设报警条件包括所截取的视频图像包含的人脸的分类信息为第二标签信息的情况下,若步骤S405中得到所截取的视频图像包含的人脸的分类信息为第二标签信息,则判定所截取的视频图像包含的人脸的分类信息符合预设报警条件,执行步骤S406,输出报警信息。若步骤S405中得到所截取的视频图像包含的人脸的分类信息为第一标签信息,则判定所截取的视频图像包含的人脸的分类信息不符合预设报警条件,可以不做任何处理。In the case where the preset alarm condition includes that the classification information of the face contained in the intercepted video image is the second tag information, if the classification information of the face contained in the intercepted video image obtained in step S405 is the second tag information, It is determined that the classification information of the human face contained in the intercepted video image meets the preset alarm condition, and step S406 is executed to output the alarm information. If the classification information of the face contained in the intercepted video image obtained in step S405 is the first tag information, it is determined that the classification information of the face contained in the intercepted video image does not meet the preset alarm condition, and no processing is required.
举例来说,在需要对陌生人进行识别,可以应用本申请图4实施例,在人脸数据库中存储授权人员的人脸数据,并将截取的视频图像转化后得到的建模数据与人脸数据库中存储的人脸数据进行匹配,以判断视频流中的人员是 否为授权人员。如果判定视频流中的人员为陌生人,则进行报警。For example, when a stranger needs to be identified, the embodiment in Figure 4 of this application can be applied to store the face data of authorized persons in a face database, and the modeled data obtained after the intercepted video image is converted to the face The face data stored in the database is matched to determine whether the person in the video stream is an authorized person. If it is determined that the person in the video stream is a stranger, an alarm is issued.
再举一例,如果需要对指定人员进行识别,可以应用本申请图4实施例,在人脸数据库中存储指定人员的人脸数据,并将截取的视频图像转化后得到的建模数据与人脸数据库中存储的人脸数据进行匹配,以判断视频流中的人员是否为指定人员。如果判定视频流中的人员为指定人员,则进行报警。To give another example, if you need to identify a designated person, you can apply the embodiment in Figure 4 of this application to store the face data of the designated person in a face database, and convert the intercepted video image into the modeling data and the face The face data stored in the database is matched to determine whether the person in the video stream is a designated person. If it is determined that the person in the video stream is a designated person, an alarm is issued.
应用本申请图4所示实施例,一方面,利用第二神经网络模型,得到所截取的视频图像包含的人脸的分类信息,根据该分类信息,判断人员是否为授权人员或指定人员,并根据判断结果及时提醒相关人员进行后续处理,这样能够实现有效的陌生人报警、或者指定人员识别。另一方面,先对视频流进行人脸识别,人脸识别算法为一种粗糙的检测算法,人脸识别算法的运算复杂度较低,计算量较小;在检测到运动目标后截取视频流中的小部分视频图像或图像区域,仅对所截取的小部分视频图像或图像区域中的人脸进行精确的分类识别,也就是进行人脸匹配,本方案未对视频流的所有视频图像中的人脸进行精确的分类识别,相比于对视频流的所有视频图像中的人脸进行精确的分类识别,减少了计算量。Applying the embodiment shown in Figure 4 of this application, on the one hand, the second neural network model is used to obtain the classification information of the face contained in the intercepted video image, and according to the classification information, it is determined whether the person is an authorized person or a designated person, and According to the judgment result, the relevant personnel are promptly reminded for follow-up processing, which can realize effective stranger alarm or identification of designated personnel. On the other hand, face recognition is performed on the video stream first. The face recognition algorithm is a rough detection algorithm. The computational complexity of the face recognition algorithm is low, and the amount of calculation is small; the video stream is intercepted after the moving target is detected Only a small part of the video image or image area in the video image or the image area is accurately classified and recognized, that is, face matching is performed. This solution does not perform accurate classification and recognition on all the video images in the video stream. Accurate classification and recognition of faces in the video stream reduces the amount of calculation compared to accurate classification and recognition of faces in all video images of the video stream.
下面结合图5介绍一种本申请实施例提供的视频分析方法应用于陌生人报警场景中的实施方式。图5中监控点可以为IPC。The following describes an implementation manner in which the video analysis method provided by an embodiment of the present application is applied to a stranger alarm scenario with reference to FIG. 5. The monitoring point in Figure 5 can be IPC.
监控点采集视频流,对视频流进行人脸识别,根据识别结果截取视频流中包含人脸的一帧或多帧视频图像,或者截取视频流中包含人脸的视频图像中的人脸区域;将所截取的视频图像或者人脸区域发送至NVR。为了方便描述,将所截取的视频图像或者人脸区域统称为人脸图像。The monitoring point collects the video stream, performs face recognition on the video stream, and intercepts one or more frames of video images containing human faces in the video stream according to the recognition results, or intercepts the face area in the video image containing human faces in the video stream; Send the captured video image or face area to the NVR. For the convenience of description, the intercepted video images or face regions are collectively referred to as face images.
NVR接收监控点发送的人脸图像,将该人脸图像输入至预先训练得到的第二神经网络模型中,利用第二神经网络模型将该人脸图像转化为建模数据;将转化得到的建模数据与人脸数据库中存储的人脸数据进行匹配;如果匹配成功,则表示该人脸图像对应的人员为授权人员,人脸图像包含的人脸的分类信息为授权人员。如果匹配不成功,则表示该人脸图像对应的人员为陌生人,人脸图像包含的人脸的分类信息为陌生人,输出报警信息。The NVR receives the face image sent by the monitoring point, inputs the face image into the second neural network model obtained in advance, and uses the second neural network model to convert the face image into modeling data; the converted model The model data is matched with the face data stored in the face database; if the matching is successful, it means that the person corresponding to the face image is an authorized person, and the classification information of the face contained in the face image is an authorized person. If the matching is unsuccessful, it means that the person corresponding to the face image is a stranger, and the classification information of the face contained in the face image is a stranger, and an alarm message is output.
与上述方法实施例相对应,本申请实施例还提供一种视频分析装置,如 图6所示,包括:Corresponding to the foregoing method embodiment, an embodiment of the present application also provides a video analysis device, as shown in FIG. 6, including:
检测模块601,用于在采集的视频流中检测监控目标;The detection module 601 is used to detect a monitoring target in the collected video stream;
截取模块602,用于从视频流中截取包含监控目标的视频图像;The interception module 602 is used to intercept the video image containing the monitoring target from the video stream;
分类模块603,用于对所截取的视频图像中的监控目标进行分类识别,得到监控目标的分类信息。The classification module 603 is used to classify and recognize the monitoring target in the intercepted video image to obtain the classification information of the monitoring target.
作为一种实施方式,检测模块601具体用于:在采集的视频流中检测运动目标;As an implementation manner, the detection module 601 is specifically configured to: detect a moving target in the collected video stream;
截取模块602具体用于:从视频流中截取包含运动目标的一帧或多帧视频图像。The interception module 602 is specifically configured to intercept one or more frames of video images containing the moving target from the video stream.
作为一种实施方式,分类模块603具体用于:As an implementation manner, the classification module 603 is specifically configured to:
将所截取的视频图像输入至预先训练得到的第一神经网络模型中,利用第一神经网络模型对所截取的视频图像中的运动目标进行分类,得到第一神经网络模型输出的运动目标的分类信息。Input the intercepted video image into the pre-trained first neural network model, and use the first neural network model to classify the moving target in the intercepted video image to obtain the classification of the moving target output by the first neural network model information.
作为一种实施方式,检测模块601具体用于:在采集的视频流中进行人脸识别,得到识别结果;As an implementation manner, the detection module 601 is specifically configured to: perform face recognition in the collected video stream to obtain a recognition result;
截取模块602具体用于:根据识别结果,在视频流中包含人脸的视频图像中截取人脸区域,作为所截取的视频图像;The interception module 602 is specifically used for: intercepting the face area in the video image containing the face in the video stream according to the recognition result as the intercepted video image;
分类模块603具体用于:将所截取的视频图像与人脸数据库中存储的人脸数据进行匹配,得到人脸的分类信息。The classification module 603 is specifically configured to match the captured video image with the face data stored in the face database to obtain the classification information of the face.
作为一种实施方式,分类模块603具体用于:As an implementation manner, the classification module 603 is specifically configured to:
将所截取的视频图像输入至预先训练得到的第二神经网络模型中,利用第二神经网络模型将所截取的视频图像转化为建模数据;Input the intercepted video image into the second neural network model obtained in advance, and use the second neural network model to convert the intercepted video image into modeling data;
将建模数据与人脸数据库中存储的人脸数据进行匹配,得到人脸的分类信息,人脸的分分类信息为第一标签信息或第二标签信息,第一标签信息指示人脸数据库中存在与建模数据匹配成功的人脸数据,第二标签信息指示人脸数据库中不存在与建模数据匹配成功的人脸数据。The modeling data is matched with the face data stored in the face database to obtain the classification information of the face. The classification information of the face is the first tag information or the second tag information, and the first tag information indicates the face database There is face data that successfully matches the modeling data, and the second tag information indicates that there is no face data that successfully matches the modeling data in the face database.
作为一种实施方式,上述视频分析装置还可以包括:第一判断模块和第一报警模块(图中未示出),其中,As an implementation manner, the above-mentioned video analysis device may further include: a first judgment module and a first alarm module (not shown in the figure), wherein:
第一判断模块,用于判断监控目标的分类信息是否符合预设报警条件;如果符合,触发第一报警模块;The first judgment module is used to judge whether the classification information of the monitoring target meets the preset alarm condition; if it meets, the first alarm module is triggered;
第一报警模块,用于输出报警信息。The first alarm module is used to output alarm information.
作为一种实施方式,上述视频分析装置还可以包括:第二判断模块和第二报警模块(图中未示出),其中,As an implementation manner, the above-mentioned video analysis device may further include: a second judgment module and a second alarm module (not shown in the figure), wherein:
第二判断模块,用于判断运动目标的分类信息是否符合预设报警条件;预设报警条件包括:运动目标的分类信息为人员,和/或运动目标的分类信息为车辆;如果符合,触发第二报警模块;The second judgment module is used to judge whether the classification information of the moving target meets the preset alarm conditions; the preset alarm conditions include: the classification information of the moving target is a person, and/or the classification information of the moving target is a vehicle; if it matches, trigger the first Two alarm modules;
第二报警模块,用于输出报警信息。The second alarm module is used to output alarm information.
作为一种实施方式,上述视频分析装置还可以包括:第三判断模块和第三报警模块(图中未示出),其中,As an implementation manner, the above-mentioned video analysis device may further include: a third judgment module and a third alarm module (not shown in the figure), wherein:
第三判断模块,用于判断人脸的分类信息是否符合预设报警条件;预设报警条件包括:人脸的分类信息为第一标签信息,或人脸的分类信息为第二标签信息;如果符合,触发第三报警模块;The third judgment module is used to judge whether the classification information of the face meets the preset alarm condition; the preset alarm condition includes: the classification information of the face is the first tag information, or the classification information of the face is the second tag information; if Yes, trigger the third alarm module;
第三报警模块,用于输出报警信息。The third alarm module is used to output alarm information.
本申请实施例中,在采集的视频流中检测监控目标;从视频流中截取包含监控目标的视频图像;对所截取的视频图像中的监控目标进行分类识别,得到监控目标的分类信息。可见,本申请实施例提供的方案中,并不是对视频流的所有视频图像中的监控目标进行精确的分类识别,而是截取包含监控目标的视频图像,仅对截取的视频图像中的监控目标进行精确的分类识别,减少了计算量。In the embodiment of the present application, the monitoring target is detected in the collected video stream; the video image containing the monitoring target is intercepted from the video stream; the monitoring target in the intercepted video image is classified and identified to obtain the classification information of the monitoring target. It can be seen that, in the solution provided by the embodiment of the present application, the monitoring target in all the video images of the video stream is not accurately classified and identified, but the video image containing the monitoring target is intercepted, and only the monitoring target in the intercepted video image Perform accurate classification and recognition, reducing the amount of calculation.
本申请实施例还提供了一种电子设备,如图7所示,包括处理器701和存储器702,An embodiment of the present application also provides an electronic device, as shown in FIG. 7, including a processor 701 and a memory 702,
存储器702,用于存放计算机程序;The memory 702 is used to store computer programs;
处理器701,用于执行存储器702上所存放的程序时,实现上述任一种视频分析方法。The processor 701 is configured to implement any of the above-mentioned video analysis methods when executing a program stored in the memory 702.
上述电子设备提到的存储器可以包括随机存取存储器(Random Access Memory,RAM),也可以包括非易失性存储器(Non-Volatile Memory,NVM),例如至少一个磁盘存储器。作为一种实施方式,存储器还可以是至少一个位于远离前述处理器的存储装置。The memory mentioned in the above electronic device may include random access memory (Random Access Memory, RAM), and may also include non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk storage. As an implementation manner, the memory may also be at least one storage device located far away from the foregoing processor.
上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The aforementioned processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (Network Processor, NP), etc.; it may also be a digital signal processor (Digital Signal Processing, DSP), a dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质内存储有计算机程序,计算机程序被处理器执行时实现上述任一种视频分析方法。The embodiments of the present application also provide a computer-readable storage medium, and a computer program is stored in the computer-readable storage medium. When the computer program is executed by a processor, any one of the foregoing video analysis methods is implemented.
本申请实施例还提供一种计算机程序,计算机程序被处理器执行时实现上述任一种视频分析方法。The embodiments of the present application also provide a computer program, which implements any of the above-mentioned video analysis methods when the computer program is executed by a processor.
本申请实施例还提供一种视频分析系统,如图8所示,包括:监控点和处理设备,其中,An embodiment of the present application also provides a video analysis system, as shown in FIG. 8, including: a monitoring point and processing equipment, where:
监控点,用于在采集的视频流中检测监控目标;从视频流中截取包含监控目标的视频图像;将所截取的视频图像发送至处理设备;The monitoring point is used to detect the monitoring target in the collected video stream; intercept the video image containing the monitoring target from the video stream; send the intercepted video image to the processing device;
处理设备,用于接收视频图像,对所接收的视频图像中监控目标进行识别,得到监控目标的分类信息。The processing device is used to receive the video image, identify the monitoring target in the received video image, and obtain the classification information of the monitoring target.
举例来说,该监控点可以为IPC,该处理设备可以为NVR,具体不做限定。For example, the monitoring point may be an IPC, and the processing device may be an NVR, which is not specifically limited.
本方案中,并不是对视频流的所有视频图像中监控目标进行精确的识别,而是截取包含监控目标的视频图像,仅对截取的视频图像进行精确的识别,减少了计算量。In this solution, the monitoring target is not accurately identified in all video images of the video stream, but the video image containing the monitoring target is intercepted, and only the intercepted video image is accurately identified, which reduces the amount of calculation.
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply one of these entities or operations. There is any such actual relationship or order between. Moreover, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article, or device that includes a series of elements includes not only those elements, but also includes Other elements of, or also include elements inherent to this process, method, article or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article or equipment that includes the element.
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于视频分析装置实施例、电子设备实施例、计算机可读存储介质实施例、计算机程序实施例、视频分析系统实施例而言,由于其基本相似于视频分析方法实施例,所以描述的比较简单,相关之处参见视频分析方法实施例的部分说明即可。Each embodiment in this specification is described in a related manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the difference from other embodiments. In particular, for the video analysis device embodiment, the electronic device embodiment, the computer readable storage medium embodiment, the computer program embodiment, and the video analysis system embodiment, since they are basically similar to the video analysis method embodiment, the comparison described Simple, please refer to the part of the description of the embodiment of the video analysis method for relevant details.
以上所述仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和原则之内所作的任何修改、等同替换、改进等,均包含在本申请的保护范围内。The above are only preferred embodiments of the present application, and are not used to limit the protection scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of this application are all included in the protection scope of this application.

Claims (19)

  1. 一种视频分析方法,其特征在于,包括:A video analysis method, characterized in that it includes:
    在采集的视频流中检测监控目标;Detect monitoring targets in the captured video stream;
    从所述视频流中截取包含所述监控目标的视频图像;Intercepting a video image containing the monitoring target from the video stream;
    对所截取的视频图像中的所述监控目标进行分类识别,得到所述监控目标的分类信息。Perform classification and recognition on the monitoring target in the intercepted video image to obtain classification information of the monitoring target.
  2. 根据权利要求1所述的方法,其特征在于,The method according to claim 1, wherein:
    所述在采集的视频流中检测监控目标,包括:The detecting the monitoring target in the collected video stream includes:
    在采集的视频流中检测运动目标;Detect moving targets in the collected video stream;
    所述从所述视频流中截取包含所述监控目标的视频图像,包括:The intercepting the video image containing the monitoring target from the video stream includes:
    从所述视频流中截取包含所述运动目标的一帧或多帧视频图像。Intercepting one or more frames of video images containing the moving target from the video stream.
  3. 根据权利要求2所述的方法,其特征在于,所述对所截取的视频图像中的所述监控目标进行分类识别,得到所述监控目标的分类信息,包括:The method according to claim 2, wherein the classifying and identifying the monitoring target in the intercepted video image to obtain the classification information of the monitoring target comprises:
    将所截取的视频图像输入至预先训练得到的第一神经网络模型中,利用所述第一神经网络模型对所截取的视频图像中的所述运动目标进行分类,得到所述第一神经网络模型输出的所述运动目标的分类信息。Input the intercepted video image into a pre-trained first neural network model, and use the first neural network model to classify the moving target in the intercepted video image to obtain the first neural network model The output classification information of the moving target.
  4. 根据权利要求1所述的方法,其特征在于,The method according to claim 1, wherein:
    所述在采集的视频流中检测监控目标,包括:The detecting the monitoring target in the collected video stream includes:
    在采集的视频流中进行人脸识别,得到识别结果;Perform face recognition in the collected video stream to obtain the recognition result;
    所述从所述视频流中截取包含所述监控目标的视频图像,包括:The intercepting the video image containing the monitoring target from the video stream includes:
    根据所述识别结果,在所述视频流中包含人脸的视频图像中截取人脸区域,作为所截取的视频图像;According to the recognition result, intercepting a human face area in a video image containing a human face in the video stream, as the intercepted video image;
    所述对所截取的视频图像中的所述监控目标进行分类识别,得到所述监控目标的分类信息,包括:The classifying and identifying the monitoring target in the intercepted video image to obtain the classification information of the monitoring target includes:
    将所截取的视频图像与人脸数据库中存储的人脸数据进行匹配,得到所述人脸的分类信息。The intercepted video image is matched with the face data stored in the face database to obtain the classification information of the face.
  5. 根据权利要求4所述的方法,其特征在于,所述将所截取的视频图像与人脸数据库中存储的人脸数据进行匹配,得到所述人脸的分类信息,包括:The method according to claim 4, wherein the matching the captured video image with the face data stored in the face database to obtain the classification information of the face comprises:
    将所截取的视频图像输入至预先训练得到的第二神经网络模型中,利用所述第二神经网络模型将所截取的视频图像转化为建模数据;Inputting the intercepted video image into a second neural network model obtained by pre-training, and using the second neural network model to convert the intercepted video image into modeling data;
    将所述建模数据与人脸数据库中存储的人脸数据进行匹配,得到所述人脸的分类信息,所述人脸的分类信息为第一标签信息或第二标签信息,所述第一标签信息指示所述人脸数据库中存在与所述建模数据匹配成功的人脸数据,所述第二标签信息指示所述人脸数据库中不存在与所述建模数据匹配成功的人脸数据。The modeling data is matched with the face data stored in the face database to obtain the classification information of the face. The classification information of the face is the first tag information or the second tag information, and the first The tag information indicates that there is face data that successfully matches the modeling data in the face database, and the second tag information indicates that there is no face data that successfully matches the modeling data in the face database. .
  6. 根据权利要求1所述的方法,其特征在于,在对所截取的视频图像中的所述监控目标进行分类识别,得到所述监控目标的分类信息之后,还包括:The method according to claim 1, characterized in that, after classifying and identifying the monitoring target in the intercepted video image, and obtaining classification information of the monitoring target, the method further comprises:
    判断所述监控目标的分类信息是否符合预设报警条件;Determine whether the classification information of the monitoring target meets the preset alarm condition;
    如果符合,输出报警信息。If it matches, output an alarm message.
  7. 根据权利要求3所述的方法,其特征在于,在得到所述第一神经网络模型输出的所述运动目标的分类信息之后,还包括:The method according to claim 3, wherein after obtaining the classification information of the moving target output by the first neural network model, the method further comprises:
    判断所述运动目标的分类信息是否符合预设报警条件;如果符合,输出报警信息;Determine whether the classification information of the moving target meets the preset alarm conditions; if it meets, output alarm information;
    所述预设报警条件包括:所述运动目标的分类信息为人员,和/或,所述运动目标的分类信息为车辆。The preset alarm condition includes: the classification information of the moving target is a person, and/or the classification information of the moving target is a vehicle.
  8. 根据权利要求5所述的方法,其特征在于,在得到所述人脸的分类信息之后,还包括:The method according to claim 5, characterized in that, after obtaining the classification information of the face, the method further comprises:
    判断所述人脸的分类信息是否符合预设报警条件;如果符合,输出报警信息;Determine whether the classification information of the human face meets the preset alarm conditions; if it meets, output alarm information;
    所述预设报警条件包括:所述人脸的分类信息为所述第一标签信息,或 所述人脸的分类信息为所述第二标签信息。The preset alarm condition includes: the classification information of the human face is the first tag information, or the classification information of the human face is the second tag information.
  9. 一种视频分析装置,其特征在于,包括:A video analysis device, characterized by comprising:
    检测模块,用于在采集的视频流中检测监控目标;The detection module is used to detect the monitoring target in the collected video stream;
    截取模块,用于从所述视频流中截取包含所述监控目标的视频图像;An interception module for intercepting a video image containing the monitoring target from the video stream;
    分类模块,用于对所截取的视频图像中的所述监控目标进行分类识别,得到所述监控目标的分类信息。The classification module is used for classifying and identifying the monitoring target in the intercepted video image, and obtaining classification information of the monitoring target.
  10. 根据权利要求9所述的装置,其特征在于,所述检测模块,具体用于:在采集的视频流中检测运动目标;The device according to claim 9, wherein the detection module is specifically configured to: detect a moving target in the collected video stream;
    所述截取模块,具体用于:从所述视频流中截取包含所述运动目标的一帧或多帧视频图像。The interception module is specifically configured to intercept one or more frames of video images containing the moving target from the video stream.
  11. 根据权利要求10所述的装置,其特征在于,所述分类模块,具体用于:The device according to claim 10, wherein the classification module is specifically configured to:
    将所截取的视频图像输入至预先训练得到的第一神经网络模型中,利用所述第一神经网络模型对所截取的视频图像中的所述运动目标进行分类,得到所述第一神经网络模型输出的所述运动目标的分类信息。Input the intercepted video image into a pre-trained first neural network model, and use the first neural network model to classify the moving target in the intercepted video image to obtain the first neural network model The output classification information of the moving target.
  12. 根据权利要求9所述的装置,其特征在于,所述检测模块,具体用于:在采集的视频流中进行人脸识别,得到识别结果;The device according to claim 9, wherein the detection module is specifically configured to: perform face recognition in the collected video stream to obtain a recognition result;
    所述截取模块,具体用于:根据所述识别结果,在所述视频流中包含人脸的视频图像中截取人脸区域,作为所截取的视频图像;The interception module is specifically configured to: according to the recognition result, intercept a face area from a video image containing a face in the video stream as the intercepted video image;
    所述分类模块,具体用于:将所截取的视频图像与人脸数据库中存储的人脸数据进行匹配,得到所述人脸的分类信息。The classification module is specifically configured to match the intercepted video image with the face data stored in the face database to obtain the classification information of the face.
  13. 根据权利要求12所述的装置,其特征在于,所述分类模块,具体用于:The device according to claim 12, wherein the classification module is specifically configured to:
    将所截取的视频图像输入至预先训练得到的第二神经网络模型中,利用所述第二神经网络模型将所截取的视频图像转化为建模数据;Inputting the intercepted video image into a second neural network model obtained by pre-training, and using the second neural network model to convert the intercepted video image into modeling data;
    将所述建模数据与人脸数据库中存储的人脸数据进行匹配,得到所述人 脸的分类信息,所述人脸的分类信息为第一标签信息或第二标签信息,所述第一标签信息指示所述人脸数据库中存在与所述建模数据匹配成功的人脸数据,所述第二标签信息指示所述人脸数据库中不存在与所述建模数据匹配成功的人脸数据。The modeling data is matched with the face data stored in the face database to obtain the classification information of the face. The classification information of the face is the first tag information or the second tag information, and the first The tag information indicates that there is face data that successfully matches the modeling data in the face database, and the second tag information indicates that there is no face data that successfully matches the modeling data in the face database. .
  14. 根据权利要求9所述的装置,其特征在于,所述装置还包括:The device according to claim 9, wherein the device further comprises:
    第一判断模块,用于判断所述监控目标的分类信息是否符合预设报警条件;如果符合,触发第一报警模块;The first judgment module is used to judge whether the classification information of the monitoring target meets the preset alarm condition; if it does, trigger the first alarm module;
    第一报警模块,用于输出报警信息。The first alarm module is used to output alarm information.
  15. 根据权利要求11所述的装置,其特征在于,所述装置还包括:The device according to claim 11, wherein the device further comprises:
    第二判断模块,用于判断所述运动目标的分类信息是否符合预设报警条件;所述预设报警条件包括:所述运动目标的分类信息为人员,和/或,所述运动目标的分类信息为车辆;如果符合,触发第二报警模块;The second judgment module is used to judge whether the classification information of the sports target meets a preset alarm condition; the preset alarm conditions include: the classification information of the sports target is a person, and/or the classification of the sports target The information is the vehicle; if it matches, the second alarm module is triggered;
    第二报警模块,用于输出报警信息。The second alarm module is used to output alarm information.
  16. 根据权利要求13所述的装置,其特征在于,所述装置还包括:The device according to claim 13, wherein the device further comprises:
    第三判断模块,用于判断所述人脸的分类信息是否符合预设报警条件;所述预设报警条件包括:所述人脸的分类信息为所述第一标签信息,或所述人脸的分类信息为所述第二标签信息;如果符合,触发第三报警模块;The third judgment module is used to judge whether the classification information of the face meets a preset alarm condition; the preset alarm condition includes: the classification information of the face is the first tag information, or the face The classification information of is the second tag information; if it matches, the third alarm module is triggered;
    第三报警模块,用于输出报警信息。The third alarm module is used to output alarm information.
  17. 一种电子设备,其特征在于,包括处理器和存储器;An electronic device characterized by comprising a processor and a memory;
    所述存储器,用于存放计算机程序;The memory is used to store computer programs;
    所述处理器,用于执行所述存储器上所存放的程序,实现权利要求1-8任一所述的方法步骤。The processor is configured to execute the program stored in the memory to implement the method steps of any one of claims 1-8.
  18. 一种计算机可读存储介质,其特征在于,存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-8任一所述的方法步骤。A computer-readable storage medium, characterized in that it stores a computer program, which, when executed by a processor, implements the method steps of any one of claims 1-8.
  19. 一种计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1-8任一所述的方法步骤。A computer program, characterized in that, when the computer program is executed by a processor, the method steps of any one of claims 1-8 are realized.
PCT/CN2020/074895 2019-02-19 2020-02-12 Video analysis method and apparatus WO2020168960A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910121021.1A CN111582006A (en) 2019-02-19 2019-02-19 Video analysis method and device
CN201910121021.1 2019-02-19

Publications (1)

Publication Number Publication Date
WO2020168960A1 true WO2020168960A1 (en) 2020-08-27

Family

ID=72112900

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/074895 WO2020168960A1 (en) 2019-02-19 2020-02-12 Video analysis method and apparatus

Country Status (2)

Country Link
CN (1) CN111582006A (en)
WO (1) WO2020168960A1 (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112148896A (en) * 2020-09-10 2020-12-29 京东数字科技控股股份有限公司 Data processing method and device for terminal media monitoring and broadcasting
CN112329517A (en) * 2020-09-17 2021-02-05 中国南方电网有限责任公司超高压输电公司南宁监控中心 Transformer substation disconnecting link confirmation video image analysis method and system
CN112464030A (en) * 2020-11-25 2021-03-09 浙江大华技术股份有限公司 Suspicious person determination method and device
CN112653874A (en) * 2020-12-01 2021-04-13 杭州勋誉科技有限公司 Storage device and intelligent video monitoring system
CN112818757A (en) * 2021-01-13 2021-05-18 上海应用技术大学 Gas station safety detection early warning method and system
CN112989934A (en) * 2021-02-05 2021-06-18 方战领 Video analysis method, device and system
CN113112754A (en) * 2021-03-02 2021-07-13 深圳市哈威飞行科技有限公司 Drowning alarm method, drowning alarm device, drowning alarm platform, drowning alarm system and computer readable storage medium
CN113139679A (en) * 2021-04-06 2021-07-20 青岛以萨数据技术有限公司 Urban road rescue early warning method, system and equipment based on neural network
CN113177459A (en) * 2021-04-25 2021-07-27 云赛智联股份有限公司 Intelligent video analysis method and system for intelligent airport service
CN113824926A (en) * 2021-08-17 2021-12-21 衢州光明电力投资集团有限公司赋腾科技分公司 Portable video analysis device and method
CN113888827A (en) * 2021-10-14 2022-01-04 深圳市巨龙创视科技有限公司 Camera control method and system
CN114630104A (en) * 2020-12-10 2022-06-14 北京市博汇科技股份有限公司 Artificial intelligence multi-model alarm processing method and device for broadcast television
CN114639061A (en) * 2022-04-02 2022-06-17 山东博昂信息科技有限公司 Vehicle detection method, system and storage medium
CN114821844A (en) * 2021-01-28 2022-07-29 深圳云天励飞技术股份有限公司 Attendance checking method and device based on face recognition, electronic equipment and storage medium
CN114821957A (en) * 2022-05-13 2022-07-29 湖南工商大学 AI video analysis system and method
CN114821934A (en) * 2021-12-31 2022-07-29 北京无线电计量测试研究所 Garden perimeter security control system and method
CN115278361A (en) * 2022-07-20 2022-11-01 重庆长安汽车股份有限公司 Driving video data extraction method, system, medium and electronic equipment

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101154B (en) * 2020-09-02 2023-12-15 腾讯科技(深圳)有限公司 Video classification method, apparatus, computer device and storage medium
CN112183353B (en) * 2020-09-28 2022-09-20 腾讯科技(深圳)有限公司 Image data processing method and device and related equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070269082A1 (en) * 2004-08-31 2007-11-22 Matsushita Electric Industrial Co., Ltd. Surveillance Recorder and Its Method
CN101854516A (en) * 2009-04-02 2010-10-06 北京中星微电子有限公司 Video monitoring system, video monitoring server and video monitoring method
CN103268680A (en) * 2013-05-29 2013-08-28 北京航空航天大学 Intelligent monitoring and anti-theft system for family
CN106372576A (en) * 2016-08-23 2017-02-01 南京邮电大学 Deep learning-based intelligent indoor intrusion detection method and system
CN109002744A (en) * 2017-06-06 2018-12-14 中兴通讯股份有限公司 Image-recognizing method, device and video monitoring equipment

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102184388A (en) * 2011-05-16 2011-09-14 苏州两江科技有限公司 Face and vehicle adaptive rapid detection system and detection method
CN106446754A (en) * 2015-08-11 2017-02-22 阿里巴巴集团控股有限公司 Image identification method, metric learning method, image source identification method and devices
CN206164722U (en) * 2016-09-21 2017-05-10 深圳市泛海三江科技发展有限公司 Discuss super electronic monitoring system based on face identification
CN106845385A (en) * 2017-01-17 2017-06-13 腾讯科技(上海)有限公司 The method and apparatus of video frequency object tracking
CN108122246A (en) * 2017-12-07 2018-06-05 中国石油大学(华东) Video monitoring intelligent identifying system
CN108596140A (en) * 2018-05-08 2018-09-28 青岛海信移动通信技术股份有限公司 A kind of mobile terminal face identification method and system
CN109241349B (en) * 2018-08-14 2022-03-25 中国电子科技集团公司第三十八研究所 Monitoring video multi-target classification retrieval method and system based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070269082A1 (en) * 2004-08-31 2007-11-22 Matsushita Electric Industrial Co., Ltd. Surveillance Recorder and Its Method
CN101854516A (en) * 2009-04-02 2010-10-06 北京中星微电子有限公司 Video monitoring system, video monitoring server and video monitoring method
CN103268680A (en) * 2013-05-29 2013-08-28 北京航空航天大学 Intelligent monitoring and anti-theft system for family
CN106372576A (en) * 2016-08-23 2017-02-01 南京邮电大学 Deep learning-based intelligent indoor intrusion detection method and system
CN109002744A (en) * 2017-06-06 2018-12-14 中兴通讯股份有限公司 Image-recognizing method, device and video monitoring equipment

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112148896A (en) * 2020-09-10 2020-12-29 京东数字科技控股股份有限公司 Data processing method and device for terminal media monitoring and broadcasting
CN112329517A (en) * 2020-09-17 2021-02-05 中国南方电网有限责任公司超高压输电公司南宁监控中心 Transformer substation disconnecting link confirmation video image analysis method and system
CN112329517B (en) * 2020-09-17 2022-11-29 中国南方电网有限责任公司超高压输电公司南宁监控中心 Transformer substation disconnecting link confirmation video image analysis method and system
CN112464030A (en) * 2020-11-25 2021-03-09 浙江大华技术股份有限公司 Suspicious person determination method and device
CN112464030B (en) * 2020-11-25 2024-05-14 浙江大华技术股份有限公司 Suspicious person determination method and suspicious person determination device
CN112653874A (en) * 2020-12-01 2021-04-13 杭州勋誉科技有限公司 Storage device and intelligent video monitoring system
CN114630104A (en) * 2020-12-10 2022-06-14 北京市博汇科技股份有限公司 Artificial intelligence multi-model alarm processing method and device for broadcast television
CN112818757A (en) * 2021-01-13 2021-05-18 上海应用技术大学 Gas station safety detection early warning method and system
CN114821844B (en) * 2021-01-28 2024-05-07 深圳云天励飞技术股份有限公司 Attendance checking method and device based on face recognition, electronic equipment and storage medium
CN114821844A (en) * 2021-01-28 2022-07-29 深圳云天励飞技术股份有限公司 Attendance checking method and device based on face recognition, electronic equipment and storage medium
CN112989934B (en) * 2021-02-05 2024-05-24 方战领 Video analysis method, device and system
CN112989934A (en) * 2021-02-05 2021-06-18 方战领 Video analysis method, device and system
CN113112754A (en) * 2021-03-02 2021-07-13 深圳市哈威飞行科技有限公司 Drowning alarm method, drowning alarm device, drowning alarm platform, drowning alarm system and computer readable storage medium
CN113139679A (en) * 2021-04-06 2021-07-20 青岛以萨数据技术有限公司 Urban road rescue early warning method, system and equipment based on neural network
CN113177459A (en) * 2021-04-25 2021-07-27 云赛智联股份有限公司 Intelligent video analysis method and system for intelligent airport service
CN113824926A (en) * 2021-08-17 2021-12-21 衢州光明电力投资集团有限公司赋腾科技分公司 Portable video analysis device and method
CN113888827A (en) * 2021-10-14 2022-01-04 深圳市巨龙创视科技有限公司 Camera control method and system
CN114821934A (en) * 2021-12-31 2022-07-29 北京无线电计量测试研究所 Garden perimeter security control system and method
CN114639061A (en) * 2022-04-02 2022-06-17 山东博昂信息科技有限公司 Vehicle detection method, system and storage medium
CN114821957A (en) * 2022-05-13 2022-07-29 湖南工商大学 AI video analysis system and method
CN115278361A (en) * 2022-07-20 2022-11-01 重庆长安汽车股份有限公司 Driving video data extraction method, system, medium and electronic equipment

Also Published As

Publication number Publication date
CN111582006A (en) 2020-08-25

Similar Documents

Publication Publication Date Title
WO2020168960A1 (en) Video analysis method and apparatus
TWI749113B (en) Methods, systems and computer program products for generating alerts in a video surveillance system
CN110674761B (en) Regional behavior early warning method and system
US20210364356A1 (en) System and method for using artificial intelligence to enable elevated temperature detection of persons using commodity-based thermal cameras
CN111814510A (en) Detection method and device for remnant body
CN110717357B (en) Early warning method and device, electronic equipment and storage medium
WO2020167155A1 (en) Method and system for detecting troubling events during interaction with a self-service device
CN112132048A (en) Community patrol analysis method and system based on computer vision
Moorthy et al. CNN based smart surveillance system: a smart IoT application post covid-19 era
US20220335724A1 (en) Processing apparatus, processing method, and non-transitory storage medium
KR102142315B1 (en) ATM security system based on image analyses and the method thereof
Dekkati et al. AI and Machine Learning for Remote Suspicious Action Detection and Recognition
Varghese et al. Video anomaly detection in confined areas
Khodadin et al. An intelligent camera surveillance system with effective notification features
WO2023124451A1 (en) Alarm event generating method and apparatus, device, and storage medium
El Gemayel et al. Automated face detection and control system using computer vision based video analytics to avoid the spreading of Covid-19
US11676439B2 (en) Face authentication system and face authentication method
Dirgantara et al. Design of Face Recognition Security System on Public Spaces
Saxena et al. Robust Home Alone Security System Using PIR Sensor and Face Recognition
Nandhini et al. IoT Based Smart Home Security System with Face Recognition and Weapon Detection Using Computer Vision
CN113095110B (en) Method, device, medium and electronic equipment for dynamically warehousing face data
Chua et al. Hierarchical audio-visual surveillance for passenger elevators
KR102332699B1 (en) Event processing system for detecting changes in spatial environment conditions using image model-based AI algorithms
KR20180001705A (en) A visitor detection method with face tracking using Haar-Like-Feature in the M2M environment
US20230058106A1 (en) Multi-camera system to perform movement pattern anomaly detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20759192

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20759192

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20759192

Country of ref document: EP

Kind code of ref document: A1