WO2019076187A1 - 视频遮蔽区域选取方法、装置、电子设备及系统 - Google Patents

视频遮蔽区域选取方法、装置、电子设备及系统 Download PDF

Info

Publication number
WO2019076187A1
WO2019076187A1 PCT/CN2018/108222 CN2018108222W WO2019076187A1 WO 2019076187 A1 WO2019076187 A1 WO 2019076187A1 CN 2018108222 W CN2018108222 W CN 2018108222W WO 2019076187 A1 WO2019076187 A1 WO 2019076187A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
specified
video
specified target
preset
Prior art date
Application number
PCT/CN2018/108222
Other languages
English (en)
French (fr)
Inventor
陆海先
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Priority to EP18868676.0A priority Critical patent/EP3700180A4/en
Priority to US16/756,094 priority patent/US11321945B2/en
Publication of WO2019076187A1 publication Critical patent/WO2019076187A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/18Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
    • G08B13/189Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
    • G08B13/194Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
    • G08B13/196Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
    • G08B13/19678User interface
    • G08B13/19686Interfaces masking personal details for privacy, e.g. blurring faces, vehicle license plates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking

Definitions

  • the present application relates to the field of computer vision technology, and in particular, to a video masking area selection method, apparatus, electronic device and system.
  • Video desensitization can be divided into manual and automatic methods. The accuracy of manual video desensitization is high, but the masking area needs to be manually and frame-by-frame, which has a large workload and low efficiency. Faced with massive amounts of video in the network, it has become impossible to deal with them one by one by artificial desensitization.
  • the related automatic video desensitization method when the video masking area is selected, after detecting the moving target in one image frame, using the target tracking manner in the subsequent image frame, predicting the range of the moving target, and using the range as the masking area,
  • the selection method of such a video masking area there may be an inaccurate selection of the masked area due to inaccurate motion trajectory prediction.
  • the purpose of the embodiments of the present application is to provide a video masking area selection method, device electronic device, and system, so as to improve the accuracy of the masking area selection.
  • the specific technical solutions are as follows:
  • the embodiment of the present application provides a video masking area selection method, where the method includes:
  • the specified target in any specified target set is a sensitive target
  • the specified target set is used as the occlusion area of the video to be detected.
  • the specified target in any specified target set is a sensitive target
  • the specified target set is used as the occlusion area of the to-be-detected video
  • the method further includes:
  • determining, by using a preset target detection algorithm, each specified target set in the to-be-detected video including:
  • the regions corresponding to the specified target are associated in a time sequence to obtain a specified target trajectory
  • a specified target trajectory of each specified target is used as each of the specified target sets in the to-be-detected video.
  • the detecting by using the preset target detection algorithm, respectively, detecting an area corresponding to all the specified targets in each frame of the video frame of the to-be-detected video, including:
  • the region corresponding to the specified target is determined by a bounding box regression algorithm based on all of the pixel regions matching the same specified target.
  • the specified target is specified for each of the specified targets, and the regions corresponding to the same specified target are associated in a time sequence to obtain the specified target trajectory, including:
  • the regional feature set belonging to the same specified target is respectively determined by a preset multi-target tracking algorithm
  • the regions corresponding to each region feature set are respectively associated to obtain the specified target track.
  • the determining, by using a preset identification algorithm, whether the specified target in each of the specified target sets is a sensitive target includes:
  • the target recognition result does not meet the preset determination rule, it is determined that the specified target corresponding to the target recognition result is not a sensitive target.
  • the identifying by using the identifying algorithm, respectively, identifying a specified target in the specified target video frame in each frame, and obtaining a target recognition result, including:
  • the quantitative relationship between the sensitive feature and the target feature is used as the target recognition result.
  • the embodiment of the present application provides a video masking area selecting apparatus, where the apparatus includes:
  • the video acquisition module to be detected is used to acquire a video to be detected.
  • a specified set determining module configured to determine, by using a preset target detection algorithm, each specified target set in the to-be-detected video, where any specified target set is a specified target in each video frame of the to-be-detected video Pixel collection;
  • a sensitive target determining module configured to determine, by using a preset identification algorithm, whether the specified target in each of the specified target sets is a sensitive target
  • the occlusion area selection module is configured to use the specified target set as the occlusion area of the to-be-detected video when the specified target in any specified target set is a sensitive target.
  • the video masking area selection device further includes:
  • a masking module configured to mask the masked area in each video frame of the to-be-detected video.
  • the specified set determining module includes:
  • a detection target sub-module configured to detect, by using a preset target detection algorithm, an area corresponding to all specified targets in each frame of the video frame of the to-be-detected video;
  • a target association sub-module configured to specify a target for each of the specified targets, and associate the regions corresponding to the specified target in a time sequence to obtain a specified target trajectory;
  • a target set sub-module configured to use a specified target trajectory of each specified target as each specified target set in the to-be-detected video.
  • the detection target sub-module includes:
  • a region dividing unit configured to divide each frame of the video frame in the to-be-detected video into a preset number of regions to obtain a plurality of pixel regions
  • a first feature acquiring unit configured to respectively extract features of each of the pixel regions by using a pre-trained convolutional neural network
  • a target matching unit configured to determine, according to a feature of each of the pixel regions, whether each pixel region matches any of the specified targets by using a preset classifier
  • a region determining unit configured to determine, according to a bounding box regression algorithm, a corresponding target corresponding to all the pixel regions that match the same specified target when there is a pixel region matching any of the specified targets region.
  • the target association submodule includes:
  • a second feature acquiring unit configured to extract features of regions corresponding to all specified targets in each frame of the video frame of the to-be-detected video, to obtain a region feature
  • a set determining unit configured to determine, according to all the regional features, a set of regional features belonging to the same specified target by using a preset multi-target tracking algorithm
  • the target trajectory determining unit is configured to respectively associate regions corresponding to each region feature set according to a time sequence, to obtain each of the specified target trajectories.
  • the sensitive target determining module includes:
  • a video frame selection sub-module configured to select, according to a preset video frame extraction method, a specified target video frame of a preset number of frames in a video frame of the specified target set for each of the specified target sets;
  • a first determining sub-module configured to respectively identify a specified target in the specified target video frame of each frame by using a preset identification algorithm, to obtain a target recognition result
  • a second determining sub-module configured to: when the target recognition result meets a preset determination rule, determine that the specified target corresponding to the target recognition result is a sensitive target; or the target recognition result does not meet a preset determination rule At the time, it is determined that the specified target corresponding to the target recognition result is not a sensitive target.
  • the first determining submodule includes:
  • a third feature acquiring unit configured to extract a feature of the specified target in the specified target video frame per frame, to obtain a target feature
  • a sensitive feature identifying unit configured to identify a sensitive feature in the target feature by using a preset target classification algorithm or a recognition technology
  • the recognition result determining unit is configured to use the quantity relationship between the sensitive feature and the target feature as the target recognition result.
  • an embodiment of the present application provides an electronic device, including a processor and a memory;
  • the memory is configured to store a computer program
  • the processor when executing the program stored on the memory, implements the following steps:
  • the specified target in any specified target set is a sensitive target
  • the specified target set is used as the occlusion area of the video to be detected.
  • the processor is further configured to:
  • the determining, by using a preset target detection algorithm, each specified target set in the to-be-detected video includes:
  • the regions corresponding to the specified target are associated in a time sequence to obtain a specified target trajectory
  • a specified target trajectory of each specified target is used as each specified target set in the to-be-detected video.
  • the preset target detection algorithm respectively detects an area corresponding to all the specified targets in each frame of the video frame of the to-be-detected video, including:
  • the region corresponding to the specified target is determined by the bounding box regression algorithm according to all the pixel regions matching the same specified target.
  • the target corresponding to the specified target is associated with the specified target trajectory in a time sequence, and the specified target trajectory is obtained, including:
  • the regional feature set belonging to the same specified target is respectively determined by a preset multi-target tracking algorithm
  • the regions corresponding to each region feature set are respectively associated to obtain each of the specified target trajectories.
  • the determining, by using a preset identification algorithm, whether the specified target in each of the specified target sets is a sensitive target includes:
  • the target recognition result does not meet the preset determination rule, it is determined that the specified target corresponding to the target recognition result is not a sensitive target.
  • the identifying by using the identifying algorithm, respectively, identifying a specified target in the specified target video frame in each frame, and obtaining a target recognition result, including:
  • the quantitative relationship between the sensitive feature and the target feature is used as the target recognition result.
  • the embodiment of the present application provides a computer readable storage medium, where the computer readable storage medium stores a computer program, and when the computer program is executed by a processor, implementing any one of the first aspects. Method steps.
  • the embodiment of the present application provides a video masking area selection system, where the system includes a video collection device and a video processor;
  • the video capture device is configured to collect a video to be detected
  • the video processor is configured to perform the method steps of any of the above first aspects.
  • the video masking area selection method, device, electronic device and system acquire the video to be detected; and determine the specified in the to-be-detected video by using a preset target detection algorithm. a target set; determining, by a preset recognition algorithm, whether the specified target in each of the specified target sets is a sensitive target; and when the specified target in any specified target set is a sensitive target, the specified target set is used as the The masked area of the detected video is described.
  • the accuracy of the masked area selection can be improved, and the sensitivity of the specified target can be determined again by the recognition algorithm, thereby reducing the sensitive target.
  • the mis-extraction improves the accuracy of the masked area selection.
  • FIG. 1 is a schematic flowchart of a video masking area selection method according to an embodiment of the present application
  • FIG. 2 is another schematic flowchart of a video masking area selection method according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a video masking area selecting apparatus according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an electronic device according to an embodiment of the present application.
  • Video desensitization is a technology for video content information hiding. Video desensitization processes sensitive or private information in video content, such as faces, such as blur or mosaic, to achieve the purpose of masking sensitive and private information.
  • the target tracking mode is used in the subsequent image frames to predict the range of the target to be occluded, and the range is taken as
  • the occlusion area has the following disadvantages: 1. The method is based on the moving target detection method. When the target to be detected is relatively stationary or the relative motion speed is small, the determined occlusion area may be incomplete. 2.
  • the detected target information is not further classified, and there may be false shadowing.
  • the masked area in the video that needs to be shielded is a human body wearing a police uniform, but it is shielded. All the human body, etc.
  • an embodiment of the present application provides a video masking area selection method, where the method includes:
  • the video masking area selection method in the embodiment of the present application can be implemented by a desensitization system.
  • the desensitization system is any system capable of performing the video masking area selection method of the embodiment of the present application. E.g:
  • the desensitization system can be an electronic device including: a processor, a memory, a communication interface, and a bus; the processor, the memory, and the communication interface are connected and communicate with each other through a bus; the memory stores executable program code; the processor reads by The executable program code stored in the memory is used to execute a program corresponding to the executable program code for performing a video masking area selection method.
  • the desensitization system can also be an application for performing a video masking area selection method at runtime.
  • the desensitization system can also be a storage medium for storing executable code for executing a video masking area selection method.
  • the desensitization system can obtain the video to be detected from the storage device, and can also obtain the video to be detected in real time through the image acquisition device.
  • the desensitization system has a built-in video capture device or a peripheral video capture device; the desensitization system instantly captures the video to be detected through a built-in video capture device or a peripheral video capture device.
  • the desensitization system acquires the to-be-detected video through the video capture device, and increases the instantness of acquiring the to-be-detected video, facilitating the user to process the instant video, and the user experience is good.
  • S102 Determine, by using a preset target detection algorithm, each specified target set in the to-be-detected video, where any specified target set is a set of pixels of the specified target in each video frame of the to-be-detected video.
  • the desensitization system uses a preset target detection algorithm to detect all the specified targets in each video frame of the video to be detected, and associates all the regions corresponding to the same specified target (the regions in the video frame are composed of pixels). Get each specified target set.
  • the preset target detection algorithm is any algorithm capable of identifying a specified target, such as Boosting, RCNN (Regions Convolutional Neural Network), FRCNN (Fast Region Convolutional Neural Network), FasterRCNN (Faster Region Convolutional Neural Network), SSD (Single Shot) MultiBox Detector) and so on.
  • the specified target is a preset target that needs to be obscured, for example, the license plate of the car in the video, the ID card appearing in the video, and the bank card.
  • S103 Determine, by using a preset recognition algorithm, whether the specified target in each specified target set is a sensitive target.
  • the preset recognition algorithm is any target classification or recognition technology capable of identifying sensitive targets, such as SIFT (Scale-Invariant Feature Transform), Dense SIFT, Color SIFT, HOG (Histogram of Oriented Gradient). Figure) and so on.
  • Sensitive targets are preset targets that need to be obscured, such as police car license plates, minor ID cards, or bank cards with designated card numbers. Further classifying or identifying the extracted specified targets to prevent mis-extraction of non-sensitive targets that are close to sensitive targets can reduce mis-extraction of sensitive targets.
  • the desensitization system sets the specified target set corresponding to the sensitive target, that is, the specified target set that specifies the target as the sensitive target, as the occlusion area of the video to be detected.
  • the target detection algorithm detects the specified target in each video frame, and the pixel of the specified target is used as the specified target set, and the motion target trajectory does not need to be predicted, so that it is more accurate to determine that the background is relatively stationary or The area corresponding to the specified target that is stationary.
  • the target is classified or identified by the recognition algorithm, and the sensitivity of the specified target is determined again, which can reduce the mis-extraction of the sensitive target, and can improve the accuracy of the selection of the masked area.
  • the automatic extraction of the occlusion area is realized, and a large amount of work due to manual calibration of the occlusion area is avoided, and the efficiency of the occlusion area extraction is high.
  • the method further includes:
  • the desensitization system adopts a method such as adding mosaic or adding gray to mask the pixels corresponding to the masked area in each frame of the video frame.
  • pixels in the occlusion area in each video frame of the video to be detected are masked, and desensitization of the video to be detected is achieved.
  • determining, by using a preset target detection algorithm, each specified target set in the to-be-detected video including:
  • step 1 the preset target detection algorithm detects the regions corresponding to all the specified targets in each frame of the video frame of the to-be-detected video.
  • the desensitization system uses a preset target detection algorithm to detect each frame of the video frame in the detected video, and determines all the specified targets in each frame of the video frame and the corresponding regions of each specified target.
  • Step 2 For each specified target in all the specified targets, the regions corresponding to the specified target are associated in a time sequence to obtain a specified target trajectory.
  • the desensitization system After the desensitization system detects the area corresponding to each specified target in each frame of the video frame, it also needs to identify which areas corresponding to the specified targets in different video frames belong to the same specified target.
  • desensitization systems utilize preset target tracking algorithms such as MeanShift-based target tracking algorithms, TLD (Tracking-Learning-Detection), IVT (Incremental Visual Tracking) algorithms, or MIL (Multi). -instance learning, algorithm, etc., to determine regions belonging to the same specified target; or desensitization systems use preset target classification or recognition algorithms to determine regions belonging to the same specified target.
  • the desensitization system associates the regions corresponding to the same specified target in each frame of the video frame according to the time sequence, and obtains the specified target trajectory with the same number of the specified target.
  • step 3 the specified target trajectory of each specified target is used as each specified target set in the video to be detected.
  • the desensitization system takes the specified target trajectory of the specified target as the specified target set of the specified target. If it is a stationary target, its trajectory can be a collection of positional information that does not change. For example, for a specified target, if the specified target is stationary, the position of the specified target recognized by each frame of the video frame is unchanged, and therefore, the pixel of the position in each video frame can be designated as the designated target. Target collection.
  • the target detection algorithm determines the region corresponding to the specified target, and associates the regions corresponding to the same specified target according to the time sequence to obtain the specified target set, which can be compared with the moving target detection or the sensitive region segmentation method. It is more accurate to determine the area corresponding to the specified target that is relatively stationary or nearly stationary with the background, and the obtained target set is more accurate.
  • the preset target detection algorithm separately detects an area corresponding to all the specified targets in each frame of the video frame to be detected, including:
  • Step 1 Split each frame of the video frame in the video to be detected into a preset number of regions to obtain a plurality of pixel regions.
  • the desensitization system divides each frame of video into a predetermined number of regions that can overlap.
  • the preset number is set according to the required recognition accuracy and the resolution of the video frame, and the preset number is positively correlated with the required recognition accuracy and the resolution of the video frame.
  • the preset number can be 100, 500, 1000 or more.
  • step two the features of each pixel region are extracted separately by a pre-trained convolutional neural network.
  • a pre-trained convolutional neural network can be established under supervised learning.
  • a neural network for extracting an area corresponding to the specified target is established, and a plurality of sets of areas corresponding to the specified target are input for supervised learning to determine an identification feature of the area corresponding to the specified target.
  • SVM Serial Vector Machine
  • an image feature including a region corresponding to a specified target is used as an eigenvalue, and an input vector is determined according to the eigenvalue and a rate of change of the eigenvalue, and a linear kernel is used respectively.
  • the kernel function) and the RBF (Radial Basis Function) training algorithm are trained to select a function with a better test set to complete the pre-trained convolutional neural network.
  • step 3 according to the characteristics of each pixel area, it is determined by the preset classifier whether each pixel area matches any specified target.
  • Step 4 When there is a pixel region matching any of the specified targets, the region corresponding to the specified target is determined by the bounding box regression algorithm according to all the pixel regions matching the same specified target.
  • the desensitization system combines all the pixel regions in the same video frame that match the same specified target for the pixel region in each frame of the video frame when there is a pixel region matching the specified target, and determines the initially corresponding region.
  • the desensitization system uses the boundingbox regression algorithm to correct the initial corresponding region and finally obtain the region corresponding to the specified target.
  • an area corresponding to a specified target is given.
  • an area corresponding to each specified target may also be determined by an algorithm such as Boosting, FRCNN, FasterRCNN, or SSD.
  • the regions corresponding to the same specified target are associated in a time sequence, and the specified target trajectory is obtained, including:
  • Step 1 Extract features of regions corresponding to all specified targets in each frame of the video frame of the to-be-detected video to obtain region features.
  • the region feature is any feature that identifies the specified target.
  • the desensitization system extracts the regional features of the specified target through a preset multi-target tracking algorithm.
  • step 2 according to all the regional features, the regional feature set belonging to the same specified target is respectively determined by a preset multi-target tracking algorithm.
  • the multi-target tracking algorithm may be a TLD (Tracking-Learning-Detection), an ITV (Incremental visual tracking) algorithm, or a MIL (Multi-instance Learning) algorithm.
  • the desensitization system utilizes a multi-target tracking algorithm to determine the characteristics of each region belonging to the same specified target, and uses all regional features of the same specified target as a set of regional features.
  • Step 3 Associate the regions corresponding to each region feature set according to the time sequence, and obtain the specified target track.
  • the specified target trajectory is determined by the multi-target tracking algorithm, and the specified target trajectory can be determined more quickly.
  • the regions corresponding to the specified target are associated in a time sequence, and the specified target trajectory is obtained, including:
  • Step 1 Extract a color histogram of the RGB (Red Green Blue) color mode of each region corresponding to the specified target.
  • Step 2 Calculate the Euclidean distance of the color histogram of the RGB color mode of each region corresponding to each specified target between two adjacent video frames in time series.
  • Step 3 Associate the regions corresponding to the specified targets whose Euclidean distance is less than the preset threshold according to the time sequence, to obtain the specified target trajectory.
  • the two specified targets are the same specified target, and the regions corresponding to the two specified targets are associated.
  • the first video frame and the second video frame are temporally adjacent video frames, the first video frame includes a specified target 1 and a specified target 2, and the second video frame includes a specified target 3 and a specified target 4.
  • the calculated Euclidean distance of the color histogram of the RGB color mode between the specified target 1 and the specified target 3 is 0.02, and the Euclidean distance of the color histogram of the RGB color mode between the specified target 1 and the specified target 4 is 0.58, specifying The Euclidean distance of the color histogram of the RGB color mode between the target 2 and the specified target 3 is 0.67, and the Euclidean distance of the color histogram of the RGB color mode between the target 2 and the specified target 4 is 0.09, and the preset threshold is In the case of 0.1, the area corresponding to the specified target 3 is associated with the area corresponding to the specified target 3, and the area corresponding to the designated target 4 is associated with the area corresponding to the designated target 2.
  • the preset threshold is set according to the actual image resolution, and the root image resolution is positively correlated.
  • the preset threshold may be 0.3, 0.2, 0.1, or 0.05, or the like.
  • the specified target trajectory is determined by the Euclidean distance, the calculation method is simple, and the calculation cost is saved.
  • determining, by using a preset identification algorithm, whether the specified target in each specified target set is a sensitive target including:
  • Step 1 For each specified target set, according to a preset video frame extraction method, a specified target video frame of a preset number of frames is selected in a video frame of the specified target set.
  • the preset video frame extraction method is any method for extracting video frames, for example, extracting one frame of video frames every 15 frames, and extracting a total of 5 frames of video frames; for example, specifying a total of 3 frames of video frames to be extracted in any specified target set. A total of 9 frames of video frames are included, and the 3rd, 6th, and 9th frames are respectively extracted.
  • the specified target in the specified target video frame is identified by using a preset recognition algorithm, and the target recognition result is obtained.
  • the desensitization system uses a preset recognition algorithm, including target classification or recognition technology, to identify each specified target video frame to obtain a target recognition result. For example, for any specified target set, a total of 5 frames of the specified target video frame are extracted, and a predetermined recognition algorithm is used to identify that the specified target in the specified frame of the four frames is a sensitive target, and one frame is specified in the target video frame.
  • the specified target is not a sensitive target, and the corresponding target recognition result may be 4 frames as a sensitive target, 1 frame is not a sensitive target; or the specified target corresponding to the specified target is a sensitive target with an probability of 80% or the like.
  • Step 3 when the target recognition result meets the preset determination rule, determine that the specified target corresponding to the target recognition result is a sensitive target; or when the target recognition result does not meet the preset determination rule, determine that the specified target corresponding to the target recognition result is not Sensitive target.
  • the preset determination rule may be set according to actual requirements.
  • the preset determination rule may be: when the probability that the specified target corresponding to the specified target set is a sensitive target is not less than the preset similarity threshold, the target recognition result is determined to be corresponding.
  • the specified target is a sensitive target.
  • the preset similarity threshold can be set according to actual conditions, for example, set to 80%, 90% or 95%.
  • the preset similarity threshold is 80%
  • the target recognition result is that the probability that the specified target is a sensitive target is 80%
  • the specified target corresponding to the target recognition result is determined to be a sensitive target
  • the target recognition result is the specified target
  • the probability of the sensitive target is 60%
  • the specified target corresponding to the target recognition result is not a sensitive target.
  • the same specified target is determined multiple times, and according to the comprehensive result of the multiple determinations, whether the specified target is a sensitive target is determined, the contingency in the determination process is reduced, and the result of the determination is more accurate. Further classifying or identifying the extracted specified targets to prevent mis-extraction of non-sensitive targets that are close to sensitive targets can reduce mis-extraction of sensitive targets.
  • the identification target is used to identify the specified target in each target video frame of each frame, and the target recognition result is obtained, including:
  • step one extracting the feature of the specified target in the specified target video frame per frame, and obtaining the target feature.
  • step two the sensitive feature is identified in the target feature by a preset target classification algorithm or recognition technology.
  • the desensitization system normalizes the image of the target video frame, and calculates the gradient of the target video frame after the image normalization process to obtain a gradient histogram.
  • the gradient histogram is divided into a preset number of Cell regions, and the gradient histogram is divided into a preset number of Block regions.
  • each Cell region may be Set to 6 ⁇ 6 pixels, each block area can be set to 3 ⁇ 3 Cell areas.
  • a projection of the preset weight is performed for each Cell region, and a histogram vector is obtained. Contrast normalization is performed for the overlapping Cell regions in each Block region to obtain a histogram vector.
  • the histogram vectors in all Block regions are combined into one HOG feature vector as the target feature.
  • the preset similarity threshold is higher than the preset similarity threshold (the similarity threshold may be set according to the resolution of the video frame, and is positively correlated with the resolution) For example, it can be set to 70%, 80%, or 90%, etc., and the target feature is determined to be a sensitive feature.
  • step 3 the quantitative relationship between the sensitive feature and the target feature is used as the target recognition result.
  • the target recognition result is used to represent the quantitative relationship between the sensitive feature and the target feature.
  • the target recognition result is: the number of sensitive features is 8, the number of target features is 10; or the target recognition result is: 8/10.
  • a specific method for determining the target recognition result is given, which reduces the chance of the determination process, and the result of the determination is more accurate.
  • FIG. 2 is another schematic flowchart of a video masking area selection method according to an embodiment of the present disclosure, including:
  • the video collection module 201 is configured to collect a video to be detected.
  • the video capture module 201 is composed of a video capture device that can record video, such as a video capture device.
  • the video capture device in the video capture module 201 collects scenes in the live environment, generates video frames of the scene in the live environment, and transmits the collected video frames. Give other modules.
  • the target extraction module 202 is configured to extract a specified target in the video frame by using a preset target detection algorithm, and associate the same specified target in a time sequence to obtain a specified target set.
  • the target extraction module 202 includes a detection target sub-module and a target tracking association sub-module in the video frame.
  • the target extraction module 202 acquires the video frame sent by the video capture module 201.
  • the detection target sub-module in the video frame is used to capture the video capture module 201 by using a preset target detection technology, such as Boosting, RCNN, FRCNN, Faster RCNN or SSD.
  • Boosting Boosting
  • RCNN RCNN
  • FRCNN Faster RCNN
  • SSD a preset target detection technology
  • Each captured video frame is subjected to target detection, a specified target is determined, and a target frame of the specified target is detected, and an area in the target frame of any specified target is used as an area corresponding to the specified target.
  • the target tracking association sub-module is used to track each specified target in the video frame by a multi-target tracking algorithm, such as TLD, ITV or MIL algorithm, and associate the same specified target in time series to form a specified target set, and Each specified target collection is bound to an ID.
  • a multi-target tracking algorithm such as TLD, ITV or MIL algorithm
  • the sensitivity determining module 203 is configured to determine whether the specified target in each specified target set is a sensitive target by using a preset target classification or recognition technology.
  • the sensitivity determination module 203 performs a sensitivity analysis on the specified target corresponding to the ID for each newly generated ID.
  • Use target classification or identification techniques such as SIFT, Dense SIFT, Color SIFT, HOG, etc., to determine whether the specified target is a sensitive target, such as a specific person or police car.
  • the specified target is judged again, and the voting target method is used to determine the sensitive target for the multiple judgments. For example, 9 frames are specified for the specified target, and a video frame is selected every 3 frames to determine the target. Whether the specified target in the video frame is a sensitive target is determined three times in total. If the specified target is a sensitive target twice or more, the specified target corresponding to the ID is finally determined as a sensitive target.
  • the method of determining sensitive targets multiple times can increase the robustness of determining sensitive targets and prevent false positives.
  • the area masking module 204 is configured to mask the specified target set when the specified target in any specified target set is a sensitive target.
  • the area masking module 204 uses a method such as mosaic or gray to set the specified target in each frame of the video frame of the specified target set corresponding to the ID.
  • the corresponding area (pixel) is masked for the purpose of video desensitization.
  • the target extraction module 202 determines the specified target by the target detection algorithm, and can more accurately determine the corresponding target that is relatively stationary or close to the background compared to the method of moving target detection or sensitive region segmentation. Area.
  • the sensitivity determining module 203 determines whether the specified target is a sensitive target, and determines the extracted specified target, thereby improving the accuracy of the masked area selection, thereby improving the accuracy of the video desensitization.
  • the automatic extraction of the occlusion area is realized, and a large amount of work due to manual calibration of the occlusion area is avoided, and the efficiency of the occlusion area extraction is high. Further classifying or identifying the extracted specified targets to prevent mis-extraction of non-sensitive targets that are close to sensitive targets can reduce mis-extraction of sensitive targets.
  • an embodiment of the present application provides a video masking area selection apparatus, where the apparatus includes:
  • the to-be-detected video acquisition module 301 is configured to acquire a video to be detected.
  • the specified set determining module 302 is configured to determine, by using a preset target detection algorithm, each specified target set in the to-be-detected video, where any specified target set is a pixel set of a specified target in each video frame of the to-be-detected video. .
  • the sensitive target determining module 303 is configured to determine, by using a preset identification algorithm, whether the specified target in each specified target set is a sensitive target.
  • the occlusion area selection module 304 is configured to use the specified target set as the occlusion area of the video to be detected when the specified target in any specified target set is a sensitive target.
  • the specified target is first determined by the target detection algorithm, and then the target is classified or identified by the target to determine whether the specified target is a sensitive target.
  • the target detection algorithm compared to the method of moving target detection or sensitive area segmentation, it is possible to more accurately determine the area corresponding to the specified target that is relatively stationary or nearly stationary.
  • the accuracy of the selection of the masked area can be improved.
  • the automatic extraction of the occlusion area is realized, and a large amount of work due to manual calibration of the occlusion area is avoided, and the efficiency of the occlusion area extraction is high. Further classifying or identifying the extracted specified targets to prevent mis-extraction of non-sensitive targets that are close to sensitive targets can reduce mis-extraction of sensitive targets.
  • the video masking area selection device further includes:
  • a masking module is configured to mask a shadow area in each video frame of the video to be detected.
  • pixels in the occlusion area in each video frame of the video to be detected are masked, and desensitization of the video to be detected is achieved.
  • the specified set determining module 302 includes:
  • the detection target sub-module is configured to respectively detect, by using a preset target detection algorithm, an area corresponding to all the specified targets in each frame of the video frame of the to-be-detected video.
  • the target association sub-module is configured to specify a target for each of the specified targets, and associate the regions corresponding to the specified target in a time sequence to obtain the specified target trajectory.
  • a target set sub-module configured to use a specified target trajectory of each specified target as each specified target set in the to-be-detected video.
  • the target detection algorithm determines the region corresponding to the specified target, and associates the regions corresponding to the same specified target according to the time sequence to obtain the specified target set, which can be compared with the moving target detection or the sensitive region segmentation method. It is more accurate to determine the area corresponding to the specified target that is relatively stationary or nearly stationary with the background, and the obtained target set is more accurate.
  • the detection target sub-module includes:
  • the area dividing unit is configured to divide each frame of the video frame in the video to be detected into a preset number of regions to obtain a plurality of pixel regions.
  • the first feature acquiring unit is configured to separately extract features of each pixel region by using a pre-trained convolutional neural network.
  • the target matching unit is configured to determine, according to the feature of each pixel region, whether each pixel region matches any specified target by using a preset classifier.
  • the area determining unit is configured to determine, according to the bounding box regression algorithm, the area corresponding to the specified target according to all the pixel areas that match the same specified target when there is a pixel area that matches any of the specified targets.
  • an area corresponding to a specified target is given.
  • an area corresponding to each specified target may also be determined by an algorithm such as Boosting, FRCNN, FasterRCNN, or SSD.
  • the target association sub-module includes:
  • a second feature acquiring unit configured to extract features of regions corresponding to all specified targets in each frame of the video frame of the to-be-detected video, to obtain a region feature.
  • the set determining unit is configured to determine, according to all the regional features, the regional feature set belonging to the same specified target by using a preset multi-target tracking algorithm.
  • the target trajectory determining unit is configured to respectively associate regions corresponding to each region feature set according to the time sequence, to obtain each specified target trajectory.
  • the specified target set is determined by the multi-target tracking algorithm, and the obtained specified target set is accurate.
  • the sensitive target determining module 303 includes:
  • the video frame selection sub-module is configured to select a specified target video frame of a preset number of frames in a video frame of the specified target set according to a preset video frame extraction method for each specified target set.
  • the first determining sub-module is configured to respectively identify a specified target in each target video frame by using a preset identification algorithm to obtain a target recognition result.
  • a second determining sub-module configured to determine that the specified target corresponding to the target recognition result is a sensitive target when the target recognition result meets the preset determination rule; or determine the target recognition result when the target recognition result does not meet the preset determination rule The corresponding specified target is not a sensitive target.
  • the same specified target is determined multiple times, and according to the comprehensive result of the multiple determinations, whether the specified target is a sensitive target is determined, the contingency in the determination process is reduced, and the result of the determination is more accurate.
  • the first determining submodule includes:
  • a third feature acquiring unit configured to extract a feature of the specified target in the specified target video frame per frame, to obtain a target feature.
  • the sensitive feature recognition unit is configured to identify the sensitive feature in the target feature by using a preset target classification algorithm or a recognition technology.
  • the recognition result determining unit is configured to use the quantitative relationship between the sensitive feature and the target feature as the target recognition result.
  • a specific method for determining the target recognition result is given, which reduces the chance of the determination process, and the result of the determination is more accurate.
  • the embodiment of the present application further provides an electronic device, including a memory of a processor, where the memory is used to store a computer program, and the processor is configured to implement a video masking area selection method according to any one of the foregoing methods when executing a program stored on the memory. .
  • the electronic device in the embodiment of the present application includes a processor 401, a communication interface 402, a memory 403, and a communication bus 404, wherein the processor 401, the communication interface 402, and the memory 403 pass through a communication bus. 404 completes the communication with each other,
  • the memory 403 is configured to store a computer program.
  • the processor 401 is configured to perform the following steps when executing the program stored on the memory 403:
  • Obtaining a to-be-detected video determining, by using a preset target detection algorithm, each specified target set in the to-be-detected video, where any specified target set is a pixel of a specified target in each video frame of the to-be-detected video a set; determining, by a preset recognition algorithm, whether the specified target in each of the specified target sets is a sensitive target; and when the specified target in any of the specified target sets is a sensitive target, the specified target set is used as the The masked area of the video to be detected.
  • the pixel of the specified target is used as the specified target set, so that it is possible to more accurately determine the specified target that is relatively stationary or close to the background. Area.
  • the target is classified or identified by the recognition algorithm, and the sensitivity of the specified target is determined again, which can improve the accuracy of the selection of the masked area.
  • the automatic extraction of the occlusion area is realized, and a large amount of work due to manual calibration of the occlusion area is avoided, and the efficiency of the occlusion area extraction is high. Further classifying or identifying the extracted specified targets to prevent mis-extraction of non-sensitive targets that are close to sensitive targets can reduce mis-extraction of sensitive targets.
  • processor 401 executes the program stored on the memory 403, it is further used to:
  • determining, by using a preset target detection algorithm, each specified target set in the to-be-detected video including:
  • the preset target detection algorithm detects the regions corresponding to all the specified targets in each frame of the video frame to be detected.
  • the regions corresponding to the specified target are associated in a time sequence to obtain the specified target trajectory.
  • the specified target trajectory of each specified target is taken as each specified target set in the video to be detected.
  • the area corresponding to all the specified targets in each frame of the video frame of the to-be-detected video is detected by using a preset target detection algorithm, including:
  • the video frame of each frame in the video to be detected is divided into a preset number of regions to obtain a plurality of pixel regions.
  • each pixel region is extracted separately by a pre-trained convolutional neural network.
  • each pixel region Based on the characteristics of each pixel region, it is determined by the preset classifier whether each pixel region matches any of the specified targets.
  • the region corresponding to the specified target is determined by the bounding box regression algorithm according to all the pixel regions that match the same specified target.
  • the regions corresponding to the same specified target are associated in a time sequence, and the specified target trajectory is obtained, including:
  • the regional feature set belonging to the same specified target is determined by a preset multi-target tracking algorithm.
  • the regions corresponding to each region feature set are respectively associated to obtain each specified target track.
  • determining, by using a preset identification algorithm, whether the specified target in each specified target set is a sensitive target includes:
  • a specified target video frame of a preset number of frames is selected in a video frame of the specified target set.
  • the specified target in the specified target video frame is identified for each frame, and the target recognition result is obtained.
  • the specified target corresponding to the target recognition result is determined to be a sensitive target.
  • the target recognition result does not conform to the preset determination rule, it is determined that the specified target corresponding to the target recognition result is not a sensitive target.
  • the specified target in the specified target video frame is identified by using the identification algorithm, and the target recognition result is obtained, including:
  • the feature of the specified target in the specified target video frame is extracted every frame to obtain the target feature.
  • Sensitive features are identified in the target features by a predetermined target classification algorithm or recognition technique.
  • the quantitative relationship between the sensitive feature and the target feature is used as the target recognition result.
  • the communication bus mentioned in the above electronic device may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the communication bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in the figure, but it does not mean that there is only one bus or one type of bus.
  • the communication interface is used for communication between the above electronic device and other devices.
  • the memory may include a random access memory (RAM), and may also include a non-volatile memory (NVM), such as at least one disk storage.
  • RAM random access memory
  • NVM non-volatile memory
  • the memory may also be at least one storage device located away from the aforementioned processor.
  • the above processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; or may be a digital signal processing (DSP), dedicated integration.
  • CPU central processing unit
  • NP network processor
  • DSP digital signal processing
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • the embodiment of the present application further provides a computer readable storage medium.
  • the computer readable storage medium stores a computer program. When the computer program is executed by the processor, the following steps are implemented:
  • Obtaining a to-be-detected video determining, by using a preset target detection algorithm, each specified target set in the to-be-detected video, where any specified target set is a pixel of a specified target in each video frame of the to-be-detected video a set; determining, by a preset recognition algorithm, whether the specified target in each of the specified target sets is a sensitive target; and when the specified target in any of the specified target sets is a sensitive target, the specified target set is used as the The masked area of the video to be detected.
  • the pixel of the specified target is used as the specified target set, so that it is possible to more accurately determine the specified target that is relatively stationary or close to the background. Area.
  • the target is classified or identified by the recognition algorithm, and the sensitivity of the specified target is determined again, which can improve the accuracy of the selection of the masked area.
  • the automatic extraction of the occlusion area is realized, and a large amount of work due to manual calibration of the occlusion area is avoided, and the efficiency of the occlusion area extraction is high. Further classifying or identifying the extracted specified targets to prevent mis-extraction of non-sensitive targets that are close to sensitive targets can reduce mis-extraction of sensitive targets.
  • any step of the video masking area selection method can be implemented.
  • the embodiment of the present application further provides a video masking area selection system, where the system includes a video capture device and a video processor.
  • a video capture device that collects video to be detected.
  • a video processor for implementing the following steps:
  • Obtaining a to-be-detected video determining, by using a preset target detection algorithm, each specified target set in the to-be-detected video, where any specified target set is a pixel of a specified target in each video frame of the to-be-detected video a set; determining, by a preset recognition algorithm, whether the specified target in each of the specified target sets is a sensitive target; and when the specified target in any of the specified target sets is a sensitive target, the specified target set is used as the The masked area of the video to be detected.
  • the pixel of the specified target is used as the specified target set, so that it is possible to more accurately determine the specified target that is relatively stationary or close to the background. Area.
  • the target is classified or identified by the recognition algorithm, and the sensitivity of the specified target is determined again, which can reduce the mis-extraction of the sensitive target, and can improve the accuracy of the selection of the masked area.
  • the automatic extraction of the occlusion area is realized, and a large amount of work due to manual calibration of the occlusion area is avoided, and the efficiency of the occlusion area extraction is high.
  • the video processor is further capable of implementing any of the above video masking area selection methods.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Bioethics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

一种视频遮蔽区域选取方法、装置、电子设备及系统,应用于计算机视觉技术领域,该视频遮蔽区域选取方法包括:获取待检测视频;通过预设的目标检测算法,确定出所述待检测视频中的各指定目标集合,其中,任一指定目标集合为一个指定目标在所述待检测视频各视频帧中的像素集合;通过预设的识别算法,分别判断各所述指定目标集合中的指定目标是否为敏感目标;当任一指定目标集合中的指定目标为敏感目标时,将该指定目标集合,作为所述待检测视频的遮蔽区域。该视频遮蔽区域选取方法,首先确定出指定目标,然后对指定目标进行是否为敏感目标的判定。通过多次判定,可以提高遮蔽区域选取得准确度。

Description

视频遮蔽区域选取方法、装置、电子设备及系统
本申请要求于2017年10月16日提交中国专利局、申请号为201710957962.X发明名称为“视频遮蔽区域选取方法、装置、电子设备及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机视觉技术领域,特别是涉及视频遮蔽区域选取方法、装置、电子设备及系统。
背景技术
随着网络技术的发展及图像采集设备的普及,视频成为信息传递的主要方式之一。用户可以通过手机或数码相机等便携式视频采集设备,记录生活中的各种事件,并发送到互联网上进行展示。但是一些视频中可能会存在一些敏感信息,如汽车的车牌号、身份证号码及银行卡号等,因此需要对视频进行脱敏处理。视频脱敏可以分为手动和自动两种方式,手动视频脱敏的准确性高,但是需要人工逐帧标定遮蔽区域,工作量大,效率低。面对网络中海量的视频,想要通过人工脱敏的方式逐个去处理,已经变得不可能。
相关的自动视频脱敏方法,在选取视频遮蔽区域时,在一个图像帧中检测出运动目标后,在后续的图像帧中利用目标跟踪方式,预测运动目标的范围,将该范围作为遮蔽区域,但是采用此种视频遮蔽区域的选取方法,会存在因运动轨迹预测不准确导致遮蔽区域选取不准确的情况。
发明内容
本申请实施例的目的在于提供一种视频遮蔽区域选取方法、装置电子设备及系统,以实现提高遮蔽区域选取得准确度。具体技术方案如下:
第一方面,本申请实施例提供了一种视频遮蔽区域选取方法,所述方法包括:
获取待检测视频;
通过预设的目标检测算法,确定出所述待检测视频中的各指定目标集合,其中,任一指定目标集合为一个指定目标在所述待检测视频各视频帧中的像素集合;
通过预设的识别算法,分别判断各所述指定目标集合中的指定目标是否为敏感目标;
当任一指定目标集合中的指定目标为敏感目标时,将该指定目标集合,作为所述待检测视频的遮蔽区域。
可选的,在所述当任一指定目标集合中的指定目标为敏感目标时,将该指定目标集合,作为所述待检测视频的遮蔽区域之后,所述方法还包括:
遮蔽所述待检测视频各视频帧中的所述遮蔽区域。
可选的,所述通过预设的目标检测算法,确定出所述待检测视频中的各指定目标集合,包括:
通过预设的目标检测算法,分别检测出所述待检测视频的每帧视频帧中的所有指定目标对应的区域;
针对所有指定目标中的每个指定目标,将该指定目标对应的区域按照时序顺序进行关联,得到指定目标轨迹;
将各指定目标的指定目标轨迹作为所述待检测视频中的各所述指定目标集合。
可选的,所述通过预设的目标检测算法,分别检测出所述待检测视频的每帧视频帧中的所有指定目标对应的区域,包括:
将所述待检测视频中的每帧视频帧分割为预设数量个区域,得到多个像素区域;
通过预先训练的卷积神经网络,分别提取每个所述像素区域的特征;
根据所述每个像素区域的特征,通过预设分类器,确定各像素区域是否与任一所述指定目标相匹配;
在存在与任一所述指定目标相匹配的像素区域时,根据与同一所述指定 目标相匹配的所有所述像素区域,通过包围盒回归算法,确定该指定目标对应的区域。
可选的,所述针对所有指定目标中的每个指定目标,将同一指定目标对应的区域按照时序顺序进行关联,得到指定目标轨迹,包括:
提取所述待检测视频的每帧视频帧中所有指定目标对应的区域的特征,得到区域特征;
根据所有的区域特征,通过预设的多目标跟踪算法,分别确定出属于同一指定目标的区域特征集合;
按照时序顺序,分别将每个区域特征集合对应的区域进行关联,得到所述指定目标轨迹。
可选的,所述通过预设的识别算法,分别判断各所述指定目标集合中的指定目标是否为敏感目标,包括:
针对每个所述指定目标集合,按照预设的视频帧提取方法,在所述指定目标集合的视频帧中选取预设帧数的指定目标视频帧;
通过预设的识别算法,分别对每帧所述指定目标视频帧中的指定目标进行识别,得到目标识别结果;
在所述目标识别结果符合预设的判定规则时,判定所述目标识别结果对应的指定目标为敏感目标;或
在所述目标识别结果不符合预设的判定规则时,判定所述目标识别结果对应的指定目标不是敏感目标。
可选的,所述通过所述识别算法,分别对每帧所述指定目标视频帧中的指定目标进行识别,得到目标识别结果,包括:
提取每帧所述指定目标视频帧中指定目标的特征,得到目标特征;
通过预设的目标分类算法或识别技术,在所述目标特征中识别敏感特征;
将所述敏感特征与所述目标特征的数量关系,作为所述目标识别结果。
第二方面,本申请实施例提供了一种视频遮蔽区域选取装置,所述装置 包括:
待检测视频获取模块,用于获取待检测视频;
指定集合确定模块,用于通过预设的目标检测算法,确定出所述待检测视频中的各指定目标集合,其中,任一指定目标集合为一个指定目标在所述待检测视频各视频帧中的像素集合;
敏感目标确定模块,用于通过预设的识别算法,分别判断各所述指定目标集合中的指定目标是否为敏感目标;
遮蔽区域选取模块,用于当任一指定目标集合中的指定目标为敏感目标时,将该指定目标集合,作为所述待检测视频的遮蔽区域。
可选的,所述视频遮蔽区域选取装置还包括:
遮蔽模块,用于遮蔽所述待检测视频各视频帧中的所述遮蔽区域。
可选的,所述指定集合确定模块,包括:
检测目标子模块,用于通过预设的目标检测算法,分别检测出所述待检测视频的每帧视频帧中的所有指定目标对应的区域;
目标关联子模块,用于针对所有指定目标中的每个指定目标,将该指定目标对应的区域按照时序顺序进行关联,得到指定目标轨迹;
目标集合子模块,用于将各指定目标的指定目标轨迹作为所述待检测视频中的各指定目标集合。
可选的,所述检测目标子模块,包括:
区域分割单元,用于将所述待检测视频中的每帧视频帧分割为预设数量个区域,得到多个像素区域;
第一特征获取单元,用于通过预先训练的卷积神经网络,分别提取每个所述像素区域的特征;
目标匹配单元,用于根据所述每个像素区域的特征,通过预设分类器,确定各像素区域是否与任一所述指定目标相匹配;
区域确定单元,用于在存在与任一所述指定目标相匹配的像素区域时,根据与同一所述指定目标相匹配的所有所述像素区域,通过包围盒回归算法,确定该指定目标对应的区域。
可选的,所述目标关联子模块,包括:
第二特征获取单元,用于提取所述待检测视频的每帧视频帧中所有指定目标对应的区域的特征,得到区域特征;
集合确定单元,用于根据所有的区域特征,通过预设的多目标跟踪算法,分别确定出属于同一指定目标的区域特征集合;
目标轨迹确定单元,用于按照时序顺序,分别将每个区域特征集合对应的区域进行关联,得到各所述指定目标轨迹。
可选的,所述敏感目标确定模块,包括:
视频帧选取子模块,用于针对每个所述指定目标集合,按照预设的视频帧提取方法,在所述指定目标集合的视频帧中选取预设帧数的指定目标视频帧;
第一判定子模块,用于通过预设的识别算法,分别对每帧所述指定目标视频帧中的指定目标进行识别,得到目标识别结果;
第二判定子模块,用于在所述目标识别结果符合预设的判定规则时,判定所述目标识别结果对应的指定目标为敏感目标;或在所述目标识别结果不符合预设的判定规则时,判定所述目标识别结果对应的指定目标不是敏感目标。
可选的,所述第一判定子模块,包括:
第三特征获取单元,用于提取每帧所述指定目标视频帧中指定目标的特征,得到目标特征;
敏感特征识别单元,用于通过预设的目标分类算法或识别技术,在所述目标特征中识别敏感特征;
识别结果确定单元,用于将所述敏感特征与所述目标特征的数量关系, 作为所述目标识别结果。
第三方面,本申请实施例提供了一种电子设备,包括处理器和存储器;
所述存储器,用于存放计算机程序;
所述处理器,用于执行所述存储器上所存放的程序时,实现如下步骤:
获取待检测视频;
通过预设的目标检测算法,确定出所述待检测视频中的各指定目标集合,其中,任一指定目标集合为一个指定目标在所述待检测视频各视频帧中的像素集合;
通过预设的识别算法,分别判断各所述指定目标集合中的指定目标是否为敏感目标;
当任一指定目标集合中的指定目标为敏感目标时,将该指定目标集合,作为所述待检测视频的遮蔽区域。
可选的,所述处理器还用于:
遮蔽所述待检测视频各视频帧中的所述遮蔽区域。
可选的,在本申请实施例的电子设备中,所述通过预设的目标检测算法,确定出所述待检测视频中的各指定目标集合,包括:
通过预设的目标检测算法,分别检测出所述待检测视频的每帧视频帧中的所有指定目标对应的区域;
针对所有指定目标中的每个指定目标,将该指定目标对应的区域按照时序顺序进行关联,得到指定目标轨迹;
将各指定目标的指定目标轨迹作为所述待检测视频中的各指定目标集合。
可选的,在本申请实施例的电子设备中,所述通过预设的目标检测算法,分别检测出所述待检测视频的每帧视频帧中的所有指定目标对应的区域,包括:
将所述待检测视频中的每帧视频帧分割为预设数量个区域,得到多个像素区域;
通过预先训练的卷积神经网络,分别提取每个所述像素区域的特征;
根据所述每个像素区域的特征,通过预设分类器,确定各像素区域是否与任一所述指定目标相匹配;
在存在与任一所述指定目标相匹配的像素区域时,根据与同一所述指定目标相匹配的所有所述像素区域,通过包围盒回归算法,确定该指定目标对应的区域。
可选的,在本申请实施例的电子设备中,所述针对所有指定目标中的每个指定目标,将同一指定目标对应的区域按照时序顺序进行关联,得到指定目标轨迹,包括:
提取所述待检测视频的每帧视频帧中所有指定目标对应的区域的特征,得到区域特征;
根据所有的区域特征,通过预设的多目标跟踪算法,分别确定出属于同一指定目标的区域特征集合;
按照时序顺序,分别将每个区域特征集合对应的区域进行关联,得到各所述指定目标轨迹。
可选的,在本申请实施例的电子设备中,所述通过预设的识别算法,分别判断各所述指定目标集合中的指定目标是否为敏感目标,包括:
针对每个所述指定目标集合,按照预设的视频帧提取方法,在所述指定目标集合的视频帧中选取预设帧数的指定目标视频帧;
通过预设的识别算法,分别对每帧所述指定目标视频帧中的指定目标进行识别,得到目标识别结果;
在所述目标识别结果符合预设的判定规则时,判定所述目标识别结果对应的指定目标为敏感目标;或
在所述目标识别结果不符合预设的判定规则时,判定所述目标识别结果 对应的指定目标不是敏感目标。
可选的,在本申请实施例的电子设备中,所述通过所述识别算法,分别对每帧所述指定目标视频帧中的指定目标进行识别,得到目标识别结果,包括:
提取每帧所述指定目标视频帧中指定目标的特征,得到目标特征;
通过预设的目标分类算法或识别技术,在所述目标特征中识别敏感特征;
将所述敏感特征与所述目标特征的数量关系,作为所述目标识别结果。
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现所述第一方面任一所述的方法步骤。
第五方面,本申请实施例提供了一种视频遮蔽区域选取系统,所述系统包括视频采集设备和视频处理器;
所述视频采集设备,用于采集待检测视频;
所述视频处理器,用于执行上述第一方面任一所述的方法步骤。
由上述的技术方案可见,本申请实施例提供的视频遮蔽区域选取方法、装置、电子设备及系统,获取待检测视频;通过预设的目标检测算法,确定出所述待检测视频中的各指定目标集合;通过预设的识别算法,分别判断各所述指定目标集合中的指定目标是否为敏感目标;当任一指定目标集合中的指定目标为敏感目标时,将该指定目标集合,作为所述待检测视频的遮蔽区域。通过目标检测算法检测出各视频帧中指定目标,将指定目标的像素作为指定目标集合,可以提高遮蔽区域选取得准确度,通过识别算法再次对指定目标的敏感性进行判定,能够减少对敏感目标的误提取,从而提高遮蔽区域选取得准确度。当然,实施本申请的任一产品或方法必不一定需要同时达到以上所述的所有优点。
附图说明
为了更清楚地说明本申请实施例和现有技术的技术方案,下面对实施例和现有技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的 附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例的视频遮蔽区域选取方法的一种流程示意图;
图2为本申请实施例的视频遮蔽区域选取方法的另一种流程示意图;
图3为本申请实施例的视频遮蔽区域选取装置的示意图;
图4为本申请实施例的电子设备的示意图。
具体实施方式
为使本申请的目的、技术方案、及优点更加清楚明白,以下参照附图并举实施例,对本申请进一步详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
视频脱敏是一种视频内容信息隐藏的技术,视频脱敏对视频内容中的敏感或者隐私信息,例如人脸等,进行模糊或者马赛克等处理,达到对敏感和隐私信息进行遮蔽的目的。相关的视频脱敏方案,在选取遮蔽区域时,在一个图像帧中检测出需要遮蔽的目标后,在后续的图像帧中均利用目标跟踪方式,预测需要遮蔽的目标的范围,将该范围作为遮蔽区域,但是存在以下缺点:1、该方法是基于运动目标检测方法,当待检测的目标与背景相对静止或相对运动速度很小时,确定的遮蔽区域会不完整。2、相关的视频脱敏方案,在选取遮蔽区域时,没有对检测出的目标信息进一步分类,会存在误遮蔽,例如视频中的需要遮蔽的遮蔽区域为穿着警服的人体,但是却遮蔽了所有的人体等。
针对上述问题,参见图1,本申请实施例提供了一种视频遮蔽区域选取方法,该方法包括:
S101,获取待检测视频。
本申请实施例的视频遮蔽区域选取方法可以通过脱敏系统实现。脱敏系统为任意能够执行本申请实施例的视频遮蔽区域选取方法的系统。例如:
脱敏系统可以为一种电子设备,包括:处理器、存储器、通信接口和总线;处理器、存储器和通信接口通过总线连接并完成相互间的通信;存储器存储可执行程序代码;处理器通过读取存储器中存储的可执行程序代码来运行与可执行程序代码对应的程序,以用于执行视频遮蔽区域选取方法。
脱敏系统还可以为一种应用程序,用于在运行时执行视频遮蔽区域选取方法。
脱敏系统还可以为一种存储介质,用于存储可执行代码,可执行代码用于执行视频遮蔽区域选取方法。
脱敏系统可以从存储设备中获取待检测视频,也可以通过图像采集设备实时获取待检测视频。可选的,脱敏系统内置视频拍摄装置,或连接外设视频拍摄装置;脱敏系统通过内置的视频拍摄装置或外设的视频拍摄装置,即时采集待检测视频。脱敏系统通过视频拍摄装置,即时拍摄获取待检测视频,增加了获取待检测视频的即时性,方便用户对即时视频进行处理,用户体验佳。
S102,通过预设的目标检测算法,确定出待检测视频中的各指定目标集合,其中,任一指定目标集合为一个指定目标在待检测视频各视频帧中的像素集合。
脱敏系统利用预设的目标检测算法,检测出待检测视频各视频帧中的所有指定目标,并分别将同一指定目标对应的所有区域(视频帧中的区域是由像素组成的)进行关联,得到各指定目标集合。
其中,预设的目标检测算法为任意能够识别指定目标的算法,例如Boosting、RCNN(Regions Convolutional Neural Network)、FRCNN(Fast Region Convolutional Neural Network)、FasterRCNN(Faster Region Convolutional Neural Network)、SSD(Single Shot MultiBox Detector)等。指定目标为预设的需要遮蔽的目标,例如,视频中汽车的车牌、视频中出现的身份证及银行卡等。
S103,通过预设的识别算法,分别判断各指定目标集合中的指定目标是否为敏感目标。
预设的识别算法为任意能够识别敏感目标的目标分类或者识别技术,例如SIFT(Scale-Invariant Feature Transform,尺度不变特征转换)、Dense SIFT、Color SIFT、HOG(Histogram of Oriented Gradient,方向梯度直方图)等。敏感目标为预设的需要遮蔽的具体目标,例如,警车车牌、未成年人的身份证或指定卡号的银行卡等。对提取的指定目标进行进一步的分类或者识别,防止误提取与敏感目标近似的非敏感目标,能够减少对敏感目标的误提取。
S104,当任一指定目标集合中的指定目标为敏感目标时,将该指定目标集合,作为待检测视频的遮蔽区域。
当存在敏感目标时,脱敏系统将敏感目标对应的指定目标集合,即指定目标为敏感目标的指定目标集合,作为待检测视频的遮蔽区域。
在本申请实施例中,通过目标检测算法检测出各视频帧中指定目标,并将指定目标的像素作为指定目标集合,不需要预测运动目标轨迹,因此能够更加准确的确定出与背景相对静止或接近静止的指定目标所对应的区域。通过识别算法对指定目标进行目标分类或识别,再次对指定目标的敏感性进行判定,能够减少对敏感目标的误提取,可以提高遮蔽区域选取得准确度。实现了遮蔽区域的自动提取,避免因手动标定遮蔽区域而带来的大量工作,遮蔽区域提取的效率高。
可选的,在将当任一指定目标集合中的指定目标为敏感目标时,将该指定目标集合,作为待检测视频的遮蔽区域之后,上述方法还包括:
遮蔽待检测视频各视频帧中的遮蔽区域。
脱敏系统采用加马赛克或加灰色等方法,分别将每一帧视频帧中遮蔽区域对应的像素进行遮蔽。
在本申请实施例中,遮蔽待检测视频各视频帧中遮蔽区域中的像素,实现了对待检测视频的脱敏。
可选的,通过预设的目标检测算法,确定出待检测视频中的各指定目标集合,包括:
步骤一,通过预设的目标检测算法,分别检测出待检测视频的每帧视频 帧中的所有指定目标对应的区域。
脱敏系统利用预设的目标检测算法,分别对待检测视频中的每帧视频帧进行检测,确定出每帧视频帧中的所有指定目标,及各指定目标各自对应的区域。
步骤二,针对所有指定目标中的每个指定目标,将该指定目标对应的区域按照时序顺序进行关联,得到指定目标轨迹。
脱敏系统检测出每帧视频帧中的各指定目标对应的区域后,还需要识别不同视频帧中的各指定目标对应的区域哪些属于同一指定目标。例如脱敏系统利用预设的目标跟踪算法,如基于MeanShift的目标跟踪算法、TLD(Tracking-Learning-Detection,视觉追踪算法)、IVT(Incremental Visual Tracking,增量视觉跟踪器)算法或MIL(Multi-instance Learning,多示例学习)算法等,确定属于同一指定目标的各区域;或脱敏系统利用预设的目标分类或者识别算法,确定属于同一指定目标的各区域。脱敏系统按照时序顺序,分别将每帧视频帧中同一指定目标对应的区域进行关联,得到与指定目标数量相同的指定目标轨迹。
步骤三,将各指定目标的指定目标轨迹作为待检测视频中的各指定目标集合。
针对每个指定目标,脱敏系统将该指定目标的指定目标轨迹作为该指定目标的指定目标集合。如果是静止目标,其轨迹则可以是不变化的位置信息组成的集合。例如,针对一个指定目标,如果该指定目标是静止的,则其每帧视频帧识别出的该指定目标的位置不变,因此,可以将各视频帧中该位置的像素作为该指定目标的指定目标集合。
在本申请实施例中,通过目标检测算法确定指定目标对应的区域,按照时序顺序对同一指定目标对应的区域进行关联,得到指定目标集合,相比于运动目标检测或敏感区域分割的方法,能够更加准确的确定出与背景相对静止或接近静止的指定目标所对应的区域,得到的目标集合更加准确。
可选的,通过预设的目标检测算法,分别检测出待检测视频的每帧视频帧中的所有指定目标对应的区域,包括:
步骤一,将待检测视频中的每帧视频帧分割为预设数量个区域,得到多个像素区域。
脱敏系统将每帧视频帧分割为预设数量个区域,这些区域可以重叠。预设数量根据要求的识别精度及视频帧的分辨率进行设定,预设数量与要求的识别精度和视频帧的分辨率正相关。例如预设数量可以为100、500、1000或更大。
步骤二,通过预先训练的卷积神经网络,分别提取每个像素区域的特征。
预先经过训练的卷积神经网络可以为在监督式学习下建立的。建立以提取指定目标对应的区域为目的的神经网络,输入多组指定目标对应的区域进行监督式学习,以确定指定目标对应的区域的识别特征。例如,使用SVM(Support Vector Machine,支持向量机)算法,将包含指定目标对应的区域的图像特征作为特征值,根据上述特征值及特征值的变化率确定输入向量,并分别采用Linear kernel(线性核函数)和RBF(Radial Basis Function,径向基函数)训练算法进行训练,选取测试集效果更好的函数,以完成预先训练的卷积神经网络。
步骤三,根据每个像素区域的特征,通过预设分类器,确定各像素区域是否与任一指定目标相匹配。
同一待检测视频中可能会存在多个种类的指定目标,例如即存在汽车的车牌,又存在人的身份证,因此需要通过分类器将像素区域的特征进行分类,以进一步判断像素区域的特征是否与指定目标的识别特征想匹配。
步骤四,在存在与任一指定目标相匹配的像素区域时,根据与同一指定目标相匹配的所有像素区域,通过包围盒回归算法,确定该指定目标对应的区域。
脱敏系统对于每一帧视频帧中的像素区域,当存在与指定目标相匹配的像素区域时,将同一视频帧中与同一指定目标相匹配的所有像素区域进行组合,确定初步对应的区域。然后脱敏系统利用boundingbox(包围盒)回归算法,修正初步对应的区域,最终得到指定目标对应的区域。
在本申请实施例中,给出了确定指定目标对应的区域的具体方法。当然, 在本申请实施例中还可以通过Boosting、FRCNN、FasterRCNN或SSD等算法确定各指定目标对应的区域。
可选的,针对所有指定目标中的每个指定目标,将同一指定目标对应的区域按照时序顺序进行关联,得到指定目标轨迹,包括:
步骤一,提取待检测视频的每帧视频帧中所有指定目标对应的区域的特征,得到区域特征。
区域特征为任意能够识别指定目标的特征。脱敏系统通过预设的多目标跟踪算法,提取指定目标的区域特征。
步骤二,根据所有的区域特征,通过预设的多目标跟踪算法,分别确定出属于同一指定目标的区域特征集合。
多目标跟踪算法可以为TLD(Tracking-Learning-Detection,视觉追踪算法)、ITV(Incremental visual tracking,增量视觉跟踪器)算法或MIL(Multi-instance Learning,多示例学习)算法等。脱敏系统利用多目标跟踪算法,确定出属于同一指定目标的各区域特征,并将同一指定目标的所有区域特征作为一个区域特征集合。
步骤三,按照时序顺序,分别将每个区域特征集合对应的区域进行关联,得到指定目标轨迹。
在本申请实施例中,通过多目标跟踪算法确定指定目标轨迹,能够更加快捷的确定指定目标轨迹。
可选的,针对所有指定目标中的每个指定目标,将该指定目标对应的区域按照时序顺序进行关联,得到指定目标轨迹,包括:
步骤一,提取每个指定目标对应的区域的RGB(Red Green Blue,红绿蓝)色彩模式的颜色直方图。
步骤二,按照时序顺序,分别计算每两帧相邻视频帧之间的各指定目标对应的区域的RGB色彩模式的颜色直方图的欧式距离。
步骤三,按照时序顺序,分别将欧式距离小于预设阈值的各指定目标对应的区域进行关联,得到指定目标轨迹。
当不同视频帧中的两个指定目标的欧式距离小于预设阈值时,则这两个指定目标为同一个指定目标,将这两个指定目标对应的区域进行关联。例如,第一视频帧与第二视频帧为时序上相邻的视频帧,第一视频帧中包括指定目标1和指定目标2,第二视频帧中包括指定目标3和指定目标4。计算出的指定目标1与指定目标3之间的RGB色彩模式的颜色直方图的欧式距离为0.02,指定目标1与指定目标4之间的RGB色彩模式的颜色直方图的欧式距离为0.58,指定目标2与指定目标3之间的RGB色彩模式的颜色直方图的欧式距离为0.67,指定目标2与指定目标4之间的的RGB色彩模式的颜色直方图的欧式距离为0.09,预设阈值为0.1,则关联指定目标1对应的区域与指定目标3对应的区域,关联指定目标2对应的区域与指定目标4对应的区域。
预设阈值根据实际的图像分辨率进行设定,根图像分辨率正相关。例如预设阈值可以为0.3、0.2、0.1、或0.05等。
在本申请实施例中,通过欧式距离确定指定目标轨迹,计算方法简单,节约计算成本。
可选的,通过预设的识别算法,分别判断各指定目标集合中的指定目标是否为敏感目标,包括:
步骤一,针对每个指定目标集合,按照预设的视频帧提取方法,在指定目标集合的视频帧中选取预设帧数的指定目标视频帧。
预设的视频帧提取方法为任意的提取视频帧的方法,例如,每隔15帧提取一帧视频帧,一共提取5帧视频帧;例如,规定总共提取3帧视频帧,在任一指定目标集合一共包含9帧视频帧,分别提取第3帧、第6帧和第9帧视频帧。
步骤二,通过预设的识别算法,分别对每帧指定目标视频帧中的指定目标进行识别,得到目标识别结果。
脱敏系统利用预设的识别算法,包括目标分类或识别技术,识别各指定目标视频帧,得到目标识别结果。例如,对于任一指定目标集合,总共提取了5帧指定目标视频帧,通过预设的识别算法,识别出其中4帧指定目标视频帧中的指定目标为敏感目标,1帧指定目标视频帧中的指定目标不是敏感目标,相应的目标识别结果可以为4帧为敏感目标,1帧不是敏感目标;或该指 定目标集合对应的指定目标为敏感目标的概率为80%等。
步骤三,在目标识别结果符合预设的判定规则时,判定目标识别结果对应的指定目标为敏感目标;或在目标识别结果不符合预设的判定规则时,判定目标识别结果对应的指定目标不是敏感目标。
预设的判定规则可以按照实际要求进行设定,例如,预设的判定规则可以为:在指定目标集合对应的指定目标为敏感目标的概率不小于预设相似度阈值时,判定目标识别结果对应的指定目标为敏感目标。预设相似度阈值可以根据实际情况进行设定,例如设定为80%、90%或95%等。例如,在预设相似度阈值为80%时,当目标识别结果为指定目标为敏感目标的概率为80%时,判定目标识别结果对应的指定目标为敏感目标;当目标识别结果为指定目标为敏感目标的概率为60%时,判定目标识别结果对应的指定目标不是敏感目标。
在本申请实施例中,对同一指定目标进行多次判断,根据多次判断的综合结果确定指定目标是否为敏感目标,减少了判定过程中的偶然性,判定的结果更加准确。对提取的指定目标进行进一步的分类或者识别,防止误提取与敏感目标近似的非敏感目标,能够减少对敏感目标的误提取。
可选的,通过识别算法,分别对每帧指定目标视频帧中的指定目标进行识别,得到目标识别结果,包括:
步骤一,提取每帧指定目标视频帧中指定目标的特征,得到目标特征。
步骤二,通过预设的目标分类算法或识别技术,在目标特征中识别敏感特征。
例如,通过HOG算法识别敏感特征。脱敏系统将目标视频帧进行图像归一化处理,并计算图像归一化处理后的目标视频帧的梯度,得到梯度直方图。根据目标对象的种类,将梯度直方图划分为预设数量的Cell(单元)区域,将梯度直方图划分为预设数量的Block(块)区域,例如目标对象为人体时,每个Cell区域可以设置为6×6个像素,每个Block区域可以设置为3×3个Cell区域。对于每一个Cell区域进行预设权重的投影,得到直方图向量,对于每个Block区域中重叠的Cell区域进行对比度归一化,得到直方图向量。把所有Block区 域内的直方图向量组合成一个HOG特征向量,作为目标特征。当目标特征与预设的敏感目标的HOG特征向量的符合预设相似条件时,例如高于预设的相似度阈值(相似度阈值可以根据视频帧的分辨率进行设定,与分辨率正相关,例如可以设定为70%、80%、或90%等),判定目标特征为敏感特征。
步骤三,将敏感特征与目标特征的数量关系,作为目标识别结果。
目标识别结果用于表征敏感特征与目标特征的数量关系,例如,目标识别结果为:敏感特征的数量为8,目标特征的数量为10;或者目标识别结果为:8/10。
在本申请实施例中,给出了确定目标识别结果的具体方法,减少了判定过程中的偶然性,判定的结果更加准确。
参见图2,图2为本申请实施例的视频遮蔽区域选取方法的另一种流程示意图,包括:
视频采集模块201,用于采集待检测视频。
视频采集模块201由摄像机等可以录制视频的视频采集设备构成,视频采集模块201中的视频采集设备采集现场环境中的景象,生成现场环境中的景象的视频帧,并将采集到的视频帧传输给其他模块。
目标提取模块202,用于通过预设的目标检测算法,对视频帧中的指定目标进行提取,按照时序顺序对同一指定目标进行关联,得到指定目标集合。
可选的,目标提取模块202包括视频帧中检测目标子模块和目标跟踪关联子模块。目标提取模块202获取由视频采集模块201发送的视频帧,视频帧中检测目标子模块用于通过预设的目标检测技术,例如Boosting、RCNN、FRCNN、Faster RCNN或SSD等,对视频采集模块201采集到的各视频帧进行目标检测,确定出指定目标并检测出指定目标的目标框,将任一指定目标的目标框中的区域作为该指定目标对应的区域。目标跟踪关联子模块用于通过多目标跟踪算法,例如TLD、ITV或MIL算法等,分别对视频帧中各指定目标进行跟踪,在时序上将同一指定目标进行关联,形成指定目标集合,并给每个指定目标集合绑定一个ID。
敏感性判断模块203,用于通过预设目标分类或识别技术,分别判断各指定目标集合中的指定目标是否为敏感目标。
每次确定新的指定目标集合后,敏感性判断模块203针对每一个新生成的ID,对该ID对应的指定目标进行敏感性分析。采用目标分类或者识别技术,例如SIFT、Dense SIFT、Color SIFT、HOG等,判断指定目标是否属于敏感目标,例如特定人物或警车等。后续每隔一定的帧数,再次对指定目标进行判断,对多次判断的记过采取投票法等手段最终确定敏感目标,例如指定目标出现了9帧,每隔3帧选取一个视频帧,判定该视频帧中的指定目标是否为敏感目标,一共判断3次,如果有2次及以上判定指定目标为敏感目标,则将该ID对应的指定目标最终判定为敏感目标。采用多次判定敏感目标的方法,能够增加判定敏感目标的鲁棒性,防止误判。
区域遮蔽模块204,用于当任一指定目标集合中的指定目标为敏感目标时,将该指定目标集合进行遮蔽。
当敏感性判断模块203判定任一ID对应的指定目标为敏感目标时,区域遮蔽模块204采用加马赛克或灰色等手段,在该ID对应的指定目标集合的各帧视频帧中,将该指定目标对应的区域(像素)进行遮蔽,以达到视频脱敏的目的。
在本申请实施例中,目标提取模块202通过目标检测算法确定指定目标,相比于运动目标检测或敏感区域分割的方法,能够更加准确的确定出与背景相对静止或接近静止的指定目标所对应的区域。敏感性判断模块203判断指定目标是否为敏感目标,对已提取的指定目标进行判断,可以提高遮蔽区域选取得准确度,进而提高视频脱敏的准确度。实现了遮蔽区域的自动提取,避免因手动标定遮蔽区域而带来的大量工作,遮蔽区域提取的效率高。对提取的指定目标进行进一步的分类或者识别,防止误提取与敏感目标近似的非敏感目标,能够减少对敏感目标的误提取。
参见图3,本申请实施例提供了一种视频遮蔽区域选取装置,该装置包括:
待检测视频获取模块301,用于获取待检测视频。
指定集合确定模块302,用于通过预设的目标检测算法,确定出待检测视 频中的各指定目标集合,其中,任一指定目标集合为一个指定目标在待检测视频各视频帧中的像素集合。
敏感目标确定模块303,用于通过预设的识别算法,分别判断各指定目标集合中的指定目标是否为敏感目标。
遮蔽区域选取模块304,用于当任一指定目标集合中的指定目标为敏感目标时,将该指定目标集合,作为待检测视频的遮蔽区域。
在本申请实施例中,首先通过目标检测算法确定出指定目标,然后对指定目标进行目标分类或识别,以判定指定目标是否为敏感目标。通过目标检测算法确定指定目标,相比于运动目标检测或敏感区域分割的方法,能够更加准确的确定出与背景相对静止或接近静止的指定目标所对应的区域。通过对指定目标进行目标分类或识别,再次对指定目标的敏感性进行判定,可以提高遮蔽区域选取得准确度。实现了遮蔽区域的自动提取,避免因手动标定遮蔽区域而带来的大量工作,遮蔽区域提取的效率高。对提取的指定目标进行进一步的分类或者识别,防止误提取与敏感目标近似的非敏感目标,能够减少对敏感目标的误提取。
可选的,上述视频遮蔽区域选取装置还包括:
遮蔽模块,用于遮蔽待检测视频各视频帧中的遮蔽区域。
在本申请实施例中,遮蔽待检测视频各视频帧中遮蔽区域中的像素,实现了对待检测视频的脱敏。
可选的,指定集合确定模块302,包括:
检测目标子模块,用于通过预设的目标检测算法,分别检测出待检测视频的每帧视频帧中的所有指定目标对应的区域。
目标关联子模块,用于针对所有指定目标中的每个指定目标,将该指定目标对应的区域按照时序顺序进行关联,得到指定目标轨迹。
目标集合子模块,用于将各指定目标的指定目标轨迹作为所述待检测视频中的各指定目标集合。
在本申请实施例中,通过目标检测算法确定指定目标对应的区域,按照 时序顺序对同一指定目标对应的区域进行关联,得到指定目标集合,相比于运动目标检测或敏感区域分割的方法,能够更加准确的确定出与背景相对静止或接近静止的指定目标所对应的区域,得到的目标集合更加准确。
可选的,检测目标子模块,包括:
区域分割单元,用于将待检测视频中的每帧视频帧分割为预设数量个区域,得到多个像素区域。
第一特征获取单元,用于通过预先训练的卷积神经网络,分别提取每个像素区域的特征。
目标匹配单元,用于根据每个像素区域的特征,通过预设分类器,确定各像素区域是否与任一指定目标相匹配。
区域确定单元,用于在存在与任一指定目标相匹配的像素区域时,根据与同一指定目标相匹配的所有像素区域,通过包围盒回归算法,确定该指定目标对应的区域。
在本申请实施例中,给出了确定指定目标对应的区域的具体方法。当然,在本申请实施例中还可以通过Boosting、FRCNN、FasterRCNN或SSD等算法确定各指定目标对应的区域。
可选的,目标关联子模块,包括:
第二特征获取单元,用于提取待检测视频的每帧视频帧中所有指定目标对应的区域的特征,得到区域特征。
集合确定单元,用于根据所有的区域特征,通过预设的多目标跟踪算法,分别确定出属于同一指定目标的区域特征集合。
目标轨迹确定单元,用于按照时序顺序,分别将每个区域特征集合对应的区域进行关联,得到各指定目标轨迹。
在本申请实施例中,通过多目标跟踪算法确定指定目标集合,得到的指定目标集合准确。
可选的,敏感目标确定模块303,包括:
视频帧选取子模块,用于针对每个指定目标集合,按照预设的视频帧提取方法,在指定目标集合的视频帧中选取预设帧数的指定目标视频帧。
第一判定子模块,用于通过预设的识别算法,分别对每帧指定目标视频帧中的指定目标进行识别,得到目标识别结果。
第二判定子模块,用于在目标识别结果符合预设的判定规则时,判定目标识别结果对应的指定目标为敏感目标;或在目标识别结果不符合预设的判定规则时,判定目标识别结果对应的指定目标不是敏感目标。
在本申请实施例中,对同一指定目标进行多次判断,根据多次判断的综合结果确定指定目标是否为敏感目标,减少了判定过程中的偶然性,判定的结果更加准确。
可选的,第一判定子模块,包括:
第三特征获取单元,用于提取每帧指定目标视频帧中指定目标的特征,得到目标特征。
敏感特征识别单元,用于通过预设的目标分类算法或识别技术,在目标特征中识别敏感特征。
识别结果确定单元,用于将敏感特征与目标特征的数量关系,作为目标识别结果。
在本申请实施例中,给出了确定目标识别结果的具体方法,减少了判定过程中的偶然性,判定的结果更加准确。
本申请实施例还提供了一种电子设备,包括处理器各存储器,存储器用于存放计算机程序,处理器用于在执行存储器上所存放的程序时,实现上述任一所述的视频遮蔽区域选取方法。
可选的,本申请实施例中的电子设备具体如图4所示,包括处理器401、通信接口402、存储器403和通信总线404,其中,处理器401,通信接口402,存储器403通过通信总线404完成相互间的通信,
存储器403,用于存放计算机程序。
处理器401,用于执行存储器403上所存放的程序时,实现如下步骤:
获取待检测视频;通过预设的目标检测算法,确定出所述待检测视频中的各指定目标集合,其中,任一指定目标集合为一个指定目标在所述待检测视频各视频帧中的像素集合;通过预设的识别算法,分别判断各所述指定目标集合中的指定目标是否为敏感目标;当任一指定目标集合中的指定目标为敏感目标时,将该指定目标集合,作为所述待检测视频的遮蔽区域。
在本申请实施例中,由于通过目标检测算法检测出各视频帧中指定目标,将指定目标的像素作为指定目标集合,因此能够更加准确的确定出与背景相对静止或接近静止的指定目标所对应的区域。通过识别算法对指定目标进行目标分类或识别,再次对指定目标的敏感性进行判定,可以提高遮蔽区域选取得准确度。实现了遮蔽区域的自动提取,避免因手动标定遮蔽区域而带来的大量工作,遮蔽区域提取的效率高。对提取的指定目标进行进一步的分类或者识别,防止误提取与敏感目标近似的非敏感目标,能够减少对敏感目标的误提取。
可选的,处理器401执行存储器403上所存放的程序时,还用于:
遮蔽待检测视频各视频帧中的遮蔽区域。
可选的,在本申请实施例的电子设备中,通过预设的目标检测算法,确定出待检测视频中的各指定目标集合,包括:
通过预设的目标检测算法,分别检测出待检测视频的每帧视频帧中的所有指定目标对应的区域。
针对所有指定目标中的每个指定目标,将该指定目标对应的区域按照时序顺序进行关联,得到指定目标轨迹。
将各指定目标的指定目标轨迹作为待检测视频中的各指定目标集合。
可选的,在本申请实施例的电子设备中,通过预设的目标检测算法,分别检测出待检测视频的每帧视频帧中的所有指定目标对应的区域,包括:
将待检测视频中的每帧视频帧分割为预设数量个区域,得到多个像素区域。
通过预先训练的卷积神经网络,分别提取每个像素区域的特征。
根据每个像素区域的特征,通过预设分类器,确定各像素区域是否与任一指定目标相匹配。
在存在与任一指定目标相匹配的像素区域时,根据与同一指定目标相匹配的所有像素区域,通过包围盒回归算法,确定该指定目标对应的区域。
可选的,在本申请实施例的电子设备中,针对所有指定目标中的每个指定目标,将同一指定目标对应的区域按照时序顺序进行关联,得到指定目标轨迹,包括:
提取待检测视频的每帧视频帧中所有指定目标对应的区域的特征,得到区域特征。
根据所有的区域特征,通过预设的多目标跟踪算法,分别确定出属于同一指定目标的区域特征集合。
按照时序顺序,分别将每个区域特征集合对应的区域进行关联,得到各指定目标轨迹。
可选的,在本申请实施例的电子设备中,通过预设的识别算法,分别判断各指定目标集合中的指定目标是否为敏感目标,包括:
针对每个指定目标集合,按照预设的视频帧提取方法,在指定目标集合的视频帧中选取预设帧数的指定目标视频帧。
通过预设的识别算法,分别对每帧指定目标视频帧中的指定目标进行识别,得到目标识别结果。
在目标识别结果符合预设的判定规则时,判定目标识别结果对应的指定目标为敏感目标。或
在目标识别结果不符合预设的判定规则时,判定目标识别结果对应的指定目标不是敏感目标。
可选的,在本申请实施例的电子设备中,通过识别算法,分别对每帧指定目标视频帧中的指定目标进行识别,得到目标识别结果,包括:
提取每帧指定目标视频帧中指定目标的特征,得到目标特征。
通过预设的目标分类算法或识别技术,在目标特征中识别敏感特征。
将敏感特征与目标特征的数量关系,作为目标识别结果。
上述电子设备提到的通信总线可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
通信接口用于上述电子设备与其他设备之间的通信。
存储器可以包括随机存取存储器(Random Access Memory,RAM),也可以包括非易失性存储器(Non-Volatile Memory,NVM),例如至少一个磁盘存储器。可选的,存储器还可以是至少一个位于远离前述处理器的存储装置。
上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
本申请实施例还提供了一种计算机可读存储介质,计算机可读存储介质内存储有计算机程序,计算机程序被处理器执行时实现如下步骤:
获取待检测视频;通过预设的目标检测算法,确定出所述待检测视频中的各指定目标集合,其中,任一指定目标集合为一个指定目标在所述待检测视频各视频帧中的像素集合;通过预设的识别算法,分别判断各所述指定目标集合中的指定目标是否为敏感目标;当任一指定目标集合中的指定目标为敏感目标时,将该指定目标集合,作为所述待检测视频的遮蔽区域。
在本申请实施例中,由于通过目标检测算法检测出各视频帧中指定目标,将指定目标的像素作为指定目标集合,因此能够更加准确的确定出与背景相 对静止或接近静止的指定目标所对应的区域。通过识别算法对指定目标进行目标分类或识别,再次对指定目标的敏感性进行判定,可以提高遮蔽区域选取得准确度。实现了遮蔽区域的自动提取,避免因手动标定遮蔽区域而带来的大量工作,遮蔽区域提取的效率高。对提取的指定目标进行进一步的分类或者识别,防止误提取与敏感目标近似的非敏感目标,能够减少对敏感目标的误提取。
可选的,计算机程序被处理器执行时还能够实现上述视频遮蔽区域选取方法的任意步骤。
本申请实施例还提供了一种视频遮蔽区域选取系统,该系统包括视频采集设备和视频处理器。
视频采集设备,用于采集待检测视频。
视频处理器,用于实现如下步骤:
获取待检测视频;通过预设的目标检测算法,确定出所述待检测视频中的各指定目标集合,其中,任一指定目标集合为一个指定目标在所述待检测视频各视频帧中的像素集合;通过预设的识别算法,分别判断各所述指定目标集合中的指定目标是否为敏感目标;当任一指定目标集合中的指定目标为敏感目标时,将该指定目标集合,作为所述待检测视频的遮蔽区域。
在本申请实施例中,由于通过目标检测算法检测出各视频帧中指定目标,将指定目标的像素作为指定目标集合,因此能够更加准确的确定出与背景相对静止或接近静止的指定目标所对应的区域。通过识别算法对指定目标进行目标分类或识别,再次对指定目标的敏感性进行判定,能够减少对敏感目标的误提取,可以提高遮蔽区域选取得准确度。实现了遮蔽区域的自动提取,避免因手动标定遮蔽区域而带来的大量工作,遮蔽区域提取的效率高。
可选的,视频处理器还能够实现上述任一视频遮蔽区域选取方法。
对于视频遮蔽区域选取装置/电子设备/系统/计算机可读存储介质的实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。

Claims (22)

  1. 一种视频遮蔽区域选取方法,其特征在于,所述方法包括:
    获取待检测视频;
    通过预设的目标检测算法,确定出所述待检测视频中的各指定目标集合,其中,任一指定目标集合为一个指定目标在所述待检测视频各视频帧中的像素集合;
    通过预设的识别算法,分别判断各所述指定目标集合中的指定目标是否为敏感目标;
    当任一指定目标集合中的指定目标为敏感目标时,将该指定目标集合,作为所述待检测视频的遮蔽区域。
  2. 根据权利要求1所述的方法,其特征在于,在所述当任一指定目标集合中的指定目标为敏感目标时,将该指定目标集合,作为所述待检测视频的遮蔽区域之后,所述方法还包括:
    遮蔽所述待检测视频各视频帧中的所述遮蔽区域。
  3. 根据权利要求1所述的方法,其特征在于,所述通过预设的目标检测算法,确定出所述待检测视频中的各指定目标集合,包括:
    通过预设的目标检测算法,分别检测出所述待检测视频的每帧视频帧中的所有指定目标对应的区域;
    针对所有指定目标中的每个指定目标,将该指定目标对应的区域按照时序顺序进行关联,得到指定目标轨迹;
    将各指定目标的指定目标轨迹作为所述待检测视频中的各指定目标集合。
  4. 根据权利要求3所述的方法,其特征在于,所述通过预设的目标检测算法,分别检测出所述待检测视频的每帧视频帧中的所有指定目标对应的区域,包括:
    将所述待检测视频中的每帧视频帧分割为预设数量个区域,得到多个像 素区域;
    通过预先训练的卷积神经网络,分别提取每个所述像素区域的特征;
    根据所述每个像素区域的特征,通过预设分类器,确定各像素区域是否与任一所述指定目标相匹配;
    在存在与任一所述指定目标相匹配的像素区域时,根据与同一所述指定目标相匹配的所有所述像素区域,通过包围盒回归算法,确定该指定目标对应的区域。
  5. 根据权利要求3所述的方法,其特征在于,所述针对所有指定目标中的每个指定目标,将同一指定目标对应的区域按照时序顺序进行关联,得到指定目标轨迹,包括:
    提取所述待检测视频的每帧视频帧中所有指定目标对应的区域的特征,得到区域特征;
    根据所有的区域特征,通过预设的多目标跟踪算法,分别确定出属于同一指定目标的区域特征集合;
    按照时序顺序,分别将每个区域特征集合对应的区域进行关联,得到各所述指定目标轨迹。
  6. 根据权利要求1所述的方法,其特征在于,所述通过预设的识别算法,分别判断各所述指定目标集合中的指定目标是否为敏感目标,包括:
    针对每个所述指定目标集合,按照预设的视频帧提取方法,在所述指定目标集合的视频帧中选取预设帧数的指定目标视频帧;
    通过预设的识别算法,分别对每帧所述指定目标视频帧中的指定目标进行识别,得到目标识别结果;
    在所述目标识别结果符合预设的判定规则时,判定所述目标识别结果对应的指定目标为敏感目标;或
    在所述目标识别结果不符合预设的判定规则时,判定所述目标识别结果对应的指定目标不是敏感目标。
  7. 根据权利要求6所述的方法,其特征在于,所述通过所述识别算法,分别对每帧所述指定目标视频帧中的指定目标进行识别,得到目标识别结果,包括:
    提取每帧所述指定目标视频帧中指定目标的特征,得到目标特征;
    通过预设的目标分类算法或识别技术,在所述目标特征中识别敏感特征;
    将所述敏感特征与所述目标特征的数量关系,作为所述目标识别结果。
  8. 一种视频遮蔽区域选取装置,其特征在于,所述装置包括:
    待检测视频获取模块,用于获取待检测视频;
    指定集合确定模块,用于通过预设的目标检测算法,确定出所述待检测视频中的各指定目标集合,其中,任一指定目标集合为一个指定目标在所述待检测视频各视频帧中的像素集合;
    敏感目标确定模块,用于通过预设的识别算法,分别判断各所述指定目标集合中的指定目标是否为敏感目标;
    遮蔽区域选取模块,用于当任一指定目标集合中的指定目标为敏感目标时,将该指定目标集合,作为所述待检测视频的遮蔽区域。
  9. 根据权利要求8所述的装置,其特征在于,所述装置还包括:
    遮蔽模块,用于遮蔽所述待检测视频各视频帧中的所述遮蔽区域。
  10. 根据权利要求8所述的装置,其特征在于,所述指定集合确定模块,包括:
    检测目标子模块,用于通过预设的目标检测算法,分别检测出所述待检测视频的每帧视频帧中的所有指定目标对应的区域;
    目标关联子模块,用于针对所有指定目标中的每个指定目标,将该指定目标对应的区域按照时序顺序进行关联,得到指定目标轨迹;
    目标集合子模块,用于将各指定目标的指定目标轨迹作为所述待检测视频中的各指定目标集合。
  11. 根据权利要求10所述的装置,其特征在于,所述检测目标子模块,包括:
    区域分割单元,用于将所述待检测视频中的每帧视频帧分割为预设数量个区域,得到多个像素区域;
    第一特征获取单元,用于通过预先训练的卷积神经网络,分别提取每个所述像素区域的特征;
    目标匹配单元,用于根据所述每个像素区域的特征,通过预设分类器,确定各像素区域是否与任一所述指定目标相匹配;
    区域确定单元,用于在存在与任一所述指定目标相匹配的像素区域时,根据与同一所述指定目标相匹配的所有所述像素区域,通过包围盒回归算法,确定该指定目标对应的区域。
  12. 根据权利要求10所述的装置,其特征在于,所述目标关联子模块,包括:
    第二特征获取单元,用于提取所述待检测视频的每帧视频帧中所有指定目标对应的区域的特征,得到区域特征;
    集合确定单元,用于根据所有的区域特征,通过预设的多目标跟踪算法,分别确定出属于同一指定目标的区域特征集合;
    目标轨迹确定单元,用于按照时序顺序,分别将每个区域特征集合对应的区域进行关联,得到各所述指定目标轨迹。
  13. 根据权利要求8所述的装置,其特征在于,所述敏感目标确定模块,包括:
    视频帧选取子模块,用于针对每个所述指定目标集合,按照预设的视频帧提取方法,在所述指定目标集合的视频帧中选取预设帧数的指定目标视频帧;
    第一判定子模块,用于通过预设的识别算法,分别对每帧所述指定目标视频帧中的指定目标进行识别,得到目标识别结果;
    第二判定子模块,用于在所述目标识别结果符合预设的判定规则时,判定所述目标识别结果对应的指定目标为敏感目标;或在所述目标识别结果不符合预设的判定规则时,判定所述目标识别结果对应的指定目标不是敏感目标。
  14. 根据权利要求13所述的装置,其特征在于,所述第一判定子模块,包括:
    第三特征获取单元,用于提取每帧所述指定目标视频帧中指定目标的特征,得到目标特征;
    敏感特征识别单元,用于通过预设的目标分类算法或识别技术,在所述目标特征中识别敏感特征;
    识别结果确定单元,用于将所述敏感特征与所述目标特征的数量关系,作为所述目标识别结果。
  15. 一种电子设备,其特征在于,包括处理器和存储器;
    所述存储器,用于存放计算机程序;
    所述处理器,用于执行所述存储器上所存放的程序时,实现如下步骤:
    获取待检测视频;
    通过预设的目标检测算法,确定出所述待检测视频中的各指定目标集合,其中,任一指定目标集合为一个指定目标在所述待检测视频各视频帧中的像素集合;
    通过预设的识别算法,分别判断各所述指定目标集合中的指定目标是否为敏感目标;
    当任一指定目标集合中的指定目标为敏感目标时,将该指定目标集合,作为所述待检测视频的遮蔽区域。
  16. 根据权利要求15所述的电子设备,其特征在于,所述处理器还用于:
    遮蔽所述待检测视频各视频帧中的所述遮蔽区域。
  17. 根据权利要求15所述的电子设备,其特征在于,所述通过预设的目 标检测算法,确定出所述待检测视频中的各指定目标集合,包括:
    通过预设的目标检测算法,分别检测出所述待检测视频的每帧视频帧中的所有指定目标对应的区域;
    针对所有指定目标中的每个指定目标,将该指定目标对应的区域按照时序顺序进行关联,得到指定目标轨迹;
    将各指定目标的指定目标轨迹作为所述待检测视频中的各指定目标集合。
  18. 根据权利要求17所述的电子设备,其特征在于,所述通过预设的目标检测算法,分别检测出所述待检测视频的每帧视频帧中的所有指定目标对应的区域,包括:
    将所述待检测视频中的每帧视频帧分割为预设数量个区域,得到多个像素区域;
    通过预先训练的卷积神经网络,分别提取每个所述像素区域的特征;
    根据所述每个像素区域的特征,通过预设分类器,确定各像素区域是否与任一所述指定目标相匹配;
    在存在与任一所述指定目标相匹配的像素区域时,根据与同一所述指定目标相匹配的所有所述像素区域,通过包围盒回归算法,确定该指定目标对应的区域。
  19. 根据权利要求17所述的电子设备,其特征在于,所述针对所有指定目标中的每个指定目标,将同一指定目标对应的区域按照时序顺序进行关联,得到指定目标轨迹,包括:
    提取所述待检测视频的每帧视频帧中所有指定目标对应的区域的特征,得到区域特征;
    根据所有的区域特征,通过预设的多目标跟踪算法,分别确定出属于同一指定目标的区域特征集合;
    按照时序顺序,分别将每个区域特征集合对应的区域进行关联,得到各 所述指定目标轨迹。
  20. 根据权利要求15所述的电子设备,其特征在于,所述通过预设的识别算法,分别判断各所述指定目标集合中的指定目标是否为敏感目标,包括:
    针对每个所述指定目标集合,按照预设的视频帧提取方法,在所述指定目标集合的视频帧中选取预设帧数的指定目标视频帧;
    通过预设的识别算法,分别对每帧所述指定目标视频帧中的指定目标进行识别,得到目标识别结果;
    在所述目标识别结果符合预设的判定规则时,判定所述目标识别结果对应的指定目标为敏感目标;或
    在所述目标识别结果不符合预设的判定规则时,判定所述目标识别结果对应的指定目标不是敏感目标。
  21. 根据权利要求20所述的电子设备,其特征在于,所述通过所述识别算法,分别对每帧所述指定目标视频帧中的指定目标进行识别,得到目标识别结果,包括:
    提取每帧所述指定目标视频帧中指定目标的特征,得到目标特征;
    通过预设的目标分类算法或识别技术,在所述目标特征中识别敏感特征;
    将所述敏感特征与所述目标特征的数量关系,作为所述目标识别结果。
  22. 一种视频遮蔽区域选取系统,其特征在于,所述系统包括视频采集设备和视频处理器;
    所述视频采集设备,用于采集待检测视频;
    所述视频处理器,用于实现权利要求1-7中任一所述的方法步骤。
PCT/CN2018/108222 2017-10-16 2018-09-28 视频遮蔽区域选取方法、装置、电子设备及系统 WO2019076187A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP18868676.0A EP3700180A4 (en) 2017-10-16 2018-09-28 PROCEDURE AND DEVICE FOR SELECTING THE VIDEO BLOCKING REGION, ELECTRONIC DEVICE AND SYSTEM
US16/756,094 US11321945B2 (en) 2017-10-16 2018-09-28 Video blocking region selection method and apparatus, electronic device, and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710957962.XA CN109670383B (zh) 2017-10-16 2017-10-16 视频遮蔽区域选取方法、装置、电子设备及系统
CN201710957962.X 2017-10-16

Publications (1)

Publication Number Publication Date
WO2019076187A1 true WO2019076187A1 (zh) 2019-04-25

Family

ID=66139242

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/108222 WO2019076187A1 (zh) 2017-10-16 2018-09-28 视频遮蔽区域选取方法、装置、电子设备及系统

Country Status (4)

Country Link
US (1) US11321945B2 (zh)
EP (1) EP3700180A4 (zh)
CN (1) CN109670383B (zh)
WO (1) WO2019076187A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343987A (zh) * 2021-06-30 2021-09-03 北京奇艺世纪科技有限公司 文本检测处理方法、装置、电子设备及存储介质
US11514582B2 (en) 2019-10-01 2022-11-29 Axis Ab Method and device for image analysis

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674678A (zh) * 2019-08-07 2020-01-10 国家计算机网络与信息安全管理中心 视频中敏感标志的识别方法及装置
US11277557B1 (en) 2020-04-06 2022-03-15 The Government of the United States of America, as represented by the Secretary of Homeland Security Privacy-aware capture and device
CN111654700B (zh) * 2020-06-19 2022-12-06 杭州海康威视数字技术股份有限公司 一种隐私遮蔽处理方法、装置、电子设备及监控系统
CN112037127A (zh) * 2020-07-27 2020-12-04 浙江大华技术股份有限公司 视频监控的隐私遮挡方法及装置、存储介质、电子装置
CN111985419B (zh) * 2020-08-25 2022-10-14 腾讯科技(深圳)有限公司 视频处理方法及相关设备
CN114339049A (zh) * 2021-12-31 2022-04-12 深圳市商汤科技有限公司 一种视频处理方法、装置、计算机设备和存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050129272A1 (en) * 2001-11-30 2005-06-16 Frank Rottman Video monitoring system with object masking
US20070116328A1 (en) * 2005-11-23 2007-05-24 Sezai Sablak Nudity mask for use in displaying video camera images
CN101610408A (zh) * 2008-06-16 2009-12-23 北京智安邦科技有限公司 视频保护置乱方法和结构
CN101933027A (zh) * 2008-02-01 2010-12-29 罗伯特·博世有限公司 用于视频监视系统的遮蔽模块、用于遮蔽被选择的物体的方法以及计算机程序
CN103167216A (zh) * 2011-12-08 2013-06-19 中国电信股份有限公司 图像遮蔽处理方法及系统
CN105141901A (zh) * 2015-08-12 2015-12-09 青岛中星微电子有限公司 一种视频处理方法和装置
CN106127106A (zh) * 2016-06-13 2016-11-16 东软集团股份有限公司 视频中目标人物查找方法和装置
CN106454492A (zh) * 2016-10-12 2017-02-22 武汉斗鱼网络科技有限公司 一种基于延时传送的直播色情内容审核系统及方法

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3722653B2 (ja) * 1999-08-31 2005-11-30 松下電器産業株式会社 監視カメラ装置及び監視カメラの表示方法
EP1353516A1 (en) * 2002-04-08 2003-10-15 Mitsubishi Electric Information Technology Centre Europe B.V. A method and apparatus for detecting and/or tracking one or more colour regions in an image or sequence of images
US20070201694A1 (en) * 2002-06-18 2007-08-30 Bolle Rudolf M Privacy management in imaging system
CN103971082A (zh) * 2013-01-31 2014-08-06 威联通科技股份有限公司 基于区域转换的视频对象检测系统及相关方法
CN103839057B (zh) * 2014-03-28 2017-03-15 中南大学 一种锑浮选工况识别方法及系统
CN105469379B (zh) * 2014-09-04 2020-07-28 广东中星微电子有限公司 视频目标区域遮挡方法和装置
CN104318782B (zh) * 2014-10-31 2016-08-17 浙江力石科技股份有限公司 一种面向区域重叠的高速公路视频测速方法及系统
US9471852B1 (en) * 2015-11-11 2016-10-18 International Business Machines Corporation User-configurable settings for content obfuscation
CN105957001A (zh) * 2016-04-18 2016-09-21 深圳感官密码科技有限公司 一种隐私保护方法及装置
CN107247956B (zh) * 2016-10-09 2020-03-27 成都快眼科技有限公司 一种基于网格判断的快速目标检测方法

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050129272A1 (en) * 2001-11-30 2005-06-16 Frank Rottman Video monitoring system with object masking
US20070116328A1 (en) * 2005-11-23 2007-05-24 Sezai Sablak Nudity mask for use in displaying video camera images
CN101933027A (zh) * 2008-02-01 2010-12-29 罗伯特·博世有限公司 用于视频监视系统的遮蔽模块、用于遮蔽被选择的物体的方法以及计算机程序
CN101610408A (zh) * 2008-06-16 2009-12-23 北京智安邦科技有限公司 视频保护置乱方法和结构
CN103167216A (zh) * 2011-12-08 2013-06-19 中国电信股份有限公司 图像遮蔽处理方法及系统
CN105141901A (zh) * 2015-08-12 2015-12-09 青岛中星微电子有限公司 一种视频处理方法和装置
CN106127106A (zh) * 2016-06-13 2016-11-16 东软集团股份有限公司 视频中目标人物查找方法和装置
CN106454492A (zh) * 2016-10-12 2017-02-22 武汉斗鱼网络科技有限公司 一种基于延时传送的直播色情内容审核系统及方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3700180A4

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11514582B2 (en) 2019-10-01 2022-11-29 Axis Ab Method and device for image analysis
CN113343987A (zh) * 2021-06-30 2021-09-03 北京奇艺世纪科技有限公司 文本检测处理方法、装置、电子设备及存储介质
CN113343987B (zh) * 2021-06-30 2023-08-22 北京奇艺世纪科技有限公司 文本检测处理方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN109670383B (zh) 2021-01-29
EP3700180A1 (en) 2020-08-26
US20210056312A1 (en) 2021-02-25
EP3700180A4 (en) 2020-08-26
US11321945B2 (en) 2022-05-03
CN109670383A (zh) 2019-04-23

Similar Documents

Publication Publication Date Title
WO2019076187A1 (zh) 视频遮蔽区域选取方法、装置、电子设备及系统
US10192107B2 (en) Object detection method and object detection apparatus
CN109035304B (zh) 目标跟踪方法、介质、计算设备和装置
US9911055B2 (en) Method and system for detection and classification of license plates
US8792722B2 (en) Hand gesture detection
US9773322B2 (en) Image processing apparatus and image processing method which learn dictionary
WO2019033572A1 (zh) 人脸遮挡检测方法、装置及存储介质
CN109727275B (zh) 目标检测方法、装置、系统和计算机可读存储介质
US20120027263A1 (en) Hand gesture detection
Do et al. Early melanoma diagnosis with mobile imaging
US10659680B2 (en) Method of processing object in image and apparatus for same
CN110866428B (zh) 目标跟踪方法、装置、电子设备及存储介质
CA3104668A1 (en) Computer vision systems and methods for automatically detecting, classifing, and pricing objects captured in images or videos
US11250269B2 (en) Recognition method and apparatus for false detection of an abandoned object and image processing device
US11068707B2 (en) Person searching method and apparatus and image processing device
Ou et al. Vehicle logo recognition based on a weighted spatial pyramid framework
US10268922B2 (en) Image processing by means of cross-correlation
JP2011040070A (ja) カメラを基にしたオブジェクトの分析のためのシステム、方法及びプログラム製品
CN111967289A (zh) 一种非配合式人脸活体检测方法及计算机存储介质
CN111191575B (zh) 一种基于火苗跳动建模的明火检测方法及系统
Agrafiotis et al. HDR Imaging for Enchancing People Detection and Tracking in Indoor Environments.
Matuska et al. A novel system for non-invasive method of animal tracking and classification in designated area using intelligent camera system
US11232314B2 (en) Computer vision based approach to image injection detection
Wang et al. A Saliency Based Human Detection Framework for Infrared Thermal Images
Fusco et al. Sign Finder Application

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18868676

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018868676

Country of ref document: EP

Effective date: 20200518