CN110427815B - Video processing method and device for realizing interception of effective contents of entrance guard - Google Patents

Video processing method and device for realizing interception of effective contents of entrance guard Download PDF

Info

Publication number
CN110427815B
CN110427815B CN201910551347.8A CN201910551347A CN110427815B CN 110427815 B CN110427815 B CN 110427815B CN 201910551347 A CN201910551347 A CN 201910551347A CN 110427815 B CN110427815 B CN 110427815B
Authority
CN
China
Prior art keywords
video
video picture
human body
face
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910551347.8A
Other languages
Chinese (zh)
Other versions
CN110427815A (en
Inventor
谢超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Terminus Beijing Technology Co Ltd
Original Assignee
Terminus Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Terminus Beijing Technology Co Ltd filed Critical Terminus Beijing Technology Co Ltd
Priority to CN201910551347.8A priority Critical patent/CN110427815B/en
Publication of CN110427815A publication Critical patent/CN110427815A/en
Application granted granted Critical
Publication of CN110427815B publication Critical patent/CN110427815B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/254Analysis of motion involving subtraction of images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • H04N7/188Capturing isolated or intermittent images triggered by the occurrence of a predetermined event, e.g. an object reaching a predetermined position
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20224Image subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance

Abstract

The invention discloses a video processing method for realizing the interception of effective contents of an entrance guard, which comprises the following steps: firstly, moving object detection is carried out on collected video pictures to identify the video pictures containing moving object areas, then whether corresponding moving objects in the video pictures containing the moving object areas belong to human body objects or not is judged, then whether the human body object areas contained in the video pictures containing the human body objects contain effective face images or not is detected, and finally the video pictures containing the effective face images are stored and uploaded. The method only stores and uploads the video images containing the effective face images, and does not need to store and upload all the collected monitoring video images, so that the requirement on a data storage space is reduced in the aspect of data storage, the data storage cost is reduced, and meanwhile, the network transmission data volume and the transmission cost are also reduced in the aspect of uploading video image data.

Description

Video processing method and device for realizing interception of effective contents of entrance guard
Technical Field
The invention relates to the technical field of image processing, in particular to a video processing method for realizing the interception of effective contents of an entrance guard and a video processing device for realizing the interception of the effective contents of the entrance guard.
Background
At present, more and more communities and buildings adopt video access control systems to replace traditional card swiping access control systems. The video access control system shoots a video picture in a certain space range in front by using a camera, identifies a person face area from the video picture, extracts face features from the person face area, compares the extracted face features with registered face features registered in a database in advance, judges whether the person has access control right by identifying the identity of the person, opens the access control for passing if the person has the access control right, and refuses the passing if the person has the access control right. For the person without access right, the video picture or the face area of the person can be transmitted to the target room according to the number of the target room provided by the person for manual verification.
Besides the basic function of access control, the video access control can also store the video pictures, or upload the data of the video pictures to a background server through a network for storage, and the video access control can be used as a file for people entering communities or buildings, so that the purposes of entrance record, security record, after-the-fact investigation and the like can be realized.
However, most video pictures taken by video gate inhibition do not contain valid human face regions. For example: when people pass, the shot pictures are unmanned video pictures, when people pass, due to the factors that the task distance is long, the faces of people face not forward (for example, the faces face away from a camera) and the like, the faces of people cannot be effectively identified, the video pictures do not contain effective facial areas of the people and do not belong to effective video pictures, and therefore the video pictures are not necessary to be stored for recording and archiving of the access control system. And the storage of the video pictures without effective human face areas wastes storage capacity and increases data storage cost, and meanwhile, the data volume of network transmission is increased and the data transmission cost is increased.
Disclosure of Invention
Objects of the invention
Based on the above, in order to facilitate the monitoring system of communities and buildings to complete the identification and the preservation of the face images collected in the video monitoring area with lower data storage cost and transmission cost, the monitoring operation cost is reduced on the premise that the preservation record can accurately reflect the face information, and the invention discloses the following technical scheme.
(II) technical scheme
As a first aspect of the present invention, the present invention discloses a video processing method for implementing interception of effective contents of an access control, including:
carrying out moving target detection on the collected video pictures to identify the video pictures containing moving target areas;
judging whether a corresponding moving target in a video image containing the moving target area belongs to a human body target or not;
detecting whether a human target area contained in a video picture containing the human target contains an effective human face image;
and storing and/or uploading the video picture containing the effective face image.
In a possible implementation, the moving object detection on the captured video picture includes:
identifying a foreground region from the acquired video picture by using a background subtraction method, and taking the obtained foreground region as the moving target region;
the background model adopted by the background subtraction method is a Gaussian mixture model or a pixel gray level mean model.
In one possible embodiment, the identifying the foreground region of the captured video picture by using the hybrid gaussian model as the background model comprises:
sequentially matching each pixel point in the video image with each Gaussian model with the priority ordered from high to low, and judging the Gaussian model matched with the pixel point;
updating parameters of the Gaussian model matched with the pixel points;
taking a plurality of Gaussian models with the highest priority and the sum of weights larger than a background weight threshold value in the updated Gaussian models as a background;
and sequentially matching each pixel point with a plurality of background Gaussian models with priorities sorted from high to low to determine pixel points belonging to the foreground so as to obtain a foreground area.
In a possible embodiment, when the gaussian models matched with the pixel points are determined, if any gaussian model is not matched with the pixel points: and selecting the Gaussian model with the minimum weight as the Gaussian model matched with the pixel point.
In one possible implementation, the identifying the foreground region of the acquired video picture by using the pixel gray level mean model as the background model comprises:
taking the mean value of corresponding pixels in the training image converted into the gray image as a background pixel value to obtain a background model;
updating the obtained background model by using the video image of the current frame to obtain a new background model;
calculating the gray difference between the video picture to be detected converted into the gray image and the new background model, and obtaining the probability distribution of foreground pixels according to the gray difference;
and obtaining a foreground area according to the foreground pixel probability distribution.
In a possible implementation, the obtaining a foreground region according to the foreground pixel probability distribution includes:
dividing the foreground pixels into a plurality of grids according to the foreground pixel probability distribution, and counting the accumulated sum of the foreground probability of the pixels in each grid;
and judging whether each grid belongs to the foreground grid or not according to the area ratio of the accumulation of each grid in the corresponding grid by taking the grid as a unit, and further obtaining a foreground area consisting of the foreground grids.
In a possible implementation manner, the determining whether a corresponding moving object in a video frame including the moving object region belongs to a human body object includes:
projecting the moving target area to a coordinate axis to obtain the number of pixels of the moving target area under each pixel line mark;
determining a corresponding number of pixel line marks corresponding to at least three target parts of the human body according to the characteristics of the pixel number of each pixel line mark;
and judging whether the distance ratio of the distances between the pixel lines corresponding to the human body target part on the coordinate axis is within the corresponding human body part distance ratio range or not, and if so, judging that the moving target in the moving target area belongs to the human body target.
In a possible implementation manner, the detecting whether the human target area included in the video picture including the human target area includes a valid human face image includes:
judging whether the picture area covered by the search window in the video picture containing the moving target area belongs to a face area or not through a classifier;
traversing the video picture by moving the search window within the video picture containing the moving target area;
determining the position and the size of a human face organ contained in the human face region;
and judging whether the face region belongs to an effective face image or not based on the position relation of the face organ.
In a possible embodiment, the saving and/or uploading a video frame containing the valid face image includes:
and extracting a picture area at least comprising a face part from the video picture comprising the effective face image for storage and/or uploading.
In one possible implementation manner, when it is detected that a human target region included in a video picture including the human target includes a valid human face image, before the video picture including the valid human face image is saved and/or uploaded:
and comparing the effective face image contained in the current frame video picture with the previous frame video picture, and canceling the storage and uploading of the current frame video picture when the previous frame video picture contains the face images of all human body targets contained in the current frame video picture.
As a second aspect of the present invention, the present invention further discloses a video processing apparatus for implementing interception of effective contents of an access control, including:
the moving target detection module is used for detecting a moving target of a video picture acquired by the video acquisition equipment so as to identify the video picture containing a moving target area;
the human body target judging module is used for judging whether the corresponding moving target in the video picture which contains the moving target area and is identified by the moving target detecting module belongs to a human body target or not;
the human face information detection module is used for detecting whether a human body target area contained in the video picture containing the human body target judged by the human body target judgment module contains an effective human face image or not;
and the video image storage module is used for storing and/or uploading the video image which is detected by the face information detection module and contains the effective face image.
In a possible implementation manner, the moving object detection module identifies a foreground region from the acquired video picture by using a background subtraction method, and takes the obtained foreground region as the moving object region;
the background model adopted by the background subtraction method is a Gaussian mixture model or a pixel gray level mean model.
In a possible implementation manner, the moving object detection module includes a first object detection sub-module, configured to perform foreground region identification on a captured video picture by using a mixed gaussian model as a background model;
the first target detection submodule includes:
the model matching unit is used for sequentially matching each pixel point in the video picture with each Gaussian model with the priority ordered from high to low and judging the Gaussian model matched with the pixel point;
the parameter updating unit is used for updating the parameters of the Gaussian model matched with the pixel points and matched by the model matching unit;
a background selecting unit, configured to use, as a background, a plurality of gaussian models that have the highest priority and whose sum of weights is greater than a background weight threshold in the gaussian models updated by the parameter updating unit;
and the first foreground obtaining unit is used for selecting a background Gaussian model from the pixel points and the plurality of background selecting units with the priority levels sorted from high to low for sequential matching, determining the pixel points belonging to the foreground, and obtaining a foreground area.
In a possible implementation manner, when the model matching unit determines the gaussian models matched with the pixel points, if any gaussian model is not matched with the pixel points: and the model matching unit selects the Gaussian model with the minimum weight as the Gaussian model matched with the pixel point.
In a possible implementation manner, the moving object detection module includes a second object detection sub-module, configured to use a pixel grayscale mean model as a background model to identify a foreground region of an acquired video frame;
the second target detection submodule includes:
the background acquisition unit is used for taking the mean value of corresponding pixels in the training image converted into the gray image as a background pixel value to obtain a background model;
the background updating unit is used for updating the background model obtained by the background obtaining unit by using the video image of the current frame to obtain a new background model;
the probability calculation unit is used for calculating the gray difference between the video picture to be detected converted into the gray image and the background model updated by the background updating unit and obtaining the probability distribution of foreground pixels according to the gray difference;
and the second foreground obtaining unit is used for obtaining a foreground area according to the probability distribution of the foreground pixels calculated by the probability calculating unit.
In a possible implementation, the second foreground obtaining unit includes:
the accumulation counting subunit is used for dividing the foreground pixels into a plurality of grids according to the foreground pixel probability distribution and counting the accumulation sum of the foreground probability of the pixels in each grid;
and the occupation ratio judging subunit is used for judging whether each grid belongs to the foreground grid or not according to the occupation ratio of the accumulated area of each grid in the corresponding grid by taking the grid as a unit so as to obtain a foreground area consisting of the foreground grids.
In one possible implementation, the human target determination module includes:
the line pixel number counting unit is used for projecting the moving target area to a coordinate axis to obtain the pixel number of the moving target area under each pixel line mark;
the target part matching unit is used for determining pixel line marks with corresponding quantity corresponding to at least three target parts of the human body according to the characteristic of the pixel number of each pixel line mark counted by the line pixel number counting unit;
and the distance ratio judging unit is used for judging whether the distance ratio of every two pixel rows corresponding to the target part of the human body determined by the target part matching unit to each other on the coordinate axis is within the corresponding human body part distance ratio range, and if the distance ratio is within the preset human body part distance ratio range, judging that the moving target in the moving target area belongs to the human body target.
In one possible implementation, the face information detection module includes:
the face area searching unit is used for judging whether an image area covered by a searching window in the video image containing the moving target area belongs to a face area or not through the classifier;
the picture searching and traversing unit is used for realizing the traversal of the video picture by moving the searching window in the video picture containing the moving target area;
an organ feature acquisition unit configured to determine a position and a size of a face organ included in the face region determined by the face region search unit;
and the effective face judgment unit is used for judging whether the face area belongs to an effective face image or not based on the position relation of the face organ determined by the organ feature acquisition unit.
In a possible embodiment, the video picture retention module extracts a picture area containing at least a face part from the video picture containing the valid face image for saving and/or uploading.
In a possible embodiment, the apparatus further comprises:
a picture retention judgment module, configured to, when the human face information detection module detects that a human body target region included in a video picture including the human body target includes an effective human face image, before the video picture retention module stores and/or uploads a video picture including the effective human face image: and comparing the effective face image contained in the current frame video picture with the previous frame video picture, and canceling the storage and uploading of the video picture retention module to the current frame video picture when the previous frame video picture contains the face images of all human body targets contained in the current frame video picture.
(III) advantageous effects
The video processing method and the video processing device for realizing the interception of the effective contents of the entrance guard can simplify the monitoring video images acquired by the entrance guard system in real time, only store and upload the video images containing the effective face images when the monitoring video images acquired by the entrance guard system need to be stored in a reserved mode to serve as entrance records, security records and future tracing materials, and do not need to store and upload all the acquired monitoring video images, thereby reducing the requirement on data storage space in the aspect of data storage, reducing the data storage cost, simultaneously reducing the data volume and transmission cost of network transmission in the aspect of uploading video image data, and further increasing the retrieval efficiency in the future person retrieval of the video images.
Drawings
The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining and illustrating the present invention and should not be construed as limiting the scope of the present invention.
Fig. 1 is a schematic flow chart of an embodiment of a video processing method disclosed in the present invention.
Fig. 2 is a schematic diagram of a certain frame of video image collected by the access control system.
Fig. 3 is a block diagram of an embodiment of a video processing apparatus according to the present disclosure.
Detailed Description
In order to make the implementation objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be described in more detail below with reference to the accompanying drawings in the embodiments of the present invention.
The following describes in detail an embodiment of a video processing method for implementing interception of effective contents of an access control according to the present disclosure with reference to fig. 1. As shown in fig. 1, the video processing method disclosed in this embodiment mainly includes the following steps:
step 100, moving object detection is performed on the collected video pictures to identify the video pictures containing the moving object areas. The access control system acquires video images of a certain area range including an access control entrance through video acquisition equipment such as a camera and the like to obtain each frame of video image including the access control entrance area, and then a moving object detection module of the video processing device detects moving objects of each frame of video image acquired by the access control system. A moving object refers to an object where a moving action actually occurs.
For example, fig. 2 is a schematic view of one frame of video image collected by an access control system, for an access control system at an entrance of an office building, a person P1 in the building is moving towards the access control system, a symbol "⊙" represents that the object moves towards an observation position, a person P2 in the building is leaving away from the access control system, a symbol "⊕" represents that the object faces away from the observation position, a reception counseling table R in the building is stationary, a car C on a bus lane outside the building is moving to the left, among four objects in fig. 2, P1, P2, and C are actual moving objects, the contour area of which should belong to a moving object area, and fixed structures such as the reception counseling table R, the ground, a bearing wall, and the like are actual non-moving objects, and the contour area of which should belong to a background area.
The moving object detection module may perform moving object detection on each frame of video image by a background subtraction method, an inter-frame difference method, an optical flow method, and the like, so as to identify and obtain a video image including a moving object region, which is hereinafter referred to as a moving object video image. The moving object detection module will recognize the video frame shown in fig. 2 as a video frame containing the moving object region. Among the three moving object detection methods, the background subtraction method and the inter-frame difference method are suitable for a simpler background area, but are sensitive to light changes, and most of the access control systems are arranged in an indoor environment with artificial illumination, so the background subtraction method and the inter-frame difference method are suitable for the access control system applied in the embodiment, and a specific implementation mode is described later.
Step 200, judging whether the corresponding moving target in the video image containing the moving target area belongs to the human body target. After obtaining the moving object video pictures, the human body object judgment module of the video processing device judges moving object areas included in the moving object video pictures, and if a moving object represented by the moving object area is a human body object, a video picture including the moving object as the human body object, hereinafter referred to as a human body object video picture, is obtained.
For example, the human body target determination module determines four targets included in the moving target video picture shown in fig. 2, and finds that the regions of the targets P1 and P2 are both actual human body target regions, so the video picture shown in fig. 2 is a human body target video picture.
Step 300, detecting whether a human target area contained in a video image containing a human target contains an effective human face image. After the human body target video pictures are obtained, a human face information detection module of the video processing device detects human body target areas contained in the human body target video pictures to obtain video pictures from which effective human face images can be identified, and the video pictures are referred to as human face information video pictures. The effective face image refers to a face image capable of determining the identity information of a person, so that a face information video picture can be used as data of entrance records, security records and post-investigation materials of the person entering a community or a building.
For example, the human face information detection module detects and identifies human target areas of two human targets P1 and P2 included in the human target video picture shown in fig. 2, and since the target P1 moves towards the access control system, the image area of the target P1 includes the front face of the person P1, and the target P2 moves away from the access control system, which is far away from the camera of the access control system, and the angle of the front face relative to the camera is not positive, and the front face faces away from the camera, so that the image area of the target P2 does not include the front face of the person P2 at all. Therefore, only the image region of the target P1 contains a valid face image, and the image region of the target P2 fails to contain a valid face image. However, the video frame shown in fig. 2 includes valid face images, and thus belongs to a face information video frame. And if the video picture does not contain any effective face image, the video picture does not belong to the face information video picture.
Step 400, storing and/or uploading a video picture containing an effective face image.
After the face information video picture is obtained, the video picture retention module of the video processing device locally stores the face information video picture, and can also upload the face information video picture to a background server of a monitoring system for storage and display so as to be used as a material for entry record, security record and post-investigation of a person corresponding to an effective face image contained in the picture.
The video processing method provided by the embodiment can simplify the monitoring video images acquired by the access control system in real time, so that when the monitoring video images acquired by the access control system need to be saved as admission records, security records and future materials, only the video images containing effective face images are saved and uploaded, all the acquired monitoring video images do not need to be saved and uploaded, the requirement on data storage space is reduced in the aspect of data storage, the data storage cost is reduced, meanwhile, the network transmission data amount and the transmission cost are also reduced in the aspect of uploading video image data, and in addition, the retrieval efficiency of people retrieval on the video images in the future is increased.
In one embodiment, the detecting the moving object of the captured video frame in step 100 includes:
and step 110, identifying a foreground region from the acquired video image by using a background subtraction method, and taking the obtained foreground region as a moving target region.
The background subtraction method is to use a pre-established background parameter model to approximate the pixel value of a background image, and compare the difference between the image of the current frame and the background model to realize the detection of the moving target area. When comparing the current frame image with the background image, the pixel areas with larger difference are regarded as foreground areas, and the pixel areas with smaller difference are regarded as background areas. The area of the moving object in the video frame is called foreground, while the area of other objects (non-moving objects) that do not actually move in the video frame is background. For example, in fig. 2, the targets P1, P2 and C are all actual foreground regions, and the target R is an actual background region, so the regions of the targets P1, P2 and C are all moving target regions. The background model selected by the background subtraction method used in step 110 is a gaussian mixture model or a pixel gray level mean model.
In one embodiment, the method for identifying the foreground area of the acquired video picture by using the mixed Gaussian model as the background model comprises the following steps:
and step A1, sequentially matching each pixel point in the video picture with each Gaussian model with the priority level ordered from high to low, and judging the Gaussian model matched with the pixel point.
The Gaussian mixture model is η (I)ti,t),i=1,2,…,K,ItFor the pixel point at time t (i.e. t frames), μi,tIs the mean value of the ith Gaussian model at time t, and K is the number of Gaussian models and is usually set to be three to five. In each Gaussian model, ωi,tIs the weight of the ith Gaussian model on the current pixel at time t, an
Figure BDA0002105560040000121
The priority of the ith gaussian model.
In step a1, it is determined according to formula (1) that the pixel point matches the gaussian model:
|Iti,t-1|≤Di*i,t-1formula (1);
wherein, mui,t-1Is the mean value of the ith Gaussian function at the t-1 frame,i,t-1is the standard deviation of the ith Gaussian function at t-1 frame, DiIs a constant.
It can be understood that the gaussian mixture model needs to be trained in advance through multiple frames of continuous video pictures.
When the gaussian models matched with the pixel points are determined in step a1, if any gaussian model is not matched with a pixel point, the gaussian model with the smallest weight is selected as the gaussian model matched with the pixel point.
And step A2, updating parameters of the Gaussian model matched with the pixel points. The updated parameters include the weight ω of the Gaussian modeli,tVariance, variance
Figure BDA0002105560040000131
Sum mean μi,tAnd are updated according to equations (2) to (4), respectively:
ωi,t=(1-α)*ωi,t-1+ α formula (2);
μi,t=(1-ρ)*μi,t-1+ρ*Itformula (3);
Figure BDA0002105560040000132
α is a custom learning rate, α is more than or equal to 0 and less than or equal to 1, the size of α determines the updating speed of the model, and the updating speed is in direct proportion to the updating speed of the model, rho is a parameter learning rate, and rho is approximately equal to α/omegai,t
The Gaussian model not matched with the pixel point keeps the former mean value and variance, and the weight is according to omegai,t=(1-α)*ωi,t-1And (4) attenuation.
If the gaussian model is not matched with the pixel point in step a1, and therefore the gaussian model with the smallest selection weight is selected, the mean value of the gaussian model with the smallest selection weight is updated to I when the parameters of the gaussian model with the smallest selection weight are updatedtThe standard deviation is updated to0The weight is updated to omegaK,t=(1-α)*ωK,t-1+α。
Step A3, using the Nb Gaussian models with the highest priority and the sum of the weights larger than the background weight threshold value in the updated Gaussian models as the background. After updating the parameters of the Gaussian mixture models, sorting the Gaussian models from large to small according to the priority as the sorting standard of the Gaussian models, wherein the Gaussian models with higher priorities are more likely to be backgrounds in the sequence. The threshold T of the partial sum of the background weights can be used as a basis for screening the models, and if T is smaller than the sum of the weights of the preceding Nb models, the preceding Nb models are distributed as the background.
And step A4, sequentially matching each pixel point with Nb Gaussian models with the priority levels from high to low as backgrounds, and determining the pixel points belonging to the foreground to obtain the foreground area. Assume that the current pixel point is ItMatching the pixel point with the Nb Gaussian models screened in the step A3 one by one according to the priority sequence of the models, and if the pixel point meets the formula (5), judging that I is the pixel pointtIs a foreground point, otherwise, I is judgedtThe background points, and therefore the foreground regions in the video frame, i.e., the regions of objects P1, P2, and C in fig. 2.
|Iti,t|>D2*i,tI ═ 1,2, …, Nb equation (5);
wherein D is2Is a custom constant.
In one embodiment, the identifying the foreground region of the captured video picture by using the pixel gray level mean model as the background model comprises:
and step B1, taking the average value of corresponding pixels in the training image converted into the gray level image as a background pixel value to obtain a background model. The training image is a video sequence used for training a background model, firstly, a part of training images before the current moment are selected, then the training images are converted into gray images, the mean value of corresponding pixels in the training images is used as a background pixel value, and a background model B is obtained, namely a background image is obtained.
The background model B is represented by equation (6):
Figure BDA0002105560040000141
wherein, ItFor grey scale images at time t (t number of frames)T is the total duration of the video sequence used to train the background model, i.e. the above-mentioned "part of the training image before the current time", and therefore T also corresponds to the number of frames of the video frame. The larger the T value is, the more video pictures are selected, the more accurate the obtained background model is, but the longer the calculation time is.
And step B2, updating the obtained background model by using the video image of the current frame to obtain a new background model. Since the background model is not ever changing, the background model needs to be updated. Calculating the pixel value of the current background model by using a formula (7), for example, so as to obtain an updated background model;
Figure BDA0002105560040000142
wherein p istIs the pixel value, u, of the image at time tt-1α is a self-defined learning rate for a pixel value corresponding to the current background model, wherein 0 is equal to or greater than α is equal to or less than 1, the size of α determines the adjustment influence degree of the current frame image on the background model, the larger α is, the larger the adjustment influence of the current frame image on the background model is, the faster the background model adapts to the environmental change, and when α is equal to 1, the current frame is used as a new background to replace the original background model.
And step B3, calculating the gray difference between the video picture to be detected converted into the gray image and the updated background model, and obtaining the probability distribution of foreground pixels according to the gray difference.
If a moving object exists in the video sequence, the moving object can be detected by comparing the difference between the current video picture and the background picture. Firstly, converting a video picture to be detected into a gray image, and then calculating a gray difference dI between a frame to be detected and a background model through a formula (8)t(x):
Figure BDA0002105560040000151
Wherein, It(x) For the video frame to be detected, B (x) is the background image, and thr is a constant. For differential representation of the current frame in grey scaleThe degree to which the gray value varies from the background gray value, if the variation is greater than thr, indicates that the variation is due to a moving object.
Specifically, whether the pixel belongs to a foreground pixel is judged through a formula (9), and then a foreground region is obtained:
Figure BDA0002105560040000152
wherein, Pt(x) Is an approximate representation of the probability that the current pixel belongs to the foreground, if Pt(x) If it is small, it can be determined as the background pixel preliminarily, if P ist(x) If so, the pixel is said to be a foreground pixel.
And step B4, obtaining a foreground area according to the foreground pixel probability distribution. And combining all the pixel points judged as foreground pixels to obtain a foreground area. Specifically, in order to eliminate the background noise so that the foreground region is not divided into a plurality of regions more than the number of the actual foreground regions, but is as close as possible to the actual foreground region, step B4 includes the following steps:
and step B41, dividing the foreground pixels into a plurality of grids according to the probability distribution of the foreground pixels, and counting the accumulated sum of the foreground probability of the pixels in each grid. The cumulative sum a is calculated as equation (10):
a- ∑ p (x) x ∈L formula (10);
where L is a local region, i.e., a grid, and x is a pixel within region L.
Step B42, using grid as unit to judge whether each grid belongs to foreground grid according to the ratio of the accumulated area in the corresponding grid, and further obtain the foreground area composed of foreground grid, since P (x) is more than or equal to 0 and less than or equal to 1, the maximum value of A is the area S of the area L, then the judgment condition for judging whether the area belongs to foreground area or background area using area as unit can adopt formula (11):
Figure BDA0002105560040000161
β is a constant factor, the larger the area is, the more demanding the number of foreground pixels to be included in the foreground area is, the more difficult it is to determine the foreground area, the greater 0 ≦ β ≦ 1, and β.
In one embodiment, the step 200 of determining whether the corresponding moving object in the video frame containing the moving object region belongs to the human body object includes the following steps:
and step 210, projecting the moving target area to a coordinate axis to obtain the pixel number of the moving target area under each pixel line mark. Taking fig. 2 as an example, the moving target regions of the moving targets P1, P2, and C are converted into a binarized image such that the pixel point values in the moving target regions are 1 and the values of the other regions are 0. And then projecting to a coordinate axis respectively to obtain a line mark-pixel number statistical chart of the moving target profile, wherein an X axis in the chart is a pixel line mark, which means that a moving target area comprises a plurality of lines of pixels in total, and a Y axis is the number of pixels with the value of 1. Because the shapes (namely the outlines) of the moving target areas are different, the number of the pixel points corresponding to each line of pixel line marks is also different, the number of the pixel points corresponding to each line of pixel line marks of a circle is in a curve of ascending first and descending later along the X axis, and the number of the pixel points corresponding to each line of pixel line marks of a triangle with a horizontally placed bottom edge is in a straight-line ascending trend along the X axis.
Step 220, determining a corresponding number of pixel line marks corresponding to at least three target parts of the human body according to the characteristics of the pixel number of each pixel line mark.
The characteristics of the pixel row mark pixel numbers of the motion target areas with different shapes are different, and the line shape and the lifting trend are different correspondingly presented on the row mark-pixel number statistical chart. Since the human body target has certain shape characteristics, the line shape of the human body in the corresponding chart is generally: the distance from the head to the neck on the X axis is short, the head and the neck are lifted from the neck, some lifting is possible in the middle, and the head and the neck are lowered to the position near the feet.
It follows that the most stable of these are the head, neck and feet, and therefore these three sites are selected as target sites. The parts from the neck to the feet are different according to the stature of the human body, but the rising from the head to the neck and the falling from the back of the neck and the falling from the front of the feet are the pixel point number line characteristics of the general human body.
Therefore, the head, the neck and the feet can be used as target parts of the human body, the number of pixels of the head and the feet is zero because the head and the feet are two ends, and the number of pixels of the neck is the lowest one of all the wave troughs in the lines because the neck is thinnest and is closer to the head.
From this, it can be seen that the head and the foot of the target portion are located on the X axis, the neck of the target portion is one valley higher than the X axis, and after the graphs of the moving objects P1, P2, and C are obtained, the positions of the head, the neck, and the foot on the X axis among the three graphs are determined, respectively.
Step 230, determining whether the distance ratio of the distances between the pixel rows corresponding to the human body target part and the pixel rows on the coordinate axis is within the corresponding human body part distance ratio range, and if so, determining that the moving target in the moving target area belongs to the human body target.
Because the distance ratio between the head and the neck and the distance between the neck and the feet of a common human body is regular, on the X axis, the distance ratio between the head and the neck HN and the neck NF is within a certain range, for example, HN/NF is more than or equal to 0.1 and less than or equal to 0.15, the distance ratio range of the human body part is [0.1 and 0.15], if the HN/NF of a moving target is within the distance ratio range of the human body part, the human body target is judged, otherwise, the human body target is judged to be a non-human body target. Targets P1 and P2 would be identified as human targets and target C would be identified as a non-human target.
Similarly, the distance ratio between the head-neck distance HN and the head-foot distance HF can be selected as the criterion for determining whether the moving object belongs to the human body object, and the range of the applicable human body distance ratio can be changed accordingly.
In one embodiment, the step 300 of detecting whether the human target area contained in the video picture containing the human target contains a valid human face image includes the following steps:
step 310, judging whether a picture area covered by a search window in a video picture containing a moving target area belongs to a human face area through a classifier. The search window is a window smaller than the size of a video picture and used for delimiting an area as an area for identifying the human face, and the classifier can classify the sample, so that the classifier can be used for carrying out image classification on the area delimited by the search window to obtain a human face image class, and the human face identification is realized.
And step 320, traversing the video picture by moving the search window in the video picture containing the moving target area. The search window may move within the video frame by a distance less than the search window length/width to traverse the entire video frame. For example, in the case of fig. 2, the face information detection module may detect a face region included in the region of the target P1 through traversal. The target P2 is not recognized as a human face because it does not include a human face.
Step 330, determining the position and size of the face organ contained in the face region. For example, the front face can show all the eyebrows, eyes, noses and mouths but may not show the ears, while the side face can only show the eyebrows, eyes and ears on one side and the noses and mouths.
And 340, judging whether the face area belongs to an effective face image or not based on the position relation of the face organs. The front face close to the camera has high organ sharpness and positive angle, and belongs to a valid face image, for example, the region of the target P1 in fig. 2 contains the valid face image. The side face far away from the camera is not high in organ definition and not correct in angle, and does not belong to an effective face image.
In one embodiment, in step 310, it is determined whether the picture region covered by the search window belongs to a face region by an Adaboost classifier. The Adaboost classifier is trained as follows:
a training sample set S is given, wherein X and Y respectively correspond to a face positive sample and a non-face negative sample, and T is the maximum cycle number of training. The training samples are N in number, the weight corresponding to each sample in the N training samples is the same, and the weight of the initialized sample is 1/N, namely the initial probability distribution of the training samples. Under the sample distribution, a first weak classifier is obtained by training N training samples.
For the samples with wrong classification, the corresponding weight is increased, and for the samples with correct classification, the weight is reduced, so that the samples with wrong classification are highlighted, and a new sample distribution is obtained. And training the samples again under the new sample distribution to obtain a second weak classifier.
And repeating the steps for T times to obtain T weak classifiers, and superposing the T weak classifiers according to a certain weight to obtain the final desired strong classifier.
After the strong classifier is trained, the finally obtained strong classifier can be used for correctly identifying the face region information in the video picture. For example, the strong classifier can recognize the face region P1f of the object P1 from FIG. 2, so that FIG. 2 belongs to a video frame containing valid face images, which contains a valid face P1 f.
After identification, a video picture containing a recognizable character face area (effective video picture) is obtained, but because the same character can generate a large number of effective video pictures in the passing process, the effective contents contained in the effective video pictures are the same for the record of the character. For example, for continuous M frames of video pictures acquired within a period of time, if j frames of video pictures (whether continuous frames exist or not) are recognized to contain valid face images through the embodiment, all picture contents of the j frames of video pictures are stored locally in the access control system or uploaded to a background server of the monitoring system to be used as a reservation record.
However, since it is not necessary to store the background region and the region with low interest level, in an embodiment, the step 400 stores and/or uploads the video frame including the valid face image specifically may be: and extracting a picture area at least comprising the face part from the video picture comprising the effective face image, and storing and/or uploading the picture area. For example, for the video picture of fig. 2, only the region of the target P1 containing the face region P1f, i.e., the human body region of P1 in the video picture, or even only the face region P1f, may be saved or uploaded. This may further reduce the amount of data stored and uploaded.
In addition, the valid faces included in the j frames of video pictures may all be faces of the same person, so that when the video pictures are saved and uploaded in step 400, a large amount of repeated contents exist in the saved and uploaded contents, but it is not necessary to save all the video pictures including the valid person information, otherwise, the storage capacity is wasted, the data storage cost is increased, and meanwhile, the data volume of network transmission is increased, and the data transmission cost is increased. Therefore, in one embodiment, when it is detected in step 300 that the human target area included in the video frame including the human target includes the valid human face image, before the video frame including the valid human face image is saved and/or uploaded in step 400:
the method comprises the steps of comparing an effective face image contained in a current frame video picture with a previous frame video picture, and canceling the storage and uploading of the current frame video picture when the previous frame video picture contains face images of all human body targets contained in the current frame video picture.
Taking the current frame video picture shown in fig. 2 as an example, fig. 2 contains an effective face image, and therefore meets the condition of being saved and uploaded, but it is found by comparing the effective face image with the previous frame video picture that the persons corresponding to the effective face images contained in the previous frame video picture and the current frame video picture are the same and are both the target P1, so that the meanings and contributions of the two frames of video pictures as the entry record and the security record are the same, and therefore, under the condition that the previous frame video picture is already saved and uploaded or an earlier video picture with the same meanings and contributions as the previous frame video picture is already saved, the current frame video picture does not need to be saved and uploaded, so as to further reduce the amount of stored data and the amount of uploaded data.
Specifically, the access control system may buffer a previous frame of video picture, and when determining whether a current frame of video picture needs to be stored and uploaded, use the buffered previous frame of video picture as a reference, so that whether the previous frame of video picture is stored or uploaded, it can be guaranteed that the previous frame of video picture is in a continuous section of video frames, and if each frame of video picture contains the same human face and has the same meanings for entry recording and security recording, only the first frame of video picture is stored and uploaded, or, when detecting a new video frame, the quality of an effective human face image is compared, and the video frame stored before is replaced with the agreed video frame whose human face image quality is higher than that of the stored video frame.
It should be noted that, since the stored video frames are used as the entry record and the security record, only a continuous segment of video frames containing the same valid human face can save most of the video frames, otherwise, if the target P1 appears once before a month and is collected and uploaded to the monitoring system, and then when the present day appears again and is collected after a month, even if the video frames appearing after a month are separated by a month, the video frames belong to discontinuous video frames, which are not stored because of the stored information, the entry record and the security record cannot be realized.
An embodiment of the video processing apparatus for implementing interception of effective contents of an access control according to the present disclosure is described in detail below with reference to fig. 3. The present embodiment is used for implementing the aforementioned video processing method.
As shown in fig. 3, the video processing apparatus disclosed in the present embodiment includes:
the moving target detection module is used for detecting a moving target of a video picture acquired by the video acquisition equipment so as to identify the video picture containing a moving target area;
the human body target judging module is used for judging whether the corresponding moving target in the video picture containing the moving target area identified by the moving target detecting module belongs to the human body target or not;
the human face information detection module is used for detecting whether a human body target area contained in the video picture containing the human body target judged by the human body target judgment module contains an effective human face image or not;
and the video image storage module is used for storing and/or uploading the video image which is detected by the face information detection module and contains the effective face image.
In one embodiment, the moving object detection module uses a background subtraction method to identify a foreground region from a captured video image, and uses the obtained foreground region as a moving object region;
the background model adopted by the background subtraction method is a Gaussian mixture model or a pixel gray level mean model.
In one embodiment, the moving object detection module comprises a first object detection sub-module, configured to identify a foreground region of a captured video frame by using a mixed gaussian model as a background model;
the first target detection submodule includes:
the model matching unit is used for sequentially matching each pixel point in the video picture with each Gaussian model with the priority ordered from high to low and judging the Gaussian model matched with the pixel point;
the parameter updating unit is used for updating the parameters of the Gaussian model matched with the pixel points and matched by the model matching unit;
the background selection unit is used for taking a plurality of Gaussian models which have the highest priority and the sum of weights which is greater than a background weight threshold value in the Gaussian models updated by the parameter updating unit as backgrounds;
and the first foreground obtaining unit is used for selecting the background Gaussian models from the pixel points and the plurality of background selecting units with the priority levels sorted from high to low for sequential matching, determining the pixel points belonging to the foreground and obtaining the foreground area.
In one embodiment, when the model matching unit determines the gaussian models matched with the pixel points, if any gaussian model is not matched with a pixel point: and the model matching unit selects the Gaussian model with the minimum weight as the Gaussian model matched with the pixel point.
In one embodiment, the moving object detection module includes a second object detection sub-module, configured to identify a foreground region of the acquired video image by using a pixel grayscale mean model as a background model;
the second target detection submodule includes:
the background acquisition unit is used for taking the mean value of corresponding pixels in the training image converted into the gray image as a background pixel value to obtain a background model;
the background updating unit is used for updating the background model obtained by the background obtaining unit by utilizing the video image of the current frame to obtain a new background model;
the probability calculation unit is used for calculating the gray difference between the video picture to be detected converted into the gray image and the background model updated by the background updating unit and obtaining the probability distribution of foreground pixels according to the gray difference;
and the second foreground obtaining unit is used for obtaining a foreground area according to the probability distribution of the foreground pixels calculated by the probability calculating unit.
In one embodiment, the second foreground obtaining unit includes:
the accumulation counting subunit is used for dividing the foreground pixels into a plurality of grids according to the probability distribution of the foreground pixels and counting the accumulation sum of the foreground probability of the pixels in each grid;
and the occupation ratio judging subunit is used for judging whether each grid belongs to the foreground grid or not according to the occupation ratio of the accumulated area of each grid in the corresponding grid by taking the grid as a unit so as to obtain a foreground area consisting of the foreground grids.
In one embodiment, the human target determination module comprises:
the line pixel number counting unit is used for projecting the moving target area to a coordinate axis to obtain the pixel number of the moving target area under each pixel line mark;
the target part matching unit is used for determining pixel line marks with corresponding quantity corresponding to at least three target parts of the human body according to the characteristic of the pixel number of each pixel line mark counted by the line pixel number counting unit;
and the distance ratio judging unit is used for judging whether the pairwise ratio of the distances between the pixel lines which are determined by the target part matching unit and correspond to the target part of the human body on the coordinate axis is within the corresponding human body part distance ratio range or not, and if the distance ratio is within the preset human body part distance ratio range, judging that the moving target in the moving target area belongs to the human body target.
In one embodiment, the face information detection module includes:
the face area searching unit is used for judging whether an image area covered by a searching window in a video image containing a moving target area belongs to a face area or not through the classifier;
the picture searching and traversing unit is used for realizing the traversal of the video picture by moving the searching window in the video picture containing the moving target area;
an organ feature acquisition unit for determining the position and size of a face organ contained in the face region determined by the face region search unit;
and the effective face judgment unit is used for judging whether the face area belongs to an effective face image or not based on the position relation of the face organ determined by the organ characteristic acquisition unit.
In one embodiment, the video frame saving module extracts a frame region at least containing a face part from a video frame containing a valid face image to be saved and/or uploaded.
In one embodiment, the apparatus further comprises:
the image retention judging module is used for detecting that a human body target area contained in a video image containing a human body target contains an effective human face image under the condition that the human face information detecting module detects that the human body target area contains the effective human face image, and before the video image retention module stores and/or uploads the video image containing the effective human face image: and comparing the effective face image contained in the current frame video picture with the previous frame video picture, and canceling the storage and uploading of the video picture retention module to the current frame video picture when the previous frame video picture contains the face images of all human body targets contained in the current frame video picture.
It should be noted that: in the drawings, the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described are some embodiments of the present invention, not all embodiments, and features in embodiments and embodiments in the present application may be combined with each other without conflict. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In this document, "first", "second", and the like are used only for distinguishing one from another, and do not indicate their degree of importance, order, and the like.
The division of modules, units or components herein is merely a logical division, and other divisions may be possible in an actual implementation, for example, a plurality of modules and/or units may be combined or integrated in another system. Modules, units, or components described as separate parts may or may not be physically separate. The components displayed as cells may or may not be physical cells, and may be located in a specific place or distributed in grid cells. Therefore, some or all of the units can be selected according to actual needs to implement the scheme of the embodiment.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (8)

1. A video processing method for realizing the interception of effective contents of an entrance guard is characterized by comprising the following steps:
carrying out moving target detection on the collected video pictures to identify the video pictures containing moving target areas;
judging whether a corresponding moving target in a video image containing the moving target area belongs to a human body target or not;
detecting whether a human target area contained in a video picture containing the human target contains an effective human face image;
storing and/or uploading a video picture containing the effective face image; wherein the content of the first and second substances,
the detecting whether the human body target area contained in the video picture containing the human body target area contains the effective human face image comprises:
judging whether the picture area covered by the search window in the video picture containing the moving target area belongs to a face area or not through a classifier;
traversing the video picture by moving the search window within the video picture containing the moving target area;
determining the position and the size of a human face organ contained in the human face region;
judging whether the face region belongs to an effective face image or not based on the position relation of the face organ; in addition, the first and second substrates are,
when it is detected that a human target area included in a video picture including the human target includes an effective human face image, before the video picture including the effective human face image is stored and/or uploaded: comparing the effective face image contained in the current frame video picture with the previous frame video picture, and canceling the storage and uploading of the current frame video picture when the previous frame video picture contains the face images of all human body targets contained in the current frame video picture; in addition, the first and second substrates are,
the judging whether the corresponding moving target in the video image containing the moving target area belongs to the human body target comprises the following steps:
projecting the moving target area to a coordinate axis to obtain the number of pixels of the moving target area under each pixel line mark;
determining a corresponding number of pixel line marks corresponding to at least three target parts of the human body according to the characteristics of the pixel number of each pixel line mark;
and judging whether the distance ratio of the distances between the pixel lines corresponding to the human body target part on the coordinate axis is within the corresponding human body part distance ratio range or not, and if so, judging that the moving target in the moving target area belongs to the human body target.
2. The method of claim 1, wherein the performing moving object detection on the captured video pictures comprises:
identifying a foreground region from the acquired video picture by using a background subtraction method, and taking the obtained foreground region as the moving target region;
the background model adopted by the background subtraction method is a Gaussian mixture model or a pixel gray level mean model.
3. The method of claim 2, wherein identifying foreground regions of the captured video frames using the hybrid gaussian model as a background model comprises:
sequentially matching each pixel point in the video image with each Gaussian model with the priority ordered from high to low, and judging the Gaussian model matched with the pixel point;
updating parameters of the Gaussian model matched with the pixel points;
taking a plurality of Gaussian models with the highest priority and the sum of weights larger than a background weight threshold value in the updated Gaussian models as a background;
and sequentially matching each pixel point with a plurality of background Gaussian models with priorities sorted from high to low to determine pixel points belonging to the foreground so as to obtain a foreground area.
4. The method of claim 1, wherein the saving and/or uploading the video picture containing the valid face image comprises:
and extracting a picture area at least comprising a face part from the video picture comprising the effective face image for storage and/or uploading.
5. The utility model provides a realize video processing apparatus of effective content intercepting of entrance guard which characterized in that includes:
the moving target detection module is used for detecting a moving target of a video picture acquired by the video acquisition equipment so as to identify the video picture containing a moving target area;
the human body target judging module is used for judging whether the corresponding moving target in the video picture which contains the moving target area and is identified by the moving target detecting module belongs to a human body target or not;
the human face information detection module is used for detecting whether a human body target area contained in the video picture containing the human body target judged by the human body target judgment module contains an effective human face image or not;
the video image storage module is used for storing and/or uploading the video image which is detected by the face information detection module and contains the effective face image; wherein the content of the first and second substances,
the device further comprises:
a picture retention judgment module, configured to, when the human face information detection module detects that a human body target region included in a video picture including the human body target includes an effective human face image, before the video picture retention module stores and/or uploads a video picture including the effective human face image: comparing an effective face image contained in a current frame video picture with a previous frame video picture, and canceling the storage and uploading of the current frame video picture by the video picture retention module when the previous frame video picture contains face images of all human body targets contained in the current frame video picture; in addition, the first and second substrates are,
the face information detection module comprises:
the face area searching unit is used for judging whether an image area covered by a searching window in the video image containing the moving target area belongs to a face area or not through the classifier;
the picture searching and traversing unit is used for realizing the traversal of the video picture by moving the searching window in the video picture containing the moving target area;
an organ feature acquisition unit configured to determine a position and a size of a face organ included in the face region determined by the face region search unit;
the effective face judgment unit is used for judging whether the face area belongs to an effective face image or not based on the position relation of the face organ determined by the organ feature acquisition unit; in addition, the first and second substrates are,
the human body target judgment module comprises:
the line pixel number counting unit is used for projecting the moving target area to a coordinate axis to obtain the pixel number of the moving target area under each pixel line mark;
the target part matching unit is used for determining pixel line marks with corresponding quantity corresponding to at least three target parts of the human body according to the characteristic of the pixel number of each pixel line mark counted by the line pixel number counting unit;
and the distance ratio judging unit is used for judging whether the distance ratio of every two pixel rows corresponding to the target part of the human body determined by the target part matching unit to each other on the coordinate axis is within the corresponding human body part distance ratio range, and if the distance ratio is within the preset human body part distance ratio range, judging that the moving target in the moving target area belongs to the human body target.
6. The apparatus of claim 5, wherein the moving object detection module identifies a foreground region from the captured video picture using a background subtraction method and takes the foreground region as the moving object region;
the background model adopted by the background subtraction method is a Gaussian mixture model or a pixel gray level mean model.
7. The apparatus of claim 6, wherein the moving object detection module comprises a first object detection sub-module for identifying foreground regions of the captured video frame using a hybrid Gaussian model as a background model;
the first target detection submodule includes:
the model matching unit is used for sequentially matching each pixel point in the video picture with each Gaussian model with the priority ordered from high to low and judging the Gaussian model matched with the pixel point;
the parameter updating unit is used for updating the parameters of the Gaussian model matched with the pixel points and matched by the model matching unit;
a background selecting unit, configured to use, as a background, a plurality of gaussian models that have the highest priority and whose sum of weights is greater than a background weight threshold in the gaussian models updated by the parameter updating unit;
and the first foreground obtaining unit is used for selecting a background Gaussian model from the pixel points and the plurality of background selecting units with the priority levels sorted from high to low for sequential matching, determining the pixel points belonging to the foreground, and obtaining a foreground area.
8. The apparatus of claim 5, wherein the video frame saving module extracts a frame region containing at least a face portion from the video frame containing the valid face image for saving and/or uploading.
CN201910551347.8A 2019-06-24 2019-06-24 Video processing method and device for realizing interception of effective contents of entrance guard Active CN110427815B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910551347.8A CN110427815B (en) 2019-06-24 2019-06-24 Video processing method and device for realizing interception of effective contents of entrance guard

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910551347.8A CN110427815B (en) 2019-06-24 2019-06-24 Video processing method and device for realizing interception of effective contents of entrance guard

Publications (2)

Publication Number Publication Date
CN110427815A CN110427815A (en) 2019-11-08
CN110427815B true CN110427815B (en) 2020-07-10

Family

ID=68409468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910551347.8A Active CN110427815B (en) 2019-06-24 2019-06-24 Video processing method and device for realizing interception of effective contents of entrance guard

Country Status (1)

Country Link
CN (1) CN110427815B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111583485A (en) * 2020-04-16 2020-08-25 北京澎思科技有限公司 Community access control system, access control method and device, access control unit and medium
CN111784896B (en) * 2020-06-17 2021-02-23 深圳南亿科技股份有限公司 Access control monitoring image storage method, system and storage medium
CN111881866B (en) * 2020-08-03 2024-01-19 杭州云栖智慧视通科技有限公司 Real-time face grabbing recommendation method and device and computer equipment
CN112637567B (en) * 2020-12-24 2021-10-26 中标慧安信息技术股份有限公司 Multi-node edge computing device-based cloud data uploading method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102542271A (en) * 2010-12-23 2012-07-04 胡茂林 Video-based technology for informing people visiting and protecting privacy at house gates or entrances and exits of public places such as office buildings
JP2016176816A (en) * 2015-03-20 2016-10-06 キヤノン株式会社 Image processor, image processing method, and program
CN106372576A (en) * 2016-08-23 2017-02-01 南京邮电大学 Deep learning-based intelligent indoor intrusion detection method and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100462047C (en) * 2007-03-21 2009-02-18 汤一平 Safe driving auxiliary device based on omnidirectional computer vision
CN102368301A (en) * 2011-09-07 2012-03-07 常州蓝城信息科技有限公司 Moving human body detection and tracking system based on video
CN104318202A (en) * 2014-09-12 2015-01-28 上海明穆电子科技有限公司 Method and system for recognizing facial feature points through face photograph
CN106157329B (en) * 2015-04-20 2021-08-17 中兴通讯股份有限公司 Self-adaptive target tracking method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102542271A (en) * 2010-12-23 2012-07-04 胡茂林 Video-based technology for informing people visiting and protecting privacy at house gates or entrances and exits of public places such as office buildings
JP2016176816A (en) * 2015-03-20 2016-10-06 キヤノン株式会社 Image processor, image processing method, and program
CN106372576A (en) * 2016-08-23 2017-02-01 南京邮电大学 Deep learning-based intelligent indoor intrusion detection method and system

Also Published As

Publication number Publication date
CN110427815A (en) 2019-11-08

Similar Documents

Publication Publication Date Title
CN110427815B (en) Video processing method and device for realizing interception of effective contents of entrance guard
CN108446617B (en) Side face interference resistant rapid human face detection method
CN106960195B (en) Crowd counting method and device based on deep learning
JP6570731B2 (en) Method and system for calculating passenger congestion
US7848548B1 (en) Method and system for robust demographic classification using pose independent model from sequence of face images
CN102346847B (en) License plate character recognizing method of support vector machine
CN110728225B (en) High-speed face searching method for attendance checking
CN104978567B (en) Vehicle checking method based on scene classification
CN108090406B (en) Face recognition method and system
CN109684922B (en) Multi-model finished dish identification method based on convolutional neural network
CN106373146B (en) A kind of method for tracking target based on fuzzy learning
CN108563999A (en) A kind of piece identity's recognition methods and device towards low quality video image
CN109063619A (en) A kind of traffic lights detection method and system based on adaptive background suppression filter and combinations of directions histogram of gradients
CN106886216A (en) Robot automatic tracking method and system based on RGBD Face datections
CN112287827A (en) Complex environment pedestrian mask wearing detection method and system based on intelligent lamp pole
CN113592911B (en) Apparent enhanced depth target tracking method
CN111539351B (en) Multi-task cascading face frame selection comparison method
US8094971B2 (en) Method and system for automatically determining the orientation of a digital image
CN116030396B (en) Accurate segmentation method for video structured extraction
CN112150692A (en) Access control method and system based on artificial intelligence
CN111274964B (en) Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle
CN111597875A (en) Traffic sign identification method, device, equipment and storage medium
CN109344758B (en) Face recognition method based on improved local binary pattern
Thomas et al. Smart car parking system using convolutional neural network
CN114445879A (en) High-precision face recognition method and face recognition equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant