CN108363953B

CN108363953B - Pedestrian detection method and binocular monitoring equipment

Info

Publication number: CN108363953B
Application number: CN201810032130.1A
Authority: CN
Inventors: 李乾坤; 郭晴; 卢维; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2018-01-12
Filing date: 2018-01-12
Publication date: 2020-09-29
Anticipated expiration: 2038-01-12
Also published as: CN108363953A

Abstract

The invention discloses a pedestrian detection method and binocular monitoring equipment, which are used for solving the technical problem of high false detection rate of people in video monitoring. The method comprises the following steps: obtaining a current frame image; the current frame image comprises a first current frame image and a second current frame image, and the first current frame image is an image frame extracted from first video information at the current moment; identifying each pedestrian in the current frame image through a preset convolutional neural network model, and predicting a corresponding head-shoulder position to obtain a pedestrian target frame corresponding to the head-shoulder position of each pedestrian; screening out real pedestrian target frames from all pedestrian target frames obtained aiming at the current frame image based on the current frame image and the corresponding background frame image; the background frame image is an image frame extracted from video information at a historical moment, and the historical moment is a moment which is different from the current moment by a preset time interval.

Description

Pedestrian detection method and binocular monitoring equipment

Technical Field

The invention relates to the field of intelligent management, in particular to a pedestrian detection method and binocular monitoring equipment.

Background

As the world enters the digital age, the demand for video surveillance is also increasing. The manager hopes to monitor by video, not only can monitor the video image of the monitored point at any time, but also can count the passenger flow of the monitored point by video monitoring so as to carry out intelligent management.

For example, in a shopping mall, by counting the number of passenger flows at each monitoring point, a counter can be added or enlarged at a place with a large passenger flow, and sales promotion can be performed at a place with a small passenger flow to attract the passenger flow.

The scenic tourism area can conduct passenger flow guidance on one hand through monitoring passenger flow, and on the other hand, can judge whether a ticket evading behavior exists through comparing with the scenic area ticket selling condition so as to take proper supervision measures in time.

In places such as stations where passenger flows are easy to gather, the passenger flow of the stations is counted through video monitoring, and when the passenger flow reaches an early warning value, emergency plans such as vehicle dispatching, passenger flow guiding, passenger flow evacuation and the like can be automatically started.

In the prior art, in order to automatically count the passenger flow of a monitored point through a video monitoring device, a person needs to be identified from a collected video image, and then the passenger flow can be counted according to the identified person. At present, people are mostly identified from video images by using a monocular method, namely, people are detected through background modeling or deep learning.

However, in the actual monitoring process, people are generally in a motion state, and when a monocular method is adopted, the principle is mainly to filter the prediction result of the convolutional neural network through a specific probability prediction threshold, so that the obtained result often has more missed detection situations when the specific probability prediction threshold is too large; when the specific probability prediction threshold is too small, more false detections exist, so that more missed detections or false detections exist in the process of identifying people, and the identification result has higher false detection rate.

Therefore, how to effectively reduce the false detection rate of people in video monitoring becomes a technical problem to be solved urgently.

Disclosure of Invention

The invention provides a pedestrian detection method and binocular monitoring equipment, which are used for solving the technical problem of high false detection rate of people in video monitoring.

In order to solve the above technical problems, an embodiment of the present invention provides a method for detecting a pedestrian, which is applied to a binocular monitoring device, where the binocular monitoring device at least includes a first collector and a second collector, and the method has the following technical scheme:

obtaining a current frame image; the current frame image comprises a first current frame image and a second current frame image, the first current frame image is an image frame extracted from first video information at the current moment, the second current frame image is an image frame extracted from second video information at the current moment, the first video information is video information obtained from the first collector, and the second video information is video information obtained from the second collector;

identifying each pedestrian in the current frame image through a preset convolutional neural network model, and predicting a corresponding head-shoulder position to obtain a pedestrian target frame corresponding to the head-shoulder position of each pedestrian;

screening out real pedestrian target frames from all pedestrian target frames obtained aiming at the current frame image based on the current frame image and the corresponding background frame image; the background frame image is an image frame extracted from video information at a historical moment, and the historical moment is a moment which is different from the current moment by a preset time interval.

Optionally, the step of screening out a real pedestrian target frame from the pedestrian target frames obtained for the current frame image includes:

screening out real pedestrian target frames from all the pedestrian target frames obtained aiming at the current frame image by adopting a preset screening mode; the preset screening mode at least comprises any one screening mode or any combination of screening modes selected from background screening, pedestrian height screening and pedestrian shoulder width screening.

Optionally, the background screening is adopted to screen out a real pedestrian target frame from each pedestrian target frame obtained for the current frame image, and the method includes:

the following operations are respectively executed for each pedestrian target frame:

calculating the background in a pedestrian target frame based on the current frame image and the background frame image

A pixel's fractional value; wherein the background pixel is between the current frame image and the background frame

The position in the image is the same, and the gray value between the current frame image and the background frame image

Pixels with differences smaller than a second preset threshold;

and when the occupation ratio value is determined to exceed a first preset threshold value, filtering the pedestrian target frame.

Optionally, calculating a ratio of background pixels in the pedestrian target frame based on the current frame image includes:

counting a first total amount of all pixels in the one pedestrian target frame and a second total amount of background pixels in the one pedestrian target frame;

obtaining the fraction value based on the first total amount and the second total amount; wherein the fraction value is proportional to the second total amount and inversely proportional to the first total amount.

Optionally, the pedestrian height screening is adopted, and a real pedestrian target frame is screened from each pedestrian target frame obtained for the current frame image, including:

calculating the height of the pedestrian corresponding to the pedestrian target frame based on the coordinates of the central pixel point of the pedestrian target frame, the coordinates of the central pixel point of the current frame image and the baseline distance; the baseline distance is the distance between the center point of the first collector and the center point of the second collector;

and when the pedestrian height is determined to be out of a first preset range, filtering the pedestrian target frame.

Optionally, calculating the pedestrian height corresponding to the pedestrian target frame based on the coordinates of the central pixel point of the pedestrian target frame, the coordinates of the central pixel point of the current frame image, and the baseline distance, includes:

calculating a three-dimensional coordinate value of the central pixel point of the pedestrian target frame under a camera coordinate system based on the coordinates of the central pixel point of the pedestrian target frame and the coordinates of the central pixel point of the current frame image, the baseline distance and the parallax value of the central pixel point of the pedestrian target frame; the parallax value is a difference value of horizontal pixel coordinates generated when the same object point is observed through the first collector and the second collector;

calculating the height of the pedestrian based on the three-dimensional coordinate value, the height value and the pitch angle; the height value is the distance between the center point of the first collector or the second collector and the ground, and the pitch angle is the pitch angle of the first collector or the second collector.

Optionally, the step of screening the pedestrian shoulder width is adopted to screen a real pedestrian target frame from each pedestrian target frame obtained for the current frame image, and includes:

calculating the pedestrian shoulder width corresponding to one pedestrian target frame based on the baseline distance, the width value of the one pedestrian target frame and the parallax value of the central pixel point of the one pedestrian target frame; the pedestrian shoulder width is in direct proportion to the product of the baseline distance and the width value, and is in inverse proportion to the parallax value, and the parallax value is the difference value of horizontal pixel coordinates generated when the same object point is observed through the first collector and the second collector;

and when the pedestrian shoulder width is determined to be out of a second preset range, filtering the pedestrian target frame.

Optionally, the background frame image is specifically an RGB image or a disparity map.

In a second aspect, an embodiment of the present invention provides a binocular monitoring device for pedestrian detection, including at least a first collector and a second collector, the binocular monitoring device including:

an obtaining unit configured to obtain a current frame image; the current frame image comprises a first current frame image and a second current frame image, the first current frame image is an image frame extracted from first video information at the current moment, the second current frame image is an image frame extracted from second video information at the current moment, the first video information is video information obtained from the first collector, and the second video information is video information obtained from the second collector;

the identification unit is used for identifying each pedestrian in the current frame image through a preset convolutional neural network model and predicting the corresponding head and shoulder position to obtain a pedestrian target frame corresponding to the head and shoulder position of each pedestrian;

the screening unit is used for screening out real pedestrian target frames from all the pedestrian target frames obtained aiming at the current frame image based on the current frame image and the corresponding background frame image; the background frame image is an image frame extracted from video information at a historical moment, and the historical moment is a moment which is different from the current moment by a preset time interval.

Optionally, when a real pedestrian target frame is screened from the pedestrian target frames obtained for the current frame image, the screening unit is configured to:

Optionally, when the background filtering is adopted to filter out a real pedestrian target frame from the pedestrian target frames obtained for the current frame image, the filtering unit is configured to:

Pixels with differences smaller than a second preset threshold;

Optionally, based on the current frame image, a proportion value of background pixels in the pedestrian target frame is calculated, and the screening unit is configured to:

Optionally, when the pedestrian height screening is adopted, a real pedestrian target frame is screened from each pedestrian target frame obtained for the current frame image, and the screening unit is configured to:

Optionally, the pedestrian height corresponding to a pedestrian target frame is calculated based on the coordinates of the central pixel point of the pedestrian target frame, the coordinates of the central pixel point of the current frame image, and the baseline distance, and the screening unit is configured to:

Optionally, when the pedestrian shoulder width screening is adopted and a real pedestrian target frame is screened from each pedestrian target frame obtained for the current frame image, the screening unit is configured to:

In a third aspect, an embodiment of the present invention further provides a binocular monitoring device for pedestrian detection, including:

at least one processor, and

a memory coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor, and the at least one processor performs the method according to the first aspect by executing the instructions stored by the memory.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, including:

the computer readable storage medium stores computer instructions which, when executed on a computer, cause the computer to perform the method of the first aspect as described above.

Through the technical solutions in one or more of the above embodiments of the present invention, the embodiments of the present invention have at least the following technical effects:

in the technical scheme provided by the application, each pedestrian in the current frame image is identified through a preset convolutional neural network model, a pedestrian target is determined, the corresponding head and shoulder position of the pedestrian is predicted, and then a pedestrian target frame corresponding to the head and shoulder position of each pedestrian in the current frame image is obtained; then, based on the current frame image and the corresponding background frame image, screening out real pedestrian target frames from all pedestrian target frames obtained aiming at the current frame image; the background frame image is an image frame extracted from the video information at a historical time, and the historical time is a time which is different from the current time by a preset time interval. Therefore, the travel human target can be recognized from the current frame image as far as possible through the preset network neural model, and the false detection target which is mistakenly recognized as the human target in the recognized human target is filtered in a screening mode, so that the real human target is obtained.

Drawings

Fig. 1 is a flowchart of a pedestrian detection method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a monitoring device according to an embodiment of the present invention.

Detailed Description

The invention provides a pedestrian detection method and device, which are used for solving the technical problem of high false detection rate of people in video monitoring.

In order to solve the technical problems, the general idea of the embodiment of the present application is as follows:

a method of pedestrian detection is provided, comprising: obtaining a current frame image; the current frame image comprises a first current frame image and a second current frame image, the first current frame image is an image frame extracted from first video information at the current moment, the second current frame image is an image frame extracted from second video information at the current moment, the first video information is video information obtained from a first collector, and the second video information is video information obtained from a second collector; identifying each pedestrian in the current frame image through a preset convolutional neural network model, and predicting the corresponding head and shoulder position to obtain a pedestrian target frame corresponding to the head and shoulder position of each pedestrian; screening out real pedestrian target frames from all pedestrian target frames obtained aiming at the current frame image based on the current frame image and the corresponding background frame image; the background frame image is an image frame extracted from the video information at a historical time, and the historical time is a time which is different from the current time by a preset time interval.

In the scheme, each pedestrian in the current frame image is identified through a preset convolutional neural network model, a pedestrian target is determined, the corresponding head and shoulder position of the pedestrian is predicted, and then a pedestrian target frame corresponding to the head and shoulder position of each pedestrian in the current frame image is obtained; then, based on the current frame image and the corresponding background frame image, screening out real pedestrian target frames from all pedestrian target frames obtained aiming at the current frame image; the background frame image is an image frame extracted from the video information at a historical time, and the historical time is a time which is different from the current time by a preset time interval. Therefore, the travel human target can be recognized from the current frame image as far as possible through the preset network neural model, and the false detection target which is mistakenly recognized as the human target in the recognized human target is filtered in a screening mode, so that the real human target is obtained.

Before describing the present solution in detail, a convolutional neural network, which is a background concept related to the present solution, will be described in order to enable those skilled in the art to understand the present solution better. The method comprises the following specific steps:

convolutional Neural Networks (CNN) are an efficient recognition method that has been developed in recent years and has attracted much attention. In the 60's of the 20 th century, Hubel and Wiesel discovered that their unique network structures could effectively reduce the complexity of feedback neural networks when studying neurons for local sensitivity and direction selection in the feline cerebral cortex, which in turn led to the proposal of convolutional neural networks.

At present, CNN has become one of the research hotspots in many scientific fields, especially in the field of pattern classification, because the network avoids the complex preprocessing of the image and can directly input the original image, it has been more widely applied.

In general, the basic structure of CNN includes two layers, one of which is a feature extraction layer, and the input of each neuron is connected to a local acceptance domain of the previous layer and extracts the feature of the local. Once the local feature is extracted, the position relation between the local feature and other features is determined; the other is a feature mapping layer, each calculation layer of the network is composed of a plurality of feature mappings, each feature mapping is a plane, and the weights of all neurons on the plane are equal. The feature mapping structure adopts a sigmoid function with small influence function kernel as an activation function of the convolution network, so that the feature mapping has displacement invariance.

CNN is used primarily to identify two-dimensional graphs of displacement, scaling and other forms of distortion invariance. Since the feature detection layer of CNN learns from the training data, when using CNN, it avoids the feature extraction of the display, and implicitly learns from the training data; moreover, because the neuron weights on the same feature mapping surface are the same, the network can learn in parallel.

In order to better understand the technical solutions of the present invention, the following detailed descriptions of the technical solutions of the present invention are provided with the accompanying drawings and the specific embodiments, and it should be understood that the specific features in the embodiments and the examples of the present invention are the detailed descriptions of the technical solutions of the present invention, and are not limitations of the technical solutions of the present invention, and the technical features in the embodiments and the examples of the present invention may be combined with each other without conflict.

Referring to fig. 1, an embodiment of the present invention provides a method for pedestrian detection, which is applied to a binocular monitoring device, the binocular monitoring device at least includes a first collector and a second collector, and a processing procedure of the method for pedestrian detection is as follows.

Step 101: obtaining a current frame image; the current frame image comprises a first current frame image and a second current frame image, the first current frame image is an image frame extracted from first video information at the current moment, the second current frame image is an image frame extracted from second video information at the current moment, the first video information is video information obtained from a first collector, and the second video information is video information obtained from a second collector.

Before obtaining the current frame image, the first collector and the second collector of the binocular monitoring device need to be installed at preset positions, specifically: and the first collector and the second collector are arranged at the appointed monitoring point, so that the field of view of the first collector and the second collector can cover the area to be detected of the monitoring point, and the base lines of the first collector and the second collector are adjusted to be parallel to the ground of the monitoring point.

After the installation positions of the first collector and the second collector are adjusted, the installation heights and the pitch angles of the first collector and the second collector need to be measured or calibrated, and the installation heights and the pitch angles are recorded.

Furthermore, epipolar line correction needs to be performed on the images of the first collector and the second collector, so that the same object point is in the same pixel row in the images of the first collector and the second collector.

After the preparation work is finished, video monitoring can be carried out on the area to be detected through the first collector and the second collector of the binocular monitoring equipment, and in the process of monitoring, image frames are extracted from the first video information and the second video information according to preset time intervals in the first video information and the second video information collected by the first collector and the second collector.

At the current time, the image frame extracted from the first video information is the first current frame image, and the image frame extracted from the second video information is the second current frame image. Thus, a current frame image including the first current frame image and the second current frame image is obtained.

After the current frame image is obtained, step 102 may be performed.

Step 102: and identifying each pedestrian in the current frame image through a preset convolutional neural network model, and predicting the corresponding head and shoulder position to obtain a pedestrian target frame corresponding to the head and shoulder position of each pedestrian.

Before each pedestrian in the current frame image is identified through the preset convolutional neural network model, the convolutional neural network model is usually trained by using a preset number of images marked with pedestrian head and shoulder information under different scenes, so that the accuracy of the finally trained convolutional neural network model for identifying the pedestrian head and shoulder information in any image at least reaches the preset accuracy.

For example, 2 ten thousand images marked with head and shoulder information can be preset in different scenes to train the convolutional neural network model, wherein 90% of the images can be used for training, 10% of the images can be used for verification, when the verification accuracy reaches the preset accuracy, the training can be selected to be finished, at the moment, all parameters in the convolutional neural network can be solidified, and the preset convolutional neural network model can be obtained.

After the preset convolutional neural network model is obtained, each pedestrian in the current frame image can be identified through the preset convolutional neural network model, and the corresponding head and shoulder position is predicted, so that a pedestrian target frame corresponding to the head and shoulder position of each pedestrian is obtained.

It should be understood that the pedestrian target frame for the same pedestrian includes the pedestrian target frame of the pedestrian identified from the first current frame image and the second current frame image.

After the pedestrian target frame is obtained, step 103 may be performed.

Step 103: screening out real pedestrian target frames from all pedestrian target frames obtained aiming at the current frame image based on the current frame image and the corresponding background frame image; the background frame image is an image frame extracted from the video information at a historical time, and the historical time is a time which is different from the current time by a preset time interval.

Screening out real pedestrian target frames from the pedestrian target frames obtained aiming at the current frame image, and specifically comprising the following steps: screening out real pedestrian target frames from all pedestrian target frames obtained aiming at the current frame image by adopting a preset screening mode; the preset screening mode at least comprises any one screening mode or any combination of screening modes selected from background screening, pedestrian height screening and pedestrian shoulder width screening.

For example, the actual pedestrian target frame may be screened from all the pedestrian target frames obtained for the current frame image by any one of three screening methods, namely background screening, pedestrian height screening and pedestrian shoulder width screening.

Or selecting any two screening modes from three screening modes of background screening, pedestrian height screening and pedestrian shoulder width screening to screen out the real pedestrian target frame from each pedestrian target frame obtained aiming at the current frame image. If the background screening is carried out firstly, then the pedestrian height screening is carried out on the basis of the background screening, or vice versa; or the pedestrian shoulder width is firstly screened, and then the sexual height screening is carried out on the basis of the pedestrian shoulder width screening, so as to screen out the real pedestrian target frame. It should be understood that the screening combination and the combination order are not limited herein.

The method can also combine background screening, pedestrian height screening and pedestrian shoulder width screening together to screen each pedestrian target frame. It is to be understood that the order of combination is not limited thereto.

For example, at a station, due to the fact that the pedestrian volume is large, the distance between people is relatively short, people in the first current frame image and the second current frame image obtained through the first collector and the second collector are shielded, and therefore the method is not suitable for head-shoulder screening.

Next, background screening, pedestrian height screening, and pedestrian shoulder width screening are specifically described, and how to screen out a real pedestrian target frame from each pedestrian target frame obtained from the current frame image.

The method comprises the following steps of firstly, screening real pedestrian target frames from all pedestrian target frames obtained aiming at a current frame image when background screening is adopted, and specifically:

firstly, calculating a proportion value of background pixels in a pedestrian target frame based on a current frame image and a background frame image; the background pixels are pixels which have the same positions in the current frame image and the background frame image and have the gray value difference smaller than a second preset threshold value between the current frame image and the background frame image.

Based on the current frame image and the background frame image, the calculation mode for calculating the proportion of the background pixels in the pedestrian target frame is as follows: firstly, counting a first total amount of all pixels in a pedestrian target frame and a second total amount of background pixels in the pedestrian target frame; obtaining the ratio based on the first total amount and the second total amount; wherein the ratio is proportional to the second total amount and inversely proportional to the first total amount.

The calculation process of the above ratio can be expressed by the following formula:

wherein, P_iRepresenting the percentage value, Total, of background pixels in the ith pedestrian target frame_iRepresenting the number of all pixels in the ith pedestrian target frame (i.e., the first total amount); thresh represents a threshold value for judging the pedestrian target frame as the background frame, and the threshold value can be set to different threshold values according to different scenes; i is_LAnd I_BRespectively representing the gray values of the corresponding pixel points of the current frame image and the background frame image; t denotes a judgment, i.e. judgment | I_L-I_BAnd if the | is smaller than the threshold value Thresh, determining the pixel point as a background pixel point.

And then, when the occupation ratio value is determined to exceed a first preset threshold value, filtering out a pedestrian target frame. That is, when the occupancy ratio of the background pixels in the ith pedestrian target frame exceeds the first preset threshold, it may be determined that the ith pedestrian target frame corresponds to a pedestrian that is not real, but other objects. Therefore, only the ith pedestrian target frame is filtered.

Obviously, by the method, the accuracy rate of identifying real pedestrians can be improved, and effective technical support is provided for various application scenes.

And a second screening mode, namely screening the real pedestrian target frames from each pedestrian target frame obtained aiming at the current frame image by adopting the pedestrian height screening, and comprising the following steps:

firstly, calculating the height of a pedestrian corresponding to a pedestrian target frame based on the coordinates of a central pixel point of the pedestrian target frame, the coordinates of the central pixel point of the current frame image and a baseline distance; and the baseline distance is the distance between the central point of the first collector and the central point of the second collector.

The specific calculation method of the pedestrian height is as follows:

calculating a three-dimensional coordinate value of a central pixel point of a pedestrian target frame under a camera coordinate system based on the coordinates of the central pixel point of the pedestrian target frame and the coordinates of the central pixel point of the current frame image, as well as a baseline distance and a parallax value of the central pixel point of the pedestrian target frame; the parallax value is a difference value of horizontal pixel coordinates generated when the same object point is observed through the first collector and the second collector.

Then calculating the height of the pedestrian based on the three-dimensional coordinate value, the height value and the pitch angle; the height value is the distance between the center point of the first collector or the second collector and the ground, and the pitch angle is the pitch angle of the first collector or the second collector.

The above mathematical expression of the pedestrian height is as follows:

pzw＝h-cos(α)×py-sin(α)×pz (2)

wherein, PZW is the height of the pedestrian, h is the installation height (namely the height value) of the first collector or the second collector, and alpha is the pitch angle of the first collector or the second collector;

px, py, and pz are three-dimensional coordinate values of a central pixel point of a pedestrian target frame in a camera coordinate system, px, py, bx (i-u)/d, py, bx (j-v)/d, pz, b, xf/d, b is a baseline distance between the first collector and the second collector, f is a calibration focal length of the first collector or the second collector, d is a parallax value of the first collector and the second collector, i and j are image coordinate values of a central pixel of a pedestrian target frame in a selected image (one of the first current frame image or the second current frame image can be selected), and u and v are center point coordinates of the selected image.

And then, when the height of the pedestrian is determined to be out of the first preset range, filtering the pedestrian target frame.

Obviously, by the mode, the accuracy rate of identifying real pedestrians can be improved, and effective technical support is provided for various application scenes.

And a screening mode III, adopting pedestrian shoulder width screening to screen out real pedestrian target frames from each pedestrian target frame obtained aiming at the current frame image, and comprising the following steps:

firstly, calculating the pedestrian shoulder width corresponding to a pedestrian target frame based on the baseline distance, the width value of the pedestrian target frame and the parallax value of a central pixel point of the pedestrian target frame; the pedestrian shoulder width is in direct proportion to the product of the baseline distance and the width value, and in inverse proportion to the parallax value, wherein the parallax value is the difference value of horizontal pixel coordinates generated when the same object point is observed through the first collector and the second collector.

The mathematical expression for calculating the shoulder width of the pedestrian is as follows:

HeadWidth＝b×width/d (3)

wherein HeadWidth is the shoulder width of the pedestrian, b is the baseline distance, and d is the parallax value.

And then, when the pedestrian shoulder width is determined to be out of the second preset range, filtering out a pedestrian target frame. Obviously, by the mode, the accuracy rate of identifying real pedestrians can be improved, and effective technical support is provided for various application scenes.

The parallax value is used for both pedestrian height screening and pedestrian shoulder width screening, and the calculation of the parallax value of one pedestrian target frame is described below.

Firstly, calculating the matching cost of each pixel point in the pedestrian target frame. The calculation formula is as follows:

wherein: p represents a pixel point in a pedestrian target frame, D_pRepresenting the disparity value, C (p, D), of a pedestrian object frame_p) Denotes p-point disparity value as D_pThe matching cost of (2). I is_L(x_q，y_q) Representing the gray value of the q point in the first current frame image, and the coordinate thereof is (x)_q，y_q)；I_R((x_q-D_p)，y_q) Representing the gray value of the q point in the second current frame image, with coordinates of ((x)_q-D_p)，y_q) And q represents a pixel within a predetermined domain of the pixel p, for example, a pixel within a predetermined domain of 7 × 7.

And then, selecting the matching cost corresponding to the pixel point with the minimum matching cost from the matching costs of all the pixel points of the pedestrian target frame, and taking the matching cost as the parallax value of the pedestrian target frame.

Further, by calculating the matching cost of each pixel in the pedestrian target frame, a disparity map of the pedestrian target frame can be generated, and then the disparity map is used as the background frame image to screen out a real pedestrian target frame from the pedestrian target frames obtained for the current frame image.

Based on the same inventive concept, an embodiment of the present invention provides a binocular monitoring device for pedestrian detection, and the specific implementation of the pedestrian detection method of the monitoring device may refer to the description of the method embodiment section, and repeated details are not repeated, please refer to fig. 2, and the binocular monitoring device includes:

an obtaining unit 201, configured to obtain a current frame image; the current frame image comprises a first current frame image and a second current frame image, the first current frame image is an image frame extracted from first video information at the current moment, the second current frame image is an image frame extracted from second video information at the current moment, the first video information is video information obtained from the first collector, and the second video information is video information obtained from the second collector;

the identification unit 202 is configured to identify each pedestrian in the current frame image through a preset convolutional neural network model, and predict a corresponding head-shoulder position to obtain a pedestrian target frame corresponding to the head-shoulder position of each pedestrian;

a screening unit 203, configured to screen a real pedestrian target frame from each pedestrian target frame obtained for the current frame image based on the current frame image and a corresponding background frame image; the background frame image is an image frame extracted from video information at a historical moment, and the historical moment is a moment which is different from the current moment by a preset time interval.

Optionally, when a real pedestrian target frame is screened from the pedestrian target frames obtained for the current frame image, the screening unit 203 is configured to:

calculating a proportion value of background pixels in a pedestrian target frame based on the current frame image and the background frame image; the background pixels are pixels which have the same positions in the current frame image and the background frame image and have the gray value difference between the current frame image and the background frame image smaller than a second preset threshold value;

Optionally, based on the current frame image, a proportion value of background pixels in the pedestrian target frame is calculated, and the screening unit 203 is configured to:

Optionally, when the pedestrian height screening is adopted, a real pedestrian target frame is screened from each pedestrian target frame obtained for the current frame image, and the screening unit 203 is configured to:

Optionally, the height of the pedestrian corresponding to the pedestrian target frame is calculated based on the coordinates of the central pixel point of the pedestrian target frame, the coordinates of the central pixel point of the current frame image, and the baseline distance, and the screening unit 203 is configured to:

Optionally, when the pedestrian shoulder width screening is adopted to screen a real pedestrian target frame from each pedestrian target frame obtained for the current frame image, the screening unit 203 is configured to:

Based on the same inventive concept, the embodiment of the invention provides binocular monitoring equipment for pedestrian detection, which comprises: at least one processor, and

a memory coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor, and the at least one processor performs the pedestrian detection method as described above by executing the instructions stored by the memory.

Based on the same inventive concept, an embodiment of the present invention further provides a computer-readable storage medium, including:

the computer readable storage medium stores computer instructions that, when executed on a computer, cause the computer to perform the pedestrian detection method as described above.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A pedestrian detection method is applied to binocular monitoring equipment, the binocular monitoring equipment at least comprises a first collector and a second collector, and the method is characterized by comprising the following steps:

screening out real pedestrian target frames from all pedestrian target frames obtained aiming at the current frame image by adopting a preset screening mode based on the current frame image and the corresponding background frame image; the background frame image is an image frame extracted from video information at a historical moment, the historical moment is a moment which is different from the current moment by a preset time interval, and the preset screening mode at least comprises background screening;

when the preset screening mode is the background screening, screening out real pedestrian target frames from the pedestrian target frames obtained aiming at the current frame image, and respectively executing the following operations aiming at each pedestrian target frame:

2. The method of claim 1, wherein calculating a fractional value of background pixels in the one pedestrian target frame based on the current frame image comprises:

3. The method of claim 1, wherein the predetermined screening means further comprises pedestrian height screening;

adopting the pedestrian height screening, screening out real pedestrian target frames from all pedestrian target frames obtained aiming at the current frame image, and the method comprises the following steps:

4. The method of claim 3, wherein calculating the pedestrian height corresponding to a pedestrian target frame based on the coordinates of the center pixel of the pedestrian target frame and the coordinates of the center pixel of the current frame image, and the baseline distance comprises:

5. The method of claim 1, wherein the predetermined screening means further comprises pedestrian shoulder width screening;

adopting the pedestrian shoulder width screening, screening out real pedestrian target frames from each pedestrian target frame obtained aiming at the current frame image, and the method comprises the following steps:

6. Method according to any of claims 1 to 5, wherein the background frame image is in particular an RGB image or a disparity map.

7. The utility model provides a binocular supervisory equipment, includes first collector and second collector at least, its characterized in that includes:

the screening unit is used for screening out real pedestrian target frames from all pedestrian target frames obtained aiming at the current frame image by adopting a preset screening mode based on the current frame image and the corresponding background frame image; when the preset screening mode is background screening, screening out real pedestrian target frames from all the pedestrian target frames obtained aiming at the current frame image, and respectively executing the following operations aiming at each pedestrian target frame: calculating a proportion value of background pixels in a pedestrian target frame based on the current frame image and the background frame image; when the occupation ratio value is determined to exceed a first preset threshold value, filtering the pedestrian target frame; the background frame image is an image frame extracted from video information at a historical moment, the historical moment is a moment which is different from the current moment by a preset time interval, the preset screening mode at least comprises the background screening, and the background pixel is a pixel which has the same position in the current frame image and the background frame image and has a gray value difference smaller than a second preset threshold value between the current frame image and the background frame image.

8. The binocular monitoring device of claim 7, wherein the fraction of background pixels in the one pedestrian target frame is calculated based on the current frame image, the screening unit being configured to:

9. The binocular monitoring device of claim 7, wherein the predetermined screening means further includes pedestrian height screening, when the pedestrian height screening is adopted, a real pedestrian target frame is screened out from the pedestrian target frames obtained for the current frame image, and the screening unit is configured to:

10. The binocular monitoring device of claim 9, wherein the pedestrian height corresponding to a pedestrian target frame is calculated based on coordinates of a center pixel point of the pedestrian target frame and coordinates of a center pixel point of the current frame image, and a baseline distance, the screening unit is configured to:

11. The binocular monitoring device of claim 7, wherein the preset screening means further includes pedestrian shoulder width screening, and when the pedestrian shoulder width screening is adopted to screen real pedestrian target frames from among the individual pedestrian target frames obtained for the current frame image, the screening unit is configured to:

12. A binocular monitoring device according to any of the claims 7-11, characterised in that the background frame images are in particular RGB images or disparity maps.

13. A binocular monitoring device for pedestrian detection, comprising:

at least one processor, and

a memory coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor, the at least one processor performing the method of any one of claims 1-6 by executing the instructions stored by the memory.

14. A computer-readable storage medium characterized by:

the computer readable storage medium stores computer instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1-6.