CN111738088B - Pedestrian distance prediction method based on monocular camera - Google Patents

Pedestrian distance prediction method based on monocular camera Download PDF

Info

Publication number
CN111738088B
CN111738088B CN202010450217.8A CN202010450217A CN111738088B CN 111738088 B CN111738088 B CN 111738088B CN 202010450217 A CN202010450217 A CN 202010450217A CN 111738088 B CN111738088 B CN 111738088B
Authority
CN
China
Prior art keywords
pedestrian
distance
model
camera
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010450217.8A
Other languages
Chinese (zh)
Other versions
CN111738088A (en
Inventor
钱学明
杨瑾
邹屹洋
侯兴松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202010450217.8A priority Critical patent/CN111738088B/en
Publication of CN111738088A publication Critical patent/CN111738088A/en
Application granted granted Critical
Publication of CN111738088B publication Critical patent/CN111738088B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C3/00Measuring distances in line of sight; Optical rangefinders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Human Computer Interaction (AREA)
  • Molecular Biology (AREA)
  • Electromagnetism (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a pedestrian distance prediction method based on a monocular camera, which comprises the following steps: determining a pedestrian head height-pedestrian camera distance model by using a monocular camera, and acquiring a video; marking a pedestrian detection and pedestrian distance sample set; building and obtaining a convolutional neural network model; training the obtained convolutional neural network model by using a training sample to obtain a pedestrian detection and distance prediction model; and inputting the picture to be detected into the trained pedestrian detection and distance prediction model to obtain the coordinates, the score and the distance of the pedestrian. The invention gives full play to the advantages of the deep learning detection method, keeps high precision and good robustness, can detect the pedestrian and predict the distance between the pedestrian and the camera more accurately under the condition of using a monocular camera with lower cost, and simultaneously can not be interfered when the pedestrian is close to the camera or the pedestrian is shielded, and can still normally predict the distance between the pedestrian.

Description

Pedestrian distance prediction method based on monocular camera
Technical Field
The invention belongs to the technical field of computer digital image processing and pattern recognition, and particularly relates to a pedestrian distance prediction method based on a monocular camera.
Background
Ensuring pedestrian safety is one of the important goals of road traffic safety systems, which makes pedestrian detection a core component in driving assistance systems (ADAS).
Currently, most pedestrian detection in ADAS is based on visual detection methods. From the early detection method based on background modeling and statistical learning to the pedestrian detection model based on the deep neural network in recent years, the method has better effect in the pedestrian detection field. Especially, a pedestrian detection model based on a deep neural network has become one of the research hotspots in the pedestrian detection field due to the higher detection accuracy and the better robustness.
The distance information of the pedestrian can be acquired by using distance measuring equipment such as laser radar, but the laser radar has the defect of high cost.
Disclosure of Invention
The invention aims to provide a pedestrian distance prediction method based on a monocular camera, which is used for completing the detection of pedestrians and the distance estimation of the pedestrians from the monocular camera so as to reduce the cost of the pedestrian distance prediction.
In order to achieve the purpose, the invention adopts the following technical scheme:
a pedestrian distance prediction method based on a monocular camera comprises the following steps:
step 1: determining a pedestrian head height-pedestrian camera distance model by using a monocular camera, and acquiring a video;
and 2, step: marking pedestrian detection and pedestrian distance to obtain a training sample set;
and step 3: adding a structural branch for pedestrian distance regression prediction on the deep learning target detection model, and constructing and obtaining a convolutional neural network model;
and 4, step 4: training the convolutional neural network model obtained in the step 3 through the training sample set obtained in the step 2 to obtain a trained convolutional neural network model serving as a pedestrian detection and distance prediction model;
and 5: and (4) inputting the pictures collected by the monocular camera into the pedestrian detection and distance prediction model obtained in the step (4) to obtain the coordinates, scores and distances of pedestrians, and completing the pedestrian distance prediction based on the monocular camera.
Further, step 1 uses the monocular camera, fixes the camera height, combines the internal parameter of camera, utilizes aperture formation of image principle and triangle-shaped similar principle, determines pedestrian's head height-pedestrian camera apart from the model: assuming that the height of the head of the pedestrian is h and the actual distance between the pedestrian and the camera is d, a conversion coefficient a is obtained, so that d = h × a.
Further, step 2 specifically includes:
2.1 Carrying out frame extraction processing on the video acquired in the step 1, wherein the frame extraction interval is 25 frames, and obtaining an initial frame picture of the video;
2.2 Judging the initial frame picture by using a motion blur algorithm, and removing a blur picture in the initial frame picture;
2.3 Marking the initial frame picture after removing the fuzzy frame, marking the head of the pedestrian and the head of the pedestrian, wherein the type of the pedestrian is marked as person, and the head of the pedestrian is marked as head; after the labeling is finished, each picture generates an xml file corresponding to the Pascal VOC format;
2.4 Processing the initial xml file after marking, converting the model of pedestrian head height-pedestrian camera distance according to the pixel height of the head and the model of the pedestrian head height-pedestrian camera distance obtained in the step 1 to obtain the distance between the pedestrian and the camera, adding the distance into the xml file as dist attribute, deleting the head position of the pedestrian of the initial xml file, and finally obtaining the final xml file.
Further, the deep learning target detection model in the step 3 is YOLO, faster-RCNN, SSD or RetinaNet.
Further, step 3 specifically includes:
3.1 Modifying a data reading portion in the deep learning target detection model; reading distance marks in the xml file is added, and the pedestrian distance is read in addition to the position coordinates of the pedestrians;
3.2 A pedestrian distance prediction branch is added to the modified deep learning target detection model of step 3.1).
In step 3.2), the distance prediction branch is composed of 5 convolutional layers, the size of each convolutional kernel is 3 multiplied by 3, the convolutional step length is 1 and is filled with 1, the output of the first four convolutional layers is output, and the number of the last convolutional output channels is the number of anchor frames; selecting ResNet50 as a basic network for extracting the characteristics of the target detection network, performing characteristic fusion on the extracted characteristics by using a characteristic pyramid with 5 layers, and connecting a coordinate regression branch, a target category branch and a distance prediction branch behind each characteristic layer; the number of channels of the 5-layer feature map in the feature pyramid FPN is 256, and the sizes of the feature maps are 100 × 136, 50 × 68, 25 × 34, 13 × 17, and 7 × 9, respectively.
Further, step 4 specifically includes:
4.1 Adds MSELoss to the network total Loss function Loss to constrain the pedestrian distance branch fraction, the MSELoss is calculated as l (x, y) = (x-y) 2 (ii) a y is the pedestrian distance x predicted by the model, and the total loss function of the marked, namely the actual pedestrian distance network consists of 3 parts: focalLoss of the confidence coefficient of the pedestrian, smoothL1Loss of coordinate regression, and MSELoss of mean square error Loss of distance regression; the pedestrian distance loss function herein may also use other regression loss functions including, but not limited to, mselos;
4.2 Model training, wherein a frame used in the training is Pythrch, a basic network ResNet50 of the model uses a pre-training model on an ImageNet classification task, a Fine-tuning strategy is adopted for training on the basis of the model, an optimization algorithm used in the training is a small-batch random gradient descent algorithm, a total of 24 epochs are trained, and the number of samples in each batch is 4; the trained convolutional neural network model has the functions of pedestrian detection and pedestrian distance prediction.
Further, step 5 specifically includes:
5.1 Input the pictures collected from the monocular camera into the pedestrian detection and distance prediction model to obtain the output result of the network, wherein the output result is n vectors with 6 dimensions output by the network after the detection is finished, and the nth vector of the detection result is R n ={x n ,y n ,w n ,h n ,s n ,d n In which { x } n ,y n ,w n ,h n Respectively representing the coordinates of the upper left corner of the coordinate frame corresponding to the nth detection result and the width and the height of the frame, s n Confidence that the nth detection result is a pedestrian, d n A pedestrian predicted distance representing an nth detection result;
5.2 All confidence s are assigned to a confidence threshold of 0.5 n Detecting results below the confidence threshold are deleted as results;
5.3 All the results processed in the step 5.2) are sorted according to the confidence coefficient s from large to small, and the position coordinate frame R of the second and the subsequent detection results is sorted according to a formula k ({x n ,y n ,w n ,h n }) and the result R of the first sorting 1 Calculating IOU, wherein k is more than 1; the expression of the formula is:
Figure GDA0003803092940000041
in the formula, IOU x,k Representing the ratio of the overlapping area to the merging area of the candidate box at the 1 st position and the candidate box at the k-th position, area (R) 1 ∩R k ) Area, area (R) representing the intersection area of the candidate box at the 1 st position and the candidate box at the k-th position 1 ∪R k ) Representing the area of the merging region of the candidate box at the 1 st position and the candidate box at the k-th position;
5.4 Set the IOU threshold of 0.5, delete all results above the IOU threshold;
5.5 After the step 5.4) is finished, taking out the first result of the confidence degree sequencing as a correct result to be output, and circularly operating the rest results according to 5.3) and 5.4) until the number of sequencing results is less than or equal to 1; all the finally obtained results are the final output results.
Compared with the prior art, the invention has the following beneficial effects: the invention can predict the distance between the pedestrian and the camera while detecting the position of the pedestrian, the AP for detecting the pedestrian is more than 95%, and the relative error of the prediction of the distance between the pedestrian and the camera is less than 10%.
Drawings
FIG. 1 is a flowchart of a pedestrian distance prediction method based on a monocular camera according to an embodiment of the present invention;
FIG. 2 is an example of a training sample in an embodiment of the present invention; wherein, fig. 2 (a) is a picture of a training sample, and fig. 2 (b) is a label file of the training sample;
FIG. 3 is a network architecture diagram of a pedestrian distance prediction branch in an embodiment of the present invention;
fig. 4 shows the pedestrian detection and distance prediction results in the embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are given to illustrate the present invention, but are not intended to limit the scope of the present invention.
Referring to fig. 1, the present invention provides a pedestrian distance prediction method based on a monocular camera, including the following steps:
step 1: determining a pedestrian head height-pedestrian camera distance model by using a monocular camera, and acquiring a video;
placing the camera at a take the altitude, combining the internal parameter of camera, utilizing aperture principle of imaging and triangle-shaped similar principle, determining pedestrian's head height-pedestrian camera apart from the model, supposing pedestrian's head height is h, pedestrian apart from the camera actual distance be d, then can obtain a conversion coefficient a for d = h a.
Step 2: marking pedestrian detection and pedestrian distance to obtain a training sample set;
the method is used for generating a data set for training a deep convolutional neural network model, and comprises the following specific steps:
2.1 Carrying out frame extraction processing on the video acquired in the step 1, wherein the frame extraction interval is 25 frames, and obtaining an initial frame picture of the video.
2.2 Using a motion blur algorithm to judge the initial frame picture and removing a blur picture in the initial frame picture; the motion blur algorithm can select a Laplacian method: firstly, laplacian transformation is carried out on a picture, then the variance of the picture after Laplacian transformation is obtained, whether an initial frame picture is fuzzy or not is judged according to the size of the variance, and if the variance is smaller than a set threshold value, the picture is regarded as fuzzy;
2.3 Marking the initial frame picture after removing the fuzzy frame, marking the head of the pedestrian and the head of the pedestrian when marking, marking the type of the pedestrian as person, and marking the head of the pedestrian as head. Labeling can be directly carried out by using a labellimg tool, and each image generates an xml file with a corresponding Pascal VOC format after the labeling is finished, wherein the labeled image is shown as a picture in a figure 2 (a);
2.4 Processing the initial xml file after labeling, converting the model of the pedestrian head height-the pedestrian camera distance obtained in the step 1 according to the pixel height of the head, obtaining the distance between the pedestrian and the camera through conversion, adding the distance into the xml file as a dist attribute, deleting the head position of the pedestrian of the head of the initial xml at the same time, and finally obtaining a final xml file, wherein the content of the xml file is shown in fig. 2 (b).
And step 3: and adding a structural branch for pedestrian distance regression prediction on the deep learning target detection model RetinaNet, and constructing to obtain a convolutional neural network model.
The step 3 specifically comprises the following steps:
3.1 Modifying a data reading part in RetinaNet (mainly adding reading of distance labels in xml files, reading of pedestrian distance is needed besides reading of position coordinates of pedestrians), and adding support for the pedestrian distance information as the distance information of each pedestrian is newly added in xml in a data set and the original RetinaNet does not contain processing of the pedestrian distance information;
3.2 A pedestrian distance prediction branch is added, the distance prediction branch is similar to a coordinate regression branch and is composed of 5 convolution layers, the size of each convolution kernel is 3 x 3, the convolution step is 1 and the filling is 1, the output of the convolution layers in the first four layers is different in that the number of convolution output channels in the last layer of the distance regression branch is the number of Anchor frames (Anchor) multiplied by 4, and the number of output channels of the distance prediction branch is the number of Anchor frames (Anchor). Selecting ResNet50 as a basic network for extracting the characteristics of the target detection network, performing characteristic fusion on the extracted characteristics by using a characteristic pyramid with 5 layers, and connecting a coordinate regression branch, a target category branch and a distance prediction branch behind each characteristic layer. The number of channels of 5 layers of feature maps in the feature pyramid FPN is 256, the sizes of the feature maps are 100 × 136, 50 × 68, 25 × 34, 13 × 17 and 7 × 9, respectively, fig. 3 is a pedestrian distance prediction branch structure diagram, and a specific structure of a pedestrian distance prediction branch is drawn in the diagram by taking the layer 1 of the FPN as an input.
And 4, step 4: training the convolutional neural network model obtained in the step 3 through the training sample set obtained in the step 2 to obtain a trained convolutional neural network model which is used as a pedestrian detection and distance prediction model;
the step 4 specifically comprises the following steps:
4.1 MSELoss is added to the net total Loss function (Loss) to constrain the pedestrian distance branch fraction, and is calculated as l (x, y) = (x-y) 2 (ii) a y is the pedestrian distance x predicted by the model, and is the mark, namely the actual pedestrian distance; the trained loss function consists of 3 parts: focalLoss of pedestrian confidence, smoothL1Loss of coordinate regression, mean square error Loss of distance regression (mselos). Because the value obtained by the mean square error loss of the distance regression is large, a proportionality coefficient needs to be multiplied (the proportionality coefficient needs to be adjusted according to the magnitude of the actual distance prediction part loss, which is 0.004 during the actual training in the embodiment) to reduce the proportion of the mean square error loss of the distance regression in the total loss, so that 3 losses are kept in the same order of magnitude;
4.2 Model training, wherein a frame used in the training is Pythrch, a basic network Resnet50 of the model uses a pre-training model on an ImageNet classification task, a Fine-tuning strategy is adopted for training on the basis of the model, an optimization algorithm used in the training is a small-batch random gradient descent algorithm, a total of 24 epochs are trained, and the number of samples in each batch is 4. The trained convolutional neural network model has the functions of pedestrian detection and pedestrian distance prediction;
and 5: and (4) inputting the pictures acquired from the monocular camera into the pedestrian detection and distance prediction model obtained in the step (4) to obtain an output result of the model, and further performing non-maximum suppression on the output result of the model to obtain the coordinates, confidence and predicted distance of the finally detected pedestrian.
The step 5 specifically comprises the following steps:
5.1 Input the pictures collected from the monocular camera into the pedestrian detection and distance prediction model to obtain the output result of the network, wherein the output result is n vectors with 6 dimensions output by the network after the detection is finished, and the nth vector of the detection result is R n ={x n ,y n ,w n ,h n ,s n ,d n Therein { x } n ,y n ,w n ,h n Respectively represent the coordinates (x) of the upper left corner of the coordinate frame corresponding to the nth detection result n ,y n ) And width w of the frame n And height h n ,s n Confidence that the nth detection result is a pedestrian, d n A pedestrian predicted distance representing an nth detection result;
5.2 All confidence s are assigned to a confidence threshold of 0.5 n Detection results below the threshold are deleted as a result;
5.3 All results are sorted in order of increasing confidence s, and the second and subsequent candidate frames R are sorted according to a formula k K > 1 and the first result R 1 Calculating IOU, wherein the formula expression is as follows:
Figure GDA0003803092940000081
in the formula, IOU 1,k Representing the ratio of the overlapping area to the merging area of the candidate box at the 1 st position and the candidate box at the k-th position, area (R) 1 ∩R k ) Area, area (R) representing the intersection area of the candidate box at the 1 st position and the candidate box at the k-th position 1 ∪R k ) Substitute for Chinese traditional medicineListing the area of a union region of the candidate box at the 1 st position and the candidate box at the k th position;
5.4 Set the IOU threshold of 0.5, delete all results above the threshold;
5.5 After the step 5.4) is finished, the first result of the confidence degree S sorting is taken out and output as a correct result, and the rest results are circularly operated according to the steps 5.3) and 5.4) until the number of the sorting results is less than or equal to 1. All the finally obtained results are the final output results. Fig. 4 is an example of the model detection result, where the red frame is a real pedestrian frame, the red text is a real pedestrian distance value, the green frame is the model prediction result, and the green text is the confidence and distance prediction value of the pedestrian. Fig. 4 is an example of a model detection result, in which a dotted line frame is a real pedestrian frame, oblique bold characters are real pedestrian distance values, a solid line frame is a model prediction result, and non-oblique bold characters are a pedestrian confidence and distance prediction values.
The experimental result shows that the technical scheme can predict the distance between the pedestrian and the camera while detecting the position of the pedestrian, the pedestrian detection AP is more than 95%, and the relative error of the pedestrian distance prediction is less than 10%.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (5)

1. A pedestrian distance prediction method based on a monocular camera is characterized by comprising the following steps:
step 1: determining a pedestrian head height-pedestrian camera distance model by using a monocular camera, and acquiring a video;
step 2: marking pedestrian detection and pedestrian distance to obtain a training sample set;
and step 3: adding a structural branch for pedestrian distance regression prediction on the deep learning target detection model, and constructing and obtaining a convolutional neural network model;
and 4, step 4: training the convolutional neural network model obtained in the step 3 through the training sample set obtained in the step 2 to obtain a trained convolutional neural network model which is used as a pedestrian detection and distance prediction model;
and 5: inputting the pictures collected by the monocular camera into the pedestrian detection and distance prediction model obtained in the step 4 to obtain the coordinates, scores and distances of pedestrians, and completing pedestrian distance prediction based on the monocular camera;
step 3, a deep learning target detection model is YOLO, faster-RCNN, SSD or RetinaNet;
the step 3 specifically comprises the following steps:
3.1 Modifying a data reading portion in the deep learning target detection model;
3.2 Adding a pedestrian distance prediction branch into the modified deep learning target detection model in the step 3.1);
in the step 3.2), the distance prediction branch consists of 5 convolutional layers, the size of each convolutional kernel is 3 multiplied by 3, the convolutional step length is 1, the padding is 1, the output of the first four convolutional layers is output, and the number of the last convolutional output channels is the number of anchor frames; selecting ResNet50 as a basic network for extracting the characteristics of the target detection network, performing characteristic fusion on the extracted characteristics by using a characteristic pyramid with 5 layers, and connecting a coordinate regression branch, a target category branch and a distance prediction branch behind each characteristic layer; the number of channels of the 5-layer feature map in the feature pyramid is 256, and the sizes of the feature map are 100 × 136, 50 × 68, 25 × 34, 13 × 17 and 7 × 9 respectively.
2. The pedestrian distance prediction method based on the monocular camera as claimed in claim 1, wherein step 1 uses the monocular camera, fixes the height of the camera, and determines the pedestrian head height-pedestrian camera distance model by combining the internal parameters of the camera and using the pinhole imaging principle and the triangle similarity principle: assuming that the height of the head of the pedestrian is h and the actual distance between the pedestrian and the camera is d, a conversion coefficient a is obtained, so that d = h × a.
3. The pedestrian distance prediction method based on the monocular camera according to claim 1, wherein the step 2 specifically includes:
2.1 Carrying out frame extraction processing on the video acquired in the step 1, wherein the frame extraction interval is 25 frames, and obtaining an initial frame picture of the video;
2.2 Using a motion blur algorithm to judge the initial frame picture and removing a blur picture in the initial frame picture;
2.3 Marking the initial frame picture after removing the fuzzy frame, marking the head of the pedestrian and the head of the pedestrian, wherein the type of the pedestrian is marked as person, and the head of the pedestrian is marked as head; after the labeling is finished, each picture generates an xml file corresponding to the Pascal VOC format;
2.4 Processing the initial xml file after marking, converting the model of pedestrian head height-pedestrian camera distance according to the pixel height of the head and the model of the pedestrian head height-pedestrian camera distance obtained in the step 1 to obtain the distance between the pedestrian and the camera, adding the distance into the xml file as dist attribute, deleting the head position of the pedestrian of the initial xml file, and finally obtaining the final xml file.
4. The pedestrian distance prediction method based on the monocular camera according to claim 1, wherein the step 4 specifically includes:
4.1 Adds MSELoss to the network total Loss function Loss to constrain the pedestrian distance branch fraction, the MSELoss is calculated as l (x, y) = (x-y) 2 (ii) a The network total loss function consists of 3 parts: focalLoss of the confidence coefficient of the pedestrian, smoothL1Loss of coordinate regression, and mean square error Loss MSELoss of distance regression;
4.2 Pytorch is used during training, a pre-training model on an ImageNet classification task is used by a basic network ResNet50 of the model, a Fine-tuning strategy is adopted for training on the basis of the model, a small-batch random gradient descent algorithm is used during training, a total of 24 epochs are trained, and the number of samples in each batch is 4; the trained convolutional neural network model has the functions of pedestrian detection and pedestrian distance prediction at the same time.
5. The pedestrian distance prediction method based on the monocular camera according to claim 1, wherein the step 5 specifically includes:
5.1 Input the pictures collected from the monocular camera into the pedestrian detection and distance prediction model to obtain the output result of the network, wherein the output result is n 6-dimensional vectors output by the network after the detection is finished, and the nth detection result vector is R n ={x n ,y n ,w n ,h n ,s n ,d n In which { x } n ,y n ,w n ,h n Respectively representing the coordinates of the upper left corner of the coordinate frame corresponding to the nth detection result and the width and the height of the frame, s n Confidence that the nth detection result is a pedestrian, d n A pedestrian predicted distance representing an nth detection result;
5.2 All confidence s are assigned to a confidence threshold of 0.5 n Detecting results below the confidence threshold are deleted as results;
5.3 All the results processed in the step 5.2) are sorted according to the confidence coefficient s from large to small, and the position coordinate frame R of the second and the subsequent detection results is sorted according to a formula k And sorting the first result R 1 Computing IOU, k>1; the expression of the formula is:
Figure FDA0003803092930000031
in the formula, IOU 1,k Representing the ratio of the overlapping area to the merging area of the candidate box at the 1 st position and the candidate box at the k-th position, area (R) 1 ∩R k ) Area, area (R) representing the intersection area of the candidate box at the 1 st position and the candidate box at the k-th position 1 ∪R k ) Representing the area of the union region of the candidate box at the 1 st position and the candidate box at the k-th position;
5.4 Set the IOU threshold of 0.5, delete all results above the IOU threshold;
5.5 5.4) after the step is finished, taking out the first result of the confidence ranking as a correct result to be output, and circularly operating the rest results according to 5.3) and 5.4) until the number of the ranking results is less than or equal to 1; all the finally obtained results are the final output results.
CN202010450217.8A 2020-05-25 2020-05-25 Pedestrian distance prediction method based on monocular camera Active CN111738088B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010450217.8A CN111738088B (en) 2020-05-25 2020-05-25 Pedestrian distance prediction method based on monocular camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010450217.8A CN111738088B (en) 2020-05-25 2020-05-25 Pedestrian distance prediction method based on monocular camera

Publications (2)

Publication Number Publication Date
CN111738088A CN111738088A (en) 2020-10-02
CN111738088B true CN111738088B (en) 2022-10-25

Family

ID=72647669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010450217.8A Active CN111738088B (en) 2020-05-25 2020-05-25 Pedestrian distance prediction method based on monocular camera

Country Status (1)

Country Link
CN (1) CN111738088B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325418A (en) * 2018-08-23 2019-02-12 华南理工大学 Based on pedestrian recognition method under the road traffic environment for improving YOLOv3
WO2019177562A1 (en) * 2018-03-15 2019-09-19 Harman International Industries, Incorporated Vehicle system and method for detecting objects and object distance
CN110837775A (en) * 2019-09-30 2020-02-25 合肥合工安驰智能科技有限公司 Underground locomotive pedestrian and distance detection method based on binarization network

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109029363A (en) * 2018-06-04 2018-12-18 泉州装备制造研究所 A kind of target ranging method based on deep learning
CN109920001A (en) * 2019-03-14 2019-06-21 大连民族大学 Method for estimating distance based on pedestrian head height
CN110674687A (en) * 2019-08-19 2020-01-10 天津大学 Robust and efficient unmanned pedestrian detection method
CN111027372A (en) * 2019-10-10 2020-04-17 山东工业职业学院 Pedestrian target detection and identification method based on monocular vision and deep learning
CN111145211B (en) * 2019-12-05 2023-06-30 大连民族大学 Method for acquiring pixel height of head of upright pedestrian of monocular camera

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019177562A1 (en) * 2018-03-15 2019-09-19 Harman International Industries, Incorporated Vehicle system and method for detecting objects and object distance
CN109325418A (en) * 2018-08-23 2019-02-12 华南理工大学 Based on pedestrian recognition method under the road traffic environment for improving YOLOv3
CN110837775A (en) * 2019-09-30 2020-02-25 合肥合工安驰智能科技有限公司 Underground locomotive pedestrian and distance detection method based on binarization network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Near infrared nighttime road pedestrians recognition based on convolutional neural network;Xiaobiao Dai 等;《Infrared Physics & Technology》;20190331;第97卷;全文 *
无人驾驶道路探测综述;史晨阳 等;《照明工程学报》;20181031;第29卷(第5期);全文 *
行人跟踪算法及应用综述行人跟踪算法及应用综述;曹自强 等;《物理学报》;20200331;第69卷(第8期);全文 *

Also Published As

Publication number Publication date
CN111738088A (en) 2020-10-02

Similar Documents

Publication Publication Date Title
CN108229397B (en) Method for detecting text in image based on Faster R-CNN
CN108564097B (en) Multi-scale target detection method based on deep convolutional neural network
CN111553387B (en) Personnel target detection method based on Yolov3
CN109934115B (en) Face recognition model construction method, face recognition method and electronic equipment
CN110807422A (en) Natural scene text detection method based on deep learning
CN113076871B (en) Fish shoal automatic detection method based on target shielding compensation
CN108304873A (en) Object detection method based on high-resolution optical satellite remote-sensing image and its system
CN111553200A (en) Image detection and identification method and device
CN109241913A (en) In conjunction with the ship detection method and system of conspicuousness detection and deep learning
CN112232371B (en) American license plate recognition method based on YOLOv3 and text recognition
CN110929795B (en) Method for quickly identifying and positioning welding spot of high-speed wire welding machine
CN113177560A (en) Universal lightweight deep learning vehicle detection method
CN111027538A (en) Container detection method based on instance segmentation model
CN116645592B (en) Crack detection method based on image processing and storage medium
CN113936195B (en) Sensitive image recognition model training method and device and electronic equipment
CN110555420A (en) fusion model network and method based on pedestrian regional feature extraction and re-identification
CN111368775A (en) Complex scene dense target detection method based on local context sensing
CN115019274A (en) Pavement disease identification method integrating tracking and retrieval algorithm
CN113159215A (en) Small target detection and identification method based on fast Rcnn
CN114332921A (en) Pedestrian detection method based on improved clustering algorithm for Faster R-CNN network
CN116740758A (en) Bird image recognition method and system for preventing misjudgment
CN111797704A (en) Action recognition method based on related object perception
CN111738088B (en) Pedestrian distance prediction method based on monocular camera
CN114913504A (en) Vehicle target identification method of remote sensing image fused with self-attention mechanism
CN113033427A (en) DL-based automatic vehicle bottom foreign matter identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant