WO2020237942A1 - 一种行人3d位置的检测方法及装置、车载终端 - Google Patents

一种行人3d位置的检测方法及装置、车载终端 Download PDF

Info

Publication number
WO2020237942A1
WO2020237942A1 PCT/CN2019/108075 CN2019108075W WO2020237942A1 WO 2020237942 A1 WO2020237942 A1 WO 2020237942A1 CN 2019108075 W CN2019108075 W CN 2019108075W WO 2020237942 A1 WO2020237942 A1 WO 2020237942A1
Authority
WO
WIPO (PCT)
Prior art keywords
pedestrian
image
bounding box
detected
key points
Prior art date
Application number
PCT/CN2019/108075
Other languages
English (en)
French (fr)
Inventor
蒋云飞
方欣
Original Assignee
初速度(苏州)科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 初速度(苏州)科技有限公司 filed Critical 初速度(苏州)科技有限公司
Publication of WO2020237942A1 publication Critical patent/WO2020237942A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Definitions

  • the present invention relates to the technical field of intelligent driving, and in particular to a method and device for detecting the 3D position of a pedestrian, and a vehicle-mounted terminal.
  • Pedestrian detection is one of the critical perception tasks in the field of intelligent driving. Pedestrian detection usually refers to the detection of pedestrians in the image when the image collected by the camera installed in the vehicle is acquired. When the pedestrian bounding box in the image is detected, the pedestrian's grounding point position in the pedestrian bounding box is determined The 3D (3 Dimensions) position in the world coordinate system where the vehicle is located. According to the 3D position, the position of the pedestrian relative to the vehicle can be determined, thereby controlling the driving of the vehicle and ensuring the safety of the pedestrian and the vehicle.
  • the camera is usually installed inside the front windshield of the vehicle.
  • the foot of the pedestrian is easily covered by the hood of the vehicle, resulting in no grounding point of the pedestrian in the image collected by the camera, so that the pedestrian bounding box detected from the image does not have a grounding point.
  • the pedestrian bounding box cannot accurately determine the 3D position of the pedestrian.
  • the present invention provides a method and device for detecting the 3D position of a pedestrian, and a vehicle-mounted terminal, so as to accurately determine the 3D position of the pedestrian even when there is no grounding point in the pedestrian bounding box.
  • the specific technical solution is as follows.
  • an embodiment of the present invention discloses a method for detecting a 3D position of a pedestrian, including:
  • the image to be detected is input into a pedestrian detection model, and the pedestrian bounding box and key points of pedestrians in the image to be detected are detected by the pedestrian detection model; wherein, the pre-trained pedestrian detection model can make the pedestrian detection model
  • the image is associated with the pedestrian bounding box and pedestrian key points;
  • the pedestrian detection model includes a feature extraction layer and a regression layer.
  • the feature vector of the image to be detected is determined by the first model parameter trained in the feature extraction layer, and The second model parameter trained in the regression layer regresses the feature vector to obtain the pedestrian bounding box and pedestrian key points in the image to be detected;
  • the pedestrian bounding box and pedestrian key points and the predetermined conversion relationship between the image coordinate system and the world coordinate system where the vehicle is located, it is determined that the pedestrian in the image to be detected is in the world coordinate system The three-dimensional 3D position.
  • the pedestrian detection model also outputs information about whether there is a grounding point in the pedestrian bounding box of the image to be detected;
  • the steps for the 3D position of the system include:
  • the pedestrian in the image to be detected is in the world coordinate system according to the grounding point of the pedestrian bounding box and the predetermined conversion relationship between the image coordinate system and the world coordinate system where the vehicle is located. 3D position under;
  • the determination of the to-be-detected based on the determined relative position between the pedestrian bounding box and the pedestrian key points, and a predetermined conversion relationship between the image coordinate system and the world coordinate system where the vehicle is located include:
  • the pedestrian in the image to be detected is in the world coordinate system 3D position.
  • the pedestrian detection model is obtained by training in the following manner:
  • the difference amount is less than a preset difference amount threshold, it is determined that the pedestrian detection model training is completed.
  • the step of regressing the feature vector through the second model parameters trained in the regression layer to obtain the pedestrian bounding box and the pedestrian key points in the image to be detected includes:
  • the pedestrian bounding box and the pedestrian key points in the to-be-detected image are selected from multiple candidate pedestrian bounding boxes and candidate pedestrian key points in the candidate pedestrian bounding box.
  • the pedestrian bounding box and the pedestrian key in the to-be-detected image are selected from a plurality of candidate pedestrian bounding boxes and candidate pedestrian key points in the candidate pedestrian bounding box
  • the steps to point include:
  • each virtual frame is filtered, and the candidate pedestrian bounding box corresponding to the filtered virtual frame and the candidate pedestrian key point in the candidate pedestrian bounding box are respectively used as the to-be-detected image The pedestrian bounding box and pedestrian key points.
  • an embodiment of the present invention provides a device for detecting a 3D position of a pedestrian, including:
  • An acquisition module configured to acquire an image to be detected collected by an image acquisition device in the vehicle
  • the detection module is configured to input the image to be detected into a pedestrian detection model, and the pedestrian detection model detects pedestrian bounding boxes and key points of pedestrians in the image to be detected; wherein the pedestrian detection model is pre-trained
  • the image to be detected can be associated with pedestrian bounding boxes and key points of pedestrians;
  • the pedestrian detection model includes a feature extraction layer and a regression layer, and the first model parameters trained in the feature extraction layer are used to determine the to-be-detected The feature vector of the image, the feature vector is regressed through the second model parameter trained in the regression layer to obtain the pedestrian bounding box and the pedestrian key points in the image to be detected;
  • the determining module is configured to determine whether the pedestrian in the image to be detected is located in the image to be detected based on the determined pedestrian bounding box and pedestrian key points, and the conversion relationship between the predetermined image coordinate system and the world coordinate system where the vehicle is located The three-dimensional 3D position in the world coordinate system.
  • the pedestrian detection model also outputs information about whether there is a grounding point in the pedestrian bounding box of the image to be detected;
  • the determining module is specifically configured as:
  • the pedestrian in the image to be detected is in the world coordinate system according to the grounding point of the pedestrian bounding box and the predetermined conversion relationship between the image coordinate system and the world coordinate system where the vehicle is located. 3D position under;
  • the determining module determines the image to be detected based on the determined relative position between the pedestrian bounding box and the pedestrian key points, and the predetermined conversion relationship between the image coordinate system and the world coordinate system where the vehicle is located
  • the pedestrian in the 3D position in the world coordinate system includes:
  • the pedestrian in the image to be detected is in the world coordinate system 3D position.
  • the device further includes: a training module; the training module is configured to train to obtain the pedestrian detection model using the following operations:
  • the difference amount is less than a preset difference amount threshold, it is determined that the pedestrian detection model training is completed.
  • the detection module when the detection module regresses the feature vector through the second model parameters trained in the regression layer to obtain the pedestrian bounding box and key points of the pedestrian in the image to be detected, it includes:
  • the pedestrian bounding box and the pedestrian key points in the to-be-detected image are selected from multiple candidate pedestrian bounding boxes and candidate pedestrian key points in the candidate pedestrian bounding box.
  • the detection module selects the pedestrian bounding box in the to-be-detected image from a plurality of candidate pedestrian bounding boxes and candidate pedestrian key points in the candidate pedestrian bounding box according to a non-maximum suppression algorithm
  • a non-maximum suppression algorithm When referring to key points for pedestrians, including:
  • each virtual frame is filtered, and the candidate pedestrian bounding box corresponding to the filtered virtual frame and the candidate pedestrian key point in the candidate pedestrian bounding box are respectively used as the to-be-detected image The pedestrian bounding box and pedestrian key points.
  • an embodiment of the present invention discloses a vehicle-mounted terminal, including: a processor and an image acquisition device; the processor includes an acquisition module, a detection module, and a determination module;
  • the acquisition module is used to acquire the image to be detected collected by the image acquisition device in the vehicle;
  • the detection module is configured to input the image to be detected into a pedestrian detection model, and the pedestrian detection model detects pedestrian bounding boxes and key points of pedestrians in the image to be detected; wherein the pre-trained pedestrian detection The model enables the image to be detected to be associated with pedestrian bounding boxes and pedestrian key points; the pedestrian detection model includes a feature extraction layer and a regression layer, and the first model parameters trained in the feature extraction layer are used to determine the Detecting the feature vector of the image, and regressing the feature vector through the second model parameter trained in the regression layer to obtain the pedestrian bounding box and pedestrian key points in the image to be detected;
  • the determining module is configured to determine the pedestrian in the image to be detected based on the determined pedestrian bounding box and pedestrian key points, and a predetermined conversion relationship between the image coordinate system and the world coordinate system where the vehicle is located The three-dimensional 3D position in the world coordinate system.
  • the pedestrian detection model also outputs information about whether there is a ground point in the pedestrian bounding box of the image to be detected; the determining module is specifically configured to:
  • the pedestrian in the image to be detected is in the world coordinate system according to the grounding point of the pedestrian bounding box and the predetermined conversion relationship between the image coordinate system and the world coordinate system where the vehicle is located. 3D position under;
  • the determining module determines the determined relative positions between the pedestrian bounding box and the pedestrian key points, and the predetermined conversion relationship between the image coordinate system and the world coordinate system where the vehicle is located.
  • the 3D position of the pedestrian in the image to be detected in the world coordinate system includes:
  • the pedestrian in the image to be detected is in the world coordinate system 3D position.
  • the processor further includes: a training module; the training module is configured to train to obtain the pedestrian detection model using the following operations:
  • the difference amount is less than a preset difference amount threshold, it is determined that the pedestrian detection model training is completed.
  • the detection module when the detection module regresses the feature vector through the second model parameters trained in the regression layer to obtain the pedestrian bounding box and key points of the pedestrian in the image to be detected, it includes:
  • the pedestrian bounding box and the pedestrian key points in the to-be-detected image are selected from a plurality of candidate pedestrian bounding boxes and candidate pedestrian key points in the candidate pedestrian bounding box.
  • the detection module selects the pedestrian bounding box in the to-be-detected image from a plurality of candidate pedestrian bounding boxes and candidate pedestrian key points in the candidate pedestrian bounding box according to a non-maximum suppression algorithm
  • a non-maximum suppression algorithm When referring to key points for pedestrians, including:
  • each virtual frame is filtered, and the candidate pedestrian bounding box corresponding to the filtered virtual frame and the candidate pedestrian key point in the candidate pedestrian bounding box are respectively used as the to-be-detected image The pedestrian bounding box and pedestrian key points.
  • the pedestrian 3D position detection method and device and vehicle-mounted terminal can detect the pedestrian bounding box and pedestrian key points in the image to be detected by the pedestrian detection model, based on the determined pedestrian bounding box and pedestrian
  • the key points and the predetermined conversion relationship between the image coordinate system and the world coordinate system determine the 3D position of the pedestrian in the image to be detected in the world coordinate system.
  • the embodiment of the present invention can simultaneously detect the pedestrian bounding box and the pedestrian key points in the image to be detected by the pedestrian detection model. When the pedestrian bounding box has no grounding point, the combination of the pedestrian bounding box and the pedestrian key points can be used to determine more accurately The 3D position of the pedestrian.
  • the pedestrian detection model is used to detect the pedestrian bounding box and key points from the image to be detected at one time.
  • the pedestrian key points can be combined to determine the 3D position of the pedestrian , Improve the accuracy of 3D position.
  • the proportional relationship between the key pedestrian points and each part of the pedestrian can be combined to determine the height from the key pedestrian point to the foot of the pedestrian, and then the grounding point corresponding to the pedestrian bounding box can be determined.
  • the 3D position of the pedestrian can be determined, which can improve the accuracy of the 3D position of the pedestrian.
  • the pedestrian detection model performs non-maximum suppression for key points of pedestrians. For multiple pedestrians that block each other, the pedestrian bounding box of each pedestrian can be determined more accurately And pedestrian key points, thereby improving the accuracy of the determined pedestrian 3D position.
  • FIG. 1 is a schematic flowchart of a method for detecting a 3D position of a pedestrian according to an embodiment of the present invention
  • FIG. 2 is a reference diagram of a process for detecting an image to be detected according to an embodiment of the present invention
  • 3A is a schematic flow chart of a pedestrian detection model detection process provided by an embodiment of the present invention.
  • 3B is a schematic diagram of performing non-maximum suppression according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a device for detecting a 3D position of a pedestrian according to an embodiment of the present invention
  • Fig. 5 is a schematic structural diagram of a vehicle-mounted terminal provided by an embodiment of the present invention.
  • the embodiment of the present invention discloses a method and device for detecting the 3D position of a pedestrian, and a vehicle-mounted terminal, which can accurately determine the 3D (3 Dimensions) position of the pedestrian even when there is no grounding point in the pedestrian bounding box.
  • the embodiments of the present invention will be described in detail below.
  • FIG. 1 is a schematic flowchart of a method for detecting a 3D position of a pedestrian according to an embodiment of the present invention. This method is applied to electronic equipment.
  • the electronic device may be an ordinary computer, a server, or a smart mobile device, etc., or it may be a vehicle-mounted terminal installed in the vehicle.
  • the method specifically includes the following steps.
  • S110 Acquire an image to be detected collected by an image collecting device in the vehicle.
  • the image acquisition device can be a normal camera, a surveillance camera or a driving recorder.
  • the image acquisition device may be a camera installed inside the front windshield of the vehicle, or may be a camera installed inside the rear windshield of the vehicle.
  • the image to be detected includes pedestrians and background areas outside of pedestrians.
  • the image to be detected may contain one or more pedestrians.
  • the pedestrian may be far away from the vehicle or closer to the vehicle; there may or may not be a pedestrian grounding point in the image to be detected.
  • Pedestrian grounding points may be blocked by vehicles or other obstacles.
  • the grounding point can be understood as the point where pedestrians contact the road.
  • S120 Input the image to be detected into a pedestrian detection model, and the pedestrian detection model detects the pedestrian bounding box and key points of the pedestrian in the image to be detected.
  • the pre-trained pedestrian detection model can associate the image to be detected with the pedestrian bounding box and pedestrian key points.
  • the pedestrian detection model includes a feature extraction layer and a regression layer.
  • the pedestrian detection model can be obtained in advance based on sample pedestrian images and labeled standard pedestrian edit boxes and standard pedestrian key points, and trained by machine learning algorithms.
  • the pedestrian detection model can be a neural network model in deep learning.
  • the pedestrian detection model detects the pedestrian bounding box and key points of the pedestrian in the image to be detected, it may specifically include: determining the feature vector of the image to be detected through the first model parameter trained in the feature extraction layer, and passing the trained first model parameter in the regression layer Two model parameters are used to regress the feature vector to obtain the pedestrian bounding box and pedestrian key points in the image to be detected.
  • the pedestrian bounding box of each pedestrian in the image to be detected is related to the pedestrian key points.
  • Each pedestrian contains pedestrian bounding box and pedestrian key points.
  • the pedestrian bounding box can be understood as a rectangular box that can enclose all pixels of the pedestrian's body area, and the pedestrian bounding box can be represented by the coordinates of the diagonal vertices of the rectangular box.
  • the pedestrian bounding box may also contain the coordinates of the center point of the pedestrian bounding box.
  • Pedestrian key points can include waist key points, shoulder key points, arm key points, head key points, leg key points, etc. Since the feet and legs of the human body are easily blocked by objects such as vehicles, the key points of pedestrians can be the key points of the waist and the key points of the shoulders. For example, you can use the waist center point as the waist key point, and the shoulder center point as the shoulder key point.
  • Pedestrian key points may be blocked and cannot be detected, but when the pedestrian bounding box is determined, the pedestrian key points can be determined according to the pedestrian bounding box. Therefore, the position of the standard pedestrian key points can be marked in the sample pedestrian image according to the position of the pedestrian bounding box As well as the visibility of key points of pedestrians, the trained pedestrian detection model can also determine the key points of pedestrians and the visibility of key points of pedestrians.
  • the pedestrian detection model can output the following detection results after detecting the image to be detected: the coordinates of the center point of the shoulder and waist of the pedestrian, the visibility, and the coordinates of the diagonal point of the pedestrian bounding box.
  • S130 Determine the 3D position of the pedestrian in the image to be detected in the world coordinate system according to the determined pedestrian bounding box and pedestrian key points, and the predetermined conversion relationship between the image coordinate system and the world coordinate system where the vehicle is located.
  • the image coordinate system is the coordinate system of the image to be detected.
  • the world coordinate system is a three-dimensional coordinate system.
  • the center point of the vehicle may be the origin
  • the traveling direction of the vehicle may be the X-axis direction
  • the upward direction perpendicular to the top surface of the vehicle may be the Z-axis direction.
  • the determined pedestrian bounding box and pedestrian key points are parameters in the image coordinate system.
  • the pedestrian's grounding point in the image to be detected can be determined; according to the image coordinate system and the world coordinate system where the vehicle is located
  • the conversion relationship between the two can convert the position of the grounding point into a 3D position in the world coordinate system.
  • the 3D position can represent the distance between pedestrians in the direction of each coordinate axis in the vehicle.
  • the camera coordinate system is a three-dimensional coordinate system where the image acquisition device is located.
  • the camera coordinate system can establish a coordinate system with the optical axis of the image acquisition device's photosensitive element as the origin and the optical axis as the Z axis. According to the internal parameter matrix of the image acquisition device, the conversion relationship between the image coordinate system and the camera coordinate system can be obtained.
  • the internal parameter matrix can be Among them, s is the tilt parameter of the optical axis, f u and f v are the focal length of the photosensitive element, u 0 and v 0 are the distances from the origin to the center of the image coordinate system, and can also be half of the length and width of the image to be detected. . u and v are the two coordinate axes of the image coordinate system.
  • the pedestrian detection model in this embodiment can detect the pedestrian bounding box and key pedestrian points in the image to be detected. According to the determined pedestrian bounding box and key pedestrian points, and the predetermined image coordinate system and the world coordinate system To determine the 3D position of the pedestrian in the image to be detected in the world coordinate system.
  • the pedestrian detection model can simultaneously detect the pedestrian bounding box and the pedestrian key points in the image to be detected. When the pedestrian bounding box has no ground point, the combination of the pedestrian bounding box and the pedestrian key points can be used to more accurately determine the pedestrian 3D position.
  • a first network model for detecting pedestrian bounding boxes and a second network model for detecting key points of pedestrians can be trained.
  • the image to be detected can be input into the second network model.
  • the second network model detects the key points of the pedestrian in the image to be detected, and combines the pedestrian bounding box and the key points to determine the pedestrian 3D position.
  • this solution requires that the image to be detected is input to the network model twice, and the image to be detected is detected twice, and two network models need to be trained in the early stage, and the overall processing efficiency is low.
  • the image to be detected can be detected once, and pedestrian bounding boxes and key points of pedestrians are output at the same time, which saves running time to a certain extent and improves detection efficiency.
  • the pedestrian detection model also outputs information about whether there is a ground point in the pedestrian bounding box of the image to be detected.
  • step S130 according to the determined pedestrian bounding box and pedestrian key points, and the predetermined conversion relationship between the image coordinate system and the world coordinate system where the vehicle is located, it is determined that the pedestrian in the image to be detected is in the world coordinate system For the 3D position, the following steps 1a to 3a can be specifically included.
  • Step 1a Determine whether there is a grounding point in the pedestrian bounding box, if it exists, go to step 2a; if it does not exist, go to step 3a.
  • this step it can be determined whether there is a grounding point in the pedestrian bounding box based on the information output by the pedestrian detection model.
  • Step 2a Determine the 3D position of the pedestrian in the image to be detected in the world coordinate system according to the grounding point of the pedestrian bounding box and the predetermined conversion relationship between the image coordinate system and the world coordinate system where the vehicle is located.
  • the 3D position of the grounding point of the pedestrian bounding box in the world coordinate system can be determined.
  • the 3D position is the pedestrian in the image to be detected in the world coordinate system 3D position below.
  • the image coordinate system and the world coordinate system determine the 3D position of the grounding point of the pedestrian bounding box and the head vertex of the pedestrian bounding box and the point representing the body width in the world coordinate system.
  • the three-dimensional enclosing frame of the human body formed by multiple 3D positions serves as the 3D position of the pedestrian in the image to be detected in the world coordinate system.
  • Step 3a Determine the pedestrian in the image to be detected in the world coordinate system according to the determined relative position between the pedestrian bounding box and the pedestrian key points, and the predetermined conversion relationship between the image coordinate system and the world coordinate system where the vehicle is located 3D position.
  • the relative position between the pedestrian bounding box and the pedestrian key point may include the distance from the pedestrian key point to the top of the pedestrian bounding box, the distance from the pedestrian key point to the side of the pedestrian bounding box, etc.
  • the relative position between the pedestrian bounding box and the pedestrian key point may include the distance from the shoulder key point to the top of the head, the distance from the waist key point to the top of the head, and so on.
  • this embodiment can determine whether there is a grounding point in the pedestrian bounding box. When it exists, the 3D position of the pedestrian is directly determined according to the grounding point of the pedestrian boundary box. When it does not exist, it is based on the relative position between the pedestrian boundary box and the pedestrian key point. Determining the 3D position of the pedestrian and making different processing according to different situations can improve the overall calculation efficiency.
  • step 3a is based on the determined relative position between the pedestrian bounding box and the pedestrian key points, and the predetermined image coordinate system and the world coordinate system where the vehicle is located
  • the conversion relationship between the two and the step of determining the 3D position of the pedestrian in the image to be detected in the world coordinate system may specifically include steps 3a-1 to 3a-4.
  • Step 3a-1 Determine the first height between the pedestrian key point and the upper bounding box of the pedestrian bounding box.
  • the upper bounding box can be understood as the upper side of the rectangular box where the pedestrian bounding box is located.
  • the first height can be understood as the distance between the pedestrian key point and the upper bounding box of the pedestrian bounding box in the longitudinal direction of the image to be detected.
  • Step 3a-2 Predict the second height between the pedestrian key point and the foot bottom of the pedestrian corresponding to the pedestrian bounding box according to the preset proportional relationship between the pedestrian key point and the top of the human body and the bottom of the human foot, and the first height.
  • the proportional relationship between the key points of pedestrians and the tops of human heads and soles of human feet can be data obtained by pre-calculating a large number of human samples. For example, the ratio between the center point of the human shoulder to the top of the human body and the bottom of the human foot, and the ratio between the center point of the waist of the human body to the top of the human head and the sole of the human foot can be obtained according to statistics.
  • the pedestrian bounding box may only include the upper body of the human body or include areas other than the feet of the human body.
  • the second height between the key point of the pedestrian and the foot of the pedestrian corresponding to the pedestrian bounding box can be predicted, and the position of the pedestrian grounding point in the image to be detected is determined according to the second height.
  • Step 3a-3 Determine the grounding point corresponding to the pedestrian bounding box in the image to be detected according to the second height.
  • the position of the second height can be directly extended downward from the coordinates of the key points of the pedestrian to obtain the grounding point corresponding to the pedestrian bounding box; it can also be determined according to the actual measurement between the different coordinate intervals of the image to be detected and the real space.
  • the zooming relationship is performed on the second height, and then the position of the second height after the processing is extended downward from the coordinates of the pedestrian key point to obtain the ground point corresponding to the pedestrian bounding box.
  • Step 3a-4 Determine the 3D position of the pedestrian in the image to be detected in the world coordinate system according to the determined grounding point corresponding to the pedestrian bounding box and the predetermined conversion relationship between the image coordinate system and the world coordinate system where the vehicle is located .
  • step 2a For the specific implementation of this step, refer to the description in step 2a.
  • the proportional relationship between the key points of the pedestrian and the parts of the pedestrian can be combined to determine the height from the key point of the pedestrian to the foot of the pedestrian, and then determine the pedestrian bounding box corresponding Ground point.
  • the 3D position of the pedestrian can be determined, which can improve the accuracy of the 3D position of the pedestrian.
  • the pedestrian detection model can be obtained by training in the following steps 1b to 6b.
  • Step 1b Obtain multiple sample pedestrian images and labeled standard pedestrian bounding boxes and standard pedestrian key points.
  • the standard pedestrian bounding box and standard pedestrian key points can be regarded as true values.
  • sample pedestrian images can be obtained.
  • One or more pedestrians can be included in the sample pedestrian image.
  • the sample pedestrian image contains the background area outside the pedestrian.
  • the sample pedestrian image can be collected in advance using a camera on the vehicle.
  • Each sample pedestrian image is marked with a standard pedestrian bounding box and information about whether there is a grounding point.
  • the marked standard pedestrian key points can include key point coordinates and visibility of key points.
  • Step 2b Input each sample pedestrian image into the feature extraction layer in the pedestrian detection model.
  • Step 3b Determine the sample feature vector of the sample pedestrian image through the first model parameter in the feature extraction layer, and send the sample feature vector to the regression layer in the pedestrian detection model.
  • the functions of the feature extraction layer and the regression layer can be implemented with different convolutional layers.
  • the sample feature vector can be expressed in the form of a feature matrix.
  • the initial value of the first model parameter can be preset based on experience, for example, can be set to a smaller value. In the process of each training, the first model parameter is continuously revised, gradually approaching the true value.
  • Step 4b Regress the sample feature vector through the second model parameter in the regression layer to obtain the sample pedestrian bounding box and sample pedestrian key points in the sample pedestrian image.
  • the initial value of the second model parameter can be preset based on experience, for example, can be set to a smaller value.
  • the second model parameter is continuously revised, gradually approaching the true value.
  • the obtained sample pedestrian bounding box and sample pedestrian key points may not be accurate enough, and the sample pedestrian bounding box and sample pedestrian key points can be used as a reference when correcting the first model parameter and the second model parameter.
  • Step 5b Determine the amount of difference between the sample pedestrian bounding box and the sample pedestrian key points and the corresponding standard pedestrian bounding box and standard pedestrian key points, respectively.
  • the aforementioned difference amount can be determined by using a loss function (loss).
  • a loss function loss
  • the amount of difference between the sample pedestrian bounding box and the standard pedestrian bounding box can be determined, and the difference between the sample pedestrian key points and the standard pedestrian key points can be determined.
  • Step 6b When the difference amount is not less than the preset difference amount threshold, adjust the first model parameter and the second model parameter according to the difference amount, and return to step 2b. When the difference amount is less than the preset difference amount threshold, it is determined that the pedestrian detection model training is completed.
  • the difference amount when determining the aforementioned difference amount, it can be determined whether the difference amount is less than a preset difference amount threshold.
  • the difference amount is not less than the preset difference amount threshold, it is considered that the difference between the predicted result of the pedestrian detection model and the standard value is large, and the network needs to be continuously trained.
  • the specific value and change direction of the difference amount can be referred to, and the first model parameter and the second model parameter are adjusted in the opposite direction according to the specific value.
  • step S120 the feature vector is regressed through the second model parameters trained in the regression layer to obtain the pedestrian bounding box and the pedestrian bounding box in the image to be detected.
  • the key steps for pedestrians include:
  • Step 1c Regress the feature vector through the trained second model parameters in the regression layer to obtain multiple candidate pedestrian bounding boxes and key points of candidate pedestrians in the candidate pedestrian bounding box.
  • a large number of pedestrian bounding boxes and key pedestrian points can be obtained, which are used as the bounding boxes of the pedestrians to be selected and the key points of the pedestrians to be selected.
  • NMS Non-Maximum Suppression
  • Step 2c According to the non-maximum suppression algorithm, select the pedestrian bounding box and the pedestrian key point in the image to be detected from the multiple candidate pedestrian bounding boxes and the candidate pedestrian key points in the candidate pedestrian bounding box.
  • the bounding box of the pedestrian to be selected and the key points of the pedestrian to be selected are correspondingly related, it can be filtered according to the degree of coincidence between the bounding boxes of each pedestrian to be selected, for example, the intersection ratio between the bounding boxes of the two pedestrians to be selected can be determined (I.e. degree of coincidence), for the bounding box of the candidate pedestrian whose intersection ratio is greater than the preset intersection ratio, the bounding box of the candidate pedestrian with a low score and the corresponding key points of the candidate pedestrian are removed.
  • the score is the confidence score.
  • the spacing between pedestrians is very small, and they often block each other.
  • the intersection ratio between the bounding boxes of the pedestrians to be selected between these pedestrians will be relatively large, exceeding the preset intersection ratio threshold, resulting in the removal of the pedestrian bounding boxes of some pedestrians, resulting in multiple mutual occlusions Of pedestrians may only detect a set of pedestrian bounding boxes and pedestrian key points.
  • step 2c can use the following The implementation mode is implemented, specifically including the following steps 2c-1 to 2c-3.
  • Step 2c-1 Determine the line between the key points of the pedestrian to be selected in the bounding box of each pedestrian to be selected.
  • the key points of the pedestrian to be selected include the shoulder center point and the waist center point
  • the shoulder center point and the waist center point in the bounding box of each pedestrian to be selected can be connected.
  • Step 2c-2 According to the pre-trained target width, the above-mentioned connection line is used as the height to generate a virtual frame corresponding to the key point of the pedestrian to be selected.
  • the aforementioned target width is a better value determined during the training process of the pedestrian detection model.
  • the virtual frame can be understood as a rectangular frame, the height of the rectangular frame is the above-mentioned line, and the width is the above-mentioned target width. In this way, a virtual frame can be obtained for each group of candidate pedestrian bounding boxes and key points of the candidate pedestrians.
  • Root step 2c-3 Filter each virtual frame according to the non-maximum suppression algorithm, and use the selected pedestrian boundary box corresponding to the filtered virtual frame and the key points of the candidate pedestrian in the candidate pedestrian boundary box as The pedestrian bounding box and pedestrian key points in the image to be detected.
  • this step can determine the intersection ratio between each virtual frame, and for the candidate pedestrian bounding box and key points of the candidate pedestrian corresponding to the virtual frame whose intersection ratio is greater than the preset intersection ratio threshold, the removal score is low.
  • the bounding box of the pedestrian to be selected and the corresponding key points of the pedestrian to be selected, the remaining pedestrian bounding boxes and the key points of the pedestrian to be selected are used as the pedestrian bounding boxes and key points of the pedestrian in the image to be detected.
  • FIG. 3A is a schematic flow chart of the pedestrian detection model detecting the image to be detected to obtain the output result.
  • the image to be detected is input to the feature extraction layer, and the feature extraction layer determines the feature vector of the image to be detected according to the first model parameters to obtain the feature vector map, and input the feature vector map to the regression layer.
  • the regression layer determines a large number of possible regional proposals from the feature vector graph according to the second model parameters.
  • Each regional proposal includes a score indicating the confidence of the regional proposal, the diagonal vertices of the pedestrian bounding box, the coordinates of key points of the pedestrian, and the key Point visibility.
  • These large area suggestions correspond to the bounding box of the pedestrian to be selected and the key points of the pedestrian to be selected in the foregoing embodiment.
  • a dashed frame is used to represent the pedestrian bounding box of two pedestrians, and the black dots are the center of the shoulder and the center of the waist, respectively.
  • the intersection ratio between the pedestrian bounding boxes is very high, and one of the pedestrians is easily removed.
  • P1 is the pedestrian's shoulder center point
  • P2 is the pedestrian's waist center point
  • the line h between P1 and P2 is used as the height
  • the target width w is used as the width to generate a virtual frame (using dotted lines Means), that is, horizontal expansion of the key point connection.
  • the NMS of the line can be expanded to the NMS of the pose, that is, the virtual width is given to the line between the key points, and then the NMS is performed. It can be seen from the right side of Figure 3B that the intersection ratio between virtual frames is much smaller than the intersection ratio between pedestrian bounding boxes, which can increase the pedestrian recall rate.
  • the remaining region suggestions and feature vectors can be input to the pooling layer for normalization processing, and finally the output result of the model is obtained.
  • non-maximum value suppression is performed for key points of pedestrians. For multiple pedestrians that block each other, each pedestrian can be determined more accurately. Pedestrian bounding box and pedestrian key points of each pedestrian, thereby improving the accuracy of the determined pedestrian 3D position.
  • steps 2c-1 to 2c-3 in the foregoing embodiment can also be used to perform NMS on the reference pedestrian bounding box and the reference pedestrian key points detected from the sample pedestrian image.
  • the ⁇ value is continuously adjusted according to the difference between the reference value and the standard value, and the more optimized ⁇ value is finally determined.
  • the transfer learning method can be used, and the existing deep convolutional neural network that has achieved good results in the field of pedestrian detection, such as Faster R-CNN, etc., has the number of output categories and may need
  • the structure of other modified parts is modified accordingly, and the fully trained parameters in the original network model are directly used as model parameters.
  • the above-mentioned pedestrian detection model may further include a pooling layer and a fully connected layer.
  • the regression layer regresses the sample feature vector according to the second model parameters, the sample pedestrian bounding box and sample pedestrian key points can be obtained, and the sample feature vector, sample pedestrian bounding box and sample pedestrian key points are input into the pooling layer, pooling layer
  • the bounding box of the sample pedestrian and the key points of the sample pedestrian can be normalized, and the normalized result can be input into the fully connected layer.
  • the fully connected layer can map the normalized bounding box of the sample pedestrian and the key points of the sample pedestrian to obtain the output result of the model.
  • the transformation vector of key points can be calculated according to the following formula:
  • g x and g y represent the two components of the standard pedestrian key point
  • P x and P y represent the two components of the pedestrian key point in the regional proposal
  • P width and P height represent the width and height of the pedestrian bounding box in the regional proposal
  • D x and d y represent the calculation of the mapping relationship between the standard pedestrian key points and the pedestrian key points in the area proposal in each training process, with It refers to the coordinate components of the key points of pedestrians.
  • the pedestrian detection model After the pedestrian detection model is trained, it can be converted according to the d x and d y obtained in the training phase and the information in the area recommendation with with That is, the key points of pedestrians output by the pedestrian detection model.
  • FIG. 4 is a schematic structural diagram of a device for detecting a 3D position of a pedestrian provided by an embodiment of the present invention.
  • the device is applied to electronic equipment, and the device embodiment corresponds to the method embodiment shown in FIG. 1.
  • the device includes:
  • the acquiring module 410 is configured to acquire the image to be detected collected by the image acquisition device in the vehicle;
  • the detection module 420 is configured to input the image to be detected into a pedestrian detection model, and the pedestrian detection model detects the pedestrian bounding box and key points of the pedestrian in the image to be detected; wherein the pre-trained pedestrian detection model can make the image to be detected and the pedestrian The bounding box is associated with the key points of the pedestrian; the pedestrian detection model includes a feature extraction layer and a regression layer.
  • the feature vector of the image to be detected is determined by the first model parameter trained in the feature extraction layer, and the second model trained in the regression layer Parameter regression on the feature vector to obtain pedestrian bounding box and pedestrian key points in the image to be detected;
  • the determination module 430 is configured to determine the position of the pedestrian in the image to be detected in the world coordinate system according to the determined pedestrian bounding box and pedestrian key points, as well as the predetermined conversion relationship between the image coordinate system and the world coordinate system where the vehicle is located. Three-dimensional 3D position.
  • the pedestrian detection model also outputs information about whether there is a ground point in the pedestrian bounding box of the image to be detected;
  • the determination module 430 is specifically configured as:
  • the determining module 430 determines the relative position between the pedestrian bounding box and the pedestrian key points, and the predetermined image coordinate system and the world coordinates of the vehicle.
  • the conversion relationship between the systems includes:
  • the device further includes: a training module (not shown in the figure); a training module configured to train to obtain a pedestrian detection model using the following operations:
  • the difference amount is not less than the preset difference amount threshold, adjust the first model parameter and the second model parameter according to the difference amount, and return to execute the step of inputting each sample pedestrian image into the feature extraction layer in the pedestrian detection model;
  • the detection module 420 regresses the feature vector through the second model parameters trained in the regression layer to obtain the pedestrian bounding box and the pedestrian bounding box in the image to be detected.
  • Key points for pedestrians include:
  • the pedestrian bounding box and the pedestrian key points in the image to be detected are selected from the multiple candidate pedestrian bounding boxes and the candidate pedestrian key points in the candidate pedestrian bounding box.
  • the detection module 420 determines from multiple candidate pedestrian bounding boxes and the candidate pedestrian key in the candidate pedestrian bounding box.
  • pedestrian bounding box and pedestrian key points in the image to be detected in the points include:
  • the line is used as the height to generate a virtual frame corresponding to the key points of the pedestrian to be selected;
  • each virtual frame is filtered, and the candidate pedestrian bounding box corresponding to the filtered virtual frame and the candidate pedestrian key point in the candidate pedestrian bounding box are respectively used as the pedestrian in the image to be detected Bounding box and pedestrian key points.
  • the foregoing device embodiment corresponds to the method embodiment, and has the same technical effect as the method embodiment.
  • the device embodiment is obtained based on the method embodiment, and the specific description can be found in the method embodiment part, which will not be repeated here.
  • Fig. 5 is a schematic structural diagram of a vehicle-mounted terminal provided by an embodiment of the present invention.
  • the vehicle-mounted terminal includes: a processor 510 and an image acquisition device 520; the processor 510 includes an acquisition module 11, a detection module 12, and a determination module 13;
  • the acquisition module 11 is used to acquire the image to be detected collected by the image acquisition device 520 in the vehicle;
  • the detection module 12 is used to input the image to be detected into a pedestrian detection model, and the pedestrian detection model detects the pedestrian bounding box and key points of the pedestrian in the image to be detected; wherein the pre-trained pedestrian detection model can make the image to be detected and the pedestrian boundary
  • the frame is associated with the key points of pedestrians;
  • the pedestrian detection model includes a feature extraction layer and a regression layer.
  • the feature vector of the image to be detected is determined by the first model parameter trained in the feature extraction layer, and the second model parameter trained in the regression layer Perform regression on the feature vector to obtain the pedestrian bounding box and pedestrian key points in the image to be detected;
  • the determining module 13 is used to determine the three-dimensionality of the pedestrian in the image to be detected in the world coordinate system according to the determined pedestrian bounding box and pedestrian key points, as well as the predetermined conversion relationship between the image coordinate system and the world coordinate system where the vehicle is located 3D position.
  • the pedestrian detection model also outputs information about whether there is a ground point in the pedestrian bounding box of the image to be detected; the determining module 13 is specifically configured to:
  • the determination module 13 determines the relative position between the pedestrian bounding box and the pedestrian key points, and the predetermined image coordinate system and the world coordinate system where the vehicle is located.
  • the conversion relationship between the two includes:
  • the processor 510 further includes: a training module (not shown in the figure); a training module for training to obtain a pedestrian detection model using the following operations:
  • the difference amount is not less than the preset difference amount threshold, adjust the first model parameter and the second model parameter according to the difference amount, and return to execute the step of inputting each sample pedestrian image into the feature extraction layer in the pedestrian detection model;
  • the detection module 12 regresses the feature vector through the second model parameters trained in the regression layer to obtain the pedestrian bounding box and the pedestrian bounding box in the image to be detected.
  • Key points for pedestrians include:
  • the pedestrian bounding box and the pedestrian key points in the image to be detected are selected from multiple candidate pedestrian bounding boxes and candidate pedestrian key points in the candidate pedestrian bounding box.
  • the detection module 12 selects the key from multiple candidate pedestrian bounding boxes and the candidate pedestrians in the candidate pedestrian bounding box.
  • pedestrian bounding box and pedestrian key points in the image to be detected in the points include:
  • the line is used as the height to generate a virtual frame corresponding to the key points of the pedestrian to be selected;
  • each virtual frame is filtered, and the candidate pedestrian bounding box corresponding to the filtered virtual frame and the candidate pedestrian key point in the candidate pedestrian bounding box are respectively used as the pedestrian in the image to be detected Bounding box and pedestrian key points.
  • the embodiment of the terminal and the embodiment of the method shown in FIG. 1 are embodiments obtained based on the same inventive concept, and relevant points can be referred to each other.
  • the foregoing terminal embodiment corresponds to the method embodiment, and has the same technical effect as the method embodiment. For specific description, refer to the method embodiment.
  • modules in the device in the embodiment may be distributed in the device in the embodiment according to the description of the embodiment, or may be located in one or more devices different from this embodiment with corresponding changes.
  • the modules of the above-mentioned embodiments can be combined into one module or further divided into multiple sub-modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

本发明实施例公开一种行人3D位置的检测方法及装置、车载终端。该方法包括:将车辆中的图像采集设备采集待检测图像输入行人检测模型,由行人检测模型检测待检测图像中的行人边界框和行人关键点;其中,行人检测模型包含特征提取层和回归层,通过特征提取层中训练好的第一模型参数确定待检测图像的特征向量,通过回归层中训练好的第二模型参数对特征向量进行回归,得到待检测图像中的行人边界框和行人关键点;根据确定的行人边界框和行人关键点,以及预先确定的图像坐标系与车辆所在世界坐标系之间的转换关系,确定待检测图像中的行人的3D位置。应用本发明实施例提供的方案,能够在行人边界框不存在接地点时也能准确地确定行人的3D位置。

Description

一种行人3D位置的检测方法及装置、车载终端 技术领域
本发明涉及智能驾驶技术领域,具体而言,涉及一种行人3D位置的检测方法及装置、车载终端。
背景技术
行人检测是智能驾驶领域至关重要的感知任务之一。行人检测通常是指在获取车辆中安装的相机采集的图像时,对图像中的行人进行检测,当检测得到图像中的行人边界框时,根据行人边界框中行人的接地点位置,确定行人在车辆所在的世界坐标系下的3D(3 Dimensions,三维)位置。根据该3D位置,即能够确定行人相对于车辆的位置,从而控制车辆的行驶,保障行人和车辆的安全。
相机通常安装在车辆的前挡风玻璃内侧。当行人距离车辆较近时,行人的脚部容易被车辆的引擎盖遮挡,导致相机采集的图像中没有行人的接地点,这样从图像中检测得到的行人边界框就不存在接地点,根据这样的行人边界框无法准确地确定行人的3D位置。
发明内容
本发明提供了一种行人3D位置的检测方法及装置、车载终端,以在行人边界框不存在接地点时也能准确地确定行人的3D位置。具体的技术方案如下。
第一方面,本发明实施例公开了一种行人3D位置的检测方法,包括:
获取车辆中的图像采集设备采集的待检测图像;
将所述待检测图像输入行人检测模型,由所述行人检测模型检测所述待检测图像中的行人边界框和行人关键点;其中,预先训练好的所述行人检测模型能够使得所述待检测图像与行人边界框和行人关键点进行关联;所述行人检测模型包含特征提取层和回归层,通过所述特征提取层中训练好的第一模型参数确定所述待检测图像的特征向量,通过所述回归层中训练好的第二模型参数对所述特征向量进行回归,得到所述待检测图像中的行人边界框和行人关键点;
根据确定的所述行人边界框和行人关键点,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的三维3D位置。
可选的,所述行人检测模型还输出所述待检测图像的行人边界框是否存在接地点的信息;
所述根据确定的所述行人边界框和行人关键点,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的3D位置的步骤,包括:
判断所述行人边界框是否存在接地点;
如果存在,则根据所述行人边界框的接地点,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的3D位置;
如果不存在,则根据确定的所述行人边界框和行人关键点之间的相对位置,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的3D位置。
可选的,所述根据确定的所述行人边界框和行人关键点之间的相对位置,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的3D位置的步骤,包括:
确定所述行人关键点与所述行人边界框的上边界框之间的第一高度;
根据预设的所述行人关键点与人体头顶、人体脚底之间的比例关系,以及所述第一高度,预测所述行人关键点与所述行人边界框对应的行人脚底之间的第二高度;
根据所述第二高度,确定所述待检测图像中所述行人边界框对应的接地点;
根据确定的所述行人边界框对应的接地点,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的3D位置。
可选的,所述行人检测模型采用以下方式训练得到:
获取多个样本行人图像和标注的标准行人边界框和标准行人关键点;
将每个样本行人图像输入行人检测模型中的特征提取层;
通过所述特征提取层中的第一模型参数,确定所述样本行人图像的样本特征向量,并将所述样本特征向量发送至所述行人检测模型中的回归层;
通过所述回归层中的第二模型参数,对所述样本特征向量进行回归,得到所述样本行人图像中的样本行人边界框和样本行人关键点;
确定所述样本行人边界框和样本行人关键点分别与对应的标准行人边界框和标准行人关键点之间的差异量;
当所述差异量不小于预设差异量阈值时,根据所述差异量对所述第一模型参数和所述第二模型参数进行调整,返回执行所述将每个样本行人图像输入行人检测模型中的特征提取层的步骤;
当所述差异量小于预设差异量阈值时,确定所述行人检测模型训练完成。
可选的,所述通过所述回归层中训练好的第二模型参数,对所述特征向量进行回归,得到待检测图像中的行人边界框和行人关键点的步骤,包括:
通过所述回归层中训练好的第二模型参数,对所述特征向量进行回归,得到多个待选行人边界框和该待选行人边界框中的待选行人关键点;
根据非极大抑制算法,从多个待选行人边界框和该待选行人边界框中的待选行人关键点中选择所述待检测图像中的行人边界框和行人关键点。
可选的,所述根据非极大抑制算法,从多个待选行人边界框和该待选行人边界框中的待选行人关键点中选择所述待检测图像中的行人边界框和行人关键点的步骤,包括:
确定每个待选行人边界框中的待选行人关键点之间的连线;
根据预先训练得到的目标宽度,以所述连线作为高度,生成所述待选行人关键点对应的虚拟边框;
根据非极大抑制算法,对每个虚拟边框进行筛选,将筛选出的虚拟边框对应的待选行人边界框和该待选行人边界框中的待选行人关键点分别作为所述待检测图像中的行人边界框和行人关键点。
第二方面,本发明实施例提供了一种行人3D位置的检测装置,包括:
获取模块,被配置为获取车辆中的图像采集设备采集的待检测图像;
检测模块,被配置为将所述待检测图像输入行人检测模型,由所述行人检测模型检测所述待检测图像中的行人边界框和行人关键点;其中,预先训练好的所述行人检测模型能够使得所述待检测图像与行人边界框和行人关键点进行关联;所述行人检测模型包含特征提取层和回归层,通过所述特征提取层中训练好的第一模型参数确定所述待检测图像的特征向量,通过所述回归层中训练好的第二模型参数对所述特征向量进行回归,得到所述待检测图像中的行人边界框和行人关键点;
确定模块,被配置为根据确定的所述行人边界框和行人关键点,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的三维3D位置。
可选的,所述行人检测模型还输出所述待检测图像的行人边界框是否存在接地点的信息;
所述确定模块,具体被配置为:
判断所述行人边界框是否存在接地点;
如果存在,则根据所述行人边界框的接地点,以及预先确定的图像坐标系与所述车辆所在世界坐 标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的3D位置;
如果不存在,则根据确定的所述行人边界框和行人关键点之间的相对位置,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的3D位置。
可选的,所述确定模块,根据确定的所述行人边界框和行人关键点之间的相对位置,以及预先确定的图像坐标系与车辆所在世界坐标系之间的转换关系,确定待检测图像中的行人在所述世界坐标系下的3D位置时,包括:
确定所述行人关键点与所述行人边界框的上边界框之间的第一高度;
根据预设的所述行人关键点与人体头顶、人体脚底之间的比例关系,以及所述第一高度,预测所述行人关键点与所述行人边界框对应的行人脚底之间的第二高度;
根据第二高度,确定所述待检测图像中所述行人边界框对应的接地点;
根据确定的所述行人边界框对应的接地点,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的3D位置。
可选的,所述装置还包括:训练模块;所述训练模块,被配置为采用以下操作训练得到所述行人检测模型:
获取多个样本行人图像和标注的标准行人边界框和标准行人关键点;
将每个样本行人图像输入行人检测模型中的特征提取层;
通过所述特征提取层中的第一模型参数,确定所述样本行人图像的样本特征向量,并将所述样本特征向量发送至所述行人检测模型中的回归层;
通过所述回归层中的第二模型参数,对所述样本特征向量进行回归,得到所述样本行人图像中的样本行人边界框和样本行人关键点;
确定所述样本行人边界框和样本行人关键点分别与对应的标准行人边界框和标准行人关键点之间的差异量;
当所述差异量不小于预设差异量阈值时,根据所述差异量对所述第一模型参数和所述第二模型参数进行调整,返回执行所述将每个样本行人图像输入行人检测模型中的特征提取层的步骤;
当所述差异量小于预设差异量阈值时,确定所述行人检测模型训练完成。
可选的,所述检测模块,通过所述回归层中训练好的第二模型参数,对所述特征向量进行回归,得到所述待检测图像中的行人边界框和行人关键点时,包括:
通过所述回归层中训练好的第二模型参数,对所述特征向量进行回归,得到多个待选行人边界框和该待选行人边界框中的待选行人关键点;
根据非极大抑制算法,从多个待选行人边界框和该待选行人边界框中的待选行人关键点中选择所述待检测图像中的行人边界框和行人关键点。
可选的,所述检测模块,根据非极大抑制算法,从多个待选行人边界框和该待选行人边界框中的待选行人关键点中选择所述待检测图像中的行人边界框和行人关键点时,包括:
确定每个待选行人边界框中的待选行人关键点之间的连线;
根据预先训练得到的目标宽度,以所述连线作为高度,生成所述待选行人关键点对应的虚拟边框;
根据非极大抑制算法,对每个虚拟边框进行筛选,将筛选出的虚拟边框对应的待选行人边界框和该待选行人边界框中的待选行人关键点分别作为所述待检测图像中的行人边界框和行人关键点。
第三方面,本发明实施例公开了一种车载终端,包括:处理器和图像采集设备;所述处理器包括获取模块、检测模块和确定模块;
所述获取模块,用于获取车辆中的图像采集设备采集的待检测图像;
所述检测模块,用于将所述待检测图像输入行人检测模型,由所述行人检测模型检测所述待检测图像中的行人边界框和行人关键点;其中,预先训练好的所述行人检测模型能够使得所述待检测图像与行人边界框和行人关键点进行关联;所述行人检测模型包含特征提取层和回归层,通过所述特征提取层中训练好的第一模型参数确定所述待检测图像的特征向量,通过所述回归层中训练好的第二模型参数对所述特征向量进行回归,得到所述待检测图像中的行人边界框和行人关键点;
所述确定模块,用于根据确定的所述行人边界框和行人关键点,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的三维3D位置。
可选的,所述行人检测模型还输出所述待检测图像的行人边界框是否存在接地点的信息;所述确定模块,具体用于:
判断所述行人边界框是否存在接地点;
如果存在,则根据所述行人边界框的接地点,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的3D位置;
如果不存在,则根据确定的所述行人边界框和行人关键点之间的相对位置,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的3D位置。
可选的,所述确定模块,根据确定的所述行人边界框和行人关键点之间的相对位置,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的3D位置时,包括:
确定所述行人关键点与所述行人边界框的上边界框之间的第一高度;
根据预设的所述行人关键点与人体头顶、人体脚底之间的比例关系,以及所述第一高度,预测所述行人关键点与所述行人边界框对应的行人脚底之间的第二高度;
根据所述第二高度,确定所述待检测图像中所述行人边界框对应的接地点;
根据确定的所述行人边界框对应的接地点,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的3D位置。
可选的,所述处理器还包括:训练模块;所述训练模块,用于采用以下操作训练得到所述行人检测模型:
获取多个样本行人图像和标注的标准行人边界框和标准行人关键点;
将每个样本行人图像输入行人检测模型中的特征提取层;
通过所述特征提取层中的第一模型参数,确定所述样本行人图像的样本特征向量,并将所述样本特征向量发送至所述行人检测模型中的回归层;
通过所述回归层中的第二模型参数,对所述样本特征向量进行回归,得到所述样本行人图像中的样本行人边界框和样本行人关键点;
确定所述样本行人边界框和样本行人关键点分别与对应的标准行人边界框和标准行人关键点之间的差异量;
当所述差异量不小于预设差异量阈值时,根据所述差异量对所述第一模型参数和所述第二模型参数进行调整,返回执行所述将每个样本行人图像输入行人检测模型中的特征提取层的步骤;
当所述差异量小于预设差异量阈值时,确定所述行人检测模型训练完成。
可选的,所述检测模块,通过所述回归层中训练好的第二模型参数,对所述特征向量进行回归,得到所述待检测图像中的行人边界框和行人关键点时,包括:
通过所述回归层中训练好的第二模型参数,对所述特征向量进行回归,得到多个待选行人边界框和该待选行人边界框中的待选行人关键点;
根据非极大抑制算法,从多个待选行人边界框和该待选行人边界框中的待选行人关键点中选择所述待检测图像中的行人边界框和行人关键点。
可选的,所述检测模块,根据非极大抑制算法,从多个待选行人边界框和该待选行人边界框中的待选行人关键点中选择所述待检测图像中的行人边界框和行人关键点时,包括:
确定每个待选行人边界框中的待选行人关键点之间的连线;
根据预先训练得到的目标宽度,以所述连线作为高度,生成所述待选行人关键点对应的虚拟边框;
根据非极大抑制算法,对每个虚拟边框进行筛选,将筛选出的虚拟边框对应的待选行人边界框和该待选行人边界框中的待选行人关键点分别作为所述待检测图像中的行人边界框和行人关键点。
由上述内容可知,本发明实施例提供的行人3D位置的检测方法及装置、车载终端,可以由行人检测模型检测待检测图像中的行人边界框和行人关键点,根据确定的行人边界框和行人关键点以及预先确定的图像坐标系与世界坐标系之间的转换关系,确定待检测图像中的行人在世界坐标系下的3D位置。本发明实施例可以由行人检测模型同时检测出待检测图像中的行人边界框和行人关键点,当行人边界框没有接地点时,可以采用行人边界框和行人关键点的结合,更准确地确定行人的3D位置。
本发明实施例的创新点包括:
1、针对每个待检测图像,采用行人检测模型一次性地从待检测图像中检测出行人边界框和行人关键点,在行人边界框不存在接地点时可以结合行人关键点确定行人的3D位置,提高3D位置的准确性。
2、当行人边界框不存在接地点时,可以结合行人关键点与行人各部位之间的比例关系,确定行人关键点到行人脚底的高度,进而确定行人边界框对应的接地点。当确定行人在待检测图像中的接地点时可以确定行人的3D位置,这样能够提高行人3D位置的准确性。
3、在行人检测模型检测行人边界框和行人关键点的过程中,针对行人关键点进行非极大值抑制,针对多个相互遮挡的行人,也能够更准确地确定每个行人的行人边界框和行人关键点,进而提高确定的行人3D位置的准确性。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单介绍。显而易见地,下面描述中的附图仅仅是本发明的一些实施例。对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的行人3D位置的检测方法的一种流程示意图;
图2为本发明实施例提供的对待检测图像进行检测的一种流程参考图;
图3A为本发明实施例提供的行人检测模型检测过程的一种流程示意图;
图3B为本发明实施例提供的执行非极大值抑制时的一种示意图;
图4为本发明实施例提供的行人3D位置的检测装置的一种结构示意图;
图5为本发明实施例提供的车载终端的一种结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整的描述。显然,所描述的实施例仅仅是本发明的一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有付出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
需要说明的是,本发明实施例及附图中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。例如包含的一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。
本发明实施例公开了一种行人3D位置的检测方法及装置、车载终端,能够在行人边界框不存在接地点时也能准确地确定行人的3D(3 Dimensions,三维)位置。下面对本发明实施例进行详细说明。
图1为本发明实施例提供的行人3D位置的检测方法的一种流程示意图。该方法应用于电子设备。该电子设备可以为普通计算机、服务器或智能移动设备等,也可以为安装于车辆中的车载终端。该方法具体包括以下步骤。
S110:获取车辆中的图像采集设备采集的待检测图像。
图像采集设备可以为普通相机、监控摄像头或者行车记录仪。图像采集设备可以为安装于车辆的前挡风玻璃内侧的相机,也可以为安装于车辆的后挡风玻璃内侧的相机。
待检测图像中包含行人以及行人之外的背景区域。待检测图像中可以包含一个或多个行人,行人可能距离车辆较远,也可能距离车辆较近;待检测图像中可以存在行人的接地点,也可以不存在行人的接地点。行人的接地点可能被车辆遮挡,也可能被其他障碍物遮挡。
其中,接地点可以理解为行人与道路接触的点。
S120:将待检测图像输入行人检测模型,由行人检测模型检测待检测图像中的行人边界框和行人关键点。
其中,预先训练好的行人检测模型能够使得待检测图像与行人边界框和行人关键点进行关联。行人检测模型包含特征提取层和回归层。行人检测模型可以为预先根据样本行人图像和标注的标准行人编辑框和标准行人关键点、采用机器学习算法训练得到的。行人检测模型可以为深度学习中的神经网络模型。
行人检测模型检测待检测图像中的行人边界框和行人关键点时,具体可以包括:通过特征提取层中训练好的第一模型参数确定待检测图像的特征向量,通过回归层中训练好的第二模型参数对特征向量进行回归,得到待检测图像中的行人边界框和行人关键点。
其中,待检测图像中每个行人的行人边界框和行人关键点是相关联的。每个行人均包含行人边界框和行人关键点。行人边界框可以理解为能够包围行人的身体区域所有像素点的矩形框,行人边界框可以采用矩形框的对角顶点坐标表示。行人边界框还可以包含行人边界框中心点的坐标。
行人关键点可以包括腰部关键点、肩部关键点、胳膊关键点、头部关键点、腿部关键点等。由于人体的脚部和腿部容易被车辆等物体遮挡,因此,行人关键点可以选用腰部关键点和肩部关键点。例如,可以将腰部中心点作为腰部关键点,将肩部中心点作为肩部关键点。
行人关键点可能会被遮挡而无法检测到,但是当行人边界框确定时,可以根据行人边界框确定行人关键点,因此可以在样本行人图像中根据行人边界框的位置标注标准行人关键点的位置以及行人关键点的可见性,这样训练好的行人检测模型也能够确定行人关键点以及行人关键点的可见性。
例如,行人检测模型在对待检测图像进行检测后,可以输出以下检测结果:行人的肩部中心点和腰部中心点的坐标以及可见性、行人边界框对角点的坐标。
S130:根据确定的行人边界框和行人关键点,以及预先确定的图像坐标系与车辆所在世界坐标系之间的转换关系,确定待检测图像中的行人在世界坐标系下的3D位置。
其中,图像坐标系为待检测图像所在坐标系。世界坐标系为三维坐标系,可以以车辆的中心点为原点,以车辆的行进方向为X轴方向,以垂直于车辆顶面且向上的方向为Z轴方向。
所确定的行人边界框和行人关键点均为图像坐标系中的参量,根据该行人边界框和行人关键点能够确定行人在待检测图像中的接地点;根据图像坐标系与车辆所在世界坐标系之间的转换关系,可以将该接地点的位置转换成世界坐标系下的3D位置。该3D位置能够表示行人距离位于车辆中的各个坐标轴方向上的距离。
在确定图像坐标系与车辆所在世界坐标系之间的转换关系时,可以根据图像坐标系与相机坐标系之间的转换关系、以及相机坐标系与世界坐标系之间的转换关系确定。相机坐标系为图像采集设备所在的三维坐标系,相机坐标系可以图像采集设备感光元件的光心为原点,以光轴为Z轴建立坐标系。 根据图像采集设备的内参矩阵,可以得到图像坐标系与相机坐标系之间的转换关系。
例如,内参矩阵可以为
Figure PCTCN2019108075-appb-000001
其中,s为光轴的倾斜参量,f u和f v为感光元件的焦距,u 0和v 0为图像坐标系的原点到中心点的距离,也可以分别为待检测图像长度和宽度的一半。u和v分别为图像坐标系的两个坐标轴。
由上述内容可知,本实施例可以由行人检测模型检测待检测图像中的行人边界框和行人关键点,根据确定的行人边界框和行人关键点以及预先确定的图像坐标系与世界坐标系之间的转换关系,确定待检测图像中的行人在世界坐标系下的3D位置。本实施例可以由行人检测模型同时检测出待检测图像中的行人边界框和行人关键点,当行人边界框没有接地点时,可以采用行人边界框和行人关键点的结合,更准确地确定行人的3D位置。
为了确定行人的3D位置,在一种方案中,可以训练用于检测行人边界框的第一网络模型和用于检测行人关键点的第二网络模型,当第一网络模型检测到待检测图像中的行人边界框不存在接地点时,可以再将待检测图像输入第二网络模型,由第二网络模型检测待检测图像中的行人关键点,将行人边界框和行人关键点进行结合确定行人的3D位置。但是,这种方案就需要将待检测图像两次输入网络模型,对待检测图像检测两次,并且前期需要训练两个网络模型,整体上处理的效率较低。相比于这种方案,图1所示实施例中能够对待检测图像检测一次,同时输出行人边界框和行人关键点,在一定程度上节省了运行时间,提高了检测效率。
在本发明的另一实施例中,基于图1所示实施例,行人检测模型还输出待检测图像的行人边界框是否存在接地点的信息。
本实施例中,步骤S130,根据确定的行人边界框和行人关键点,以及预先确定的图像坐标系与车辆所在世界坐标系之间的转换关系,确定待检测图像中的行人在世界坐标系下的3D位置时,具体可以包括以下步骤1a~3a。
步骤1a:判断行人边界框是否存在接地点,如果存在,则执行步骤2a;如果不存在,则执行步骤3a。
本步骤可以根据行人检测模型输出的信息,判断行人边界框是否存在接地点。
步骤2a:根据行人边界框的接地点,以及预先确定的图像坐标系与车辆所在世界坐标系之间的转换关系,确定待检测图像中的行人在世界坐标系下的3D位置。
本步骤中,根据图像坐标系与世界坐标系之间的转换关系,可以确定行人边界框的接地点在世界坐标系下的3D位置,该3D位置即为待检测图像中的行人在世界坐标系下的3D位置。
也可以是,根据图像坐标系与世界坐标系之间的转换关系,确定行人边界框的接地点以及行人边界框的头部顶点和表征身体宽度的点在世界坐标系下的3D位置,将该多个3D位置构成的人体的立体包围框,作为待检测图像中的行人在世界坐标系下的3D位置。
步骤3a:根据确定的行人边界框和行人关键点之间的相对位置,以及预先确定的图像坐标系与车辆所在世界坐标系之间的转换关系,确定待检测图像中的行人在世界坐标系下的3D位置。
其中,行人边界框与行人关键点之间的相对位置,可以包括行人关键点到行人边界框顶端的距离、行人关键点到行人边界框侧边的距离等。例如,该行人边界框与行人关键点之间的相对位置可以包括肩部关键点到头顶的距离,腰部关键点到头顶的距离等。
综上,本实施例可以判断行人边界框是否存在接地点,当存在时直接根据行人边界框的接地点确定行人的3D位置,当不存在时根据行人边界框与行人关键点之间的相对位置确定行人的3D位置,根据不同的情况做出不同的处理,能够提高整体的计算效率。
在本发明的另一实施例中,基于图1所示实施例,步骤3a,根据确定的行人边界框和行人关键点之间的相对位置,以及预先确定的图像坐标系与车辆所在世界坐标系之间的转换关系,确定待检测图像中的行人在世界坐标系下的3D位置的步骤,具体可以包括步骤3a-1~3a-4。
步骤3a-1:确定行人关键点与行人边界框的上边界框之间的第一高度。
其中,上边界框可以理解为行人边界框所在的矩形框的上边。第一高度可以理解为行人关键点与行人边界框的上边界框在待检测图像的纵向上的距离。
步骤3a-2:根据预设的行人关键点与人体头顶、人体脚底之间的比例关系,以及第一高度,预测行人关键点与行人边界框对应的行人脚底之间的第二高度。
行人关键点与人体头顶、人体脚底之间的比例关系可以为预先对大量人体样本进行统计后得到的数据。例如,可以根据统计得到人体肩部中心点到人体头顶以及人体脚底的比例关系,以及人体腰部中心点到人体头顶以及人体脚底的比例关系。
当行人边界框不存在接地点时,行人边界框可以只包含人体上半身或者包含除人体脚部以外的区域。为了确定行人在世界坐标系下的3D位置,可以预测行人关键点与行人边界框对应的行人脚底之间的第二高度,根据该第二高度确定行人接地点在待检测图像中的位置。
步骤3a-3:根据第二高度,确定待检测图像中所述行人边界框对应的接地点。
本步骤中,可以直接从行人关键点的坐标向下扩展第二高度的位置,得到行人边界框对应的接地点;也可以根据预先经过实测确定的待检测图像不同坐标区间与真实空间之间的缩放关系,对第二高度进行缩放处理,再从行人关键点的坐标向下扩展处理后的第二高度的位置,得到行人边界框对应的接地点。
步骤3a-4:根据确定的行人边界框对应的接地点,以及预先确定的图像坐标系与车辆所在世界坐标系之间的转换关系,确定待检测图像中的行人在世界坐标系下的3D位置。
本步骤的具体实施方式可以参考步骤2a中的说明。
综上,本实施例中,当行人边界框不存在接地点时,可以结合行人关键点与行人各部位之间的比例关系,确定行人关键点到行人脚底的高度,进而确定行人边界框对应的接地点。当确定行人在待检测图像中的接地点时可以确定行人的3D位置,这样能够提高行人3D位置的准确性。
在本发明的另一实施例中,基于图1所示实施例,行人检测模型可以采用以下步骤1b~6b训练得到。
步骤1b:获取多个样本行人图像和标注的标准行人边界框和标准行人关键点。该标准行人边界框和标准行人关键点可以视为真实值。
在实际应用中,为了使得模型训练得更准确,可以获取大量样本行人图像。样本行人图像中可以包括一个或多个行人。样本行人图像中包含行人之外的背景区域。
样本行人图像可以是预先利用车辆上的摄像头采集得到的。每个样本行人图像均标注有标准行人边界框以及是否存在接地点的信息,标注的标准行人关键点可以包括关键点坐标以及关键点的可见性。
步骤2b:将每个样本行人图像输入行人检测模型中的特征提取层。
步骤3b:通过特征提取层中的第一模型参数,确定样本行人图像的样本特征向量,并将样本特征向量发送至所述行人检测模型中的回归层。
其中,特征提取层和回归层的功能可以分别以不同的卷积层来实现。样本特征向量可以特征矩阵的形式表示。第一模型参数的初始值可以根据经验预先设置,例如可以设置为较小的值。在每次训练的过程中,第一模型参数不断地被修正,逐渐接近真实值。
步骤4b:通过回归层中的第二模型参数,对样本特征向量进行回归,得到样本行人图像中的样本行人边界框和样本行人关键点。
第二模型参数的初始值可以根据经验预先设置,例如可以设置为较小的值。在每次训练的过程中,第二模型参数不断地被修正,逐渐接近真实值。
在训练过程中,得到的样本行人边界框和样本行人关键点可能不够准确,该样本行人边界框和样本行人关键点可以作为对第一模型参数和第二模型参数进行修正时的参考依据。
步骤5b:确定所述样本行人边界框和样本行人关键点分别与对应的标准行人边界框和标准行人关键点之间的差异量。
其中,上述差异量可以采用损失函数(loss)确定。在确定差异量时可以分别确定样本行人边界框与标准行人边界框之间的差异量,确定样本行人关键点与标准行人关键点之间的差异量。
步骤6b:当差异量不小于预设差异量阈值时,根据该差异量对第一模型参数和第二模型参数进行调整,返回执行步骤2b。当差异量小于预设差异量阈值时,确定行人检测模型训练完成。
当返回执行步骤2b时,可以将其他的样本行人图像输入行人检测模型中的特征提取层,执行下一次学习过程。
本实施例中,在确定上述差异量时,可以判断该差异量是否小于预设差异量阈值。当差异量不小于预设差异量阈值时,认为行人检测模型的预测结果与标准值之间的差异较大,需要继续训练网络。根据差异量对上述第一模型参数和第二模型参数进行调整时,可以参考差异量的具体数值以及变化方向,根据该具体数值,向相反方向调整该第一模型参数和第二模型参数。
在本发明的另一实施例中,基于图1所示实施例,步骤S120中,通过回归层中训练好的第二模型参数,对特征向量进行回归,得到待检测图像中的行人边界框和行人关键点的步骤,包括:
步骤1c:通过回归层中训练好的第二模型参数,对特征向量进行回归,得到多个待选行人边界框和该待选行人边界框中的待选行人关键点。
根据第二模型参数,直接对特征向量进行回归时,可以得到大量的行人边界框和行人关键点,作为待选行人边界框和待选行人关键点。为了对这些大量的行人边界框和行人关键点进行筛选,消除多余的检测结果,可以对这些大量的行人边界框和行人关键点进行NMS(Non-Maximum Suppression,非极大值抑制)处理。
步骤2c:根据非极大抑制算法,从多个待选行人边界框和该待选行人边界框中的待选行人关键点中选择待检测图像中的行人边界框和行人关键点。
由于待选行人边界框和待选行人关键点是对应关联的,可以根据对各个待选行人边界框之间的重合度进行筛选,例如可以确定两个待选行人边界框之间的交并比(即重合度),针对交并比大于预设交并比的待选行人边界框,去除分值低的待选行人边界框和对应的待选行人关键点。分值为置信度得分。
在行人较多且拥挤的场景中,行人之间的间距非常小,还经常会相互遮挡。在这种情况下,这些行人之间的待选行人边界框之间的交并比会比较大,超出预设交并比阈值,导致部分行人的行人边界框被清除掉,造成多个相互遮挡的行人可能只能检测出一组行人边界框和行人关键点。
为了尽可能多地从待检测图像中检测到每个行人,提高算法的召回率(召回率即Recall Rate,也叫查全率),在本发明的另一实施例中,步骤2c可以采用以下实施方式实现,具体包括以下步骤2c-1~2c-3。
步骤2c-1:确定每个待选行人边界框中的待选行人关键点之间的连线。
例如,当待选行人关键点包括肩部中心点和腰部中心点时,可以将每个待选行人边界框中的肩部中心点和腰部中心点进行连线。
步骤2c-2:根据预先训练得到的目标宽度,以上述连线作为高度,生成待选行人关键点对应的虚拟边框。
其中,上述目标宽度为在行人检测模型的训练过程中确定的较优值。虚拟边框可以理解为矩形框,该矩形框的高度为上述连线,宽度为上述目标宽度。这样可以针对每一组待选行人边界框和待选行人关键点,可以得到一个虚拟边框。
根步骤2c-3:据非极大抑制算法,对每个虚拟边框进行筛选,将筛选出的虚拟边框对应的待选行人边界框和该待选行人边界框中的待选行人关键点分别作为所述待检测图像中的行人边界框和行人关键点。
具体的,本步骤可以确定每个虚拟边框之间的交并比,针对交并比大于预设交并比阈值的虚拟边框对应的待选行人边界框和待选行人关键点,去除分值低的待选行人边界框和对应的待选行人关键点,将剩余的待选行人边界框和待选行人关键点作为待检测图像中的行人边界框和行人关键点。
下面结合具体实例对本实施例进行说明。图3A为行人检测模型对待检测图像进行检测得到输出结果的流程示意图。将待检测图像输入特征提取层,特征提取层根据第一模型参数确定待检测图像的特征向量,得到特征向量图,并将特征向量图输入回归层。回归层根据第二模型参数从特征向量图中确定大量可能的区域建议(proposals),每个区域建议中包括表示区域建议置信度的分值、行人边界框对角顶点和行人关键点坐标以及关键点可见性。将这些大量的区域建议对应于上述实施例中的待选行人边界框和待选行人关键点。
参见图3B的左侧图,采用虚线框表示两个行人的行人边界框,黑色圆点分别为肩部中心点和腰部中心点。当根据行人边界框对各个区域建议进行非极大值抑制时,当两个行人距离很近时,行人边界框之间的交并比非常高,其中一个行人容易被清除掉。参见图3B的右侧图,P1为行人的肩部中心点,P2为行人的腰部中心点,以P1和P2的连线h作为高度,以目标宽度w作为宽度,生成虚拟边框(采用点画线表示),即对关键点连线进行横向膨胀。当从两个关键点扩展到虚拟边框时,线的NMS可以扩展为位姿的NMS,即给关键点之间的连线赋予虚拟宽度,再进行NMS。从图3B右侧图中可见,虚拟边框之间的交并比比行人边界框之间的交并比小得多,能够提高行人的召回率。
当对图3A中的区域建议进行NMS之后,可以将剩余的区域建议和特征向量均输入池化层进行归一化处理,最后得到模型的输出结果。
综上,本实施例中,在行人检测模型检测行人边界框和行人关键点的过程中,针对行人关键点进行非极大值抑制,针对多个相互遮挡的行人,也能够更准确地确定每个行人的行人边界框和行人关键点,进而提高确定的行人3D位置的准确性。
为了确定目标宽度,可以将设置σ=h/w,通过在训练阶段设定不同的σ值,得到更优化的目标宽度。
在行人检测模型的训练过程中,也可以采用上述实施例中步骤2c-1~2c-3的方式对从样本行人图像中检测到的参考行人边界框和参考行人关键点进行NMS。在训练过程中,根据参考值与标准值之间的差异量不断调整σ值,最终确定较优化的σ值。
为了更快速地得到行人检测模型,可以采用迁移学习的方法,利用已有的在行人检测领域取得较好结果的深度卷积神经网络,如Faster R-CNN等,对其输出类别数量及可能需要修改的其他部位的结构做出相应的修改,并直接采用原有网络模型中已经充分训练的参数,作为模型参数。
在另一种实施方式中,上述行人检测模型还可以包含池化层和全连接层。当回归层根据第二模型参数对样本特征向量进行回归之后,可以得到样本行人边界框和样本行人关键点,将样本特征向量以及样本行人边界框和样本行人关键点输入池化层,池化层可以对样本行人边界框和样本行人关键点进行归一化,并将归一化后的结果输入全连接层。全连接层可以对归一化后的样本行人边界框和样本行人关键点进行映射,得到模型的输出结果。
参见图3A,在训练阶段中,在回归行人关键点的坐标时,可以根据以下公式计算关键点的变换向量:
Figure PCTCN2019108075-appb-000002
Figure PCTCN2019108075-appb-000003
其中,g x和g y表示标准行人关键点的两个分量,P x和P y表示区域建议中行人关键点的两个分量,P width和P height表示区域建议中行人边界框的宽度和高度,d x和d y表示在每一次训练过程中计算标准行人关键点与区域建议中行人关键点之间的映射关系,
Figure PCTCN2019108075-appb-000004
Figure PCTCN2019108075-appb-000005
为参考行人关键点的坐标分量。在训练过程中,可以将
Figure PCTCN2019108075-appb-000006
和g之间的差异作为损失函数,通过对第一模型参数和第二模型参数进行调整而不断地减小损失函数,进而学习到更好的d x和d y。当训练完成后
当行人检测模型训练好之后,可以根据训练阶段得到的d x和d y,以及区域建议中的信息转换得到
Figure PCTCN2019108075-appb-000007
Figure PCTCN2019108075-appb-000008
Figure PCTCN2019108075-appb-000009
即为行人检测模型输出的行人关键点。
图4为本发明实施例提供的行人3D位置的检测装置的一种结构示意图。该装置应用于电子设备,该装置实施例与图1所示方法实施例相对应。该装置包括:
获取模块410,被配置为获取车辆中的图像采集设备采集的待检测图像;
检测模块420,被配置为将待检测图像输入行人检测模型,由行人检测模型检测待检测图像中的行人边界框和行人关键点;其中,预先训练好的行人检测模型能够使得待检测图像与行人边界框和行人关键点进行关联;行人检测模型包含特征提取层和回归层,通过特征提取层中训练好的第一模型参数确定待检测图像的特征向量,通过回归层中训练好的第二模型参数对特征向量进行回归,得到待检测图像中的行人边界框和行人关键点;
确定模块430,被配置为根据确定的行人边界框和行人关键点,以及预先确定的图像坐标系与车辆所在世界坐标系之间的转换关系,确定待检测图像中的行人在世界坐标系下的三维3D位置。
在本发明的另一实施例中,基于图4所示实施例,行人检测模型还输出待检测图像的行人边界框是否存在接地点的信息;确定模块430,具体被配置为:
判断行人边界框是否存在接地点;
如果存在,则根据行人边界框的接地点,以及预先确定的图像坐标系与车辆所在世界坐标系之间的转换关系,确定待检测图像中的行人在世界坐标系下的3D位置;
如果不存在,则根据确定的行人边界框和行人关键点之间的相对位置,以及预先确定的图像坐标系与车辆所在世界坐标系之间的转换关系,确定待检测图像中的行人在世界坐标系下的3D位置。
在本发明的另一实施例中,基于图4所示实施例,确定模块430,根据确定的行人边界框和行人关键点之间的相对位置,以及预先确定的图像坐标系与车辆所在世界坐标系之间的转换关系,确定待检测图像中的行人在世界坐标系下的3D位置时,包括:
确定行人关键点与行人边界框的上边界框之间的第一高度;
根据预设的行人关键点与人体头顶、人体脚底之间的比例关系,以及第一高度,预测行人关键点与行人边界框对应的行人脚底之间的第二高度;
根据第二高度,确定待检测图像中行人边界框对应的接地点;
根据确定的行人边界框对应的接地点,以及预先确定的图像坐标系与车辆所在世界坐标系之间的转换关系,确定待检测图像中的行人在世界坐标系下的3D位置。
在本发明的另一实施例中,基于图4所示实施例,该装置还包括:训练模块(图中未示出);训练模块,被配置为采用以下操作训练得到行人检测模型:
获取多个样本行人图像和标注的标准行人边界框和标准行人关键点;
将每个样本行人图像输入行人检测模型中的特征提取层;
通过特征提取层中的第一模型参数,确定样本行人图像的样本特征向量,并将样本特征向量发送至行人检测模型中的回归层;
通过回归层中的第二模型参数,对样本特征向量进行回归,得到样本行人图像中的样本行人边界框和样本行人关键点;
确定样本行人边界框和样本行人关键点分别与对应的标准行人边界框和标准行人关键点之间的差异量;
当差异量不小于预设差异量阈值时,根据差异量对第一模型参数和第二模型参数进行调整,返回执行将每个样本行人图像输入行人检测模型中的特征提取层的步骤;
当差异量小于预设差异量阈值时,确定行人检测模型训练完成。
在本发明的另一实施例中,基于图4所示实施例,检测模块420,通过回归层中训练好的第二模型 参数,对特征向量进行回归,得到待检测图像中的行人边界框和行人关键点时,包括:
通过回归层中训练好的第二模型参数,对特征向量进行回归,得到多个待选行人边界框和该待选行人边界框中的待选行人关键点;
根据非极大抑制算法,从多个待选行人边界框和该待选行人边界框中的待选行人关键点中选择待检测图像中的行人边界框和行人关键点。
在本发明的另一实施例中,基于图4所示实施例,检测模块420,根据非极大抑制算法,从多个待选行人边界框和该待选行人边界框中的待选行人关键点中选择待检测图像中的行人边界框和行人关键点时,包括:
确定每个待选行人边界框中的待选行人关键点之间的连线;
根据预先训练得到的目标宽度,以连线作为高度,生成待选行人关键点对应的虚拟边框;
根据非极大抑制算法,对每个虚拟边框进行筛选,将筛选出的虚拟边框对应的待选行人边界框和该待选行人边界框中的待选行人关键点分别作为待检测图像中的行人边界框和行人关键点。
上述装置实施例与方法实施例相对应,与该方法实施例具有同样的技术效果,具体说明参见方法实施例。装置实施例是基于方法实施例得到的,具体的说明可以参见方法实施例部分,此处不再赘述。
图5为本发明实施例提供的车载终端的一种结构示意图。该车载终端包括:处理器510和图像采集设备520;处理器510包括获取模块11、检测模块12和确定模块13;
获取模块11,用于获取车辆中的图像采集设备520采集的待检测图像;
检测模块12,用于将待检测图像输入行人检测模型,由行人检测模型检测待检测图像中的行人边界框和行人关键点;其中,预先训练好的行人检测模型能够使得待检测图像与行人边界框和行人关键点进行关联;行人检测模型包含特征提取层和回归层,通过特征提取层中训练好的第一模型参数确定待检测图像的特征向量,通过回归层中训练好的第二模型参数对特征向量进行回归,得到待检测图像中的行人边界框和行人关键点;
确定模块13,用于根据确定的行人边界框和行人关键点,以及预先确定的图像坐标系与车辆所在世界坐标系之间的转换关系,确定待检测图像中的行人在世界坐标系下的三维3D位置。
在本发明的另一实施例中,基于图5所示实施例,行人检测模型还输出待检测图像的行人边界框是否存在接地点的信息;确定模块13具体用于:
判断行人边界框是否存在接地点;
如果存在,则根据行人边界框的接地点,以及预先确定的图像坐标系与车辆所在世界坐标系之间的转换关系,确定待检测图像中的行人在世界坐标系下的3D位置;
如果不存在,则根据确定的行人边界框和行人关键点之间的相对位置,以及预先确定的图像坐标系与车辆所在世界坐标系之间的转换关系,确定待检测图像中的行人在世界坐标系下的3D位置。
在本发明的另一实施例中,基于图5所示实施例,确定模块13根据确定的行人边界框和行人关键点之间的相对位置,以及预先确定的图像坐标系与车辆所在世界坐标系之间的转换关系,确定待检测图像中的行人在世界坐标系下的3D位置时,包括:
确定行人关键点与行人边界框的上边界框之间的第一高度;
根据预设的行人关键点与人体头顶、人体脚底之间的比例关系,以及第一高度,预测行人关键点与行人边界框对应的行人脚底之间的第二高度;
根据第二高度,确定待检测图像中行人边界框对应的接地点;
根据确定的行人边界框对应的接地点,以及预先确定的图像坐标系与车辆所在世界坐标系之间的转换关系,确定待检测图像中的行人在世界坐标系下的3D位置。
在本发明的另一实施例中,基于图5所示实施例,处理器510还包括:训练模块(图中未示出); 训练模块,用于采用以下操作训练得到行人检测模型:
获取多个样本行人图像和标注的标准行人边界框和标准行人关键点;
将每个样本行人图像输入行人检测模型中的特征提取层;
通过特征提取层中的第一模型参数,确定样本行人图像的样本特征向量,并将样本特征向量发送至行人检测模型中的回归层;
通过回归层中的第二模型参数,对样本特征向量进行回归,得到样本行人图像中的样本行人边界框和样本行人关键点;
确定样本行人边界框和样本行人关键点分别与对应的标准行人边界框和标准行人关键点之间的差异量;
当差异量不小于预设差异量阈值时,根据差异量对第一模型参数和第二模型参数进行调整,返回执行将每个样本行人图像输入行人检测模型中的特征提取层的步骤;
当差异量小于预设差异量阈值时,确定行人检测模型训练完成。
在本发明的另一实施例中,基于图5所示实施例,检测模块12,通过回归层中训练好的第二模型参数,对特征向量进行回归,得到待检测图像中的行人边界框和行人关键点时,包括:
通过回归层中训练好的第二模型参数,对特征向量进行回归,得到多个待选行人边界框和该待选行人边界框中的待选行人关键点;
根据非极大抑制算法,从多个待选行人边界框和该待选行人边界框中的待选行人关键点中选择待检测图像中的行人边界框和行人关键点。
在本发明的另一实施例中,基于图5所示实施例,检测模块12,根据非极大抑制算法,从多个待选行人边界框和该待选行人边界框中的待选行人关键点中选择待检测图像中的行人边界框和行人关键点时,包括:
确定每个待选行人边界框中的待选行人关键点之间的连线;
根据预先训练得到的目标宽度,以连线作为高度,生成待选行人关键点对应的虚拟边框;
根据非极大抑制算法,对每个虚拟边框进行筛选,将筛选出的虚拟边框对应的待选行人边界框和该待选行人边界框中的待选行人关键点分别作为待检测图像中的行人边界框和行人关键点。
该终端实施例与图1所示方法实施例是基于同一发明构思得到的实施例,相关之处可以相互参照。上述终端实施例与方法实施例相对应,与该方法实施例具有同样的技术效果,具体说明参见方法实施例。
本领域普通技术人员可以理解:附图只是一个实施例的示意图,附图中的模块或流程并不一定是实施本发明所必须的。
本领域普通技术人员可以理解:实施例中的装置中的模块可以按照实施例描述分布于实施例的装置中,也可以进行相应变化位于不同于本实施例的一个或多个装置中。上述实施例的模块可以合并为一个模块,也可以进一步拆分成多个子模块。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明实施例技术方案的精神和范围。

Claims (10)

  1. 一种行人3D位置的检测方法,其特征在于,包括:
    获取车辆中的图像采集设备采集的待检测图像;
    将所述待检测图像输入行人检测模型,由所述行人检测模型检测所述待检测图像中的行人边界框和行人关键点;其中,预先训练好的所述行人检测模型能够使得所述待检测图像与行人边界框和行人关键点进行关联;所述行人检测模型包含特征提取层和回归层,通过所述特征提取层中训练好的第一模型参数确定所述待检测图像的特征向量,通过所述回归层中训练好的第二模型参数对所述特征向量进行回归,得到所述待检测图像中的行人边界框和行人关键点;
    根据确定的所述行人边界框和行人关键点,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的三维3D位置。
  2. 如权利要求1所述的方法,其特征在于,所述行人检测模型还输出所述待检测图像的行人边界框是否存在接地点的信息;
    所述根据确定的所述行人边界框和行人关键点,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的3D位置的步骤,包括:
    判断所述行人边界框是否存在接地点;
    如果存在,则根据所述行人边界框的接地点,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的3D位置;
    如果不存在,则根据确定的所述行人边界框和行人关键点之间的相对位置,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的3D位置。
  3. 如权利要求2所述的方法,其特征在于,所述根据确定的所述行人边界框和行人关键点之间的相对位置,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的3D位置的步骤,包括:
    确定所述行人关键点与所述行人边界框的上边界框之间的第一高度;
    根据预设的所述行人关键点与人体头顶、人体脚底之间的比例关系,以及所述第一高度,预测所述行人关键点与所述行人边界框对应的行人脚底之间的第二高度;
    根据所述第二高度,确定所述待检测图像中所述行人边界框对应的接地点;
    根据确定的所述行人边界框对应的接地点,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的3D位置。
  4. 如权利要求1所述的方法,其特征在于,所述行人检测模型采用以下方式训练得到:
    获取多个样本行人图像和标注的标准行人边界框和标准行人关键点;
    将每个样本行人图像输入行人检测模型中的特征提取层;
    通过所述特征提取层中的第一模型参数,确定所述样本行人图像的样本特征向量,并将所述样本特征向量发送至所述行人检测模型中的回归层;
    通过所述回归层中的第二模型参数,对所述样本特征向量进行回归,得到所述样本行人图像中的样本行人边界框和样本行人关键点;
    确定所述样本行人边界框和样本行人关键点分别与对应的标准行人边界框和标准行人关键点之间的差异量;
    当所述差异量不小于预设差异量阈值时,根据所述差异量对所述第一模型参数和所述第二模型参数进行调整,返回执行所述将每个样本行人图像输入行人检测模型中的特征提取层的步骤;
    当所述差异量小于预设差异量阈值时,确定所述行人检测模型训练完成。
  5. 如权利要求1所述的方法,其特征在于,所述通过所述回归层中训练好的第二模型参数,对所述特征向量进行回归,得到所述待检测图像中的行人边界框和行人关键点的步骤,包括:
    通过所述回归层中训练好的第二模型参数,对所述特征向量进行回归,得到多个待选行人边界框和该待选行人边界框中的待选行人关键点;
    根据非极大抑制算法,从多个待选行人边界框和该待选行人边界框中的待选行人关键点中选择所述待检测图像中的行人边界框和行人关键点。
  6. 如权利要求5所述的方法,其特征在于,所述根据非极大抑制算法,从多个待选行人边界框和该待选行人边界框中的待选行人关键点中选择所述待检测图像中的行人边界框和行人关键点的步骤,包括:
    确定每个待选行人边界框中的待选行人关键点之间的连线;
    根据预先训练得到的目标宽度,以所述连线作为高度,生成所述待选行人关键点对应的虚拟边框;
    根据非极大抑制算法,对每个虚拟边框进行筛选,将筛选出的虚拟边框对应的待选行人边界框和该待选行人边界框中的待选行人关键点分别作为所述待检测图像中的行人边界框和行人关键点。
  7. 一种行人3D位置的检测装置,其特征在于,包括:
    获取模块,被配置为获取车辆中的图像采集设备采集的待检测图像;
    检测模块,被配置为将所述待检测图像输入行人检测模型,由所述行人检测模型检测所述待检测图像中的行人边界框和行人关键点;其中,预先训练好的所述行人检测模型能够使得所述待检测图像与行人边界框和行人关键点进行关联;所述行人检测模型包含特征提取层和回归层,通过所述特征提取层中训练好的第一模型参数确定所述待检测图像的特征向量,通过所述回归层中训练好的第二模型参数对所述特征向量进行回归,得到所述待检测图像中的行人边界框和行人关键点;
    确定模块,被配置为根据确定的所述行人边界框和行人关键点,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的三维3D位置。
  8. 如权利要求7所述的装置,其特征在于,所述行人检测模型还输出所述待检测图像的行人边界框是否存在接地点的信息;
    所述确定模块,具体被配置为:
    判断所述行人边界框是否存在接地点;
    如果存在,则根据所述行人边界框的接地点,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的3D位置;
    如果不存在,则根据确定的所述行人边界框和行人关键点之间的相对位置,以及预先确定的图像坐标系与所述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的3D位置。
  9. 一种车载终端,其特征在于,包括:处理器和图像采集设备;所述处理器包括获取模块、检测模块和确定模块;
    所述获取模块,用于获取车辆中的图像采集设备采集的待检测图像;
    所述检测模块,用于将所述待检测图像输入行人检测模型,由所述行人检测模型检测所述待检测图像中的行人边界框和行人关键点;其中,预先训练好的所述行人检测模型能够使得所述待检测图像与行人边界框和行人关键点进行关联;所述行人检测模型包含特征提取层和回归层,通过所述特征提取层中训练好的第一模型参数确定所述待检测图像的特征向量,通过所述回归层中训练好的第二模型参数对所述特征向量进行回归,得到所述待检测图像中的行人边界框和行人关键点;
    所述确定模块,用于根据确定的所述行人边界框和行人关键点,以及预先确定的图像坐标系与所 述车辆所在世界坐标系之间的转换关系,确定所述待检测图像中的行人在所述世界坐标系下的三维3D位置。
  10. 如权利要求9所述的终端,其特征在于,所述检测模块,通过所述回归层中训练好的第二模型参数,对所述特征向量进行回归,得到所述待检测图像中的行人边界框和行人关键点时,包括:
    通过所述回归层中训练好的第二模型参数,对所述特征向量进行回归,得到多个待选行人边界框和该待选行人边界框中的待选行人关键点;
    根据所述非极大抑制算法,从多个待选行人边界框和该待选行人边界框中的待选行人关键点中选择所述待检测图像中的行人边界框和行人关键点。
PCT/CN2019/108075 2019-05-30 2019-09-26 一种行人3d位置的检测方法及装置、车载终端 WO2020237942A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910460564.6A CN110956069B (zh) 2019-05-30 2019-05-30 一种行人3d位置的检测方法及装置、车载终端
CN201910460564.6 2019-05-30

Publications (1)

Publication Number Publication Date
WO2020237942A1 true WO2020237942A1 (zh) 2020-12-03

Family

ID=69975483

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/108075 WO2020237942A1 (zh) 2019-05-30 2019-09-26 一种行人3d位置的检测方法及装置、车载终端

Country Status (2)

Country Link
CN (1) CN110956069B (zh)
WO (1) WO2020237942A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801880A (zh) * 2021-03-08 2021-05-14 广州敏视数码科技有限公司 一种车载全景图像成像与目标检测融合显示的方法
US20220101038A1 (en) * 2020-09-28 2022-03-31 Rakuten Group, Inc., Information processing device, information processing method, and storage medium
CN114475577A (zh) * 2021-12-17 2022-05-13 斑马网络技术有限公司 车辆控制方法、装置及存储介质
CN116258722A (zh) * 2023-05-16 2023-06-13 青岛奥维特智能科技有限公司 基于图像处理的桥梁建筑智能检测方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597959B (zh) * 2020-05-12 2023-09-26 盛景智能科技(嘉兴)有限公司 行为检测方法、装置及电子设备
CN113139504B (zh) * 2021-05-11 2023-02-17 支付宝(杭州)信息技术有限公司 身份识别方法、装置、设备及存储介质
CN113246931B (zh) * 2021-06-11 2021-09-28 创新奇智(成都)科技有限公司 一种车辆控制方法、装置、电子设备及存储介质
CN115861316B (zh) * 2023-02-27 2023-09-29 深圳佑驾创新科技股份有限公司 行人检测模型的训练方法及其装置、行人检测方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104616277A (zh) * 2013-11-01 2015-05-13 深圳中兴力维技术有限公司 视频结构化描述中的行人定位方法及其装置
CN105631440A (zh) * 2016-02-22 2016-06-01 清华大学 一种易受伤害道路使用者的联合检测方法
WO2017065883A1 (en) * 2015-10-14 2017-04-20 Qualcomm Incorporated Systems and methods for producing an image visualization
CN109726627A (zh) * 2018-09-29 2019-05-07 初速度(苏州)科技有限公司 一种神经网络模型训练及通用接地线的检测方法
CN109766868A (zh) * 2019-01-23 2019-05-17 哈尔滨工业大学 一种基于身体关键点检测的真实场景遮挡行人检测网络及其检测方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596058A (zh) * 2018-04-11 2018-09-28 西安电子科技大学 基于计算机视觉的行车障碍物测距方法
CN109145756A (zh) * 2018-07-24 2019-01-04 湖南万为智能机器人技术有限公司 基于机器视觉和深度学习的目标检测方法
CN109285190B (zh) * 2018-09-06 2021-06-04 广东天机工业智能系统有限公司 对象定位方法、装置、电子设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104616277A (zh) * 2013-11-01 2015-05-13 深圳中兴力维技术有限公司 视频结构化描述中的行人定位方法及其装置
WO2017065883A1 (en) * 2015-10-14 2017-04-20 Qualcomm Incorporated Systems and methods for producing an image visualization
CN105631440A (zh) * 2016-02-22 2016-06-01 清华大学 一种易受伤害道路使用者的联合检测方法
CN109726627A (zh) * 2018-09-29 2019-05-07 初速度(苏州)科技有限公司 一种神经网络模型训练及通用接地线的检测方法
CN109766868A (zh) * 2019-01-23 2019-05-17 哈尔滨工业大学 一种基于身体关键点检测的真实场景遮挡行人检测网络及其检测方法

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220101038A1 (en) * 2020-09-28 2022-03-31 Rakuten Group, Inc., Information processing device, information processing method, and storage medium
US11861879B2 (en) * 2020-09-28 2024-01-02 Rakuten Group, Inc. Information processing device, information processing method, and storage medium
CN112801880A (zh) * 2021-03-08 2021-05-14 广州敏视数码科技有限公司 一种车载全景图像成像与目标检测融合显示的方法
CN112801880B (zh) * 2021-03-08 2024-06-07 广州敏视数码科技有限公司 一种车载全景图像成像与目标检测融合显示的方法
CN114475577A (zh) * 2021-12-17 2022-05-13 斑马网络技术有限公司 车辆控制方法、装置及存储介质
CN114475577B (zh) * 2021-12-17 2023-11-03 斑马网络技术有限公司 车辆控制方法、装置及存储介质
CN116258722A (zh) * 2023-05-16 2023-06-13 青岛奥维特智能科技有限公司 基于图像处理的桥梁建筑智能检测方法
CN116258722B (zh) * 2023-05-16 2023-08-11 青岛奥维特智能科技有限公司 基于图像处理的桥梁建筑智能检测方法

Also Published As

Publication number Publication date
CN110956069A (zh) 2020-04-03
CN110956069B (zh) 2022-06-21

Similar Documents

Publication Publication Date Title
WO2020237942A1 (zh) 一种行人3d位置的检测方法及装置、车载终端
JP5926228B2 (ja) 自律車両用の奥行き検知方法及びシステム
JP6574611B2 (ja) 立体画像に基づいて距離情報を求めるためのセンサシステム
JP4409035B2 (ja) 画像処理装置、特異箇所検出方法、及び特異箇所検出プログラムを記録した記録媒体
WO2019202397A2 (en) Vehicle environment modeling with a camera
CN111259706B (zh) 一种车辆的车道线压线判断方法和系统
WO2022151664A1 (zh) 一种基于单目摄像头的3d物体检测方法
KR20200095384A (ko) 운전자 상태에 따라 맞춤형의 캘리브레이션을 위해 운전자 보조 장치를 자동으로 조정하는 방법 및 장치
JP2004118638A (ja) ステレオ画像処理装置およびステレオ画像処理方法
KR101869266B1 (ko) 극한 심층학습 기반 차선 검출 시스템 및 그 방법
CN111738033B (zh) 基于平面分割的车辆行驶信息确定方法及装置、车载终端
CN110717445A (zh) 一种用于自动驾驶的前车距离跟踪系统与方法
CN111098850A (zh) 一种自动停车辅助系统及自动泊车方法
CN115496923B (zh) 一种基于不确定性感知的多模态融合目标检测方法及装置
Yeol Baek et al. Scene understanding networks for autonomous driving based on around view monitoring system
CN110991264A (zh) 前方车辆检测方法和装置
US6925194B2 (en) Curved lane recognizing method in road modeling system
CN115700796A (zh) 模型生成方法、模型生成装置、非瞬时性存储介质、移动体姿势推定方法及其推定装置
CN110197104B (zh) 基于车辆的测距方法及装置
JP4070450B2 (ja) 前方車両認識装置及び認識方法
JP4106163B2 (ja) 障害物検出装置及びその方法
KR20220151572A (ko) IPM 이미지와 정밀도로지도(HD Map) 피팅을 통해 노면객체의 변화를 자동으로 판단하고 갱신하는 정밀도로지도 자동갱신 방법 및 시스템
CN113569803A (zh) 一种基于多尺度卷积的多模态数据融合车道目标检测的方法及系统
GB2605621A (en) Monocular depth estimation
WO2022186814A1 (en) Vehicle environment modeling with a camera

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19931269

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19931269

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19931269

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 24.06.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19931269

Country of ref document: EP

Kind code of ref document: A1