WO2022134624A1 - 一种行人目标的检测方法、电子设备及存储介质 - Google Patents

一种行人目标的检测方法、电子设备及存储介质 Download PDF

Info

Publication number
WO2022134624A1
WO2022134624A1 PCT/CN2021/113129 CN2021113129W WO2022134624A1 WO 2022134624 A1 WO2022134624 A1 WO 2022134624A1 CN 2021113129 W CN2021113129 W CN 2021113129W WO 2022134624 A1 WO2022134624 A1 WO 2022134624A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
detection
detection frame
frame
pedestrian
Prior art date
Application number
PCT/CN2021/113129
Other languages
English (en)
French (fr)
Inventor
邱志明
赵俊
Original Assignee
亿咖通(湖北)技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 亿咖通(湖北)技术有限公司 filed Critical 亿咖通(湖北)技术有限公司
Publication of WO2022134624A1 publication Critical patent/WO2022134624A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/60Extraction of image or video features relating to illumination properties, e.g. using a reflectance or lighting model

Definitions

  • the invention relates to the technical field of image processing, and in particular, to a pedestrian target detection method, an electronic device and a storage medium.
  • a camera device is installed on the vehicle.
  • the electronic device can obtain the image to be detected through the camera device, and then input the image to be detected into a pre-trained pedestrian detection model to determine whether the image to be detected is detected. Pedestrians are present.
  • the above-mentioned pedestrian target detection method cannot accurately detect all the pedestrians in the to-be-detected image.
  • the image to be detected often only contains part of the image of the pedestrian, and the pedestrian target detection based on this image may not be able to accurately detect travel. people, which in turn can cause danger when the vehicle is moving at a slow speed.
  • the purpose of the embodiments of the present invention is to provide a pedestrian target detection method, an electronic device and a storage medium, so as to improve the accuracy of the pedestrian target detection result and avoid danger during slow driving of the vehicle.
  • the specific technical solutions are as follows:
  • an embodiment of the present invention provides a method for detecting a pedestrian target, the method comprising:
  • the first detection frame and any of the second detection frames determine the first detection frame according to the positions of the first detection frame and the second detection frame in the first image to be detected Whether the detection frame and the second detection frame identify the same pedestrian;
  • a target detection frame is determined based on the first detection frame, wherein the target detection frame is used to identify the pedestrian in the first detection frame. 1.
  • the pedestrian target detection result in the image to be detected;
  • the target detection frame is determined based on the first detection frame and the second detection frame, respectively.
  • an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus;
  • the processor is used to implement the steps of the pedestrian target detection method when executing the program stored in the memory.
  • an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the above-mentioned steps of the pedestrian target detection method are implemented.
  • the electronic device can acquire the image collected by the image acquisition device installed on the vehicle to obtain the first image to be detected; the first image to be detected is subjected to pedestrian target detection to obtain the first image to be detected At least one first detection frame in the first detection frame, wherein the first detection frame is used to identify the area occupied by pedestrians in the first to-be-detected image; perform pedestrian lower body target detection on the first to-be-detected image to obtain the first to-be-detected image.
  • At least one second detection frame wherein the second detection frame is used to identify the area occupied by the pedestrian's lower body in the first image to be detected; for any first detection frame and any second detection frame, according to the first detection frame and the position of the second detection frame in the first image to be detected, to determine whether the first detection frame and the second detection frame identify the same pedestrian; for the case that the first detection frame and the second detection frame identify the same pedestrian, based on the first The detection frame determines the target detection frame, wherein the target detection frame is used to identify the pedestrian target detection result of the pedestrian in the first image to be detected; for the situation that the first detection frame and the second detection frame do not identify the same pedestrian, based on the first The detection frame and the second detection frame respectively determine the target detection frame.
  • the electronic device can perform pedestrian target detection on the first image to be detected, and can obtain a first detection frame and a second detection frame by performing pedestrian target detection on the first to-be-detected image, and based on the first detection frame and the second detection frame The position of the frame determines whether the first detection frame and the second detection frame identify the same pedestrian, and then the pedestrian target detection result is obtained. In this way, when there is an incomplete pedestrian in the first image to be detected, the electronic device can accurately detect the incomplete pedestrian, which can improve the accuracy of the pedestrian target detection result and avoid danger when the vehicle is driving at a slow speed.
  • FIG. 1 is a flowchart of a method for detecting a pedestrian target according to an embodiment of the present invention
  • FIG. 2 is a flowchart of an alternative frame and a method for determining a suppression attribute of an alternative frame in an embodiment of the present invention
  • FIG. 3 is a flowchart of a method for determining a first to-be-removed detection frame in an embodiment of the present invention
  • FIG. 5 is a schematic structural diagram of a pedestrian target detection device according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
  • embodiments of the present invention provide a pedestrian target detection method, device, electronic device, computer-readable storage medium and computer program product. The following first introduces a pedestrian target detection method provided by an embodiment of the present invention.
  • the pedestrian target detection method provided by the embodiment of the present invention can be applied to any electronic device that needs to detect a pedestrian target behind a vehicle, for example, an on-board computer, an image acquisition device, a processor, and the like. For convenience of description, it is hereinafter referred to as an electronic device.
  • a pedestrian target detection method the method includes:
  • S101 Acquire an image collected by an image acquisition device installed on a vehicle to obtain a first image to be detected
  • S102 performing pedestrian target detection on the first image to be detected, to obtain at least one first detection frame in the first image to be detected;
  • the first detection frame is used to identify the area occupied by the pedestrian in the first image to be detected.
  • the second detection frame is used to identify the area occupied by the pedestrian's lower body in the first image to be detected.
  • the target detection frame is used to identify the pedestrian target detection result of the pedestrian in the first image to be detected.
  • the electronic device can perform pedestrian target detection on the first to-be-detected image, and can obtain the first detection frame and the second detection frame by performing pedestrian target detection on the first to-be-detected image. frame, and based on the positions of the first detection frame and the second detection frame, it is determined whether the first detection frame and the second detection frame identify the same pedestrian, and then the pedestrian target detection result is obtained. In this way, when there is an incomplete pedestrian in the first image to be detected, the electronic device can accurately detect the incomplete pedestrian, which can improve the accuracy of the pedestrian target detection result and avoid danger when the vehicle is driving slowly.
  • the electronic device may acquire the image collected by the image acquisition device installed on the vehicle to obtain the first image to be detected.
  • the image acquisition device installed on the vehicle can collect images near the vehicle, and then send the collected images to the electronic device as the first image to be detected.
  • the above-mentioned electronic device is an image acquisition device
  • the above-mentioned first image to be detected may also be an image acquired by the electronic device when the vehicle is traveling at a slow speed.
  • the above-mentioned image acquisition device can be installed in the head, tail, side and other positions of the vehicle.
  • the electronic device can use a pre-trained pedestrian target detection model to perform pedestrian target detection on the first image to be detected.
  • the complete pedestrian refers to the whole including the head, torso, legs, feet, etc. of the pedestrian, and the first detection frame is used to identify the area occupied by the pedestrian in the first image to be detected.
  • the specific manner in which the electronic device performs pedestrian target detection on the first image to be detected may be a corresponding manner in the technical field of image processing, which is not specifically limited here, as long as the pedestrian in the first image to be detected can be detected.
  • the above-mentioned first image to be detected may include incomplete pedestrians.
  • the first image to be detected obtained by the electronic device may include the pedestrian's image.
  • Lower body image in this case, when the electronic device uses the pedestrian target detection model to perform pedestrian target detection on the first image to be detected, it is likely that the incomplete pedestrian in the first image to be detected cannot be accurately detected.
  • an ultrasonic sensor can be used to detect pedestrians near the vehicle.
  • the ultrasonic sensor may fail during thunderstorms. Therefore, the use of ultrasonic sensors cannot guarantee the accurate detection of pedestrians within a close range of the vehicle.
  • the electronic device can use the pre-trained pedestrian lower body target detection model to detect the pedestrian lower body target on the first image to be detected. is the lower body of the pedestrian, and at least one second detection frame in the first image to be detected can be obtained, wherein the lower body of the pedestrian refers to the part below the waist of the pedestrian, and the second detection frame is used to identify the lower body of the pedestrian in the first to-be-detected The area occupied by the image.
  • the specific method for the electronic device to detect the target of the lower body of the pedestrian on the first image to be detected may be the corresponding method in the field of image processing technology, which is not specifically limited here, as long as the lower body of the pedestrian in the first image to be detected can be detected. .
  • the execution order of the above steps S102 and S103 can be in any order.
  • the image to be detected only pedestrians may be detected, and if only pedestrians are detected, only the lower body of the pedestrian may be detected, which is not specifically limited here.
  • the electronic device can The position of the detection frame in the first image to be detected determines whether the first detection frame and the second detection frame are identified as the same pedestrian.
  • the above-mentioned first detection frame and second detection frame are both rectangular detection frames.
  • the electronic device obtains the first detection frame and the second detection frame, it can be in the image coordinate system of the first image to be detected. Determine the vector (x1, y1, h1, w1) corresponding to the first detection frame and the vector (x2, y2, h2, w2) corresponding to the second detection frame.
  • (x1, y1) is the coordinate of the upper left corner vertex of the first detection frame in the image coordinate system of the first image to be detected, h1, w1 are the height and width of the first detection frame respectively;
  • (x2, y2) is The coordinates of the upper left corner vertex of the second detection frame in the image coordinate system of the first image to be detected, h2 and w2 are the height and width of the second detection frame, respectively.
  • the electronic device can calculate the distance between the first detection frame and the second detection frame according to the vector (x1, y1, h1, w1) corresponding to the first detection frame and the vector (x2, y2, h2, w2) corresponding to the second detection frame Then, based on the size relationship between the intersection ratio between the first detection frame and the second detection frame and the intersection ratio threshold, it is determined whether the first detection frame and the second detection frame identify the same pedestrian, where the first detection frame and the second detection frame identify the same pedestrian.
  • the intersection ratio of a detection frame and the second detection frame is the ratio between the overlapping area of the first detection frame and the second detection frame and the total area of the first detection frame and the second detection frame in the first image to be detected.
  • the electronic device can determine the first detection frame and the second detection frame identifies the same pedestrian; when the intersection ratio between the first detection frame and the second detection frame is less than the intersection ratio threshold, it means that the area of the overlapping portion of the first detection frame and the second detection frame is small , then the electronic device can determine that the first detection frame and the second detection frame identify different pedestrians.
  • the threshold of the cross-union ratio can be set according to the empirical value.
  • the electronic device can determine the target detection frame based on the first detection frame.
  • the target detection frame is used to identify the pedestrian target detection result of the pedestrian in the first image to be detected.
  • the electronic device may determine the target detection frame only based on the first detection frame, that is, determine the first detection frame as the target detection frame .
  • the electronic device may also determine the target detection frame based on the positions of the first detection frame and the second detection frame.
  • step S106 for the situation that the first detection frame and the second detection frame do not identify the same pedestrian, for example, due to the error of the target detection model itself, when a complete pedestrian in the first image to be detected may be detected, Only the first detection frame is detected, but the second detection frame is not obtained, or the second detection frame is not obtained because some parts of the pedestrian's lower body are occluded. In this case, the second detection frame obtained in the first image to be detected is not obtained.
  • the lower body area of the pedestrian identified by the detection frame is not the lower body area of the pedestrian identified by the first detection frame, and the second detection frame and the first detection frame correspond to different pedestrians.
  • each second detection frame can respectively identify the lower body area of each pedestrian in the first image to be detected, then, the first detection frame and the second detection frame are not the same
  • the area identified by the second detection frame is likely to be the incomplete lower body area of the pedestrian in the first image to be detected, that is, the pedestrian corresponding to the second detection frame is in the first to-be-detected image. Detect images where only the lower body is present in the image.
  • the electronic device may determine the target detection frame based on the first detection frame and the second detection frame, respectively.
  • the electronic device may determine the first detection frame as the target detection frame, and at the same time determine the second detection frame as the target detection frame Object detection box.
  • the electronic device can detect incomplete pedestrians in the first to-be-detected image and avoid missing pedestrians in the first to-be-detected image, thereby improving the accuracy of pedestrian target detection results and making driving safer.
  • determining the target detection frame based on the first detection frame may include:
  • the first detection frame is used as the target detection frame.
  • the electronic device may perform pedestrian head object detection on the second to-be-detected image, where the second to-be-detected image is an image in the first detection frame.
  • the electronic device may perform pedestrian head target detection on the second image to be detected by using a pedestrian head target detection model completed in advance.
  • the above pedestrian head target detection model is obtained by pre-training the initial pedestrian head target detection model through a plurality of image samples containing pedestrian head regions. During the training process, the initial pedestrian head target detection model can be continuously adjusted. In order to make the parameters of the initial pedestrian head target detection model more suitable, the pedestrian head target detection model that can accurately detect the pedestrian head area in the image is obtained.
  • the above pedestrian head target detection model can be a deep convolutional neural network, SVM (support vector machines, support vector machine), Adaboost model and other machine learning models, and its parameters can be randomly initialized, which is not specifically limited here.
  • the electronic device When the electronic device does not detect the pedestrian's head in the second image to be detected, it is considered that the area in the first detection frame does not contain the pedestrian's head, the first detection frame is probably not a pedestrian, and there is a detection error , therefore, it can be determined that the first detection frame is not the detection frame corresponding to the pedestrian, then the electronic device can discard the first detection frame during the detection process.
  • the electronic device When the electronic device detects the head of the pedestrian in the second image to be detected, it is considered that the head of the pedestrian exists in the area within the first detection frame.
  • the first detection frame may not include the lower body area of the pedestrian, it includes the head area of the pedestrian, which means that in the first image to be detected, the lower body of the pedestrian corresponding to the first detection frame is likely to be is blocked, then the first detection frame is also the detection frame corresponding to the pedestrian, and at this time, the electronic device can determine the target detection frame based on the first detection frame.
  • the electronic device in the case where the first detection frame and the second detection frame do not identify the same pedestrian, the electronic device can perform pedestrian head target detection on the second image to be detected; for the first detection frame When the pedestrian head is detected in the frame, the first detection frame is used as the target detection frame. In this way, when the lower body area of the pedestrian in the first image to be detected is blocked, the electronic device can also accurately detect the blocked pedestrian by performing object detection on the head of the pedestrian through the second image to be detected.
  • the above-mentioned step of determining a target detection frame based on the first detection frame in the case that the first detection frame and the second detection frame identify the same pedestrian may include:
  • the area identified by the second detection frame is the area occupied by the pedestrian's lower body in the first image to be detected.
  • the position can accurately identify the position of the pedestrian's feet in the first image to be detected. Therefore, in order to more accurately identify the position of the pedestrian's feet in the first image to be detected, the electronic device can be based on the second detection frame.
  • the lower boundary of the first detection frame is adjusted to obtain the third detection frame, and then the third detection frame is used as the target detection frame.
  • the electronic device may keep the width of the first detection frame unchanged, and adjust the lower boundary of the first detection frame to the lower boundary of the second detection frame, that is, the lower boundary of the second detection frame is used as the third detection frame
  • the lower boundary of the frame, the left boundary, right boundary and upper boundary of the first detection frame are respectively the left boundary, right boundary and upper boundary of the third detection frame, and the third detection frame is obtained as the target detection frame.
  • the electronic device adjusts the lower boundary of the first detection frame based on the lower boundary of the second detection frame to obtain the third detection frame, and uses the third detection frame as the target detection frame.
  • the lower boundary of the third detection frame can more accurately identify the position of the pedestrian's foot in the first image to be detected, thereby improving the accuracy of the pedestrian target detection result.
  • the above-mentioned step of performing pedestrian target detection on the first image to be detected to obtain at least one first detection frame in the first image to be detected may include:
  • Obtain the image features of the first image to be detected use a pedestrian target detection model to perform pedestrian target detection on the image features of the first image to be detected, and obtain a plurality of first detection frames to be deduplicated in the first image to be detected;
  • the first detection frame to be deduplicated is subjected to deduplication processing to obtain at least one first detection frame in the first image to be detected.
  • the electronic device can extract the image features of the first image to be detected, and perform pedestrian target detection on the image features of the first image to be detected through the pedestrian target detection model, and obtain the image features of the first image to be detected.
  • the above pedestrian target detection model is obtained by pre-training the initial pedestrian target detection model through multiple image samples containing complete pedestrians. During the training process, the parameters of the initial pedestrian target detection model can be continuously adjusted to make the initial pedestrian target detection. The parameters of the model are more suitable, and then a pedestrian target detection model that can accurately detect pedestrians in the image is obtained.
  • the above pedestrian target detection model may be a machine learning model such as SVM, deep convolutional neural network, Adaboost model, and its parameters may be randomly initialized, which is not specifically limited here.
  • multiple first to-be-removed detection frames may be generated for the same pedestrian, that is, the above-mentioned multiple first-to-be-removed In the re-detection frame, it is likely that there are multiple first de-duplication detection frames that identify the same pedestrian. In the case where multiple first detection frames to be deduplicated identify the same pedestrian, the multiple first detection frames to be deduplicated overlap.
  • the electronic device can perform deduplication processing on the plurality of first detection frames to be deduplicated, and obtain At least one first detection frame in the first image to be detected.
  • the above-mentioned step of performing pedestrian lower body target detection on the first to-be-detected image to obtain at least one second detection frame in the first to-be-detected image may include:
  • Use the pedestrian lower body detection model to perform pedestrian lower body target detection on the image features of the first image to be detected, and obtain multiple second detection frames to be deduplicated in the first image to be detected; Reprocessing is performed to obtain at least one second detection frame in the first image to be detected.
  • the electronic device can extract the image features of the first to-be-detected image, and use the pedestrian lower-body detection model to perform pedestrian lower body target detection on the image features of the first to-be-detected image, and obtain the image features in the first to-be-detected image.
  • a plurality of second to-be-removed detection frames wherein the second to-be-removed detection frames are used to identify the area occupied by the pedestrian's lower body in the first to-be-detected image.
  • the above pedestrian lower body target detection model is obtained by pre-training the initial pedestrian lower body target detection model through a plurality of image samples containing the pedestrian lower body.
  • the parameters of the lower body target detection model are more suitable, and then a pedestrian lower body target detection model that can accurately detect the lower body of the pedestrian in the image is obtained.
  • the above-mentioned pedestrian lower body target detection model can be a machine learning model such as SVM, deep convolutional neural network, Adaboost model, and its parameters can be randomly initialized, which is not specifically limited here.
  • multiple second detection frames to be deduplicated may be generated for the same pedestrian, that is, the above multiple second detection frames may be generated.
  • the detection frame to be deduplicated it is likely that there are multiple second detection frames to be deduplicated that identify the same pedestrian.
  • the multiple second detection frames to be deduplicated overlap.
  • the electronic device may perform deduplication processing on the plurality of second detection frames to be deduplicated, Obtain at least one first detection frame in the first image to be detected.
  • the electronic device can obtain the image features of the first image to be detected, and use the pedestrian target detection model to perform pedestrian target detection on the image features of the first image to be detected, and obtain the image features of the first image to be detected.
  • multiple first detection frames to be deduplicated perform deduplication processing on the multiple first detection frames to be deduplicated to obtain at least one first detection frame in the first image to be detected; use the pedestrian lower body detection model to Detecting the image features of the image and performing pedestrian lower body target detection to obtain multiple second detection frames to be deduplicated in the first image to be detected; performing deduplication processing on the plurality of second detection frames to be deduplicated to obtain a first image to be detected at least one second detection frame in .
  • the electronic device can accurately detect each pedestrian in the first image to be detected, thereby improving the accuracy of the pedestrian target detection result.
  • the above-mentioned step of performing de-duplication processing on the plurality of first detection frames to be de-duplicated to obtain at least one first detection frame in the first to-be-detected image may include: :
  • the electronic device performs non-maximum suppression processing on all the first detection frames to be deduplicated (Non-Maximum Suppression, NMS), and finally there is a first remaining detection frame to be deduplicated, that is, the obtained first candidate frame, and at the same time, the first suppression attribute of the first candidate frame can be obtained.
  • NMS non-Maximum Suppression
  • the first suppression attribute is the number of the first to-be-removed detection frames that are removed based on the first candidate frame during the non-maximum value suppression process
  • the first candidate frame is the one that has not been removed after the non-maximum value suppression process.
  • the first to be deduplicated detection frame For example, in the process of non-maximum value suppression, the number of the first deduplication detection frames to be removed based on the first candidate frame k2 is 10, then the first suppression attribute of the first candidate frame k2 is also 10.
  • the pedestrian target detection model may output the confidence level of the first detection frame to be deduplicated.
  • the above pedestrian target detection model may detect some first de-duplication detection frames with high confidence but not representing the area occupied by pedestrians in the sub-image.
  • the first detection frame to be deduplicated in the area occupied in the image may be referred to as a false alarm detection frame.
  • the electronic device can determine that the suppression attribute of the first candidate detection frame is the same as that of the first candidate detection frame.
  • the magnitude relationship between the first thresholds may be set according to an empirical value.
  • the suppression attribute of the first candidate detection frame is less than the first threshold, it means that the number of detection frames in the overlapping relationship between the first candidate detection frame and the first candidate detection frame is small. In this case , the probability that the first candidate detection frame is a false alarm detection frame is high, then the first candidate detection frame can be discarded.
  • the suppression attribute of the first candidate detection frame is not less than the first threshold, it means that the number of detection frames in the overlapping relationship between the first candidate detection frame and the first candidate detection frame is large. In this case, the possibility that the first candidate detection frame is a false alarm detection frame is low, so the first candidate detection frame can be reserved as the first detection frame.
  • the above-mentioned step of performing de-duplication processing on the plurality of second to-be-de-duplicated detection frames to obtain at least one second detection frame in the first to-be-detected image may include:
  • the electronic device performs non-maximum suppression processing on all the second detection frames to be deduplicated, and finally the remaining one Second, the detection frame to be deduplicated, that is, the obtained second candidate frame, and at the same time, the second suppression attribute of the second candidate frame can be obtained.
  • the second suppression attribute is the number of the second to-be-removed detection frames that are removed based on the second candidate frame during the non-maximum value suppression process
  • the second candidate frame is the non-maximum value suppression process that has not been removed.
  • the second detection frame to be deduplicated For example, in the process of non-maximum value suppression, the number of second deduplication detection frames to be removed based on the second candidate frame k3 is 15, then the first suppression attribute of the second candidate frame k3 is 15.
  • the pedestrian lower body target detection model may output the confidence level of the second detection frame to be deduplicated.
  • the above-mentioned pedestrian lower body target detection model may detect some second detection frames with high confidence but not representing the area occupied by the pedestrian's lower body in the sub-image.
  • the second detection frame to be deduplicated in the area occupied by the lower body in the sub-image may also be referred to as a false alarm detection frame.
  • the electronic device can determine that the suppression attribute of the second candidate detection frame is the same as that of the second candidate detection frame.
  • the magnitude relationship between the second thresholds may be set according to an empirical value.
  • the suppression attribute of the second candidate detection frame is less than the second threshold, it means that the number of detection frames in the overlapping relationship between the second candidate detection frame and the second candidate detection frame is small. In this case , the second candidate detection frame is more likely to be a false alarm detection frame, then the second candidate detection frame can be discarded.
  • the suppression attribute of the second candidate detection frame is not less than the second threshold, it means that the number of detection frames in the overlapping relationship between the second candidate detection frame and the second candidate detection frame is large. In this case, the possibility that the second candidate detection frame is a false alarm detection frame is low, so the second candidate detection frame can be reserved as the second detection frame.
  • the first detection frame and the second detection frame in the first image to be detected can be determined through the above steps.
  • the false alarm detection frame can be discarded, and the false alarm detection frame can be avoided in the pedestrian target detection result, so that the accuracy of the pedestrian target detection result can be improved.
  • non-maximum value suppression processing is performed on all detection frames to be deduplicated, and the candidate frame and the suppression attribute of the candidate frame are obtained, which may include:
  • the electronic device may use the detection frame to be deduplicated with the highest confidence as the candidate frame, and the detection frame to be deduplicated except the candidate frame as the redundant frame set.
  • the confidence level of the detection frame D1 to be removed is 0.90
  • the confidence level of the detection frame D2 to be removed is 0.81
  • the confidence level of the detection frame D3 to be removed is 0.94
  • the confidence level of the detection frame D4 to be removed is 0.73.
  • the detection frame to be deduplicated D3 can be used as a candidate frame, and the confidence level of the detection frame to be deduplicated D1, the detection frame D2 to be deduplicated is 0.81, and the detection frame D4 to be deduplicated can be regarded as the redundant frame set.
  • the first detection frame to be deduplicated is used as the detection frame to be deduplicated
  • the obtained candidate frame is the first candidate frame
  • the second detection frame to be deduplicated is regarded as the detection frame to be deduplicated
  • the obtained candidate frame is Second option box.
  • the electronic device may calculate the difference between the candidate frame and the detection frame to be deduplicated. The intersection ratio between the detection frames to be deduplicated.
  • the electronic device can calculate the area of the overlapping portion between the candidate frame and the to-be-removed detection frame according to the position of the candidate frame and the position of the to-be-removed detection frame, and calculate the candidate frame and the to-be-removed detection frame , and then calculate the ratio between the area of the overlapping part and the total area to obtain the intersection ratio.
  • the size of the intersection ratio between the candidate frame and the frame to be deduplicated and the third threshold may be compared.
  • the above-mentioned third threshold can be set according to an empirical value, for example, can be set as 0.6.
  • the intersection ratio between the candidate frame and the to-be-removed detection frame is not less than the third threshold, it means that the area of the overlap between the candidate frame and the to-be-removed detection frame is large, then the candidate frame and the to-be-removed detection frame have a larger area of overlap.
  • the re-detection frame is likely to be an overlapping detection frame, and the electronic device can remove the detection frame to be de-duplicated from the redundant frame set, and update the suppression attribute of the candidate frame, that is, increase the suppression attribute of the candidate frame by 1.
  • the first detection frame to be deduplicated is taken as the detection frame to be deduplicated, and the obtained candidate frame is the first candidate frame.
  • the updated suppression attribute is the first suppression attribute
  • the second candidate frame to be deduplicated is regarded as the detection frame to be deduplicated
  • the obtained candidate frame is the second candidate frame.
  • the updated suppression attribute is the second suppression attribute. That is, the first suppression attribute is the number of first detection frames to be deduplicated that are removed based on the first candidate frame, and the second inhibition attribute is the number of second detection frames to be deduplicated that are removed based on the second candidate frame.
  • intersection ratio between the candidate frame and the detection frame to be deduplicated is smaller than the third threshold, it means that the area of the overlapping portion between the candidate frame and the detection frame to be deduplicated is small, then the candidate frame and the detection frame to be deduplicated have a small area of overlap.
  • the detection frame is probably not an overlapping detection frame, and the electronic device can retain the detection frame to be deduplicated.
  • the electronic device can determine the detection frame to be deduplicated with the highest confidence from the redundant frame set as a new candidate frame, and return to the execution of the above-mentioned detection frame to be deduplicated except the candidate frame as a redundant frame. Aggregate steps until each candidate box and the suppression properties of each candidate box are determined. Finally, there are no overlapping detection boxes in the candidate box.
  • the electronic device can determine each candidate box and the suppression attribute of each candidate box according to the above steps. In this way, the electronic device can realize the deduplication processing of the detection frame to be deduplicated, so that the efficiency of determining the target detection frame can be improved, and the accuracy of the target detection frame can be improved.
  • the image features of the first image to be detected are obtained above, and pedestrian target detection is performed on the image features of the first image to be detected by using a pedestrian target detection model.
  • the step of obtaining a plurality of first to-be-removed detection frames in the first to-be-detected image may include:
  • the first image to be detected may include a plurality of pedestrians with different occupied areas.
  • the target detection model usually performs target detection on an image through a detection window of a preset size
  • the target size that the target detection model can detect is fixed.
  • the electronic device may construct an image pyramid of the first image to be detected based on the first image to be detected and a preset zoom ratio.
  • the image pyramid of the first image to be detected is a method of performing multi-scale transformation processing on the first image to be detected.
  • the first image to be detected can be used as the first image pyramid of the image to be detected.
  • the image pyramid includes multiple layers of sub-images, each layer of sub-images is obtained by scaling the sub-images of the previous layer according to the preset scaling ratio, and the size of the sub-image is not smaller than the detection window of the target detection model
  • the size of the target detection model is a pedestrian target detection model, a pedestrian lower body target detection model or a pedestrian head target detection model.
  • the size of the first image to be detected is 1280 ⁇ 720
  • the preset reduction ratio is 1.06.
  • 47 There are sub-images of different sizes, of which the size of the smallest sub-image is 80 ⁇ 45, and finally an image pyramid including 48 images (1 first image to be detected + 47 sub-images) is obtained.
  • the pixel value of the pixel point in the sub-image may be determined by means of linear interpolation.
  • linear interpolation For example, single linear interpolation, bilinear interpolation, and trilinear interpolation can be used to determine the pixel value of the pixel point in the sub-image.
  • the specific linear interpolation method can be selected according to requirements, which is not specifically limited here.
  • S302 Extract the image features of the sub-images in each layer respectively, and perform pedestrian target detection on the image features of the sub-images in each layer by using a pedestrian target detection model to obtain a first candidate detection frame in the sub-images in each layer. ;
  • the electronic device can extract the image features of each layer of sub-images, and perform pedestrian target detection on the image features of each layer of sub-images through the pedestrian target detection model to obtain each layer of sub-images.
  • the first candidate detection frame in , where the first candidate detection frame is used to identify the area occupied by the pedestrian in the sub-image.
  • the electronic device may, based on the first candidate detection frame and the sub-image to which the first candidate detection frame belongs, and the first image to be detected The scaling ratio between the two is to determine the first to-be-de-duplicated detection frame in the first to-be-detected image.
  • the electronic device can scale the first candidate detection frame in the sub-image according to the scaling ratio between the sub-image and the first to-be-detected image to obtain the scaled
  • the size and position of the first candidate detection frame that is, the size and position of the first candidate detection frame to be deduplicated corresponding to the first image to be detected, so as to obtain the first deduplication detection frame.
  • the sub-image Dt1 includes the first candidate detection frame k1, wherein the vector corresponding to the first candidate detection frame k1 is (1, 10, 20, 15), indicating that the upper left corner of the first candidate detection frame k1 is in the image coordinate system
  • the coordinates of (1,10), the width of the first candidate detection frame k1 is 20, and the height is 15.
  • the scaling ratio between the sub-image Dt1 and the first image to be detected H1 is 1:20, then based on the scaling ratio , it can be determined that the vector of the first detection frame to be deduplicated corresponding to the first candidate detection frame k1 in the first image to be detected is (20, 200, 400, 300), that is, the upper left corner of the first detection frame to be deduplicated is in the first image to be detected.
  • the coordinates in the image coordinate system are (20, 200), the width of the first detection frame to be deduplicated is 400, and the height is 300.
  • the above-mentioned steps of using the pedestrian lower body detection model to perform pedestrian lower body target detection on the image features of the first to-be-detected image to obtain a plurality of second to-be-removed detection frames in the first to-be-detected image may include: :
  • the pedestrian lower body target detection is performed on the image features of each sub-image by the pedestrian lower-body detection model, and the second candidate detection frame in each sub-image is obtained; based on the second candidate detection frame and the sub-image to which the second candidate detection frame belongs, and The scaling ratio between the first images to be detected determines the second detection frame to be deduplicated in the first image to be detected.
  • the electronic device can extract the image features of each layer of sub-images respectively, and perform pedestrian lower body target detection on the image features of each layer of sub-images through the pedestrian lower body detection model respectively, and obtain each layer of sub-images.
  • the second candidate detection box in the image can extract the image features of each layer of sub-images respectively, and perform pedestrian lower body target detection on the image features of each layer of sub-images through the pedestrian lower body detection model respectively, and obtain each layer of sub-images.
  • the electronic device may, based on the second candidate detection frame and the sub-image to which the second candidate detection frame belongs, and the first to-be-detected frame The scaling ratio between the images determines the second to-be-de-duplicated detection frame in the first to-be-detected image.
  • the electronic device can scale the second candidate detection frame in the sub-image according to the scaling ratio between the sub-image and the first to-be-detected image, to obtain the scaled
  • the size and position of the second candidate detection frame that is, the size and position of the first detection frame to be deduplicated corresponding to the first image to be detected, of the second candidate detection frame, thereby obtaining the second detection frame to be deduplicated.
  • the electronic device can determine the first detection frame to be deduplicated and the second detection frame to be deduplicated in the first image to be detected according to the above steps. In this way, the electronic device can accurately detect each pedestrian in the first image to be detected, thereby improving the accuracy of the pedestrian target detection result.
  • the above-mentioned image features include at least one of the following:
  • the first method is to obtain the brightness value of each pixel in the image to be extracted as an image feature of the image to be extracted.
  • the electronic device can acquire the brightness value of each pixel in the image to be extracted as an image feature of the image to be extracted.
  • the to-be-extracted image is the above-mentioned first to-be-detected image or the above-mentioned sub-image.
  • the image to be extracted may be an RGB image, and for each pixel in the image to be extracted, the electronic device may extract the parameter values of the Red channel, Green channel, and Blue channel corresponding to the pixel, and then according to The following formula is used to calculate the brightness value of the pixel point, and then obtain the brightness value I of each pixel point in the image to be extracted, which is used as the image feature of the sub-image:
  • the above-mentioned image to be extracted may be a fisheye image in YUV420SP format.
  • the electronic device may extract the Y channel parameter corresponding to the pixel as the parameter of the pixel.
  • Brightness value: I(x,y) Y(x,y), where I(x,y) is the brightness value of the pixel point (x,y), and Y(x,y) is the pixel point (x,y) ) of the Y channel parameter value.
  • the electronic device can obtain the brightness value of each pixel in the image to be extracted, and then determine the brightness gradient magnitude of each pixel in the image to be extracted based on the brightness value of each pixel in the image to be extracted and its coordinates, as the value of the brightness gradient of each pixel in the image to be extracted. Extract image features of an image.
  • the brightness gradient magnitude G of the image to be extracted can be expressed as:
  • G x represents the brightness gradient amplitude in the x-axis direction of the image coordinate system
  • G y represents the brightness gradient amplitude in the y-axis direction of the image coordinate system
  • I(x+1,y) is the pixel point (x+1,y)
  • the brightness value of , I(x, y) is the brightness value of the pixel point (x, y)
  • I(x, y+1) is the brightness value of the pixel point (x, y+1).
  • the value of the brightness gradient magnitude G of the image to be extracted is:
  • the electronic device can calculate the value of the brightness gradient amplitude G of the image to be extracted according to the following formula:
  • the third type obtain the brightness value of each pixel in the image to be extracted, determine the brightness gradient magnitude of each pixel in the image to be extracted based on the brightness value of each pixel in the image to be extracted, and determine the magnitude of the brightness gradient of each pixel in the image to be extracted
  • the brightness gradient magnitude of each pixel determines the brightness gradient direction of each pixel, and determines the brightness gradient direction histogram corresponding to the image to be extracted based on the brightness gradient direction of each pixel in the image to be extracted.
  • the image is used as the image feature of the image to be extracted.
  • the electronic device can calculate the luminance gradient direction ⁇ of each pixel by the following formula:
  • the electronic device may determine the luminance gradient direction histogram corresponding to the to-be-extracted image based on the luminance-gradient direction of each pixel, as an image feature of the to-be-extracted image.
  • the abscissa represents the brightness gradient direction
  • the ordinate represents the brightness gradient magnitude of the pixel in the image to be extracted in the corresponding brightness gradient direction.
  • 401 is the luminance gradient direction value of pixel 1
  • 402 is the luminance gradient direction value of pixel 2
  • 403 is the luminance gradient magnitude of pixel 1
  • 404 is the luminance gradient magnitude of pixel 2.
  • the electronic device can determine that the luminance gradient magnitude of pixel 1 in the 80° direction is 2; when the luminance gradient direction of pixel 2 is 10° , When the brightness gradient amplitude is 4, the electronic device can determine that the brightness gradient amplitude of pixel 2 in the 0° direction is And determine the brightness gradient amplitude of pixel 2 in the 20° direction as
  • the brightness gradient direction can be discrete according to specific requirements.
  • the difference between every two adjacent direction categories can be set to be 30°, then there are 6 direction categories in the gradient direction histogram; for example, each direction category can be set to The difference between two adjacent direction categories is 45°, then there are 4 direction categories in the gradient direction histogram.
  • a first look-up table corresponding to G y /G x may be established in advance.
  • the electronic device may The value of G y /G x is determined according to the correspondence between G y , G x and G y /G x contained in the first look-up table.
  • the pre-built first lookup table is shown in the following table:
  • the value of G y /G x can be determined to be 0.2 according to the above table.
  • the electronic device in order to determine the gradient direction ⁇ of each pixel point more quickly, can pre-establish a second look-up table corresponding to the arctangent function, according to the value of arctan G y /G x and G y / The correspondence between G x determines the luminance gradient direction ⁇ .
  • the pre-built second lookup table is shown in the following table:
  • the electronic device can determine the image feature of the image to be extracted in the above manner. In this way, the electronic device can accurately determine the image features in the image to be extracted in various ways.
  • the training method of the target detection model may include the following steps:
  • A1 obtain an initial target detection model and a plurality of image samples, wherein each image sample includes a target object;
  • the target detection model includes the above-mentioned pedestrian target detection model, pedestrian lower body target detection model and pedestrian head target detection model.
  • the electronic device can obtain the initial target detection model and multiple image samples, and based on the multiple image samples, the initial target model can be determined. to train.
  • the above image samples can be image samples containing pedestrians, image samples containing the lower body area of pedestrians, or images containing pedestrian heads
  • the image sample of the area, the above-mentioned target object is the complete pedestrian, the lower body of the pedestrian, or the head of the pedestrian.
  • the above pedestrian target detection model can be obtained by training the initial target detection model based on the image sample; when the image sample is an image sample containing a pedestrian's lower body area, the initial target The pedestrian lower body target detection model can be obtained by training the detection model; when the image sample is an image sample including the pedestrian head region, the pedestrian head target detection model can be obtained by training the initial target detection model based on the image sample.
  • A2 mark the area occupied by the target object in each image sample, and obtain the mark detection frame corresponding to each image sample as a mark label;
  • the target detection model obtained by training needs to detect the image and obtain the detection frame used to identify the area occupied by the target object in the image, therefore, for each image sample, the target object can be pre-marked in the image sample.
  • the area of the image sample is obtained, and the marker detection frame corresponding to the image sample is obtained as the marker label of the image sample.
  • A3 input each image sample into the initial target detection model, perform detection according to the image features of each image sample, and obtain the detection frame corresponding to the target object included in each image sample, as the predicted label of each image sample;
  • A4 based on the difference between the predicted label and the calibration label of the corresponding image sample, adjust the parameters of the initial target detection model until the initial target detection model converges, stop training, and obtain the target detection model.
  • the electronic device can compare the predicted label with the corresponding calibration label, and then adjust the parameters of the initial target detection model according to the difference between the predicted label and the corresponding calibration label, so that the parameters of the initial target detection model are more suitable.
  • the method of adjusting the parameters of the initial target detection model may be a gradient descent algorithm, a stochastic gradient descent algorithm, or other model parameter adjustment methods, which are not specifically limited or described herein.
  • the electronic device may determine whether the number of iterations of the initial target detection model reaches a preset number, or whether the total loss function of the initial target detection model is not greater than a preset value.
  • the current initial target detection model If the number of iterations of the initial target detection model reaches the preset number, or the total loss function of the initial target detection model is not greater than the preset value, it means that the current initial target detection model has converged, that is, the current initial target detection model can The detection is performed to obtain accurate output results, so the training can be stopped at this time to obtain the target detection model.
  • the above-mentioned preset number of times may be set according to factors such as detection requirements, model structure, etc., for example, may be 6000 times, 9000 times, 12000 times, etc., which are not specifically limited here.
  • the preset value can be set according to factors such as detection requirements, model structure, etc., for example, can be 1, 0.9, 0.75, etc., which is not specifically limited here.
  • the electronic device needs to continue to train the initial target detection model.
  • the above method may further include:
  • Step 1 for each target detection frame, based on the coordinates of the preset detection point in the target detection frame in the image coordinate system of the first to-be-detected image and the preset mapping relationship, determine that the preset detection point is in the first to-be-detected image. Detect the coordinates in the image coordinate system of the top view corresponding to the image as the target coordinates;
  • the electronic device can detect the target based on the preset detection frame in the target detection frame.
  • the coordinates of the point in the image coordinate system of the first image to be detected and the preset mapping relationship determine the coordinates of the preset detection point in the image coordinate system of the top view corresponding to the first image to be detected as target coordinates.
  • the above-mentioned preset detection point is a preset coordinate point used to identify the position of the pedestrian corresponding to the target detection frame in the first image to be detected, which may be a pixel point in the target detection frame, for example, may be the target detection frame The lower left corner point, the lower right corner point, the midpoint of the lower boundary, etc.
  • the electronic device may determine, based on the coordinates of the preset detection point and the preset mapping relationship, the coordinates of the preset detection point in the image coordinate system of the top view corresponding to the first image to be detected, as the target coordinates.
  • the above-mentioned target coordinates are used to identify the position of the pedestrian in the above-mentioned top view
  • the above-mentioned preset mapping relationship is used to map the pixels in the first to-be-detected image to the top-down view corresponding to the first to-be-detected image
  • the above-mentioned preset mapping relationship It can be set according to the internal and external parameters of the image acquisition device.
  • the internal parameters can include the focal length, distortion coefficient, image principal point coordinates, etc. of the image acquisition device
  • the external parameters can include the position, pitch angle, roll angle, etc. of the image acquisition device.
  • Step 2 for each target detection frame, determine the distance between the target pedestrian and the vehicle corresponding to the target detection frame based on the distance between the target coordinates and the pre-calibrated image acquisition device coordinates in the top view, and the scale of the top view;
  • the electronic device can determine the distance between the target pedestrian and the vehicle corresponding to the target detection frame based on the distance between the target coordinates and the pre-calibrated image acquisition device coordinates in the top view and the scale of the top view.
  • the coordinates of the image capture device are used to identify the position of the image capture device in the above-mentioned top view.
  • the electronic device can calculate the distance D between the target pedestrian and the vehicle corresponding to the target detection frame as:
  • the image acquisition device may use a fisheye camera, and the first image to be detected may be a fisheye photo in YUV format.
  • Step 3 based on the distance between the target pedestrian and the vehicle and the preset vehicle control rule, control the vehicle to give an alarm and/or brake.
  • the electronic device can control the vehicle to give an alarm and/or according to the distance and the preset vehicle control rules. or brakes.
  • the above-mentioned vehicle control rules include a correspondence between distances and vehicle behaviors, and the vehicle behaviors are the above-mentioned behaviors such as alarming and braking.
  • the preset vehicle control rules are: when the distance between the target pedestrian and the vehicle is greater than 3 meters, control the vehicle to sound the alarm; when the distance between the target pedestrian and the vehicle is greater than 1.5 meters and not greater than 3 meters, control the Vehicle brakes; when the distance between the target pedestrian and the vehicle is not more than 1.5 meters, the vehicle will be controlled to whistle to give an alarm and brake.
  • the electronic device determines that the distance between the target pedestrian M1 and the vehicle is 2.9 meters, it can control the braking of the vehicle; when the electronic device determines that the distance between the target pedestrian M2 and the vehicle is 3.5 meters, it can control the vehicle to whistle to alarm.
  • the electronic device can accurately determine the distance between the target pedestrian and the vehicle, and timely control the vehicle behavior of the vehicle based on the distance and the preset vehicle control rules, thereby avoiding danger when the vehicle is traveling at a slow speed.
  • an embodiment of the present invention further provides a device for detecting a pedestrian target.
  • the following describes a pedestrian target detection device provided by an embodiment of the present invention.
  • a detection device for pedestrian targets the device includes:
  • An image acquisition module 501 configured to acquire an image acquired by an image acquisition device installed on the vehicle to obtain a first image to be detected
  • the first detection module 502 is used to perform pedestrian target detection on the first image to be detected, to obtain at least one first detection frame in the first image to be detected;
  • the first detection frame is used to identify the area occupied by the pedestrian in the first image to be detected.
  • the second detection module 503 is configured to perform pedestrian lower body target detection on the first to-be-detected image to obtain at least one second detection frame in the first to-be-detected image;
  • the second detection frame is used to identify the area occupied by the pedestrian's lower body in the first image to be detected.
  • the judgment module 504 is configured to, for any one of the first detection frame and any of the second detection frame, according to the position of the first detection frame and the second detection frame in the first image to be detected , determine whether the first detection frame and the second detection frame identify the same pedestrian;
  • a first detection result determination module 505 configured to determine a target detection frame based on the first detection frame in the case that the first detection frame and the second detection frame identify the same pedestrian;
  • the target detection frame is used to identify the pedestrian target detection result of the pedestrian in the first image to be detected.
  • the second detection result determining module 506 is configured to determine the Object detection box.
  • the image collected by the image acquisition device installed on the vehicle is acquired to obtain the first image to be detected; the pedestrian target detection is performed on the first image to be detected, and the first image to be detected is obtained at least one first detection frame of the A second detection frame, wherein the second detection frame is used to identify the area occupied by the pedestrian's lower body in the first image to be detected; for any first detection frame and any second detection frame, according to the first detection frame and The position of the second detection frame in the first image to be detected determines whether the first detection frame and the second detection frame identify the same pedestrian; for the case where the first detection frame and the second detection frame identify the same pedestrian, based on the first detection frame
  • the frame determines the target detection frame, wherein the target detection frame is used to identify the pedestrian target detection result of the pedestrian in the first image to be detected;
  • the frame and the second detection frame respectively determine the target detection frame.
  • the electronic device can perform pedestrian target detection on the first image to be detected, and can obtain a first detection frame and a second detection frame by performing pedestrian target detection on the first to-be-detected image, and based on the first detection frame and the second detection frame The position of the frame determines whether the first detection frame and the second detection frame identify the same pedestrian, and then the pedestrian target detection result is obtained. In this way, when there is an incomplete pedestrian in the first image to be detected, the electronic device can accurately detect the incomplete pedestrian, which can improve the accuracy of the pedestrian target detection result and avoid danger when the vehicle is driving slowly.
  • the foregoing second detection result determination module 506 may include:
  • a detection submodule (not shown in FIG. 5 ), for performing pedestrian head target detection on the second to-be-detected image
  • the second to-be-detected image is an image in the first detection frame.
  • the detection result determination sub-module (not shown in FIG. 5 ) is configured to use the first detection frame as the target detection frame when a pedestrian head is detected in the first detection frame.
  • the foregoing first detection result determination module 505 may include:
  • the first detection result determination sub-module (not shown in FIG. 5 ) is used to adjust the lower boundary of the first detection frame based on the lower boundary of the second detection frame to obtain a third detection frame, and the third detection frame is box as the target detection box.
  • the foregoing first detection module 502 may include:
  • the first to-be-removed detection frame determination sub-module (not shown in FIG. 5 ) is used to obtain the image features of the first to-be-detected image, and use the pedestrian target detection model to perform the image features of the first to-be-detected image.
  • Pedestrian target detection obtaining a plurality of first to-be-de-duplicated detection frames in the first to-be-detected image;
  • a first de-duplication sub-module (not shown in FIG. 5 ), configured to perform de-duplication processing on the plurality of first to-be-de-duplicated detection frames to obtain at least one first detection frame in the first to-be-detected image;
  • the above-mentioned second detection module 503 may include
  • the second to-be-removed detection frame determination sub-module (not shown in FIG. 5 ) is used to perform pedestrian lower-body target detection on the image features of the first to-be-detected image by using the pedestrian lower-body detection model to obtain the first to-be-detected image multiple second detection frames to be deduplicated in the image;
  • a second de-duplication sub-module (not shown in FIG. 5 ) is configured to perform de-duplication processing on the plurality of second detection frames to be de-duplicated to obtain at least one second detection frame in the first to-be-detected image.
  • the above-mentioned first deduplication sub-module may include:
  • the first deduplication unit (not shown in FIG. 5 ) is used to perform non-maximum suppression processing on all the first detection frames to be deduplicated, and obtain the first candidate frame and the first candidate frame.
  • first inhibitory property
  • the first suppression attribute is the number of the first to-be-removed detection frames that are removed based on the first candidate frame during the non-maximum value suppression process.
  • a first detection frame determination unit (not shown in FIG. 5 ), configured to use the first candidate frame whose first suppression attribute is not less than a first threshold as a first detection frame;
  • the above-mentioned second deduplication submodule may include:
  • the second deduplication unit (not shown in FIG. 5 ) is configured to perform non-maximum suppression processing on all the second detection frames to be deduplicated to obtain a second candidate frame and the second candidate frame
  • the suppression attribute is the number of second to-be-removed detection frames removed based on the second candidate frame during non-maximum value suppression processing.
  • a second detection frame determining unit (not shown in FIG. 5 ) is configured to use the second candidate frame whose second suppression attribute is not less than a second threshold as a second detection frame.
  • the above-mentioned apparatus may further include:
  • a selection module (not shown in FIG. 5 ), used for taking the detection frame to be deduplicated with the highest confidence as a candidate frame, and taking the detection frame to be deduplicated except the candidate frame as a set of redundant frames;
  • the detection frame to be deduplicated is the first detection frame to be deduplicated or the second detection frame to be deduplicated
  • the candidate frame is the first candidate frame or the second candidate frame frame.
  • a cross-union ratio determination module (not shown in FIG. 5 ), configured to calculate the difference between the candidate frame and the to-be-removed detection frame for each of the to-be-removed detection frames in the redundant frame set the cross-combination ratio;
  • a removal module (not shown in FIG. 5 ) is used to remove the to-be-de-duplicated detection frame from the redundant frame set if the intersection ratio is not less than the third threshold, and update the suppression of the candidate frame Attributes;
  • the suppression attribute is the first suppression attribute or the second suppression attribute.
  • a return module (not shown in FIG. 5 ) is used to determine the frame to be removed with the highest confidence from the redundant frame set as the candidate frame, and return the frame to be removed except the candidate frame.
  • Re-detection boxes are used as a set of redundant boxes until each of the candidate boxes and the suppression properties of each of the candidate boxes are determined.
  • the above-mentioned first to-be-removed detection frame determination submodule may include:
  • an image pyramid construction unit (not shown in FIG. 5 ), configured to construct an image pyramid of the first image to be detected based on the first image to be detected and a preset scaling ratio;
  • the image pyramid includes multiple layers of sub-images.
  • the first candidate detection frame determination unit (not shown in FIG. 5 ) is used to extract the image features of the sub-images in each layer, respectively, and perform pedestrian targets on the image features of the sub-images in each layer through the pedestrian target detection model. Detect to obtain the first candidate detection frame in the sub-image of each layer;
  • the first to-be-removed detection frame determination unit (not shown in FIG. 5 ) is used to determine the relationship between the first candidate detection frame and the sub-image to which the first candidate detection frame belongs and the first to-be-detected image The scaling ratio is determined, and the first to-be-removed detection frame in the first to-be-detected image is determined;
  • the above-mentioned second to-be-removed detection frame determination submodule may include:
  • the second candidate detection frame determination unit (not shown in FIG. 5 ) is configured to perform pedestrian lower body target detection on the image features of the sub-images in each layer through the pedestrian lower-body detection model, respectively, and obtain the first target detection in the sub-images in each layer.
  • a second to-be-removed detection frame determining unit (not shown in FIG. 5 ), configured to determine the second candidate detection frame and the sub-image to which the second candidate detection frame belongs and the first to-be-detected image
  • the scaling ratio is determined to determine the second to-be-de-duplicated detection frame in the first to-be-detected image.
  • the above-mentioned image features include at least one of the following:
  • the first one acquiring the brightness value of each pixel in the image to be extracted as an image feature of the image to be extracted, wherein the image to be extracted is the first image to be detected or the sub-image;
  • the second type acquire the brightness value of each pixel in the image to be extracted, and determine the brightness gradient magnitude of each pixel in the image to be extracted based on the brightness value of each pixel in the image to be extracted , using the luminance gradient magnitude as the image feature of the to-be-extracted image;
  • the third type obtain the brightness value of each pixel in the image to be extracted, and determine the brightness gradient magnitude of each pixel in the image to be extracted based on the brightness value of each pixel in the image to be extracted , determine the brightness gradient direction of each pixel point based on the brightness gradient magnitude of each pixel point in the to-be-extracted image, and determine the corresponding to-be-extracted image based on the brightness gradient direction of each pixel point in the to-be-extracted image
  • the brightness gradient direction histogram is used as the image feature of the to-be-extracted image.
  • An embodiment of the present invention further provides an electronic device, as shown in FIG. 6 , including a processor 601 , a communication interface 602 , a memory 603 and a communication bus 604 , wherein the processor 601 , the communication interface 602 , and the memory 603 pass through the communication bus 604 complete communication with each other,
  • the processor 601 is configured to implement the steps of the pedestrian target detection method described in any of the foregoing embodiments when executing the program stored in the memory 603 .
  • the electronic device can acquire the image collected by the image acquisition device installed on the vehicle to obtain the first image to be detected; Detecting at least one first detection frame in the image, wherein the first detection frame is used to identify the area occupied by the pedestrian in the first image to be detected; perform target detection on the lower body of the pedestrian on the first image to be detected to obtain the first image to be detected at least one of the second detection frames, wherein the second detection frame is used to identify the area occupied by the pedestrian's lower body in the first image to be detected; for any first detection frame and any second detection frame, according to the first
  • the positions of the detection frame and the second detection frame in the first image to be detected determine whether the first detection frame and the second detection frame are identified as the same pedestrian; for the case where the first detection frame and the second detection frame are identified as the same pedestrian,
  • the target detection frame is determined based on the first detection frame, wherein the target detection frame is used to identify the pedestrian target detection result of the pedestrian in the first image to be detected
  • the electronic device can perform pedestrian target detection on the first image to be detected, and can obtain a first detection frame and a second detection frame by performing pedestrian target detection on the first to-be-detected image, and based on the first detection frame and the second detection frame The position of the frame determines whether the first detection frame and the second detection frame identify the same pedestrian, and then the pedestrian target detection result is obtained. In this way, when there is an incomplete pedestrian in the first to-be-detected image, the electronic device can accurately detect the incomplete pedestrian, which can improve the accuracy of the pedestrian target detection result and avoid danger when the vehicle is running at a slow speed.
  • the communication bus mentioned in the above electronic device may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like.
  • PCI peripheral component interconnect standard
  • EISA Extended Industry Standard Architecture
  • the communication bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • the communication interface is used for communication between the above electronic device and other devices.
  • the memory may include random access memory (Random Access Memory, RAM), and may also include non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk storage.
  • RAM Random Access Memory
  • NVM non-Volatile Memory
  • the memory may also be at least one storage device located away from the aforementioned processor.
  • the above-mentioned processor can be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (Digital Signal Processing, DSP), dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • CPU Central Processing Unit
  • NP Network Processor
  • DSP Digital Signal Processing
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • FPGA Field-Programmable Gate Array
  • a computer-readable storage medium is also provided, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the The method steps of pedestrian target detection described above.
  • the image collected by the image acquisition device installed on the vehicle can be obtained, and the first image to be detected can be obtained; Perform pedestrian target detection on the first image to be detected, and obtain at least one first detection frame in the first image to be detected, wherein the first detection frame is used to identify the area occupied by pedestrians in the first image to be detected;
  • the detection image is used to detect the pedestrian's lower body target, and at least one second detection frame in the first image to be detected is obtained, wherein the second detection frame is used to identify the area occupied by the pedestrian's lower body in the first image to be detected;
  • a detection frame and any second detection frame according to the positions of the first detection frame and the second detection frame in the first to-be-detected image, determine whether the first detection frame and the second detection frame identify the same pedestrian; for the first detection frame The frame and the second detection frame are used to identify the same pedestrian, and the target detection frame is determined
  • the electronic device can perform pedestrian target detection on the first image to be detected, and can obtain a first detection frame and a second detection frame by performing pedestrian target detection on the first to-be-detected image, and based on the first detection frame and the second detection frame The position of the frame determines whether the first detection frame and the second detection frame identify the same pedestrian, and then the pedestrian target detection result is obtained. In this way, when there is an incomplete pedestrian in the first image to be detected, the electronic device can accurately detect the incomplete pedestrian, which can improve the accuracy of the pedestrian target detection result and avoid danger when the vehicle is driving at a slow speed.
  • a computer program product containing instructions, which, when running on a computer, cause the computer to execute the steps of the pedestrian target detection method described in any of the above embodiments.
  • the image collected by the image acquisition device installed on the vehicle can be obtained, and the first image to be detected can be obtained;
  • the image is subjected to pedestrian target detection to obtain at least one first detection frame in the first image to be detected, wherein the first detection frame is used to identify the area occupied by pedestrians in the first image to be detected;
  • the lower body target is detected, and at least one second detection frame in the first image to be detected is obtained, wherein the second detection frame is used to identify the area occupied by the pedestrian's lower body in the first image to be detected; for any first detection frame and
  • the detection frame is to identify the same pedestrian, and the target detection frame is determined based on the first detection frame, wherein the target detection frame is used
  • the electronic device can perform pedestrian target detection on the first image to be detected, and can obtain a first detection frame and a second detection frame by performing pedestrian target detection on the first to-be-detected image, and based on the first detection frame and the second detection frame The position of the frame determines whether the first detection frame and the second detection frame identify the same pedestrian, and then the pedestrian target detection result is obtained. In this way, when there is an incomplete pedestrian in the first image to be detected, the electronic device can accurately detect the incomplete pedestrian, which can improve the accuracy of the pedestrian target detection result and avoid danger when the vehicle is driving slowly.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • software it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present invention are generated.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes an integration of one or more available media.
  • the available media may be magnetic media (e.g., floppy disk, hard disk, magnetic tape), optical media (e.g., DVD), or semiconductor media (e.g., Solid State Disk (SSD)), and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)
  • Image Processing (AREA)

Abstract

本发明实施例提供了一种行人目标的检测方法、电子设备及存储介质,所述方法包括:获取第一待检测图像;对第一待检测图像进行行人目标检测,得到至少一个第一检测框;对第一待检测图像进行行人下半身目标检测,得到至少一个第二检测框;针对任一第一检测框和任一第二检测框,确定第一检测框和第二检测框是否标识同一行人;针对第一检测框和第二检测框为标识同一行人的情况,基于第一检测框确定目标检测框;针对第一检测框和第二检测框不为标识同一行人的情况,基于第一检测框、第二检测框分别确定目标检测框。采用本发明实施例,可以提高行人目标检测结果的准确度,避免车辆慢速行驶的过程中发生危险。

Description

一种行人目标的检测方法、电子设备及存储介质
本申请要求于2020年12月22日提交中国专利局、申请号为202011522139.4、申请名称为“一种行人目标的检测方法、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及图像处理技术领域,特别是涉及一种行人目标的检测方法、电子设备及存储介质。
背景技术
在车辆泊车、车辆启动等车辆慢速行驶的过程中,为了及时发现车辆附近的行人,以避免在车辆慢速行驶的过程中出现车辆碰撞到行人等危险发生,基于机器学习模型的行人检测的方法已广泛应用在车辆的行人检测系统中。
具体来说,车辆上安装有摄像设备,在车辆慢速行驶的过程中,电子设备可以通过摄像设备获取待检测图像,然后将待检测图像输入预先训练的行人检测模型,确定待检测图像中是否存在行人。
当待检测图像中的行人并不完整时,采用上述行人目标的检测方法并不能准确地检测到待检测图像中的所有行人。例如,当车辆附近的行人与车辆上安装的摄像设备之间的距离较近时,待检测图像中往往只包含行人的部分图像,基于这种图像进行行人目标检测时很可能无法准确的检测出行人,进而会导致车辆慢速行驶的过程中发生危险。
发明内容
本发明实施例的目的在于提供一种行人目标的检测方法、电子设备及存储介质,以提高行人目标检测结果的准确度,避免车辆慢速行驶的过程中发生危险。具体技术方案如下:
第一方面,本发明实施例提供了一种行人目标的检测方法,所述方法包括:
获取车辆上安装的图像采集设备所采集的图像,得到第一待检测图像;
对所述第一待检测图像进行行人目标检测,得到所述第一待检测图像中的至少一个第 一检测框,其中,所述第一检测框用于标识行人在所述第一待检测图像中所占区域;
对所述第一待检测图像进行行人下半身目标检测,得到所述第一待检测图像中的至少一个第二检测框,其中,所述第二检测框用于标识行人的下半身在所述第一待检测图像中所占区域;
针对任一所述第一检测框和任一所述第二检测框,根据所述第一检测框和所述第二检测框在所述第一待检测图像中的位置,确定所述第一检测框和所述第二检测框是否标识同一行人;
针对所述第一检测框和所述第二检测框为标识同一行人的情况,基于所述第一检测框确定目标检测框,其中,所述目标检测框用于标识所述行人在所述第一待检测图像中的行人目标检测结果;
针对所述第一检测框和所述第二检测框不为标识同一行人的情况,基于所述第一检测框、所述第二检测框分别确定所述目标检测框。
第二方面,本发明实施例提供了一种电子设备,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;
存储器,用于存放计算机程序;
处理器,用于执行存储器上所存放的程序时,实现上述行人目标的检测方法步骤。
第三方面,本发明实施例提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现上述行人目标的检测方法步骤。
本发明实施例所提供的方案中,电子设备可以获取车辆上安装的图像采集设备所采集的图像,得到第一待检测图像;对第一待检测图像进行行人目标检测,得到第一待检测图像中的至少一个第一检测框,其中,第一检测框用于标识行人在第一待检测图像中所占区域;对第一待检测图像进行行人下半身目标检测,得到第一待检测图像中的至少一个第二检测框,其中,第二检测框用于标识行人的下半身在第一待检测图像中所占区域;针对任一第一检测框和任一第二检测框,根据第一检测框和第二检测框在第一待检测图像中的位置,确定第一检测框和第二检测框是否标识同一行人;针对第一检测框和第二检测框为标识同一行人的情况,基于第一检测框确定目标检测框,其中,目标检测框用于标识行人在第一待检测图像中的行人目标检测结果;针对第一检测框和第二检测框不为标识同一行人的情况,基于第一检测框、第二检测框分别确定目标检测框。
电子设备可以对第一待检测图像进行行人目标检测,并可以通过对第一待检测图像进行行人下半身目标检测,得到第一检测框及第二检测框,并基于第一检测框及第二检测框的位置确定第一检测框及第二检测框是否标识同一行人,进而得到行人目标检测结果。这 样,当第一待检测图像中存在不完整的行人时,电子设备可以准确地检测出该不完整的行人,可以提高行人目标检测结果的准确度,避免车辆慢速行驶的过程中发生危险。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的实施例。
图1为本发明实施例所提供的一种行人目标的检测方法的流程图;
图2为本发明实施例中备选框以及备选框的抑制属性的确定方式的流程图;
图3为本发明实施例中第一待去重检测框的确定方式的流程图;
图4为梯度方向直方图的示意图;
图5为本发明实施例所提供的一种行人目标的检测装置的结构示意图;
图6为本发明实施例所提供的一种电子设备的结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
为了提高行人目标检测的准确度,避免车辆慢速行驶的过程中发生危险,本发明实施例提供了一种行人目标的检测方法、装置、电子设备、计算机可读存储介质及计算机程序产品。下面首先对本发明实施例提供的一种行人目标的检测方法进行介绍。
本发明实施例所提供的一种行人目标的检测方法可以应用于任意需要检测车辆后方的行人目标的电子设备,例如,可以为车载电脑、图像采集设备、处理器等。为了方便描述,后续称为电子设备。
如图1所示,一种行人目标的检测方法,所述方法包括:
S101,获取车辆上安装的图像采集设备所采集的图像,得到第一待检测图像;
S102,对所述第一待检测图像进行行人目标检测,得到所述第一待检测图像中的至少一个第一检测框;
其中,所述第一检测框用于标识行人在所述第一待检测图像中所占区域。
S103,对所述第一待检测图像进行行人下半身目标检测,得到所述第一待检测图像中的至少一个第二检测框;
其中,所述第二检测框用于标识行人的下半身在所述第一待检测图像中所占区域。
S104,针对任一所述第一检测框和任一所述第二检测框,根据所述第一检测框和所述第二检测框在所述第一待检测图像中的位置,确定所述第一检测框和所述第二检测框是否标识同一行人;
S105,针对所述第一检测框和所述第二检测框为标识同一行人的情况,基于所述第一检测框确定目标检测框;
其中,所述目标检测框用于标识所述行人在所述第一待检测图像中的行人目标检测结果。
S106,针对所述第一检测框和所述第二检测框不为标识同一行人的情况,基于所述第一检测框、所述第二检测框分别确定所述目标检测框。
可见,本发明实施例所提供的方案中,电子设备可以对第一待检测图像进行行人目标检测,并可以通过对第一待检测图像进行行人下半身目标检测,得到第一检测框及第二检测框,并基于第一检测框及第二检测框的位置确定第一检测框及第二检测框是否标识同一行人,进而得到行人目标检测结果。这样,当第一待检测图像中存在不完整的行人时,电子设备可以准确地检测出该不完整的行人,可以提高行人目标检测结果的准确度,避免车辆慢速行驶的过程中发生危险。
在车辆慢速行驶的过程中,为了确定正在行驶的车辆附近是否存在行人,在上述步骤S101中,电子设备可以获取车辆上安装的图像采集设备所采集的图像,得到第一待检测图像。具体的,在车辆慢速行驶的过程中,安装在车辆上的图像采集设备可以采集车辆附近的图像,然后将采集的图像发送至电子设备,作为第一待检测图像。如果上述电子设备为图像采集设备,上述第一待检测图像也可以为电子设备在车辆慢速行驶过程中采集的图像。其中,上述图像采集设备可以安装在车辆的头部、尾部、侧方等位置。
为了确定第一待检测图像中是否存在行人,在上述步骤S102中,电子设备可以对第一待检测图像利用预先训练的行人目标检测模型进行行人目标检测,若存在完整的行人,可以得到第一待检测图像中的至少一个第一检测框。其中,完整的行人是指包括行人头部、躯干、腿部、脚部等整体,第一检测框用于标识行人在第一待检测图像中所占区域。电子设备对第一待检测图像进行行人目标检测的具体方式可以为图像处理技术领域中的相应方式,在此不做具体限定,只要可以检测出第一待检测图像中的行人即可。
上述第一待检测图像中可能包括不完整的行人,例如,当行人与图像采集设备之间的距离较近(0~2m)时,电子设备获取到的第一待检测图像中可能包含行人的下半身图像,在这种情况下,电子设备利用行人目标检测模型对第一待检测图像进行行人目标检测时,很可能无法准确的检测出第一待检测图像中不完整的行人。通常,为了确定车辆较近范围内(0~2m)是否存在行人,可以通过超声波传感器检测车辆附近的行人。但由于超声波传感器对于非金属材质的物体的灵敏度并不高,甚至在特殊的天气情况下会出现失灵的问题,例如,在雷雨天气时超声波传感器会出现失灵的问题。因此,采用超声波传感器无法保证能够准确检测出车辆较近范围内的行人。
针对这种情况,为了提高行人目标检测的准确度,在上述步骤S103中,电子设备可以对第一待检测图像利用预先训练的行人下半身目标检测模型进行行人下半身目标检测,若存在行人整体或者仅仅是行人的下半身,均可以得到第一待检测图像中的至少一个第二检测框,其中,行人的下半身是指行人的腰部以下部位,第二检测框用于标识行人的下半身在第一待检测图像中所占区域。电子设备对第一待检测图像进行行人下半身目标检测的具体方式可以为图像处理技术领域中的相应方式,在此不做具体限定,只要可以检测出第一待检测图像中的行人的下半身即可。
其中,上述步骤S102及S103的执行顺序可以为任意顺序,可以先执行步骤S102后执行步骤S103,也可以先执行步骤S103后执行步骤S102,还可以同时执行步骤S103及步骤S102,当然在第一待检测图像中,可能只检测出行人,若只检测出行人,也可能只检测出行人的下半身,在此不做具体限定。
若第一待检测图像中存在完整的行人,在对第一待检测图像进行行人检测时,会检测出行人标识为第一检测框,在对第一待检测图像进行行人下半身检测时,也会检测出行人下半身标识为第二检测框,因此,在得到第一待检测图像中的第一检测框及第二检测框后,第一检测框和第二检测框可能为标识同一行人的检测框。为了避免在检测结果中出现多个检测框标识同一行人的情况出现,在上述步骤S104中,针对任一第一检测框和任一第二检测框,电子设备可以根据第一检测框和第二检测框在第一待检测图像中的位置,确定第一检测框和第二检测框是否标识为同一行人。
在一种实施方式中,上述第一检测框及第二检测框均为矩形检测框,电子设备在得到第一检测框及第二检测框时,可以在第一待检测图像的图像坐标系中确定第一检测框对应的向量(x1,y1,h1,w1)及第二检测框对应的向量(x2,y2,h2,w2)。其中,(x1,y1)为第一检测框的左上角顶点在第一待检测图像的图像坐标系中的坐标,h1,w1分别为第一检测框的高和宽;(x2,y2)为第二检测框的左上角顶点在第一待检测图像的图像坐 标系中的坐标,h2,w2分别为第二检测框的高和宽。
电子设备可以根据第一检测框对应的向量(x1,y1,h1,w1)及第二检测框对应的向量(x2,y2,h2,w2),计算第一检测框与第二检测框之间的交并比,然后基于第一检测框与第二检测框之间的交并比与交并比阈值之间的大小关系,确定第一检测框和第二检测框是否标识同一行人,这里第一检测框与第二检测框的交并比为在第一待检测图像中第一检测框和第二检测框的重叠面积与第一检测框和第二检测框的总面积之间的比值。当第一检测框与第二检测框之间的交并比不小于交并比阈值时,说明第一检测框与第二检测框重叠部分的面积较大,那么电子设备可以确定第一检测框和第二检测框标识的是同一行人;当第一检测框与第二检测框之间的交并比小于交并比阈值时,说明第一检测框与第二检测框重叠部分的面积较小,那么电子设备可以确定第一检测框和第二检测框标识的是不同的行人。其中,交并比阈值可以根据经验值进行设置。
在上述步骤S105中,针对第一检测框和第二检测框为标识同一行人的情况,在这种情况下第二检测框所标识的行人的下半身区域为第一检测框所标识的完整的行人的下半身区域,那么电子设备便可以基于第一检测框确定目标检测框。其中,目标检测框用于标识行人在第一待检测图像中的行人目标检测结果。
在一种实施方式中,当第一检测框和第二检测框为标识同一行人的情况时,电子设备可以仅仅基于第一检测框就确定目标检测框,即将第一检测框确定为目标检测框。
在一种实施方式中,第一检测框和第二检测框为标识同一行人的情况时,电子设备也可以基于第一检测框与第二检测框的位置来确定目标检测框。
在上述步骤S106中,针对第一检测框和第二检测框不为标识同一行人的情况,例如由于目标检测模型本身的误差,可能对第一待检测图像中某一完整的行人进行检测时,仅仅检测得到第一检测框,而没有得到第二检测框,或者由于行人下半身某些部位被遮挡,而没有得到第二检测框,在这种情况下在第一待检测图像中得到的第二检测框所标识的行人的下半身区域并不是第一检测框所标识的行人的下半身区域,第二检测框和第一检测框对应不同的行人。由于第一待检测图像中可能包含多个行人,每个第二检测框可以分别标识第一待检测图像中每个行人的下半身区域,那么,在第一检测框和第二检测框不为同一行人对应的检测框的情况下,第二检测框所标识的区域很可能是第一待检测图像中不完整的行人的下半身区域,也就是说,第二检测框所对应的行人在第一待检测图像中只存在下半身的图像。为了确保行人目标检测结果的准确性,电子设备可以基于第一检测框、第二检测框分别确定目标检测框。
在一种实施方式中,当第一检测框和第二检测框为标识不为同一行人的情况时,电子 设备可以将第一检测框确定为目标检测框,并同时将第二检测框确定为目标检测框。
这样,电子设备可以检测出第一待检测图像中不完整的行人,避免漏检第一待检测图像中的行人,从而可以提高行人目标检测结果的准确度,使得行车过程中更加安全。
作为本发明实施例的一种实施方式,针对上述第一检测框和上述第二检测框不为标识同一行人的情况,基于第一检测框确定目标检测框,可以包括:
对第二待检测图像进行行人头部目标检测;针对第一检测框内检测到行人头部的情况,将第一检测框作为目标检测框。
针对上述第一检测框和上述第二检测框不为标识同一行人的情况,在这种情况下,还可能是因为第一检测框所标识的区域并不包括行人,也就是说,此时第一检测框可能不是行人对应的检测框,检测模型存在误检。为了进一步确定第一检测框是否为行人对应的检测框,电子设备可以对第二待检测图像进行行人头部目标检测,其中,第二待检测图像为第一检测框内的图像。
在一种实施方式中,电子设备可以通过预先训练完成的行人头部目标检测模型对第二待检测图像进行行人头部目标检测。其中,上述行人头部目标检测模型为预先通过多个包含行人头部区域的图像样本对初始行人头部目标检测模型进行训练得到的,在训练过程中,可以不断调整初始行人头部目标检测模型的参数,以使初始行人头部目标检测模型的参数更加合适,进而得到可以准确检测图像中的行人的头部区域的行人头部目标检测模型。其中,上述行人头部目标检测模型可以深度卷积神经网络、SVM(support vector machines,支持向量机)、Adaboost模型等机器学习模型,其参数可以随机初始化,在此不做具体限定。
当电子设备并未在第二待检测图像中检测到行人头部时,即认为在第一检测框内的区域并不包含行人的头部,第一检测框中很可能不是行人,存在检测错误,因此,可以确定第一检测框并不是行人对应的检测框,那么电子设备在检测过程中也就可以丢弃第一检测框。
当电子设备在第二待检测图像中检测到行人头部时,即认为在第一检测框内的区域存在行人的头部。在这种情况下,第一检测框虽然可能不包含行人的下半身区域,但包含行人的头部区域,这表示在第一待检测图像中,第一检测框中所对应的行人的下半身很可能被遮挡,那么第一检测框也是行人所对应的检测框,此时电子设备可以基于第一检测框确定目标检测框。
可见,本发明实施例所提供的方案中,针对第一检测框和第二检测框不为标识同一行人的情况,电子设备可以对第二待检测图像进行行人头部目标检测;针对第一检测框内检 测到行人头部的情况,将第一检测框作为目标检测框。这样,当第一待检测图像中行人的下半身区域被遮挡时,电子设备也可以通过第二待检测图像进行行人头部目标检测的方式准确地检测出该被遮挡的行人。
作为本发明实施例的一种实施方式,上述针对所述第一检测框和所述第二检测框为标识同一行人的情况,基于所述第一检测框确定目标检测框的步骤,可以包括:
基于第二检测框的下边界调整第一检测框的下边界得到第三检测框,将第三检测框作为目标检测框。
针对第一检测框和第二检测框为标识同一行人的情况,第二检测框所标识的区域为行人的下半身在第一待检测图像中所占的区域,由于第二检测框的下边界的位置可以准确地标识行人的脚部在第一待检测图像中的位置,因此,为了更加准确地标识行人的脚部在第一待检测图像中的位置,电子设备可以基于第二检测框的下边界调整第一检测框的下边界得到第三检测框,然后将第三检测框作为目标检测框。
在一种实施方式中,电子设备可以保持第一检测框的宽度不变,将第一检测框的下边界调整为第二检测框的下边界,即将第二检测框的下边界作为第三检测框的下边界,第一检测框的左边界、右边界、上边界分别为第三检测框的左边界、右边界、上边界,得到第三检测框作为目标检测框。
可见,在本发明实施例提供的方案中,电子设备基于第二检测框的下边界调整第一检测框的下边界得到第三检测框,将第三检测框作为目标检测框。这样,第三检测框的下边界可以更加准确地标识行人的脚部在第一待检测图像中的位置,从而可以提高行人目标检测结果的准确度。
作为本发明实施例的一种实施方式,上述对所述第一待检测图像进行行人目标检测,得到所述第一待检测图像中的至少一个第一检测框的步骤,可以包括:
获取第一待检测图像的图像特征,利用行人目标检测模型对第一待检测图像的图像特征进行行人目标检测,获得第一待检测图像中的多个第一待去重检测框;对多个第一待去重检测框进行去重处理,得到第一待检测图像中的至少一个第一检测框。
在得到第一待检测图像后,电子设备可以提取第一待检测图像的图像特征,并通过行人目标检测模型对第一待检测图像的图像特征进行行人目标检测,获得第一待检测图像中的多个第一待去重检测框,其中,第一待去重检测框用于标识行人在第一待检测图像中所占区域。
上述行人目标检测模型为预先通过多个包含完整的行人的图像样本对初始行人目标检测模型进行训练得到的,在训练过程中,可以不断调整初始行人目标检测模型的参数, 以使初始行人目标检测模型的参数更加合适,进而得到可以准确检测图像中的行人的行人目标检测模型。其中,上述行人目标检测模型可以为SVM、深度卷积神经网络、Adaboost模型等机器学习模型,其参数可以随机初始化,在此不做具体限定。
利用行人目标检测模型对第一待检测图像的图像特征进行行人目标检测的过程中,可能会对同一个行人生成多个第一待去重检测框,也就是说,上述多个第一待去重检测框中,很可能存在多个第一待去重检测框标识的是同一个行人。在多个第一待去重检测框标识同一行人的情况下,该多个第一待去重检测框之间是重叠的。为了从标识同一行人的多个第一待去重检测框中确定可以最准确的表征行人所占区域的检测框,电子设备可以对上述多个第一待去重检测框进行去重处理,得到第一待检测图像中的至少一个第一检测框。
相应的,上述对所述第一待检测图像进行行人下半身目标检测,得到所述第一待检测图像中的至少一个第二检测框的步骤,可以包括:
利用行人下半身检测模型对第一待检测图像的图像特征进行行人下半身目标检测,获得第一待检测图像中的多个第二待去重检测框;对多个第二待去重检测框进行去重处理,得到第一待检测图像中的至少一个第二检测框。
在得到第一待检测图像后,电子设备可以提取第一待检测图像的图像特征,并通过行人下半身检测模型对第一待检测图像的图像特征进行行人下半身目标检测,获得第一待检测图像中的多个第二待去重检测框,其中,第二待去重检测框用于标识行人的下半身在第一待检测图像中所占区域。
上述行人下半身目标检测模型为预先通过多个包含行人下半身的图像样本对初始行人下半身目标检测模型进行训练得到的,在训练过程中,可以不断调整初始行人下半身目标检测模型的参数,以使初始行人下半身目标检测模型的参数更加合适,进而得到可以准确检测图像中行人的下半身的行人下半身目标检测模型。其中,上述行人下半身目标检测模型可以为SVM、深度卷积神经网络、Adaboost模型等机器学习模型,其参数可以随机初始化,在此不做具体限定。
利用行人下半身目标检测模型对第一待检测图像的图像特征进行行人下半身目标检测的过程中,可能会对同一个行人生成多个第二待去重检测框,也就是说,上述多个第二待去重检测框中,很可能存在多个第二待去重检测框标识的是同一个行人。在多个第二待去重检测框标识同一行人的情况下,该多个第二待去重检测框之间是重叠的。为了从标识同一行人的多个第二待去重检测框中确定可以最准确的表征行人下半身所占区域的检测框,电子设备可以对上述多个第二待去重检测框进行去重处理,得到第一待检测图像中的至少一个第一检测框。
可见,上述本发明实施例所提供的,电子设备可以获取第一待检测图像的图像特征,利用行人目标检测模型对第一待检测图像的图像特征进行行人目标检测,获得第一待检测图像中的多个第一待去重检测框;对多个第一待去重检测框进行去重处理,得到第一待检测图像中的至少一个第一检测框;利用行人下半身检测模型对第一待检测图像的图像特征进行行人下半身目标检测,获得第一待检测图像中的多个第二待去重检测框;对多个第二待去重检测框进行去重处理,得到第一待检测图像中的至少一个第二检测框。这样,电子设备可以准确地检测出第一待检测图像中的每个行人,提高行人目标检测结果的准确度。
作为本发明实施例的一种实施方式,上述对所述多个第一待去重检测框进行去重处理,得到所述第一待检测图像中的至少一个第一检测框的步骤,可以包括:
对所有第一待去重检测框进行非极大值抑制处理,获得第一备选框以及第一备选框的第一抑制属性;将第一抑制属性不小于第一阈值的第一备选框作为第一检测框。
针对同一个行人目标,在得到所有第一待去重检测框之后,可能在第一待检测图像中存在多个重叠的第一待去重检测框。为了从第一待去重检测框中确定能够最准确的标识行人所占区域的第一检测框,电子设备对所有第一待去重检测框进行非极大值抑制处理(Non-Maximum Suppression,NMS),最后剩余一个第一待去重检测框,即得到的第一备选框,同时可以得到该第一备选框的第一抑制属性。
其中,第一抑制属性为非极大值抑制处理时基于第一备选框去除的第一待去重检测框的数量,第一备选框即为非极大值抑制处理后未被去除的第一待去重检测框。例如,在非极大值抑制处理过程中,基于第一备选框k2去除的第一待去重检测框的数量为10,那么第一备选框k2的第一抑制属性也就是10。
在通过上述行人目标检测模型检测出每个第一待去重检测框时,行人目标检测模型可以输出第一待去重检测框的置信度。在这个过程中,上述行人目标检测模型可能会检测出一些置信度较高但并非表征行人在子图像中所占区域的第一待去重检测框,这些置信度较高但并非表征行人在子图像中所占区域的第一待去重检测框可以称为虚警检测框。
为了避免得到的第一备选检测框中存在虚警检测框影响行人目标检测结果的准确度,针对每个第一备选检测框,电子设备可以判断该第一备选检测框的抑制属性与第一阈值之间的大小关系。其中,上述第一阈值可以根据经验值进行设置。
当第一备选检测框的抑制属性小于第一阈值时,说明上述第一候选检测框中与该第一备选检测框之间为重叠关系的检测框的数量较少,在这种情况下,该第一备选检测框为虚警检测框的可能性较高,那么便可以丢弃该第一备选检测框。
当第一备选检测框的抑制属性不小于第一阈值时,说明上述第一候选检测框中与该第 一备选检测框之间为重叠关系的检测框的数量较多,在这种情况下,该第一备选检测框为虚警检测框的可能性较低,那么便可以保留该第一备选检测框,作为第一检测框。
相应的,上述对所述多个第二待去重检测框进行去重处理,得到所述第一待检测图像中的至少一个第二检测框的步骤,可以包括:
对所有第二待去重检测框进行非极大值抑制处理,获得第二备选框以及第二备选框的第二抑制属性;将第二抑制属性不小于第二阈值的第二备选框作为第二检测框。
针对同一个行人下半身目标,在得到所有第二待去重检测框之后,可能在第一待检测图像中存在多个重叠的第二待去重检测框。为了从第二待去重检测框中确定能够最准确的标识行人下半身所占区域的第二检测框,电子设备对所有第二待去重检测框进行非极大值抑制处理,最后剩余一个第二待去重检测框,即得到的第二备选框,同时可以得到该第二备选框的第二抑制属性。
其中,第二抑制属性为非极大值抑制处理时基于第二备选框去除的第二待去重检测框的数量,第二备选框即为非极大值抑制处理后未被去除的第二待去重检测框。例如,在非极大值抑制处理过程中,基于第二备选框k3去除的第二待去重检测框的数量为15,那么第二备选框k3的第一抑制属性也就是15。
在通过上述行人下半身目标检测模型检测出每个第二待去重检测框时,行人下半身目标检测模型可以输出第二待去重检测框的置信度。在这个过程中,上述行人下半身目标检测模型可能会检测出一些置信度较高但并非表征行人下半身在子图像中所占区域的第二待去重检测框,这些置信度较高但并非表征行人下半身在子图像中所占区域的第二待去重检测框也可以称为虚警检测框。
为了避免得到的第二备选检测框中存在虚警检测框影响行人目标检测结果的准确度,针对每个第二备选检测框,电子设备可以判断该第二备选检测框的抑制属性与第二阈值之间的大小关系。其中,上述第二阈值可以根据经验值进行设置。
当第二备选检测框的抑制属性小于第二阈值时,说明上述第二候选检测框中与该第二备选检测框之间为重叠关系的检测框的数量较少,在这种情况下,该第二备选检测框为虚警检测框的可能性较高,那么便可以丢弃该第二备选检测框。
当第二备选检测框的抑制属性不小于第二阈值时,说明上述第二候选检测框中与该第二备选检测框之间为重叠关系的检测框的数量较多,在这种情况下,该第二备选检测框为虚警检测框的可能性较低,那么便可以保留该第二备选检测框,作为第二检测框。
可见,本发明实施例所提供的方案中,可以通过上述步骤确定第一待检测图像中的第一检测框和第二检测框。这样,可以将虚警检测框丢弃,避免行人目标检测结果中包括虚 警检测框,从而可以提高行人目标检测结果的准确度。
作为本发明实施例的一种实施方式,如图2所示,对所有待去重检测框进行非极大值抑制处理,获得备选框以及备选框的抑制属性,可以包括:
S201,将置信度最高的待去重检测框作为备选框,将除所述备选框外的待去重检测框作为冗余框集合;
为了确定出准确的备选框,电子设备可以将置信度最高的待去重检测框作为备选框,将除所述备选框外的待去重检测框作为冗余框集合。例如,待去重检测框D1的置信度为0.90,待去重检测框D2的置信度为0.81,待去重检测框D3的置信度为0.94,待去重检测框D4的置信度为0.73,那么可以将待去重检测框D3为备选框,并将待去重检测框D1、待去重检测框D2的置信度为0.81及待去重检测框D4作为冗余框集合。其中,将第一待去重检测框作为待去重检测框,获得的备选框为第一备选框,将第二待去重检测框作为待去重检测框,获得的备选框为第二备选框。
S202,针对所述冗余框集合中的每一所述待去重检测框,计算所述备选框与该待去重检测框之间的交并比;
为了确定冗余框集合中是否存在与备选框重叠的待去重检测框,针对所述冗余框集合中的每一所述待去重检测框,电子设备可以计算该备选框与该待去重检测框之间的交并比。
具体的,电子设备可以根据备选框的位置,以及待去重检测框的位置,计算备选框与待去重检测框之间重叠部分的面积,并计算备选框与待去重检测框的总面积,进而计算重叠部分的面积与总面积之间的比值,得到交并比。
例如,备选框B1对应的向量为(1,10,10,10),表示备选框B1的左上角在第一待检测图像的图像坐标系中的坐标为(1,10)、宽为10,高为10,待去重检测框D1对应的向量为(6,10,10,10),表示待去重检测框D1的左上角在第一待检测图像的图像坐标系中的坐标为(6,10)、宽为10,高为10,那么备选框B1与待去重检测框D1之间重叠部分的面积即为(6-1)×10=50,备选框B1与待去重检测框D1的总面积即为(6+10-1)×10=150,那么备选框B1与待去重检测框D1之间的交并比即为50÷150≈0.33。
S203,若所述交并比不小于第三阈值,从所述冗余框集合中去除该待去重检测框,并更新所述备选框的抑制属性;
在得到备选框与待去重检测框之间的交并比后,可以比较备选框与待去重检测框之间的交并比与第三阈值之间的大小。其中,上述第三阈值可以根据经验值进行设置,例如,可以设置为0.6。
当备选框与待去重检测框之间的交并比不小于第三阈值时,说明备选框与待去重检测 框之间的重叠部分的面积较大,那么备选框与待去重检测框很可能为重叠的检测框,电子设备便可以从冗余框集合中去除该待去重检测框,并更新备选框的抑制属性,也就是将备选框的抑制属性加1。其中,将第一待去重检测框作为待去重检测框,获得的备选框为第一备选框,当第一备选框与第一待去重检测框之间的交并比不小于第三阈值时,更新的抑制属性为第一抑制属性;将第二待去重检测框作为待去重检测框,获得的备选框为第二备选框,当第二备选框与第二待去重检测框之间的交并比不小于第三阈值时,更新的抑制属性为第二抑制属性。也就是说,第一抑制属性为基于第一备选框去除的第一待去重检测框的数量,第二抑制属性为基于第二备选框去除的第二待去重检测框的数量。
当备选框与待去重检测框之间的交并比小于第三阈值时,说明备选框与待去重检测框之间的重叠部分的面积较小,那么备选框与待去重检测框很可能不为重叠的检测框,电子设备可以保留该待去重检测框。
S204,从所述冗余框集合中确定置信度最高的待去重检测框为所述备选框,返回所述将除所述备选框外的待去重检测框作为冗余框集合的步骤,直至确定每个所述备选框以及每个所述备选框的抑制属性。
在将冗余框集合中与备选框之间的交并比小于第三阈值的所有待去重检测框去除后,由于未被去除待去重检测框中很可能存在重叠的检测框,因此,的电子设备可以从冗余框集合中确定置信度最高的待去重检测框为新的备选框,并返回执行上述将除所述备选框外的待去重检测框作为冗余框集合的步骤,直至确定每个备选框以及每个备选框的抑制属性。最后,备选框中也就不存在重叠的检测框。
可见,本发明实施例所提供的方案中,电子设备可以按照上述步骤确定每个备选框以及每个备选框的抑制属性。这样,电子设备可以实现对待去重检测框的去重处理,从而可以提高确定目标检测框的效率,并可以提高目标检测框的准确度。
作为本发明实施例的一种实施方式,如图3所示,上述获取所述第一待检测图像的图像特征,利用行人目标检测模型对所述第一待检测图像的图像特征进行行人目标检测,获得所述第一待检测图像中的多个第一待去重检测框的步骤,可以包括:
S301,基于所述第一待检测图像和预设缩放比例,构建所述第一待检测图像的图像金字塔;
在车辆附近可能存在多个行人,且该多个行人与车辆之间的距离是不一样的。当行人与车辆之间的距离越近时,该行人在第一待检测图像中所占的区域的面积越大;当行人与车辆之间的距离越远时,该行人在后方图像中所占的区域的面积越小。因此,当获取到第一待检测图像时,该第一待检测图像中可能包含多个所占的区域的面积不同的行人。
由于目标检测模型通常通过预设尺寸的检测窗口对图像进行目标检测,所以目标检测模型能够检测的目标大小是固定的。为了准确地检测出上述第一待检测图像中的每个行人,电子设备可以基于第一待检测图像和预设缩放比例,构建第一待检测图像的图像金字塔。其中,第一待检测图像的图像金字塔为对第一待检测图像进行多尺度变换处理的一种方式,在构建第一待检测图像的图像金字塔时,第一待检测图像可以作为图像金字塔的第一层子图像,图像金字塔中包括多层子图像,每一层子图像均为将其上一层子图像按照预设缩放比例进行缩放得到的,子图像的尺寸不小于目标检测模型的检测窗口的尺寸,目标检测模型为行人目标检测模型、行人下半身目标检测模型或行人头部目标检测模型。
例如,第一待检测图像的尺寸为1280×720,预设的缩小比例为1.06,电子设备按照该缩小比例对尺寸为1280×720的第一待检测图像进行缩小处理47次后,可以得到47张不同大小的子图像,其中尺寸最小的子图像的尺寸为80×45,最终得到包括48张(第一待检测图像1张+子图像47张)图像的图像金字塔。
构建第一待检测图像的图像金字塔时,可以采用线性插值的方式确定子图像中像素点的像素值。例如,可以采用单线性插值、双线性插值、三线性插值的方式,确定子图像中像素点的像素值,具体的线性插值的方式可以根据需求进行选择,在此不做具体限定。
S302,分别提取每层所述子图像的图像特征,并分别通过行人目标检测模型对每层所述子图像的图像特征进行行人目标检测,获得每层所述子图像中的第一候选检测框;
在得到第一待检测图像的图像金字塔后,电子设备可以分别提取每层子图像的图像特征,并分别通过行人目标检测模型对每层子图像的图像特征进行行人目标检测,获得每层子图像中的第一候选检测框,其中,第一候选检测框用于标识行人在子图像中所占的区域。
S303,基于所述第一候选检测框及所述第一候检测框所属的子图像与所述第一待检测图像之间的缩放比例,确定所述第一待检测图像中的第一待去重检测框;
针对每个行人,在不同的子图像中可能存在多个用于标识该行人所占区域的第一候选检测框。为了从该多个第一候选检测框中确定可以最准确的表征行人所占区域的检测框,电子设备可以基于第一候选检测框及第一候检测框所属的子图像与第一待检测图像之间的缩放比例,确定第一待检测图像中的第一待去重检测框。
具体的,针对每个子图像中的第一候选检测框,电子设备可以按照该子图像与第一待检测图像之间的缩放比例,缩放该子图像中的第一候选检测框,得到缩放后的第一候选检测框的尺寸和位置,也就是该第一候选检测框在第一待检测图像对应的第一待去重检测框的尺寸和位置,从而得到第一待去重检测框。
例如,子图像Dt1中包括第一候选检测框k1,其中第一候选检测框k1对应的向量为 (1,10,20,15),表示第一候选检测框k1的左上角在图像坐标系中的坐标为(1,10)、第一候选检测框k1的宽为20,高为15,若子图像Dt1与第一待检测图像H1之间的缩放比例为1:20时,那么基于该缩放比例,可以确定第一候选检测框k1在第一待检测图像中对应的第一待去重检测框的向量为(20,200,400,300),也就是第一待去重检测框的左上角在第一待检测图像的图像坐标系中的坐标为(20,200)、第一待去重检测框的宽为400,高为300。
相应的,上述利用行人下半身检测模型对所述第一待检测图像的图像特征进行行人下半身目标检测,获得所述第一待检测图像中的多个第二待去重检测框的步骤,可以包括:
分别通过行人下半身检测模型对每层子图像的图像特征进行行人下半身目标检测,获得每层子图像中的第二候选检测框;基于第二候选检测框及第二候选检测框所属的子图像与第一待检测图像之间的缩放比例,确定第一待检测图像中的第二待去重检测框。
在得到第一待检测图像的图像金字塔后,电子设备可以分别提取每层子图像的图像特征,并分别通过行人下半身检测模型对每层子图像的图像特征进行行人下半身目标检测,获得每层子图像中的第二候选检测框。
针对每个行人,在不同的子图像中可能存在多个用于标识该行人的下半身所占区域的第二候选检测框。为了从该多个第二候选检测框中确定可以最准确的表征行人下半身所占区域的检测框,电子设备可以基于第二候选检测框及第二候检测框所属的子图像与第一待检测图像之间的缩放比例,确定第一待检测图像中的第二待去重检测框。
具体的,针对每个子图像中的第二候选检测框,电子设备可以按照该子图像与第一待检测图像之间的缩放比例,缩放该子图像中的第二候选检测框,得到缩放后的第二候选检测框的尺寸和位置,也就是该第二候选检测框在第一待检测图像对应的第一待去重检测框的尺寸和位置,从而得到第二待去重检测框。
可见,本发明实施例所提供的方案中,电子设备可以根据上述步骤确定第一待检测图像中的第一待去重检测框和第二待去重检测框。这样,电子设备可以准确地检测出第一待检测图像中的每个行人,提高行人目标检测结果的准确度。
作为本发明实施例的一种实施方式,上述图像特征至少包括以下一种:
第一种:获取待提取图像中每个像素点的亮度值,作为待提取图像的图像特征。
电子设备可以获取待提取图像中每个像素点的亮度值,作为待提取图像的图像特征。其中,待提取图像为上述第一待检测图像或上述子图像。
在一种实施方式中,上述待提取图像可以为RGB图像,针对待提取图像中的每个像素点,电子设备可以提取该像素点对应的Red通道、Green通道、Blue通道的参数值,然后根据如下所示公式计算该像素点的亮度值,进而得到待提取图像中的每个像素点的亮度值I, 作为子图像的图像特征:
I=0.299R+0.587G+0.114B
在另一种实施方式中,上述待提取图像可以为YUV420SP格式的鱼眼图像,针对待提取图像中的每个像素点,电子设备可以提取该像素点对应的Y通道参数,作为该像素点的亮度值:I(x,y)=Y(x,y),其中,I(x,y)为像素点(x,y)的亮度值,Y(x,y)为像素点(x,y)的Y通道参数值。
第二种:获取待提取图像中每个像素点的亮度值,基于待提取图像中每个像素点的亮度值,确定待提取图像中每个像素点的亮度梯度幅值,将亮度梯度幅值作为待提取图像的图像特征。
电子设备可以获取待提取图像中每个像素点的亮度值,然后基于待提取图像中每个像素点的亮度值及其坐标,确定待提取图像中每个像素点的亮度梯度幅值,作为待提取图像的图像特征。
待提取图像的亮度梯度幅值G可以表示为:
Figure PCTCN2021113129-appb-000001
其中,G x表示图像坐标系x轴方向的亮度梯度幅值,G y表示图像坐标系y轴方向的亮度梯度幅值,I(x+1,y)为像素点(x+1,y)的亮度值,I(x,y)为像素点(x,y)的亮度值,I(x,y+1)为像素点(x,y+1)的亮度值。
那么,待提取图像的亮度梯度幅值G的数值即为:
Figure PCTCN2021113129-appb-000002
在一种实施方式中,由于计算待提取图像的亮度梯度幅值G的数值时需要求
Figure PCTCN2021113129-appb-000003
的平方根,该计算量较大,耗时较长,为了减少计算耗时,电子设备可以根据如下所示公式计算待提取图像的亮度梯度幅值G的数值:
G(x,y)=|G x+G y|
第三种:获取待提取图像中每个像素点的亮度值,基于待提取图像中每个像素点的亮度值,确定待提取图像中每个像素点的亮度梯度幅值,基于待提取图像中每个像素点的亮度梯度幅值确定每个像素点的亮度梯度方向,并基于待提取图像中每个像素点的亮度梯度方向确定待提取图像对应的亮度梯度方向直方图,将亮度梯度方向直方图作为待提取图像的图像特征。
电子设备在确定每个像素点的亮度梯度幅值之后,可以通过如下所示的公式计算每个像素点的亮度梯度方向θ:
θ=arctan G y/G x
在得到待提取图像中每个像素点的亮度梯度方向后,电子设备可以基于每个像素点的亮度梯度方向,确定待提取图像对应的亮度梯度方向直方图,作为待提取图像的图像特征。其中,在亮度梯度方向直方图中,横坐标表示亮度梯度方向,纵坐标表示 待提取图像中的像素点在对应的亮度梯度方向的亮度梯度幅值。
具体的,如图4所示,梯度方向直方图中设置有9个方向类别,分别为0°方向、20°方向、40°方向、60°方向、80°方向、100°方向、120°方向、140°方向及160°方向,每两个相邻的方向类别之间相差20°。其中,401为像素点1的亮度梯度方向值,402为像素点2的亮度梯度方向值,403为像素点1的亮度梯度幅值,404为像素点2的亮度梯度幅值。
当像素点1的亮度梯度方向为80°、亮度梯度幅值为2时,电子设备可以确定像素点1在80°方向的亮度梯度幅值为2;当像素点2的亮度梯度方向为10°、亮度梯度幅值为4时,电子设备可以确定像素点2在0°方向的亮度梯度幅值为
Figure PCTCN2021113129-appb-000004
并确定像素点2在20°方向的亮度梯度幅值为
Figure PCTCN2021113129-appb-000005
其中,亮度梯度方向可以根据具体要求离散,例如,可以设置每两个相邻的方向类别之间相差30°,那么梯度方向直方图中也就设置有6个方向类别;又例如,可以设置每两个相邻的方向类别之间相差45°,那么梯度方向直方图中也就设置有4个方向类别。
在一种实施方式中,为了更快的确定G y/G x的数值,可以预先建立G y/G x对应的第一查找表,当需要计算G y/G x的数值时,电子设备可以根据第一查找表包含的G y、G x与G y/G x之间的对应关系,确定G y/G x的数值。
例如,预先建立的第一查找表如下表所示:
G y G x G y/G x
1 5 0.2
2 4 0.5
6 10 0.6
1 12 1
当电子设备确定G y为1、G x为5时,根据上表可以确定G y/G x的数值为0.2。
在另一种实施方式中,为了更快地确定每个像素点的梯度方向θ,电子设备可以预先建立反正切函数对应的第二查找表,根据arctan G y/G x的数值与G y/G x之间的对应关系,确定亮度梯度方向θ。
例如,预先建立的第二查找表如下表所示:
Figure PCTCN2021113129-appb-000006
当G y/G x的数值为1时,电子设备可以根据上表确定θ=arctan G y/G x=45°。
可见,本发明实施例所提供的方案中,电子设备可以通过以上方式确定待提取图像的 图像特征。这样,电子设备可以通过多种方式准确地确定待提取图像中的图像特征。
作为本发明实施例的一种实施方式,目标检测模型的训练方式可以包括如下步骤:
A1,获取初始目标检测模型及多个图像样本,其中,每个图像样本包括目标对象;
目标检测模型包括上述行人目标检测模型、行人下半身目标检测模型及行人头部目标检测模型。为了获得可以准确检测图像中的完整的行人、行人的下半身区域或行人的头部区域的目标检测模型,电子设备可以获取初始目标检测模型及多个图像样本,基于多个图像样本对初始目标模型进行训练。
由于训练完成的目标检测模型需要检测图像中的完整的行人、行人的下半身或行人的头部,因此,上述图像样本可以为包含行人的图像样本、包含行人下半身区域的图像样本或包含行人头部区域的图像样本,上述目标对象也就是完整的行人、行人的下半身或行人的头部。
当图像样本为包含完整的行人的图像样本时,基于图像样本对初始目标检测模型进行训练可以得到上述行人目标检测模型;当图像样本为包含行人下半身区域的图像样本时,基于图像样本对初始目标检测模型进行训练可以得到上述行人下半身目标检测模型;当图像样本为包含行人头部区域的图像样本时,基于图像样本对初始目标检测模型进行训练可以得到上述行人头部目标检测模型。
A2,标记目标对象在每个图像样本中所占的区域,得到每个图像样本对应的标记检测框,作为标记标签;
由于训练得到的目标检测模型需要对图像进行检测,得到用于标识目标对象在图像中所占的区域的检测框,因此,针对每个图像样本,可以预先标记目标对象在该图像样本中所占的区域,得到该图像样本对应的标记检测框,作为该图像样本的标记标签。
A3,将每个图像样本输入初始目标检测模型,根据每个图像样本的图像特征进行检测,得到每个图像样本包括的目标对象对应的检测框,作为每个图像样本的预测标签;
A4,基于预测标签及对应的图像样本的标定标签的差异,调整初始目标检测模型的参数,直到初始目标检测模型收敛,停止训练,得到目标检测模型。
电子设备可以将上述预测标签与对应的标定标签进行对比,进而根据预测标签与对应的标定标签之间的差异,调整初始目标检测模型的参数,以使初始目标检测模型的参数更加合适。其中,调整初始目标检测模型的参数的方式可以为梯度下降算法、随机梯度下降算法等模型参数调整方式,在此不做具体限定及说明。
为了确定上述初始目标检测模型是否收敛,电子设备可以判断初始目标检测模型的迭代次数是否达到预设次数,或,初始目标检测模型的总损失函数是否不大于预设值。
如果初始目标检测模型的迭代次数达到预设次数,或,初始目标检测模型的总损失函数不大于预设值,说明当前初始目标检测模型已经收敛,也就是说,当前初始目标检测模型可以对图像进行检测得到准确的输出结果,所以此时可以停止训练,得到目标检测模型。
其中,上述预设次数可以根据检测要求、模型结构等因素设定,例如,可以为6000次、9000次、12000次等,在此不做具体限定。预设值可以根据检测要求、模型结构等因素设定,例如可以为1、0.9、0.75等,在此不做具体限定。
如果初始目标检测模型的迭代次数未达到预设次数,或,初始目标检测模型的总损失函数大于预设值,说明当前初始目标检测模型还未收敛,也就是说,当前初始目标检测模型对图像进行检测得到输出结果还不够准确,那么电子设备需要继续训练初始目标检测模型。
作为本发明实施例的一种实施方式,在得到上述行人目标检测结果之后,上述方法还可以包括:
步骤1,针对每个目标检测框,基于该目标检测框中的预设检测点在第一待检测图像的图像坐标系中的坐标以及预设映射关系,确定该预设检测点在第一待检测图像对应的俯视图的图像坐标系中的坐标,作为目标坐标;
在得到行人目标检测结果之后,为了确定行人目标检测结果中包含的检测框所对应的行人与车辆之间的距离,针对每个目标检测框,电子设备可以基于该目标检测框中的预设检测点在第一待检测图像的图像坐标系中的坐标以及预设映射关系,确定该预设检测点在第一待检测图像对应的俯视图的图像坐标系中的坐标,作为目标坐标。其中,上述预设检测点为预先设置的用于标识目标检测框所对应的行人在第一待检测图像中位置的坐标点,可以为目标检测框中的像素点,例如,可以为目标检测框的左下角点、右下角点、下边界的中点等。
电子设备可以基于预设检测点的坐标,以及预设映射关系,确定预设检测点在第一待检测图像对应的俯视图的图像坐标系中的坐标,作为目标坐标。其中,上述目标坐标用于标识行人在上述俯视图中的位置,上述预设映射关系用于将第一待检测图像中的像素点映射至第一待检测图像对应的俯视图中,上述预设映射关系可以根据图像采集设备的内参和外参进行设置,内参可以包括图像采集设备的焦距、畸变系数、像主点坐标等,外参可以包括图像采集设备的位置、俯仰角、翻滚角等。
步骤2,针对每个目标检测框,基于目标坐标与俯视图中预先标定的图像采集设备坐标之间的距离,以及俯视图的比例尺,确定该目标检测框对应的目标行人与车辆之间的距离;
针对每个目标检测框,电子设备可以基于目标坐标与俯视图中预先标定的图像采集设备坐标之间的距离,以及俯视图的比例尺,确定该目标检测框对应的目标行人与车辆之间的距离。其中,图像采集设备坐标用于标识图像采集设备在上述俯视图中的位置。
假设预先标定的图像采集设备坐标为(x0,y0),目标坐标为(x*1,y*1),俯视图的比例尺为1:2.5cm,表示俯视图中每两个相邻的像素点之间对应的距离为2.5cm,那么目标坐标与图像采集设备坐标在俯视图坐标系的X轴方向上的距离可以表示为:Dx=(x*1-x0)×2.5,目标坐标与图像采集设备坐标在俯视图坐标系的Y轴方向上的距离可以表示为:Dy=(y*1-y0)×2.5。然后,电子设备便可以计算目标检测框对应的目标行人与车辆之间的距离D为:
Figure PCTCN2021113129-appb-000007
在一种实施方式中,为了获取到角度更广的后方图像,上述图像采集设备可以采用鱼眼相机,上述第一待检测图像可以为YUV格式的鱼眼照片。
步骤3,基于目标行人与车辆之间的距离以及预设的车辆控制规则,控制车辆进行报警和/或刹车。
在确定目标行人与车辆之间的距离之后,为了警示目标行人避让正在行驶的车辆,并避免车辆碰撞到目标行人,电子设备可以根据该距离以及预设的车辆控制规则,控制车辆进行报警和/或刹车。其中,上述车辆控制规则包括距离与车辆行为之间的对应关系,车辆行为即为上述报警、刹车等行为。
例如,预设的车辆控制规则为:当目标行人与车辆之间的距离大于3米时,控制车辆鸣笛报警;当目标行人与车辆之间的距离大于1.5米且不大于3米时,控制车辆刹车;当目标行人与车辆之间的距离不大于1.5米时,控制车辆鸣笛报警并刹车。当电子设备确定目标行人M1与车辆之间的距离为2.9米时,便可以控制车辆刹车;当电子设备确定目标行人M2与车辆之间的距离为3.5米时,便可以控制车辆鸣笛报警。
这样,电子设备可以准确确定目标行人与车辆之间的距离,并且基于该距离和预设的车辆控制规则及时控制车辆的车辆行为,可以避免车辆慢速行驶的过程中发生危险。
相应于上述一种行人目标的检测方法,本发明实施例还提供了一种行人目标的检测装置。下面对本发明实施例提供的一种行人目标的检测装置进行介绍。
如图5所示,一种行人目标的检测装置,所述装置包括:
图像获取模块501,用于获取车辆上安装的图像采集设备所采集的图像,得到第一待检测图像;
第一检测模块502,用于对所述第一待检测图像进行行人目标检测,得到所述第一待 检测图像中的至少一个第一检测框;
其中,所述第一检测框用于标识行人在所述第一待检测图像中所占区域。
第二检测模块503,用于对所述第一待检测图像进行行人下半身目标检测,得到所述第一待检测图像中的至少一个第二检测框;
其中,所述第二检测框用于标识行人的下半身在所述第一待检测图像中所占区域。
判断模块504,用于针对任一所述第一检测框和任一所述第二检测框,根据所述第一检测框和所述第二检测框在所述第一待检测图像中的位置,确定所述第一检测框和所述第二检测框是否标识同一行人;
第一检测结果确定模块505,用于针对所述第一检测框和所述第二检测框为标识同一行人的情况,基于所述第一检测框确定目标检测框;
其中,所述目标检测框用于标识所述行人在所述第一待检测图像中的行人目标检测结果。
第二检测结果确定模块506,用于针对所述第一检测框和所述第二检测框不为标识同一行人的情况,基于所述第一检测框、所述第二检测框分别确定所述目标检测框。
可见,本发明实施例所提供的方案中,获取车辆上安装的图像采集设备所采集的图像,得到第一待检测图像;对第一待检测图像进行行人目标检测,得到第一待检测图像中的至少一个第一检测框,其中,第一检测框用于标识行人在第一待检测图像中所占区域;对第一待检测图像进行行人下半身目标检测,得到第一待检测图像中的至少一个第二检测框,其中,第二检测框用于标识行人的下半身在第一待检测图像中所占区域;针对任一第一检测框和任一第二检测框,根据第一检测框和第二检测框在第一待检测图像中的位置,确定第一检测框和第二检测框是否标识同一行人;针对第一检测框和第二检测框为标识同一行人的情况,基于第一检测框确定目标检测框,其中,目标检测框用于标识行人在第一待检测图像中的行人目标检测结果;针对第一检测框和第二检测框不为标识同一行人的情况,基于第一检测框、第二检测框分别确定目标检测框。
电子设备可以对第一待检测图像进行行人目标检测,并可以通过对第一待检测图像进行行人下半身目标检测,得到第一检测框及第二检测框,并基于第一检测框及第二检测框的位置确定第一检测框及第二检测框是否标识同一行人,进而得到行人目标检测结果。这样,当第一待检测图像中存在不完整的行人时,电子设备可以准确地检测出该不完整的行人,可以提高行人目标检测结果的准确度,避免车辆慢速行驶的过程中发生危险。
作为本发明实施例的一种实施方式,上述第二检测结果确定模块506可以包括:
检测子模块(图5中未示出),用于对第二待检测图像进行行人头部目标检测;
其中,所述第二待检测图像为所述第一检测框内的图像。
检测结果确定子模块(图5中未示出),用于针对所述第一检测框内检测到行人头部的情况,将所述第一检测框作为所述目标检测框。
作为本发明实施例的一种实施方式,上述第一检测结果确定模块505可以包括:
第一检测结果确定子模块(图5中未示出),用于基于所述第二检测框的下边界调整所述第一检测框的下边界得到第三检测框,将所述第三检测框作为目标检测框。
作为本发明实施例的一种实施方式,上述第一检测模块502可以包括:
第一待去重检测框确定子模块(图5中未示出),用于获取所述第一待检测图像的图像特征,利用行人目标检测模型对所述第一待检测图像的图像特征进行行人目标检测,获得所述第一待检测图像中的多个第一待去重检测框;
第一去重子模块(图5中未示出),用于对所述多个第一待去重检测框进行去重处理,得到所述第一待检测图像中的至少一个第一检测框;
上述第二检测模块503可以包括
第二待去重检测框确定子模块(图5中未示出),用于利用行人下半身检测模型对所述第一待检测图像的图像特征进行行人下半身目标检测,获得所述第一待检测图像中的多个第二待去重检测框;
第二去重子模块(图5中未示出),用于对所述多个第二待去重检测框进行去重处理,得到所述第一待检测图像中的至少一个第二检测框。
作为本发明实施例的一种实施方式,上述第一去重子模块可以包括:
第一去重单元(图5中未示出),用于对所有所述第一待去重检测框进行非极大值抑制处理,获得第一备选框以及所述第一备选框的第一抑制属性;
其中,所述第一抑制属性为非极大值抑制处理时基于所述第一备选框去除的第一待去重检测框的数量。
第一检测框确定单元(图5中未示出),用于将所述第一抑制属性不小于第一阈值的所述第一备选框作为第一检测框;
上述第二去重子模块可以包括:
第二去重单元(图5中未示出),用于对所有所述第二待去重检测框进行非极大值抑制处理,获得第二备选框以及所述第二备选框的第二抑制属性;
其中,所述抑制属性为非极大值抑制处理时基于所述第二备选框去除的第二待去重检测框的数量。
第二检测框确定单元(图5中未示出),用于将所述第二抑制属性不小于第二阈值的 所述第二备选框作为第二检测框。
作为本发明实施例的一种实施方式,上述装置还可以包括:
选择模块(图5中未示出),用于将置信度最高的待去重检测框作为备选框,将除所述备选框外的待去重检测框作为冗余框集合;
其中,所述待去重检测框为所述第一待去重检测框或所述第二待去重检测框,所述备选框为所述第一备选框或所述第二备选框。
交并比确定模块(图5中未示出),用于针对所述冗余框集合中的每一所述待去重检测框,计算所述备选框与该待去重检测框之间的交并比;
去除模块(图5中未示出),用于若所述交并比不小于第三阈值,从所述冗余框集合中去除该待去重检测框,并更新所述备选框的抑制属性;
其中,所述抑制属性为所述第一抑制属性或所述第二抑制属性。
返回模块(图5中未示出),用于从所述冗余框集合中确定置信度最高的待去重检测框为所述备选框,返回将除所述备选框外的待去重检测框作为冗余框集合,直至确定每个所述备选框以及每个所述备选框的抑制属性。
作为本发明实施例的一种实施方式,上述第一待去重检测框确定子模块可以包括:
图像金字塔构建单元(图5中未示出),用于基于所述第一待检测图像和预设缩放比例,构建所述第一待检测图像的图像金字塔;
其中,所述图像金字塔包括多层子图像。
第一候选检测框确定单元(图5中未示出),用于分别提取每层所述子图像的图像特征,并分别通过行人目标检测模型对每层所述子图像的图像特征进行行人目标检测,获得每层所述子图像中的第一候选检测框;
第一待去重检测框确定单元(图5中未示出),用于基于所述第一候选检测框及所述第一候检测框所属的子图像与所述第一待检测图像之间的缩放比例,确定所述第一待检测图像中的第一待去重检测框;
上述第二待去重检测框确定子模块可以包括:
第二候选检测框确定单元(图5中未示出),用于分别通过行人下半身检测模型对每层所述子图像的图像特征进行行人下半身目标检测,获得每层所述子图像中的第二候选检测框;
第二待去重检测框确定单元(图5中未示出),用于基于所述第二候选检测框及所述第二候选检测框所属的子图像与所述第一待检测图像之间的缩放比例,确定所述第一待检测图像中的第二待去重检测框。
作为本发明实施例的一种实施方式,上述图像特征至少包括以下一种:
第一种:获取待提取图像中每个像素点的亮度值,作为所述待提取图像的图像特征,其中,所述待提取图像为所述第一待检测图像或所述子图像;
第二种:获取所述待提取图像中每个像素点的亮度值,基于所述待提取图像中每个像素点的亮度值,确定所述待提取图像中每个像素点的亮度梯度幅值,将所述亮度梯度幅值作为所述待提取图像的图像特征;
第三种:获取所述待提取图像中每个像素点的亮度值,基于所述待提取图像中每个像素点的亮度值,确定所述待提取图像中每个像素点的亮度梯度幅值,基于所述待提取图像中每个像素点的亮度梯度幅值确定每个像素点的亮度梯度方向,并基于所述待提取图像中每个像素点的亮度梯度方向确定所述待提取图像对应的亮度梯度方向直方图,将所述亮度梯度方向直方图作为所述待提取图像的图像特征。
本发明实施例还提供了一种电子设备,如图6所示,包括处理器601、通信接口602、存储器603和通信总线604,其中,处理器601,通信接口602,存储器603通过通信总线604完成相互间的通信,
存储器603,用于存放计算机程序;
处理器601,用于执行存储器603上所存放的程序时,实现上述任一实施例所述的行人目标检测的方法步骤。
可见,本发明实施例所提供的方案中,电子设备可以获取车辆上安装的图像采集设备所采集的图像,得到第一待检测图像;对第一待检测图像进行行人目标检测,得到第一待检测图像中的至少一个第一检测框,其中,第一检测框用于标识行人在第一待检测图像中所占区域;对第一待检测图像进行行人下半身目标检测,得到第一待检测图像中的至少一个第二检测框,其中,第二检测框用于标识行人的下半身在第一待检测图像中所占区域;针对任一第一检测框和任一第二检测框,根据第一检测框和第二检测框在第一待检测图像中的位置,确定第一检测框和第二检测框是否标识为同一行人;针对第一检测框和第二检测框为标识同一行人的情况,基于第一检测框确定目标检测框,其中,目标检测框用于标识行人在第一待检测图像中的行人目标检测结果;针对第一检测框和第二检测框不为标识同一行人的情况,基于第一检测框、第二检测框分别确定目标检测框。
电子设备可以对第一待检测图像进行行人目标检测,并可以通过对第一待检测图像进行行人下半身目标检测,得到第一检测框及第二检测框,并基于第一检测框及第二检测框的位置确定第一检测框及第二检测框是否标识同一行人,进而得到行人目标检测结果。这样,当第一待检测图像中存在不完整的行人时,电子设备可以准确地检测出该不完整的行 人,可以提高行人目标检测结果的准确度,避免车辆慢速行驶的过程中发生危险。
上述电子设备提到的通信总线可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
通信接口用于上述电子设备与其他设备之间的通信。
存储器可以包括随机存取存储器(Random Access Memory,RAM),也可以包括非易失性存储器(Non-Volatile Memory,NVM),例如至少一个磁盘存储器。可选的,存储器还可以是至少一个位于远离前述处理器的存储装置。
上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
在本发明提供的又一实施例中,还提供了一种计算机可读存储介质,该计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现上述任一实施例所述的行人目标检测的方法步骤。
可见,本发明实施例所提供的方案中,计算机可读存储介质内存储的计算机程序被处理器执行时,可以获取车辆上安装的图像采集设备所采集的图像,得到第一待检测图像;对第一待检测图像进行行人目标检测,得到第一待检测图像中的至少一个第一检测框,其中,第一检测框用于标识行人在第一待检测图像中所占区域;对第一待检测图像进行行人下半身目标检测,得到第一待检测图像中的至少一个第二检测框,其中,第二检测框用于标识行人的下半身在第一待检测图像中所占区域;针对任一第一检测框和任一第二检测框,根据第一检测框和第二检测框在第一待检测图像中的位置,确定第一检测框和第二检测框是否标识同一行人;针对第一检测框和第二检测框为标识同一行人的情况,基于第一检测框确定目标检测框,其中,目标检测框用于标识行人在第一待检测图像中的行人目标检测结果;针对第一检测框和第二检测框不为标识同一行人的情况,基于第一检测框、第二检测框分别确定目标检测框。
电子设备可以对第一待检测图像进行行人目标检测,并可以通过对第一待检测图像进行行人下半身目标检测,得到第一检测框及第二检测框,并基于第一检测框及第二检测框的位置确定第一检测框及第二检测框是否标识同一行人,进而得到行人目标检测结果。这 样,当第一待检测图像中存在不完整的行人时,电子设备可以准确地检测出该不完整的行人,可以提高行人目标检测结果的准确度,避免车辆慢速行驶的过程中发生危险。
在本发明提供的又一实施例中,还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述任一实施例所述的行人目标检测的方法步骤。
可见,本发明实施例所提供的方案中,包含指令的计算机程序产品在计算机上运行时,可以获取车辆上安装的图像采集设备所采集的图像,得到第一待检测图像;对第一待检测图像进行行人目标检测,得到第一待检测图像中的至少一个第一检测框,其中,第一检测框用于标识行人在第一待检测图像中所占区域;对第一待检测图像进行行人下半身目标检测,得到第一待检测图像中的至少一个第二检测框,其中,第二检测框用于标识行人的下半身在第一待检测图像中所占区域;针对任一第一检测框和任一第二检测框,根据第一检测框和第二检测框在第一待检测图像中的位置,确定第一检测框和第二检测框是否标识同一行人;针对第一检测框和第二检测框为标识同一行人的情况,基于第一检测框确定目标检测框,其中,目标检测框用于标识行人在第一待检测图像中的行人目标检测结果;针对第一检测框和第二检测框不为标识同一行人的情况,基于第一检测框、第二检测框分别确定目标检测框。
电子设备可以对第一待检测图像进行行人目标检测,并可以通过对第一待检测图像进行行人下半身目标检测,得到第一检测框及第二检测框,并基于第一检测框及第二检测框的位置确定第一检测框及第二检测框是否标识同一行人,进而得到行人目标检测结果。这样,当第一待检测图像中存在不完整的行人时,电子设备可以准确地检测出该不完整的行人,可以提高行人目标检测结果的准确度,避免车辆慢速行驶的过程中发生危险。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导 体介质(例如固态硬盘Solid State Disk(SSD))等。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。
以上所述仅为本发明的较佳实施例,并非用于限定本发明的保护范围。凡在本发明的精神和原则之内所作的任何修改、等同替换、改进等,均包含在本发明的保护范围内。

Claims (10)

  1. 一种行人目标的检测方法,其特征在于,所述方法包括:
    获取车辆上安装的图像采集设备所采集的图像,得到第一待检测图像;
    对所述第一待检测图像进行行人目标检测,得到所述第一待检测图像中的至少一个第一检测框,其中,所述第一检测框用于标识行人在所述第一待检测图像中所占区域;
    对所述第一待检测图像进行行人下半身目标检测,得到所述第一待检测图像中的至少一个第二检测框,其中,所述第二检测框用于标识行人的下半身在所述第一待检测图像中所占区域;
    针对任一所述第一检测框和任一所述第二检测框,根据所述第一检测框和所述第二检测框在所述第一待检测图像中的位置,确定所述第一检测框和所述第二检测框是否标识同一行人;
    针对所述第一检测框和所述第二检测框为标识同一行人的情况,基于所述第一检测框确定目标检测框,其中,所述目标检测框用于标识所述行人在所述第一待检测图像中的行人目标检测结果;
    针对所述第一检测框和所述第二检测框不为标识同一行人的情况,基于所述第一检测框、所述第二检测框分别确定所述目标检测框。
  2. 根据权利要求1所述的方法,其特征在于,所述针对所述第一检测框和所述第二检测框不为标识同一行人的情况,基于所述第一检测框确定所述目标检测框,包括:
    对第二待检测图像进行行人头部目标检测,其中,所述第二待检测图像为所述第一检测框内的图像;
    针对所述第一检测框内检测到行人头部的情况,将所述第一检测框作为所述目标检测框。
  3. 根据权利要求1所述的方法,其特征在于,所述针对所述第一检测框和所述第二检测框为标识同一行人的情况,基于所述第一检测框确定目标检测框的步骤,包括:
    基于所述第二检测框的下边界调整所述第一检测框的下边界得到第三检测框,将所述第三检测框作为目标检测框。
  4. 根据权利要求1所述的方法,其特征在于,所述对所述第一待检测图像进行行人目标检测,得到所述第一待检测图像中的至少一个第一检测框的步骤,包括:
    获取所述第一待检测图像的图像特征,利用行人目标检测模型对所述第一待检测图像的图像特征进行行人目标检测,获得所述第一待检测图像中的多个第一待去重检测框;
    对所述多个第一待去重检测框进行去重处理,得到所述第一待检测图像中的至少一个第一检测框;
    所述对所述第一待检测图像进行行人下半身目标检测,得到所述第一待检测图像中的至少一个第二检测框的步骤,包括:
    利用行人下半身检测模型对所述第一待检测图像的图像特征进行行人下半身目标检测,获得所述第一待检测图像中的多个第二待去重检测框;
    对所述多个第二待去重检测框进行去重处理,得到所述第一待检测图像中的至少一个第二检测框。
  5. 根据权利要求4所述的方法,其特征在于,所述对所述多个第一待去重检测框进行去重处理,得到所述第一待检测图像中的至少一个第一检测框的步骤,包括:
    对所有所述第一待去重检测框进行非极大值抑制处理,获得第一备选框以及所述第一备选框的第一抑制属性,其中,所述第一抑制属性为非极大值抑制处理时基于所述第一备选框去除的第一待去重检测框的数量;
    将所述第一抑制属性不小于第一阈值的所述第一备选框作为第一检测框;
    所述对所述多个第二待去重检测框进行去重处理,得到所述第一待检测图像中的至少一个第二检测框的步骤,包括:
    对所有所述第二待去重检测框进行非极大值抑制处理,获得第二备选框以及所述第二备选框的第二抑制属性,其中,所述抑制属性为非极大值抑制处理时基于所述第二备选框去除的第二待去重检测框的数量;
    将所述第二抑制属性不小于第二阈值的所述第二备选框作为第二检测框。
  6. 根据权利要求5所述的方法,其特征在于,对所有待去重检测框进行非极大值抑制处理,获得备选框以及所述备选框的抑制属性,包括:
    将置信度最高的待去重检测框作为备选框,将除所述备选框外的待去重检测框作为冗余框集合,其中,所述待去重检测框为所述第一待去重检测框或所述第二待去重检测框,所述备选框为所述第一备选框或所述第二备选框;
    针对所述冗余框集合中的每一所述待去重检测框,计算所述备选框与该待去重检测框之间的交并比;
    若所述交并比不小于第三阈值,从所述冗余框集合中去除该待去重检测框,并更新所述备选框的抑制属性,其中,所述抑制属性为所述第一抑制属性或所述第二抑制属性;
    从所述冗余框集合中确定置信度最高的待去重检测框为所述备选框,返回所述将除所述备选框外的待去重检测框作为冗余框集合的步骤,直至确定每个所述备选框以及每个所 述备选框的抑制属性。
  7. 根据权利要求4所述的方法,其特征在于,所述获取所述第一待检测图像的图像特征,利用行人目标检测模型对所述第一待检测图像的图像特征进行行人目标检测,获得所述第一待检测图像中的多个第一待去重检测框的步骤,包括:
    基于所述第一待检测图像和预设缩放比例,构建所述第一待检测图像的图像金字塔,其中,所述图像金字塔包括多层子图像;
    分别提取每层所述子图像的图像特征,并分别通过行人目标检测模型对每层所述子图像的图像特征进行行人目标检测,获得每层所述子图像中的第一候选检测框;
    基于所述第一候选检测框及所述第一候检测框所属的子图像与所述第一待检测图像之间的缩放比例,确定所述第一待检测图像中的第一待去重检测框;
    所述利用行人下半身检测模型对所述第一待检测图像的图像特征进行行人下半身目标检测,获得所述第一待检测图像中的多个第二待去重检测框的步骤,包括:
    分别通过行人下半身检测模型对每层所述子图像的图像特征进行行人下半身目标检测,获得每层所述子图像中的第二候选检测框;
    基于所述第二候选检测框及所述第二候选检测框所属的子图像与所述第一待检测图像之间的缩放比例,确定所述第一待检测图像中的第二待去重检测框。
  8. 根据权利要求4-7任一项所述的方法,其特征在于,所述图像特征至少包括以下一种:
    第一种:获取待提取图像中每个像素点的亮度值,作为所述待提取图像的图像特征,其中,所述待提取图像为所述第一待检测图像或所述子图像;
    第二种:获取所述待提取图像中每个像素点的亮度值,基于所述待提取图像中每个像素点的亮度值,确定所述待提取图像中每个像素点的亮度梯度幅值,将所述亮度梯度幅值作为所述待提取图像的图像特征;
    第三种:获取所述待提取图像中每个像素点的亮度值,基于所述待提取图像中每个像素点的亮度值,确定所述待提取图像中每个像素点的亮度梯度幅值,基于所述待提取图像中每个像素点的亮度梯度幅值确定每个像素点的亮度梯度方向,并基于所述待提取图像中每个像素点的亮度梯度方向确定所述待提取图像对应的亮度梯度方向直方图,将所述亮度梯度方向直方图作为所述待提取图像的图像特征。
  9. 一种电子设备,其特征在于,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;
    存储器,用于存放计算机程序;
    处理器,用于执行存储器上所存放的程序时,实现权利要求1-8任一所述的方法步骤。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-8任一所述的方法步骤。
PCT/CN2021/113129 2020-12-22 2021-08-17 一种行人目标的检测方法、电子设备及存储介质 WO2022134624A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011522139.4 2020-12-22
CN202011522139.4A CN112257692B (zh) 2020-12-22 2020-12-22 一种行人目标的检测方法、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022134624A1 true WO2022134624A1 (zh) 2022-06-30

Family

ID=74225323

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/113129 WO2022134624A1 (zh) 2020-12-22 2021-08-17 一种行人目标的检测方法、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN112257692B (zh)
WO (1) WO2022134624A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115249355A (zh) * 2022-09-22 2022-10-28 杭州枕石智能科技有限公司 目标关联方法、设备及计算机可读存储介质
CN115272814A (zh) * 2022-09-28 2022-11-01 南昌工学院 一种远距离空间自适应多尺度的小目标检测方法
CN116563521A (zh) * 2023-04-14 2023-08-08 依未科技(北京)有限公司 目标检测的检测框处理方法及其装置、电子设备

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112257692B (zh) * 2020-12-22 2021-03-12 湖北亿咖通科技有限公司 一种行人目标的检测方法、电子设备及存储介质
CN112562093B (zh) * 2021-03-01 2021-05-18 湖北亿咖通科技有限公司 目标检测方法、电子介质和计算机存储介质
CN112926500B (zh) * 2021-03-22 2022-09-20 重庆邮电大学 一种结合头部和整体信息的行人检测方法
CN113420725B (zh) * 2021-08-20 2021-12-31 天津所托瑞安汽车科技有限公司 Bsd产品的漏报场景识别方法、设备、系统和存储介质
CN117152258B (zh) * 2023-11-01 2024-01-30 中国电建集团山东电力管道工程有限公司 一种管道生产智慧车间的产品定位方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378202A (zh) * 2019-06-05 2019-10-25 魔视智能科技(上海)有限公司 一种基于鱼眼镜头的全方位行人碰撞预警方法
CN110532985A (zh) * 2019-09-02 2019-12-03 北京迈格威科技有限公司 目标检测方法、装置及系统
US20200226415A1 (en) * 2015-07-09 2020-07-16 Texas Instruments Incorporated Window grouping and tracking for fast object detection
WO2020198704A1 (en) * 2019-03-28 2020-10-01 Phase Genomics, Inc. Systems and methods for karyotyping by sequencing
CN112257692A (zh) * 2020-12-22 2021-01-22 湖北亿咖通科技有限公司 一种行人目标的检测方法、电子设备及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416265A (zh) * 2018-01-30 2018-08-17 深圳大学 一种人脸检测方法、装置、设备及存储介质
CN109190680A (zh) * 2018-08-11 2019-01-11 复旦大学 基于深度学习的医疗药品图像的检测与分类方法
CN109766796B (zh) * 2018-12-20 2023-04-18 西华大学 一种面向密集人群的深度行人检测方法
CN110659576A (zh) * 2019-08-23 2020-01-07 深圳久凌软件技术有限公司 一种基于联合判断与生成学习的行人搜索方法及装置
CN111191533B (zh) * 2019-12-18 2024-03-19 北京迈格威科技有限公司 行人重识别的处理方法、装置、计算机设备和存储介质
CN111126399B (zh) * 2019-12-28 2022-07-26 苏州科达科技股份有限公司 一种图像检测方法、装置、设备及可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200226415A1 (en) * 2015-07-09 2020-07-16 Texas Instruments Incorporated Window grouping and tracking for fast object detection
WO2020198704A1 (en) * 2019-03-28 2020-10-01 Phase Genomics, Inc. Systems and methods for karyotyping by sequencing
CN110378202A (zh) * 2019-06-05 2019-10-25 魔视智能科技(上海)有限公司 一种基于鱼眼镜头的全方位行人碰撞预警方法
CN110532985A (zh) * 2019-09-02 2019-12-03 北京迈格威科技有限公司 目标检测方法、装置及系统
CN112257692A (zh) * 2020-12-22 2021-01-22 湖北亿咖通科技有限公司 一种行人目标的检测方法、电子设备及存储介质

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115249355A (zh) * 2022-09-22 2022-10-28 杭州枕石智能科技有限公司 目标关联方法、设备及计算机可读存储介质
CN115249355B (zh) * 2022-09-22 2022-12-27 杭州枕石智能科技有限公司 目标关联方法、设备及计算机可读存储介质
CN115272814A (zh) * 2022-09-28 2022-11-01 南昌工学院 一种远距离空间自适应多尺度的小目标检测方法
CN115272814B (zh) * 2022-09-28 2022-12-27 南昌工学院 一种远距离空间自适应多尺度的小目标检测方法
CN116563521A (zh) * 2023-04-14 2023-08-08 依未科技(北京)有限公司 目标检测的检测框处理方法及其装置、电子设备
CN116563521B (zh) * 2023-04-14 2024-04-23 依未科技(北京)有限公司 目标检测的检测框处理方法及其装置、电子设备

Also Published As

Publication number Publication date
CN112257692A (zh) 2021-01-22
CN112257692B (zh) 2021-03-12

Similar Documents

Publication Publication Date Title
WO2022134624A1 (zh) 一种行人目标的检测方法、电子设备及存储介质
CN108009543B (zh) 一种车牌识别方法及装置
CN109635685B (zh) 目标对象3d检测方法、装置、介质及设备
CN109087510B (zh) 交通监测方法及装置
US11455805B2 (en) Method and apparatus for detecting parking space usage condition, electronic device, and storage medium
WO2018108129A1 (zh) 用于识别物体类别的方法及装置、电子设备
CN108846826B (zh) 物体检测方法、装置、图像处理设备及存储介质
WO2021217625A1 (zh) 停车检测方法、系统、处理设备和存储介质
KR20210008083A (ko) 목표 검출 방법 및 장치 및 지능형 주행 방법, 기기 및 저장 매체
WO2019227954A1 (zh) 识别交通灯信号的方法、装置、可读介质及电子设备
CN110598512B (zh) 一种车位检测方法及装置
KR101848019B1 (ko) 차량 영역 검출을 통한 차량 번호판 검출 방법 및 장치
CN109919002B (zh) 黄色禁停线识别方法、装置、计算机设备及存储介质
US20220058422A1 (en) Character recognition method and terminal device
WO2023124133A1 (zh) 交通行为检测方法及装置、电子设备、存储介质和计算机程序产品
US20210350705A1 (en) Deep-learning-based driving assistance system and method thereof
WO2024016524A1 (zh) 基于独立非均匀增量采样的网联车位置估计方法及装置
CN111383246B (zh) 条幅检测方法、装置及设备
CN111898491A (zh) 一种车辆逆向行驶的识别方法、装置及电子设备
CN110751619A (zh) 一种绝缘子缺陷检测方法
WO2021115040A1 (zh) 图像校正方法、装置、终端设备和存储介质
CN117215327A (zh) 基于无人机的公路巡检检测及智能飞行控制方法
JP5928010B2 (ja) 道路標示検出装置及びプログラム
CN112308061B (zh) 一种车牌字符识别方法及装置
CN112464938A (zh) 车牌检测识别方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21908629

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21908629

Country of ref document: EP

Kind code of ref document: A1