CN111950543B

CN111950543B - Target detection method and device

Info

Publication number: CN111950543B
Application number: CN201910397246.XA
Authority: CN
Inventors: 杨磊
Original assignee: Beijing Jingdong Qianshi Technology Co Ltd
Current assignee: Beijing Jingbangda Trade Co Ltd; Beijing Jingdong Qianshi Technology Co Ltd
Priority date: 2019-05-14
Filing date: 2019-05-14
Publication date: 2024-08-16
Anticipated expiration: 2039-05-14
Also published as: CN111950543A

Abstract

The invention discloses a target detection method and device, and relates to the technical field of computers. One embodiment of the method comprises the following steps: image detection is carried out on the collected original image so as to obtain category information and a boundary frame of a target in the original image; clustering pixels belonging to the same target in the boundary frame based on depth information corresponding to the original image so as to divide the target in the boundary frame into areas; and counting the distribution of the depth values of the pixels in the boundary box to obtain the region corresponding to the pixels with the depth values in the set depth interval, and determining the position information of the target according to the region. According to the method, the boundary frame is obtained through image detection on an original image, after the boundary frame is subjected to regional division in a clustering mode, the non-target area in the boundary frame is removed through filtering by combining the counted depth values of the pixels in the boundary frame, the target area is reserved, the precision of the boundary frame covering the target is improved, and then the precision of target detection is improved.

Description

Target detection method and device

Technical Field

The present invention relates to the field of computers, and in particular, to a target detection method and apparatus.

Background

The unmanned vehicle is faced with the problem of how to avoid obstacles in the road running process, the target detection is an important component part of interaction between the unmanned vehicle and the surrounding environment, and is also a main basis for behavior prediction and decision planning, and the overall performance level of the unmanned vehicle is directly influenced by the good or bad effect of the target detection. The currently widely adopted target detection method comprises the following steps: target detection based on laser radar, target detection based on millimeter wave radar, target detection based on monocular, etc.

In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art:

Based on target detection of the laser radar, the sensor is high in cost, category information of targets cannot be obtained easily, and recognition capability of remote targets is weak; the detection precision is low based on the target detection of the millimeter wave radar, and pedestrians cannot be detected; based on monocular target detection, the method is difficult to calibrate with a laser radar and the like, and has low detection precision and large error.

Disclosure of Invention

In view of this, an embodiment of the present invention provides a method and an apparatus for detecting a target, where an original image is subjected to image detection to obtain a bounding box, and a clustering manner is adopted to divide a target in the bounding box, and then, a non-target area in the bounding box is filtered and removed by combining a counted depth value of a pixel in the bounding box, so that a target area is reserved, the precision of the bounding box covering the target is improved, and further, the precision of target detection is improved.

In order to achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a target detection method.

The target detection method of the embodiment of the invention comprises the following steps: image detection is carried out on the collected original image so as to obtain category information and a boundary box of a target in the original image; clustering pixels belonging to the same target in the boundary box based on the depth information corresponding to the original image so as to divide the target in the boundary box into areas; and counting the depth value distribution of pixels in the boundary box to obtain a region corresponding to the pixels with the depth values in the set depth interval, and determining the position information of the target according to the region.

Optionally, the counting depth value distribution of pixels within the bounding box includes: dividing the depth value into subintervals with the same interval according to the depth value of the pixel in the boundary box; and traversing the depth values of the pixels in the boundary frame, and respectively counting the number of the pixels falling in the subinterval to obtain the depth value distribution of the pixels in the boundary frame.

Optionally, the method further comprises: and determining a section with the most dense pixel distribution in the depth value distribution, and taking the determined section as the depth section.

Optionally, the method further comprises: collecting an original image by adopting a binocular camera; the clustering the pixels belonging to the same target in the boundary box comprises the following steps: traversing pixels within the bounding box; calculating an included angle between a line segment formed by two pixels and a line segment formed by one pixel and the coordinate origin by taking the origin of a binocular camera reference system as the coordinate origin and combining the depth values of any two traversed pixels; if the included angle is larger than a preset threshold value, the two pixels belong to the same target.

Optionally, the determining the location information of the target according to the area includes: converting pixels in the region into point cloud data; and determining a minimum three-dimensional boundary box surrounding the point cloud data, wherein the three-dimensional coordinates of the minimum three-dimensional boundary box are the position information of the target.

In order to achieve the above object, according to another aspect of the embodiments of the present invention, there is provided an object detection apparatus.

An object detection device according to an embodiment of the present invention includes: the image detection module is used for carrying out image detection on the acquired original image so as to obtain category information and a boundary box of a target in the original image; the region dividing module is used for clustering pixels belonging to the same target in the boundary frame based on the depth information corresponding to the original image so as to divide the region of the target in the boundary frame; the statistics determining module is used for counting the depth value distribution of the pixels in the boundary box so as to obtain a region corresponding to the pixels with the depth values in the set depth interval, and determining the position information of the target according to the region.

Optionally, the statistics determining module is further configured to: dividing the depth value into subintervals with the same interval according to the depth value of the pixel in the boundary box; and traversing the depth values of the pixels in the boundary frame, and respectively counting the number of the pixels falling in the subinterval to obtain the depth value distribution of the pixels in the boundary frame.

Optionally, the apparatus further comprises: and the depth interval determining module is used for determining an interval with the most dense pixel distribution in the depth value distribution, and taking the determined interval as the depth interval.

Optionally, the apparatus further comprises: the image acquisition module is used for acquiring an original image by adopting a binocular camera; the area dividing module is further configured to: traversing pixels within the bounding box; calculating an included angle between a line segment formed by two pixels and a line segment formed by one pixel and the coordinate origin by taking the origin of a binocular camera reference system as the coordinate origin and combining the depth values of any two traversed pixels; and if the included angle is larger than a preset threshold value, the two pixels belong to the same target.

Optionally, the statistics determining module is further configured to: converting pixels in the region into point cloud data; and determining a minimum three-dimensional boundary box surrounding the point cloud data, wherein the three-dimensional coordinates of the minimum three-dimensional boundary box are the position information of the target.

To achieve the above object, according to still another aspect of the embodiments of the present invention, there is provided an electronic device.

An electronic device according to an embodiment of the present invention includes: one or more processors; and the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are enabled to realize the target detection method according to the embodiment of the invention.

To achieve the above object, according to still another aspect of the embodiments of the present invention, there is provided a computer-readable medium.

A computer readable medium of an embodiment of the present invention has stored thereon a computer program which, when executed by a processor, implements a target detection method of an embodiment of the present invention.

One embodiment of the above invention has the following advantages or benefits: the method comprises the steps of obtaining a boundary frame through image detection on an original image, dividing the target in the boundary frame by adopting a clustering mode, and filtering to remove non-target areas in the boundary frame by combining the counted depth values of pixels in the boundary frame, so that the accuracy of the boundary frame covering the target is improved, and the accuracy of target detection is further improved; dividing the depth value into subintervals, and counting the number of pixels falling into each subinterval to obtain the depth value distribution of the pixels in the boundary frame, so that the distribution density state of the pixels can be intuitively seen; taking the interval with the most dense pixel distribution as a depth interval, extracting the pixel corresponding area of the depth interval, and accurately obtaining the target area only containing the target in the boundary frame, thereby further improving the target detection precision; the binocular camera is adopted to acquire the original image, so that the effective detection of a remote target can be realized, the detection precision is high, and the calibration difficulty between the camera and the laser radar is reduced; clustering is carried out based on the depth value, so that the clustering efficiency is high; and the pixels of the target area in the two-dimensional bounding box are converted into three-dimensional point cloud data, so that the stability of target detection is improved.

Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram of the main steps of a target detection method according to an embodiment of the invention;

FIG. 2 is a schematic diagram of the detection principle of the target detection method according to the embodiment of the invention;

FIG. 3 is a schematic flow chart of a target detection method according to an embodiment of the invention;

FIG. 4 is a schematic diagram of determining whether two pixels belong to the same object according to an embodiment of the present invention;

Fig. 5 (a) is a schematic diagram of an image detection result of performing image detection on an original image based on CNN in the first embodiment of the present invention;

fig. 5 (b) is a depth image corresponding to a pixel of an original image one according to the first embodiment of the present invention;

FIG. 5 (c) is a schematic diagram of the area division result according to the first embodiment of the present invention;

FIG. 6 is a schematic diagram of a method for distinguishing target areas from non-target areas according to an embodiment of the present invention;

fig. 7 (a) is a schematic diagram of an image detection result of performing image detection on an original image based on CNN in a second embodiment of the present invention;

FIG. 7 (b) is a schematic diagram of a result of filtering non-target areas of the original image II according to the second embodiment of the present invention;

Fig. 7 (c) is a schematic diagram of a target detection result according to a second embodiment of the present invention;

FIG. 8 is a schematic diagram of the main modules of an object detection apparatus according to an embodiment of the present invention;

FIG. 9 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;

Fig. 10 is a schematic structural diagram of a computer device suitable for use in an electronic apparatus to implement an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram of main steps of a target detection method according to an embodiment of the present invention. As shown in fig. 1, the target detection method in the embodiment of the invention mainly includes the following steps:

Step S101: and carrying out image detection on the acquired original image to obtain category information and a bounding box of the target in the original image. Image detection refers to locating objects of interest from an image, accurately judging specific categories of each object, and giving out a boundary box of each object. An image detection algorithm may be used to detect the image of the captured raw image. The image detection algorithm can be divided into two major types of a traditional method and a deep learning method based on CNN (Convolutional Neural Networks, convolutional neural network), and the robustness of the deep learning method based on CNN under the detection precision and different environments is superior to that of the traditional method. In the embodiment of the invention, the CNN-based deep learning method is selected for image detection, so that the category of the target and the two-dimensional boundary box of the target on the original image can be obtained.

Step S102: and clustering pixels belonging to the same target in the boundary box based on the depth information corresponding to the original image so as to divide the target in the boundary box into areas. And clustering pixels in the two-dimensional boundary box obtained by image detection, and judging whether the two pixels belong to the same target or not based on depth values corresponding to a plurality of pixels in the two-dimensional boundary box in the clustering process. Pixels belonging to the same object in the two-dimensional boundary box are gathered into the same class, and pixels with the same color can be used for representing the pixels belonging to the same object, so that the region division of the object in the two-dimensional boundary box is realized.

Step S103: and counting the depth value distribution of pixels in the boundary box to obtain a region corresponding to the pixels with the depth values in the set depth interval, and determining the position information of the target according to the region. In the above steps, although pixels belonging to the same target are clustered together, the influence of non-target areas such as background and noise on target positioning cannot be eliminated. Therefore, in the embodiment, on the basis of region division, a statistical method is introduced to determine the depth value distribution of a plurality of pixels in the two-dimensional boundary box, and based on the prior knowledge that the target occupies most of the area of the two-dimensional boundary box, the region with dense pixel distribution is regarded as a target region, and the region with sparse pixel distribution is regarded as a non-target region, so that the distinction between the target region and the non-target region can be realized. The target area can be subsequently surrounded by a more accurate two-dimensional bounding box (i.e., the smallest two-dimensional bounding box surrounding the target), the coordinate position of which is the position of the target. Pixels in the target area can be converted into three-dimensional point cloud data, and the minimum three-dimensional boundary box surrounding the three-dimensional point cloud data is the position of the target in the real world.

Fig. 2 is a schematic diagram of a detection principle of a target detection method according to an embodiment of the present invention. As shown in fig. 2, the target detection method according to the embodiment of the present invention includes: data input, detection process and detection result. The input data is an RGB image acquired by a binocular camera and a depth image of the same size and pixel-aligned with the RGB image. Wherein R in RGB represents Red (Red), G represents Green (Green), and B represents Blue (Blue). The detection process includes CNN-based image detection and depth value-based clustering processes. CNN-based image detection can obtain the category of the target and a two-dimensional boundary box on the RGB image; the clustering process is to group pixels belonging to the same target into the same class in a two-dimensional boundary box through a clustering algorithm so as to divide the region of the boundary box; then, based on the depth value statistics, filtering the background and noise areas in the boundary box and reserving the target area; and finally, converting pixels belonging to the target area in the bounding box into three-dimensional point cloud data, and obtaining a minimum three-dimensional bounding box covering the point cloud data, namely the position of the target in the real world.

Fig. 3 is a schematic flow chart of a target detection method according to an embodiment of the invention. As shown in fig. 3, the target detection method according to the embodiment of the present invention mainly includes the following steps:

Step S301: and acquiring an original image of the periphery of the unmanned vehicle by adopting a binocular camera so as to acquire the original image and a depth image corresponding to pixels of the original image. The binocular stereo camera model calculates depth from two images (color RGB images or gray scales) output from the left and right cameras. The implementation process of obtaining the depth image from the original image may be: firstly, correcting left and right images, wherein the two corrected images are positioned on the same plane and are parallel to each other; performing pixel matching on the two corrected images; and calculating the depth of each pixel according to the matching result, and further obtaining a depth image. In the embodiment, a binocular stereo camera is adopted to directly output RGB images and depth images corresponding to pixels of the RGB images.

Step S302: and performing image detection on the original image by using a deep learning method based on CNN to obtain category information and a bounding box of the target in the original image. Unmanned vehicles, such as unmanned delivery logistics vehicles, have limited computing power of an embedded platform, and when an image detection model is selected, the requirement of detection precision is met, and whether the light weight can reach the deployable efficiency index of an embedded end/a mobile end is considered. Based on the above requirements, the image detection model in the embodiment may employ MobileNets model, fast-R-CNN series, SSD (Single Shot MultiBox Detector, single-shot multi-box detector) series, yolo series, or the like.

The MobileNets model is a set of mobile-end-first computer vision models, and can be used for processing various training tasks, including face analysis, common object detection, photo positioning and fine-grained recognition tasks. The model adopts DEPTHWISE SEPARABLE CONVOLUTIONS (depth separable convolution) to replace the traditional 3D convolution, reduces the redundant expression of convolution kernels, and reduces the calculated amount and parameter efficiency of the model; simultaneously, two super-parameter width factors (Width Multiplier) and resolution factors (Resolution Multiplier) are introduced, so that a smaller and faster model can be changed with minimum change under the condition of not redesigning the model, and the deployment requirement under the condition of extremely required running speed or memory is met. In the embodiment, an SSD-mobilenet-v2 model is adopted for image detection, and Object Detection API (google open source application program interface) is used for realizing quick building and training of the model.

The Faster-R-CNN creatively adopts the convolution network to automatically generate the candidate frames, and shares the convolution network with the target detection network, so that the number of the candidate frames is greatly reduced, and the quality of the candidate frames is also improved. The SSD has the main idea that a series of sparse candidate frames are generated through a heuristic method or a CNN network, and then the candidate frames are classified and regressed, so that the detection speed is high. Yolo is a regression model-based target detection algorithm, and has the characteristics of low accuracy but high detection speed.

Step S303: based on depth information which is the same in size and aligned with pixels of the original image, pixels belonging to the same target in the boundary frame are clustered into one type through a clustering algorithm, so that the targets in the boundary frame are divided into areas. And clustering pixels in the two-dimensional boundary box obtained by image detection, and gathering pixels belonging to the same target into the same class. The specific implementation process for judging whether two pixels belong to the same target is as follows: traversing all pixels in the boundary frame by adopting a breadth-first traversal (Breadth-FIRSTSEARCH, BFS) algorithm, taking the origin of a binocular camera reference system as a coordinate origin, and calculating an included angle between a line segment formed by the two pixels and a line segment formed by one pixel and the coordinate origin by combining the depth values of any two traversed pixels; if the included angle is larger than a preset threshold (thresold), the two pixels are attributed to the same target; otherwise, the two pixels do not belong to the same target.

Fig. 4 is a schematic diagram of determining whether two pixels belong to the same object according to an embodiment of the present invention. As shown in fig. 4, O is the origin of coordinates (i.e., the origin of the binocular camera reference system), A, B is two pixels each corresponding to a determined horizontal azimuth; the calculation formula of the angle beta between OA and OB is as follows:

Wherein d ₁ = |oa| is the depth value of the pixel a; d ₂ = |ob| is the depth value of pixel B; alpha is the difference between the horizontal azimuth angles of pixel a and pixel B, and is a known value.

If β > thresold, then pixel A and pixel B are assigned to the same target;

If β is less than or equal to thresold, then pixel A and pixel B do not belong to the same target.

Fig. 5 (a) is a schematic diagram of an image detection result of performing image detection on an original image based on CNN in the first embodiment of the present invention; fig. 5 (b) is a depth image corresponding to a pixel of an original image one according to the first embodiment of the present invention; fig. 5 (c) is a schematic diagram of the area division result according to the first embodiment of the present invention. As shown in fig. 5 (a), after the CNN-based deep learning method performs image detection on the first original image, the first original image is obtained to include a pedestrian as a target, and is surrounded by a white solid two-dimensional bounding box. As shown in fig. 5 (c), after clustering the objects of the white solid two-dimensional bounding box, the same-color pixel representations may be used to attribute to the same object.

Step S304: and counting the distribution of the depth values of the pixels in the boundary box to obtain the region corresponding to the pixels with the depth values in the set depth interval. The specific implementation process of the statistical depth value distribution is as follows: dividing the depth value into subintervals with the same interval according to the depth value of the pixel in the two-dimensional boundary box; and traversing the depth values of the pixels in the boundary box, and respectively counting the number of the pixels falling in each subinterval to obtain a depth value distribution histogram of all the pixels in the two-dimensional boundary box. After the depth value distribution histogram is obtained, a section with the most dense pixel distribution is obtained through a sliding window method, wherein pixels in the section are distributed on an interested target, and pixels at other positions are pixels on non-target areas such as background and noise. In the embodiment, a window with a fixed size is slid from the leftmost side to the rightmost side of the depth value distribution histogram, and the window is slid by one step each time, and a section corresponding to a position with the largest number of pixels in the sliding process is a depth section with the most dense pixel distribution.

Fig. 6 is a schematic diagram of the principle of distinguishing the target area from the non-target area according to the embodiment of the present invention. As shown in fig. 6, the abscissa represents depth (depth), the ordinate represents the number of pixels (quality), and the region corresponding to the pixels of the subinterval (i.e., the hatched portion) in which the pixels are densely distributed is the target region; the region corresponding to the pixels of the subinterval (i.e., the non-hatched portion) where the pixel distribution is sparse may be a non-target region such as background, noise, or the like. The regions corresponding to the pixels of the shadow portion are extracted and the regions corresponding to the pixels of the non-shadow portion are filtered, so that the target region and the non-target region in the two-dimensional bounding box can be distinguished.

Step S305: pixels within the region are converted into point cloud data, and a minimum three-dimensional bounding box surrounding the point cloud data is determined to determine location and size information of the target based on the minimum three-dimensional bounding box. Pixels in the depth image and the three-dimensional point cloud data are in one-to-one correspondence, and the depth image and the three-dimensional point cloud data can be mutually converted. The pixels of the target area in the two-dimensional boundary box are converted into three-dimensional point cloud data, the three-dimensional point cloud data are surrounded by a minimum three-dimensional boundary box capable of being covered by the three-dimensional point cloud data, and six-dimensional information (three-dimensional coordinates, length, width and height) of the minimum three-dimensional boundary box is information of the position and the size (namely length, width and height) of the target in the three-dimensional space.

Fig. 7 (a) is a schematic diagram of an image detection result of performing image detection on an original image based on CNN in a second embodiment of the present invention; FIG. 7 (b) is a schematic diagram of a result of filtering non-target areas of the original image II according to the second embodiment of the present invention; fig. 7 (c) is a schematic diagram of a target detection result according to a second embodiment of the present invention.

As shown in fig. 7 (a), after the CNN-based deep learning method performs image detection on the second original image, the second original image is obtained to include the targets of the pedestrian 1, the pedestrian 2, the vehicle 1 and the vehicle 2, and is respectively surrounded by a white solid line two-dimensional bounding box.

As shown in fig. 7 (b), after clustering the targets of the white solid two-dimensional bounding box, counting the depth value distribution of pixels in the bounding box; based on the prior knowledge that the target occupies most of the area in the white solid line two-dimensional boundary box, the area with the largest color area is regarded as a target area, and other color areas are regarded as non-target areas such as background, noise and the like; surrounding the target area with a white dashed two-dimensional bounding box results in a more accurate two-dimensional bounding box containing only the target.

As shown in fig. 7 (c), after filtering the background, noise, and the like, the pixels of the object within the white dotted two-dimensional bounding box in fig. 7 (b) are converted into point cloud data, and then the point cloud data is surrounded by a three-dimensional bounding box. The position of the target can be obtained based on the coordinates of the three-dimensional boundary frame, and the size information of the target can be obtained based on the length, width and height of the three-dimensional boundary frame.

According to the target detection method, the boundary frame is obtained by carrying out image detection on the original image, the targets in the boundary frame are subjected to region division in a clustering mode, and the non-target regions in the boundary frame are filtered and removed by combining the counted depth values of the pixels in the boundary frame, so that the target regions are reserved, the precision of the boundary frame covering the targets is improved, and the precision of target detection is further improved; dividing the depth value into subintervals, and counting the number of pixels falling into each subinterval to obtain the depth value distribution of the pixels in the boundary frame, so that the distribution density state of the pixels can be intuitively seen; taking the interval with the most dense pixel distribution as a depth interval, extracting the pixel corresponding area of the depth interval, and accurately obtaining the target area only containing the target in the boundary frame, thereby further improving the target detection precision; the binocular camera is adopted to acquire the original image, so that the effective detection of a remote target can be realized, the detection precision is high, and the calibration difficulty between the camera and the laser radar is reduced; clustering is carried out based on the depth value, so that the clustering efficiency is high; and the pixels of the target area in the two-dimensional bounding box are converted into three-dimensional point cloud data, so that the stability of target detection is improved.

Fig. 8 is a schematic diagram of main modules of an object detection apparatus according to an embodiment of the present invention. As shown in fig. 8, an object detection device 800 according to an embodiment of the present invention mainly includes:

The image detection module 801 is configured to perform image detection on an acquired original image to obtain category information and a bounding box of a target in the original image. Image detection refers to locating objects of interest from an image, accurately judging specific categories of each object, and giving out a boundary box of each object. An image detection algorithm may be used to detect the image of the captured raw image. The image detection algorithm can be divided into two main types of a traditional method and a deep learning method based on CNN, and the robustness of the deep learning method based on CNN under the detection precision and different environments is superior to that of the traditional method. In the embodiment of the invention, the CNN-based deep learning method is selected for image detection, so that the category of the target and the two-dimensional boundary box of the target on the original image can be obtained.

And the region dividing module 802 is configured to cluster pixels belonging to the same target in the bounding box based on depth information corresponding to the original image, so as to divide the region of the target in the bounding box. And clustering pixels in the two-dimensional boundary box obtained by image detection, and judging whether the two pixels belong to the same target or not based on depth values corresponding to a plurality of pixels in the two-dimensional boundary box in the clustering process. Pixels belonging to the same object in the two-dimensional boundary box are gathered into the same class, and pixels with the same color can be used for representing the pixels belonging to the same object, so that the region division of the object in the two-dimensional boundary box is realized.

The statistics determining module 803 is configured to perform statistics on a depth value distribution of pixels in the bounding box, so as to obtain an area corresponding to pixels with depth values in a set depth interval, and determine location information of the target according to the area. In the above module, although pixels belonging to the same target are clustered together, the influence of non-target areas such as background and noise on target positioning cannot be eliminated. Therefore, in the embodiment, on the basis of region division, a statistical method is introduced to determine the depth value distribution of a plurality of pixels in the two-dimensional boundary box, and based on the prior knowledge that the target occupies most of the area of the two-dimensional boundary box, the region with dense pixel distribution is regarded as a target region, and the region with sparse pixel distribution is regarded as a non-target region, so that the distinction between the target region and the non-target region can be realized. The target area can be subsequently surrounded by a more accurate two-dimensional bounding box (i.e., the smallest two-dimensional bounding box surrounding the target), the coordinate position of which is the position of the target. Pixels in the target area can be converted into three-dimensional point cloud data, and the minimum three-dimensional boundary box surrounding the three-dimensional point cloud data is the position of the target in the real world.

In addition, the object detection apparatus 800 according to the embodiment of the present invention may further include: a depth interval determination module and an image acquisition module (not shown in fig. 8). The depth interval determining module is used for determining an interval with the most dense pixel distribution in the depth value distribution, and taking the determined interval as the depth interval. And the image acquisition module is used for acquiring an original image by adopting the binocular camera.

From the above description, it can be seen that, by performing image detection on the original image to obtain a bounding box, and performing region division on the target in the bounding box in a clustering manner, combining the counted depth values of the pixels in the bounding box, filtering to remove the non-target region in the bounding box, and reserving the target region, the precision of the bounding box covering the target is improved, and further the precision of target detection is improved; dividing the depth value into subintervals, and counting the number of pixels falling into each subinterval to obtain the depth value distribution of the pixels in the boundary frame, so that the distribution density state of the pixels can be intuitively seen; taking the interval with the most dense pixel distribution as a depth interval, extracting the pixel corresponding area of the depth interval, and accurately obtaining the target area only containing the target in the boundary frame, thereby further improving the target detection precision; the binocular camera is adopted to acquire the original image, so that the effective detection of a remote target can be realized, the detection precision is high, and the calibration difficulty between the camera and the laser radar is reduced; clustering is carried out based on the depth value, so that the clustering efficiency is high; and the pixels of the target area in the two-dimensional bounding box are converted into three-dimensional point cloud data, so that the stability of target detection is improved.

Fig. 9 illustrates an exemplary system architecture 900 to which the object detection method or object detection apparatus of embodiments of the present invention may be applied.

As shown in fig. 9, system architecture 900 may include terminal devices 901, 902, 903, a network 904, and a server 905. The network 904 is the medium used to provide communications links between the terminal devices 901, 902, 903 and the server 905. The network 904 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user may interact with the server 905 over the network 904 using the terminal devices 901, 902, 903 to receive or send messages, etc. Various communication client applications can be installed on the terminal devices 901, 902, 903.

Terminal devices 901, 902, 903 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 905 may be a server that provides various services, such as a background management server that processes RGB images and depth images provided by an administrator using the terminal apparatuses 901, 902, 903. The background management server can perform image detection on the received RGB image, perform clustering and other processing on the combination of the depth image and the detection result, and feed back the processing result (such as the type, the position, the size and the like of the target) to the terminal equipment.

It should be noted that, the object detection method provided in the embodiment of the present application is generally executed by the server 905, and accordingly, the object detection device is generally disposed in the server 905.

It should be understood that the number of terminal devices, networks and servers in fig. 9 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

According to an embodiment of the invention, the invention further provides an electronic device and a computer readable medium.

The electronic device of the present invention includes: one or more processors; and the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are enabled to realize the target detection method according to the embodiment of the invention.

The computer readable medium of the present invention has stored thereon a computer program which, when executed by a processor, implements a target detection method of an embodiment of the present invention.

Referring now to FIG. 10, there is illustrated a schematic diagram of a computer system 1000 suitable for use in implementing an electronic device of an embodiment of the present invention. The electronic device shown in fig. 10 is merely an example, and should not impose any limitation on the functionality and scope of use of embodiments of the present invention.

As shown in fig. 10, the computer system 1000 includes a Central Processing Unit (CPU) 1001, which can execute various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data required for the operation of the computer system 1000 are also stored. The CPU 1001, ROM 1002, and RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output portion 1007 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), etc., and a speaker, etc.; a storage portion 1008 including a hard disk or the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The drive 1100 is also connected to the I/O interface 1005 as needed. A removable medium 1101 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 1100 as needed, so that a computer program read out therefrom is installed into the storage section 1008 as needed.

In particular, the processes described above in the main step diagrams may be implemented as computer software programs according to the disclosed embodiments of the invention. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the main step diagrams. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1009, and/or installed from the removable medium 1101. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 1001.

The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor includes an image detection module, a region division module, and a statistics determination module. The names of these modules do not limit the module itself in some cases, for example, the image detection module may also be described as a "module for performing image detection on the acquired original image to obtain category information of the object in the original image and a bounding box".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include: image detection is carried out on the collected original image so as to obtain category information of a target in the original image and a boundary box; clustering pixels belonging to the same target in the boundary box based on the depth information corresponding to the original image so as to divide the target in the boundary box into areas; and counting the depth value distribution of pixels in the boundary box to obtain a region corresponding to the pixels with the depth values in the set depth interval, and determining the position information of the target according to the region.

From the above description, it can be seen that, by performing image detection on the original image to obtain a bounding box, and performing region division on the target in the bounding box in a clustering manner, combining the counted depth values of the pixels in the bounding box, filtering to remove the non-target region in the bounding box, and reserving the target region, the precision of the bounding box covering the target is improved, and further the precision of target detection is improved.

The product can execute the method provided by the embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method. Technical details not described in detail in this embodiment may be found in the methods provided in the embodiments of the present invention.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A method of detecting an object, comprising:

Image detection is carried out on the collected original image so as to obtain category information and a boundary box of a target in the original image;

clustering pixels belonging to the same target in the boundary box based on the depth information corresponding to the original image so as to divide the target in the boundary box into areas; the clustering the pixels belonging to the same target in the boundary box comprises the following steps: traversing pixels within the bounding box; calculating an included angle between a line segment formed by two pixels and a line segment formed by one pixel and the coordinate origin by taking the origin of a binocular camera reference system as the coordinate origin and combining the depth values of any two traversed pixels; if the included angle is larger than a preset threshold, the two pixels belong to the same target;

And counting the depth value distribution of pixels in the boundary box to obtain a region corresponding to the pixels with the depth values in the set depth interval, and determining the position information of the target according to the region.

2. The method of claim 1, wherein said counting the depth value distribution of pixels within said bounding box comprises:

Dividing the depth value into subintervals with the same interval according to the depth value of the pixel in the boundary box;

And traversing the depth values of the pixels in the boundary frame, and respectively counting the number of the pixels falling in the subinterval to obtain the depth value distribution of the pixels in the boundary frame.

3. The method according to claim 2, wherein the method further comprises:

And determining a section with the most dense pixel distribution in the depth value distribution, and taking the determined section as the depth section.

4. The method according to claim 1, wherein the method further comprises: the original image was acquired using a binocular camera.

5. The method of claim 1, wherein said determining location information of said object from said region comprises:

converting pixels in the region into point cloud data;

And determining a minimum three-dimensional boundary box surrounding the point cloud data, wherein the three-dimensional coordinates of the minimum three-dimensional boundary box are the position information of the target.

6. An object detection apparatus, comprising:

The image detection module is used for carrying out image detection on the acquired original image so as to obtain category information and a boundary box of a target in the original image;

The region dividing module is used for clustering pixels belonging to the same target in the boundary frame based on the depth information corresponding to the original image so as to divide the region of the target in the boundary frame; the clustering the pixels belonging to the same target in the boundary box comprises the following steps: traversing pixels within the bounding box; calculating an included angle between a line segment formed by two pixels and a line segment formed by one pixel and the coordinate origin by taking the origin of a binocular camera reference system as the coordinate origin and combining the depth values of any two traversed pixels; if the included angle is larger than a preset threshold, the two pixels belong to the same target;

The statistics determining module is used for counting the depth value distribution of the pixels in the boundary box so as to obtain a region corresponding to the pixels with the depth values in the set depth interval, and determining the position information of the target according to the region.

7. The apparatus of claim 6, wherein the statistics determination module is further configured to:

dividing the depth value into subintervals with the same interval according to the depth value of the pixel in the boundary box; and

8. The apparatus of claim 7, wherein the apparatus further comprises: the depth interval determining module is used for determining an interval with the most densely distributed pixels of the depth value distribution, and taking the determined interval as the depth interval.

9. The apparatus of claim 6, wherein the apparatus further comprises: and the image acquisition module is used for acquiring an original image by adopting the binocular camera.

10. The apparatus of claim 6, wherein the statistics determination module is further configured to:

Converting pixels in the region into point cloud data; and

11. An electronic device, comprising:

One or more processors;

Storage means for storing one or more programs,

When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-5.

12. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-5.