WO2020199834A1 - Object detection method and apparatus, and network device and storage medium - Google Patents

Object detection method and apparatus, and network device and storage medium Download PDF

Info

Publication number
WO2020199834A1
WO2020199834A1 PCT/CN2020/077721 CN2020077721W WO2020199834A1 WO 2020199834 A1 WO2020199834 A1 WO 2020199834A1 CN 2020077721 W CN2020077721 W CN 2020077721W WO 2020199834 A1 WO2020199834 A1 WO 2020199834A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
area
network
information
candidate object
Prior art date
Application number
PCT/CN2020/077721
Other languages
French (fr)
Chinese (zh)
Inventor
杨泽同
孙亚楠
贾佳亚
戴宇荣
沈小勇
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2020199834A1 publication Critical patent/WO2020199834A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/06Recognition of objects for industrial automation

Definitions

  • This application relates to the field of artificial intelligence technology, specifically to object detection technology.
  • Object detection refers to determining the location and category of objects in a scene.
  • object detection technology has been widely used in various scenarios, such as autonomous driving and drones.
  • the current object detection scheme generally collects scene images, extracts features from the scene images, and then determines the position and category of the object in the scene image based on the extracted features.
  • the current object detection scheme has problems such as low object detection accuracy, especially in 3D object detection scenes.
  • the embodiments of the present application provide an object detection method, device, network device, and storage medium, which can improve the accuracy of object detection.
  • the embodiment of the present application provides an object detection method, which is executed by a network device, and includes:
  • the candidate object area is optimized to obtain the target object detection area and the positioning information of the target object detection area.
  • an embodiment of the present application also provides an object detection device, including:
  • the detection unit is used to detect the former scenic spot from the point cloud of the scene
  • An area constructing unit configured to construct a candidate object area corresponding to the front scenic spot based on the front scenic spot and a predetermined size, to obtain initial positioning information of the candidate object area
  • a feature extraction unit configured to perform feature extraction on all points in the point cloud based on a point cloud network to obtain a feature set corresponding to the point cloud;
  • a feature construction unit configured to construct the area feature information of the candidate object area based on the feature set
  • the prediction unit is configured to predict the type and location information of the candidate object area based on the area prediction network and the area feature information, and obtain the prediction type and predicted location information of the candidate object area;
  • the optimization unit is used to optimize the candidate object area based on the initial positioning information of the candidate object area, the prediction type of the candidate object area, and the predicted positioning information to obtain the target object detection area and the positioning information of the target object detection area.
  • An embodiment of the present application also provides a network device, including a memory and a processor; the memory stores multiple instructions, and the processor loads the instructions in the memory to execute any of the instructions provided in the embodiments of the present application. Steps in the object detection method.
  • an embodiment of the present application further provides a storage medium that stores a plurality of instructions, and the instructions are suitable for loading by a processor to execute the steps in any object detection method provided in the embodiments of the present application. .
  • embodiments of the present application also provide a computer program product, including instructions, which when run on a computer, cause the computer to execute the steps in any object detection method provided in the embodiments of the present application.
  • the embodiment of the present application can detect the front scenic spot from the point cloud of the scene; construct the candidate object area corresponding to the front scenic spot based on the previous scenic spot and the predetermined size, and determine the initial positioning information of the candidate object area; Perform feature extraction on all points in the point cloud to obtain the feature set corresponding to the point cloud; construct the region feature information of the candidate object region based on the feature set; predict the candidate object based on the region prediction network and the region feature information
  • the type and location information of the area, the predicted type and predicted location information of the candidate object area are obtained; the candidate object area is optimized based on the initial location information of the candidate object area, the predicted type and predicted location information of the candidate object area, and the target object detection is obtained Area and location information of the target object detection area.
  • this solution can use the point cloud data of the scene for object detection, and can also generate candidate object regions for each front scenic spot in the point cloud, and optimize the candidate object regions based on the regional features of the candidate object regions; therefore, it can greatly Improve the accuracy of object detection, especially for 3D object detection, the detection effect is significantly improved.
  • FIG. 1a is a schematic diagram of a scene of an object detection method provided by an embodiment of the present application.
  • Figure 1b is a flowchart of an object detection method provided by an embodiment of the present application.
  • Figure 1c is a schematic structural diagram of a point cloud network provided by an embodiment of the present application.
  • Figure 1d is a schematic diagram of the PointNet++ network structure provided by an embodiment of the present application.
  • Figure 1e is a schematic diagram of an object detection effect in an automatic driving scene provided by an embodiment of the present application.
  • Figure 2a is a schematic diagram of image semantic segmentation provided by an embodiment of the present application.
  • FIG. 2b is a schematic diagram of point cloud segmentation provided by an embodiment of the present application.
  • Figure 2c is a schematic diagram of candidate region generation provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of feature construction of candidate regions provided by an embodiment of the present application.
  • Figure 4a is a schematic structural diagram of a regional prediction network provided by an embodiment of the present application.
  • FIG. 4b is another schematic diagram of the structure of the regional prediction network provided by an embodiment of the present application.
  • FIG. 5a is a schematic diagram of another process of object detection provided by an embodiment of the present application.
  • FIG. 5b is an architecture diagram of object detection provided by an embodiment of the present application.
  • FIG. 5c is a schematic diagram of test experiment results provided by an embodiment of the present application.
  • Figure 6a is a schematic structural diagram of an object detection device provided by an embodiment of the present application.
  • 6b is another schematic diagram of the structure of the object detection device provided by the embodiment of the present application.
  • FIG. 6c is another schematic diagram of the structure of the object detection device provided by the embodiment of the present application.
  • FIG. 6d is another schematic structural diagram of the object detection device provided by the embodiment of the present application.
  • 6e is another schematic diagram of the structure of the object detection device provided by the embodiment of the present application.
  • Fig. 7 is a schematic structural diagram of a network device provided by an embodiment of the present application.
  • the embodiments of the present application provide an object detection method, device, network device, and storage medium.
  • the object detection device may be integrated in a network device, and the network device may be a server or a terminal and other devices; for example, the network device may include equipment such as a vehicle-mounted device, a micro processing box and the like.
  • the so-called object detection refers to determining or recognizing the location and category of objects in a scene, for example, recognizing the category and location of objects in a road scene, such as street lights, vehicles and their locations.
  • an embodiment of the present application provides an object detection system including a network device and a collection device, etc.; the communication connection between the network device and the collection device, for example, through a wired or wireless network connection.
  • the network device and the collection device may be integrated into one device.
  • the collection device can be used to collect point cloud data or image data of the scene.
  • the collection device can upload the collected point cloud data to a network device for processing.
  • the network device can be used for object detection, specifically, it can detect the front scenic spot from the point cloud of the scene; construct the candidate object area corresponding to the former scenic spot based on the previous scenic spot and a predetermined size to obtain the initial positioning information of the candidate object area; based on the point cloud
  • the network performs feature extraction on all points in the point cloud to obtain the feature set corresponding to the point cloud; constructs the regional feature information of the candidate object region based on the feature set; predicts the type and location information of the candidate object region based on the regional prediction network and regional feature information , Obtain the prediction type and predicted positioning information of the candidate object area; optimize the candidate object area based on the initial positioning information of the candidate object area, the prediction type and predicted positioning information of the candidate object area, and obtain the target object detection area and its positioning information.
  • the detected objects can be identified in the scene image according to the location information.
  • the detected objects can be selected in the scene image in the form of a detection frame.
  • the type of the detected object may also be identified in the scene image.
  • the object detection device can be integrated in a network device.
  • the network device can be a server or a terminal; where the terminal can include a mobile phone, a tablet, Notebook computers, and personal computing (PC, Personal Computer), micro-processing terminals and other equipment.
  • PC Personal Computer
  • An object detection method provided by an embodiment of the present application may be executed by a processor of a network device. As shown in FIG. 1b, the specific process of the object detection method may be as follows:
  • the point cloud is a point collection of the surface characteristics of the scene or the target.
  • the points in the point cloud may include the position information of the points, such as three-dimensional coordinates, and may also include color information (RGB) or reflection intensity information (Intensity).
  • RGB color information
  • Intensity reflection intensity information
  • the point cloud can be detected by the principle of laser measurement or photogrammetry, for example, the point cloud of the object can be obtained by scanning with a laser scanner or a photographic scanner.
  • the principle of laser detection point cloud is: when a beam of laser irradiates the surface of an object, the reflected laser will carry information such as position and distance. If the laser beam is scanned according to a certain track, the reflected laser point information will be recorded while scanning. Because the scanning is extremely fine, a large number of laser points can be obtained, so a laser point cloud can be formed. Point cloud formats are *.las; *.pcd; *.txt, etc.
  • the point cloud data of the scene can be collected by the network device itself, or can be collected by other devices, the network device can be obtained from other devices, or searched from a network database, etc.
  • the scene can be multiple, for example, a road scene in automatic driving, an aviation scene in a drone flight, and so on.
  • the front scenic spot is relative to the background point.
  • a scene can be divided into a background and a foreground.
  • the points in the background can be called background points, and the points in the foreground can be called the front spots.
  • the point cloud of the scene can be semantically segmented to identify the front scenic spot in the point cloud of the scene.
  • the point cloud of the scene can be semantically segmented directly to obtain the previous scenic spot in the point cloud.
  • Semantic Segmentation refers to classifying each point in the point cloud of a scene so as to identify points belonging to a certain type.
  • semantic segmentation For example, 2D semantic segmentation or 3D semantic segmentation can be used to perform semantic segmentation on the point cloud.
  • the image of the scene may be segmented semantically to obtain foreground pixels, and then the foreground pixels Map it to the point cloud to get the front spot.
  • the step of "detecting the front scenic spot from the point cloud of the scene" may include:
  • the point corresponding to the foreground pixel in the point cloud of the scene is determined as the front scenic spot.
  • the foreground pixels can be mapped to the point cloud of the scene to obtain the target points in the point cloud corresponding to the foreground pixels. For example, it can be based on the mapping relationship between the pixels in the image and the points in the point cloud (such as The location mapping relationship, etc.) realize the mapping, and determine the target point having the mapping relationship with the foreground pixel as the front scenic spot.
  • the mapping relationship between the pixels in the image and the points in the point cloud such as The location mapping relationship, etc.
  • the points in the point cloud can be projected into the image of the scene.
  • the points in the point cloud can be projected into the image of the scene through the mapping relationship matrix or transformation matrix between the point cloud and the pixels.
  • the segmentation result (such as foreground pixels, background pixels, etc.) corresponding to the point in the image is used as the segmentation result of the point, and based on the segmentation result of the point, it is determined whether the point is a former scenic spot, and each former scenic spot is determined from the point cloud. Specifically, when the segmentation result of a point is a foreground pixel, it is determined that the point is a front scenic spot.
  • the semantic segmentation in the embodiments of the present application can be implemented by a segmentation network based on deep learning.
  • a segmentation network based on DeepLabV3 based on X-ception can be used, and the image of the scene can be performed through the segmentation network. Segmentation to obtain foreground pixels such as foreground pixels of cars, pedestrians, and cyclists in automatic driving. Then, the point in the point cloud is projected into the image of the scene, and then its corresponding segmentation result in the picture is used as the segmentation result of this point, thereby determining the front scenic spot in the point cloud. This method can accurately detect the front spot in the point cloud.
  • the embodiment of the present application may construct the object area corresponding to each front scenic spot based on the previous scenic spot and the predetermined size, and use the object area corresponding to the previous scenic spot as the candidate object area.
  • the candidate object area may be a two-dimensional area, that is, a 2D area, or a three-dimensional area, that is, a 3D area, which may be determined according to actual requirements.
  • the predetermined size can be set according to actual needs, and the predetermined size can include predetermined size parameters, for example, length l*width w in the 2D area, and length l*width w*height h in the 3D area.
  • the previous scenic spot can be the center point, and the candidate object area corresponding to the previous scenic spot can be generated according to a predetermined size.
  • the location information of the candidate object area may include position information, size information, and so on of the candidate object area.
  • the position information of the candidate object area may be represented by the position information of the reference point in the candidate object area, and the reference point may be set according to actual requirements, for example, The center point of the candidate object area is used as the reference point.
  • the position information of the candidate object area may include the 3D coordinates of the center point such as (x, y, z).
  • the size information of the candidate object area may include the size parameter of the area.
  • the size information of the candidate object area may include length l*width w
  • the candidate object area may include length l * width w * height h and so on.
  • the orientation of the object is also important reference information. Therefore, in some embodiments, the positioning information of the candidate object region may also include the orientation of the candidate object region, such as forward, backward, downward, Wait upward, the orientation of the candidate object area can indicate the orientation of the object in the scene. In practical applications, the orientation of the candidate object area can be expressed based on an angle. For example, two orientations can be defined, 0° and 90° respectively.
  • the candidate object area may be identified in the form of a detection frame, for example, 2D detection frame and 3D detection frame identification.
  • a 2D segmentation network can be used to semantically segment the image to obtain the image segmentation result (including foreground pixels, etc.); then, referring to Figure 2b, the image segmentation result is mapped to the point cloud, Obtain the point cloud segmentation result (including the former scenic spot). Then, with each front scenic spot as the center, a candidate object area is generated.
  • the schematic diagram of candidate object region generation is shown in Figure 2c. With each front scenic spot as the center, a 3D detection frame of artificially specified size is generated as a candidate object area.
  • the embodiment of the present application uses two orientations, 0° and 90° respectively.
  • this embodiment of the application can generate a candidate object area, such as a 3D candidate object detection frame, for each front scenic spot.
  • the point cloud network may be a network based on deep learning, for example, it may be a point cloud network such as PointNet and PointNet++.
  • the time sequence between step 103 and step 102 is not limited by the sequence number, and step 102 may be executed before step 103, or step 103 may be executed before step 102, or simultaneously.
  • all points in the point cloud can be input to the point cloud network, and the point cloud network performs feature extraction on the input points to obtain a feature set corresponding to the point cloud.
  • the point cloud network may include a first sampling network and a second sampling network; wherein, the first sampling network is connected to the second sampling network.
  • the first sampling network can be called an encoder
  • the second sampling network can be a decoder.
  • the feature downsampling process is performed on all points in the point cloud through the first sampling network to obtain the initial feature of the point cloud; the initial feature is upsampled through the second sampling network to obtain the feature set of the point cloud.
  • the first sampling network includes a plurality of set abstraction layers (SA) connected in sequence
  • the second sampling network includes a plurality of set abstraction layers (SA) connected in sequence, and is one-to-one with each set abstraction layer (SA) in the first sampling network.
  • the SA in the first sampling network corresponds to the FP in the second sampling network, and the number can be set according to actual needs.
  • the first sampling network and the second sampling network include three layers of SA and FP respectively.
  • the first sampling network can include three downsampling processing (that is, the encoding stage includes three steps of downsampling processing), the number of points are 1024, 256, and 64 respectively;
  • the second sampling network can include three upsampling processing (also That is, the decoding stage includes three steps of up-sampling processing), and the points of the three steps are 256, 1024, and N.
  • the feature extraction process of the point cloud network is as follows:
  • Input all the points of the point cloud to the first sampling network, and then divide the points in the point cloud into local areas through each collective abstraction layer (SA) in the first sampling network, and extract the features of the central point of the local area to obtain the point cloud
  • SA collective abstraction layer
  • the output point cloud feature is 64 ⁇ 1024.
  • pointnet++ uses the idea of layered feature extraction, and calls each time a set abstraction. Divided into three parts: sampling layer, grouping layer, feature extraction layer. First look at the sampling layer. In order to extract some relatively important center points from the dense point cloud, the farthest point sampling (FPS) method is adopted. Of course, random sampling is also possible. Then there is the grouping layer, which searches for the nearest k nearest neighbors to form a patch within a certain range of the center point extracted by the previous layer. The feature extraction layer is to perform convolution and pooling of these k points through a small pointnet network, and the obtained features are used as the features of this central point, and then sent to the next layer to continue. In this way, the center points obtained for each layer are a subset of the center points of the previous layer, and as the number of layers deepens, the number of center points decreases, but each center point contains more and more information.
  • FPS farthest point sampling
  • the first sampling network in the embodiment of the present application is composed of multiple SA layers. At each level, a set of points is processed and abstracted to generate a new set with fewer elements.
  • the collective abstraction layer consists of three key layers: sampling layer, grouping layer, and point cloud network layer (PointNet layer).
  • sampling layer selects a set of points from the input points, which define the centroid of the local area.
  • grouping layer constructs a set of local regions by finding the "adjacent" points around the centroid.
  • the point cloud network layer uses a micro point network to encode the local area set into a feature vector.
  • the embodiment of the present application proposes an improved SA layer.
  • the grouping layer (Grouping layer) in the SA layer can use Multi-scale grouping (MSG, multi-scale grouping).
  • MSG Multi-scale grouping
  • the local features under each radius are extracted during grouping, and then combined together .
  • the idea is to sample multi-scale features and concat (connect) in grouping layer.
  • MSG packets are used in the first and second SA layers.
  • a single-scale grouping may also be used in the SA, for example, a single-scale grouping (SSG) is used in the SA layer as the output.
  • the initial features of the point cloud can be input to the second sampling network, and the initial features are up-sampling processing such as residual up-sampling processing through the second sampling network.
  • the three-layer FP of the second sampling network performs up-sampling processing on 64 ⁇ 1024 features, and then outputs N ⁇ 128 features.
  • the step of "upsampling the initial features through the second sampling network to obtain the feature set of the point cloud” includes:
  • the current input feature is up-sampled through the current feature propagation layer to obtain the feature set of the point cloud.
  • the output feature of the previous layer can include the SA layer or the FP layer of the current FP layer.
  • the first FP layer after inputting 64*1024 point cloud features to the first FP layer, the first FP layer will The 64*1024 point cloud feature and the 256*256 feature input to the third SA layer are determined as the current input feature, and the feature is up-sampled, and the obtained feature is output to the second FP layer.
  • the second FP layer takes the output feature 256*128 feature of the previous FP layer and the 1024*128 feature input to the second SA layer as the input feature of the current layer, and performs up-sampling on the feature to obtain 1024*128 feature Enter the value of the third FP layer.
  • the third FP layer uses the 1024*128 features output by the second FP layer and the N*4 features input to the first SA layer as the input features of the current layer, and performs up-sampling processing to output the final feature of the point cloud.
  • feature extraction can be performed on all points in the point cloud to obtain a feature set of the point cloud, which prevents information loss and improves the accuracy of object detection.
  • the feature of some points can be selected from the feature set as the feature information of the candidate object area to which it belongs;
  • the position information of some points can be selected from the feature set as the feature information of the candidate object region to which they belong.
  • the feature and location information of some points can also be assembled to construct regional feature information.
  • the step of "constructing the region feature information of the candidate object region based on the feature set" may include:
  • the first part of feature information and the second part of feature information are fused to obtain the regional features of the candidate object region.
  • the number of target points and the selection method can be set according to actual needs. For example, a certain number of points can be selected randomly in the candidate object area or according to a certain selection method (such as selection based on the distance from the center point, etc.), such as selection 512 points.
  • the feature of the target point can be extracted from the feature set of the point cloud, and the extracted feature of the target point is used as the first part of the feature information of the candidate object area (which can be represented by F1).
  • the features of these 512 points can be extracted from the feature set (ie feature set) of the point cloud to form the first part of feature information F1.
  • the location information of the target point can be directly used as the second part of the feature information of the candidate object area (which can be represented by F2) .
  • the step of "constructing the second part of the feature information of the candidate object region based on the position information of the target point" may include:
  • the position information of the target point may include the coordinate information of the target point, such as 3D coordinates xyz, and the normalize of the position information can be set according to actual needs.
  • the target point can be determined based on the position information of the center point of the candidate object area.
  • Position information is adjusted. For example, subtract the 3D coordinates of the center of the candidate object from the 3D coordinates of the target point.
  • the first part of feature information and standardized location information are fused to obtain the fused feature information of the target point.
  • the two can be fused using Concat (connection) to obtain the fusion Features (B, N, C+3).
  • the fusion feature can also be spatially transformed.
  • a spatial transformation network may be used for transformation, for example, a supervised spatial transformation network such as T-Net may be used.
  • T-Net a supervised spatial transformation network
  • the merged features (B, N, C+3) can be spatially transformed through T-Net to obtain the transformed coordinates (B, 3).
  • the normalized position value of the target point can be subtracted from the transformed position value to obtain the second partial feature F2 of the candidate object region.
  • the normalized 3D coordinates (B, N, 3) of the target point can be subtracted from the transformed 3D coordinates (B, 3) to obtain the second partial feature F2.
  • the geometric stability or spatial invariance of the position feature can be improved, thereby improving the accuracy of feature extraction.
  • the first part feature information and the second part feature information of each candidate object area can be obtained by the above method, and then the two parts of features are fused to obtain the area feature information of each candidate object area.
  • F1 and F2 can be concatenated (Concat) to obtain the connected features (B, N, C+3) of the candidate object region, and this feature is used as the regional feature of the candidate object region.
  • the regional prediction network can be used to predict the type and location information of the candidate object area. For example, it can classify and locate the candidate object area to obtain the prediction type and predicted location information of the candidate prediction area.
  • the network can be based on deep learning.
  • the region prediction network can be trained from the point cloud or image of the sample object.
  • the predicted positioning information may include predicted position information such as 2D or 3D coordinates, dimensions such as length, width, and height.
  • it may also include predicted orientation information such as 0° or 90°.
  • the regional prediction network may include a feature extraction network, a classification network, and a regression network.
  • the classification network and the regression network are respectively connected to the feature extraction network. as follows:
  • the feature extraction network is used to perform feature extraction on input information, for example, perform feature extraction on the area feature information of the candidate object area to obtain the global feature information of the candidate object area.
  • the classification network is used to classify the area.
  • the candidate object area can be classified based on the global feature information of the candidate object area to obtain the prediction type of the candidate object area.
  • the regression network is used to locate the area, for example, to locate the candidate object area to obtain the predicted location information of the candidate object area. Because the regression network is used to predict the positioning, the output predicted positioning information can also be called regression information, such as predicted regression information.
  • the step of "predicting the type and location information of the candidate object area based on the area prediction network and area feature information to obtain the prediction type and predicted location information of the candidate object area” may include:
  • classify the candidate object area Based on the classification network and global feature information, classify the candidate object area to obtain the prediction type of the candidate object area;
  • the candidate object area is located, and the predicted location information of the candidate object area is obtained.
  • the feature extraction network in the embodiment of the present application may include: a plurality of sequentially connected collective abstraction layers, namely SA layers; the classification network may include a plurality of fully connected layers (fc) connected in sequence, As shown in Figure 4b, multiple fcs for classification are included, such as cls-fc1, cls-fc2, and cls-pred. Among them, the regression network includes multiple fully connected layers connected in sequence, as shown in Figure 4b, including multiple fcs for regression, such as reg-fc1, reg-fc2, and reg-pred. In the embodiment of the present application, the number of SA layers and fc layers can be set according to actual requirements.
  • the process of extracting the global feature information of the region may include: sequentially performing feature extraction on the region feature information through each set abstraction layer in the feature extraction network to obtain the global feature information of the candidate object region.
  • the structure of the collective abstraction layer can refer to the above introduction.
  • the grouping in the SA layer can be grouped in a single scale, that is, SSG grouping is used to improve the accuracy and efficiency of global feature extraction.
  • the regional prediction network can perform feature extraction on regional feature information in turn through three SA layers. For example, when the input feature input is M ⁇ 131 features, after three SA layer feature extraction, 128 ⁇ 128 and 32 ⁇ 256 and other features. After the SA layer feature extraction, the global feature information is obtained. At this time, the global feature information can be input to the classification network and the regression network respectively.
  • the classification network uses the first two cls-fc1 and cls-fc2 to perform dimensionality reduction processing on the global feature information, and performs classification prediction through the last cls-pred layer, and outputs the prediction type of the candidate object region.
  • the regression network uses the first two reg-fc1 and reg-fc2 to perform dimensionality reduction processing on the global feature information, and performs regression prediction through the last reg-pred layer to obtain the predicted location information of the candidate object region.
  • the type of the candidate object area can be set according to actual needs, for example, according to whether there are objects in the area, it can be divided into objects with or without objects; or according to quality, it can also be divided into high, medium, and low quality.
  • the type and positioning information of each candidate object area can be predicted.
  • the positioning information of the candidate object area may be adjusted based on the predicted positioning information first, and then the candidate object area may be filtered based on the prediction type.
  • the candidate object regions may be screened based on the prediction type first, and then the positioning information may be adjusted.
  • the step of "optimizing the candidate object area based on the initial positioning information, prediction type, and predicted positioning information to obtain the target object detection area and the positioning information of the target object detection area" may include:
  • the initial location information of the filtered object area is optimized and adjusted to obtain the target object detection area and the location information of the target object detection area.
  • the candidate object regions whose prediction type is empty regions can be filtered out, and then based on the predicted positioning information of the remaining candidate object regions after the filtering process, their initial positioning Information is optimized and adjusted.
  • the positioning information optimization adjustment method can be adjusted based on the difference information between the predicted positioning information and the initial positioning information, for example, the difference in the 3D coordinates of the area, the difference in size, and so on.
  • an optimal positioning information based on the predicted positioning information and the initial positioning information, and then adjust the positioning information of the candidate object area to the optimal positioning information. For example, determine the 3d coordinates and length, width, and height of an optimal area.
  • the object detection area can also be identified in the scene image based on the location information of the target object detection area.
  • the object detection method provided by the embodiment of the application can accurately detect in the automatic driving scene.
  • the position, size, and direction of objects on the current road are conducive to decision-making and judgment of autonomous driving.
  • the object detection provided by the embodiments of the present application may be applicable to various scenarios, such as scenarios such as autonomous driving, drones, and security monitoring.
  • the embodiment of the present application can detect the former scenic spot from the point cloud of the scene; construct the object area corresponding to the former scenic spot based on the former scenic spot and the predetermined size to obtain the initial positioning information of the candidate object area; compare the point cloud based on the point cloud network Perform feature extraction on all points of, to obtain the feature set corresponding to the point cloud; construct the regional feature information of the candidate object region based on the feature set; predict the type and location information of the candidate object region based on the region prediction network and regional feature information to obtain the candidate object region
  • the prediction type and prediction positioning information of the candidate object area; based on the initial positioning information of the candidate object area, the prediction type and prediction positioning information of the candidate object area, the candidate object area is optimized to obtain the target object detection area and the positioning information of the target object detection area.
  • Using the point cloud data of the scene for object detection can improve the accuracy of object detection.
  • this solution can also generate candidate object regions for each front scenic spot in the point cloud, which can avoid information loss.
  • candidate object regions are generated for each front scenic spot, that is, for any object, its corresponding candidate region will be generated. Therefore, it will not be affected by object scale changes and severe occlusion, which improves the effectiveness and success rate of object detection.
  • this solution can also optimize the candidate object region based on the region characteristics of the candidate object region; therefore, the accuracy and quality of object detection can be further improved.
  • the object detection device is specifically integrated in a network device as an example for description.
  • the network device can obtain a training set of the semantic segmentation network, which includes sample images labeled with pixel types (such as foreground pixels, background pixels, etc.).
  • pixel types such as foreground pixels, background pixels, etc.
  • the network device can train semantic segmentation based on the training set and loss function.
  • the sample image can be semantically segmented through the semantic segmentation network to obtain foreground pixels of the sample image, and then the segmented pixel type and the labeled pixel type are converged based on the loss function to obtain the trained semantic segmentation network.
  • the network device obtains a training set of the point cloud network, and the training set includes sample point clouds of sample objects or scenes.
  • the network device can train the point cloud network based on the sample point cloud training set.
  • the network device obtains the training set of the area prediction network, the training set may include the sample point cloud labeled with the object area type and positioning information; the area prediction network is trained through the training set, specifically, the object area type of the sample point cloud is predicted And the positioning information, the prediction type is converged with the real type, and the predicted positioning information is converged with the real positioning information to obtain the trained regional prediction network.
  • the foregoing network training may be performed by the network device itself, or it may be obtained by the network device after the training of other devices is completed. It should be understood that the network applied in the embodiment of the present application is not limited to training in the foregoing manner, and may also be trained in other manners.
  • an object detection method the specific process can be as follows:
  • the network device acquires an image and a point cloud of the scene.
  • network equipment can obtain scene images and point clouds from image acquisition equipment and point cloud acquisition equipment respectively
  • the network device uses a semantic segmentation network to perform semantic segmentation on the image of the scene to obtain foreground pixels.
  • a road scene image can be collected first, and a 2D semantic segmentation network can be used to segment the scene image to obtain a segmentation result, including foreground pixels, background pixels, and so on.
  • the network device maps the foreground pixels to the point cloud of the scene to obtain the front scenic spot in the point cloud.
  • X-ception-based DeepLabV3 can be used as a segmentation network, and the image of the scene can be segmented through the segmentation network to obtain foreground pixels such as foreground pixels of cars, pedestrians, and cyclists in autonomous driving. Then, the point in the point cloud is projected into the image of the scene, and then the segmentation result in the corresponding picture is used as the segmentation result of this point, and then the front scenic spot in the point cloud is generated. This method can accurately detect the front spot in the point cloud.
  • the network device constructs a three-dimensional candidate object area corresponding to each front scenic spot based on each front scenic spot and a predetermined size, and obtains initial positioning information of the candidate object area.
  • the previous scenic spot is the center point and the three-dimensional candidate object area corresponding to the previous scenic spot is generated according to a predetermined size.
  • the location information of the candidate object area may include position information, size information, and so on of the candidate object area.
  • the candidate object area corresponding to the previous scenic spot can be generated according to a predetermined size by using the previous scenic spot as the center point, that is, a Piont-Based Proposal Generation (Piont-Based Proposal Generation) can be generated.
  • the network device performs feature extraction on all points in the point cloud through the point cloud network to obtain a feature set corresponding to the point cloud.
  • all points in the point cloud (B, N, 4) can be input to PointNet++, and the feature of the point cloud can be extracted through PointNet++ to obtain (B, N, C).
  • the network device constructs regional feature information of the candidate object region based on the feature set.
  • the network device can generate the area feature information of the candidate object area based on the feature set of the point cloud (ie, Proposal Feature Generation).
  • the network device selects multiple target points in the candidate object area; extracts the characteristics of the target point from the feature set to obtain the first part of the feature information of the candidate object area; standardizes the position information of the target point to obtain the standardized position of the target point Information; the first part of the feature information and standardized location information are fused to obtain the fusion feature information of the target point; the fusion feature information of the target is spatially transformed to obtain the transformed location information of the target point; based on the transformed location information, The standardized position information of the target point is adjusted to obtain the second part of the feature information of the candidate object area; the first part of the feature information and the second part of the feature information are fused to obtain the regional feature of the candidate area.
  • the region feature generation can refer to the above-mentioned embodiment and the description of FIG. 3.
  • the network device predicts the type and location information of the candidate object area based on the area prediction network and the area feature information, and obtains the prediction type and predicted location information of the candidate object area.
  • the candidate region can be classified (cls) and regression (reg) through the Box Prediction Net, so as to predict the type and regression parameters of the candidate object region.
  • the regression parameters are predicted positioning information. Including three-dimensional coordinates, length, width and height, orientation and other parameters such as (x, y, z, l, h, w, angle).
  • the network device optimizes the candidate object area based on the initial positioning information of the candidate object area, the prediction type of the candidate object area, and the predicted positioning information to obtain the target object detection area and the positioning information of the target object detection area.
  • the network device can filter the candidate object regions based on the prediction type of the candidate object regions to obtain the filtered object regions; according to the predicted positioning information of the filtered object regions, the initial positioning information of the filtered object regions can be optimized and adjusted to obtain optimization Back object detection area and its positioning information.
  • the object detection area can also be identified in the scene image based on the location information of the target object detection area.
  • the object detection method provided by the embodiment of the application can accurately detect in the automatic driving scene.
  • the position, size, and direction of objects on the current road are conducive to decision-making and judgment of autonomous driving.
  • the embodiment of the present application may use all the point clouds as input, and then use a PointNet++ structure to generate features for each point in the point cloud. Then use each point in the point cloud as an anchor point to generate a candidate area. After that, the feature of each point is used as input to optimize the candidate area to generate the final detection result.
  • the algorithm capabilities provided by the embodiments of this application have been tested on some data sets.
  • the capabilities of the algorithms provided by the embodiments of this application have been tested on an open source autonomous driving data set such as the KITTI data set.
  • the KITTI data set is an automatic The driving data set, with objects of various sizes and distances at the same time, is very challenging.
  • the algorithm of the embodiment of this application surpasses all existing 3D object detection algorithms on KITTI, reaching a brand-new state-of-the-art, and at the same time, it is far superior to the previous best in the difficulty set. algorithm.
  • the point cloud of 7481 training images and the point cloud of 7518 test images of three categories (cars, pedestrians and cycling) are tested.
  • the average accuracy (AP) of the most extensive experiment is compared with other methods.
  • Other methods include MV3D (Multi-View 3D object detection, multi-modal 3D object detection), AVOD (Aggregate View Object Detection, multi-view object detection) , VoxelNet (3D pixel network), F-PointNet (Frustum-PointNet, cone point cloud network), AVOD-FPN (multi-view object detection-cone point cloud network).
  • Figure 5c shows the test results.
  • the accuracy of the object detection method (Ours in FIG. 5c) provided by the embodiment of the present application is significantly higher than other methods.
  • an embodiment of the present application also provides an object detection device.
  • the object detection device can be integrated in a network device.
  • the network device can be a server, a terminal, a vehicle-mounted device, Equipment such as drones can also be miniature processing boxes.
  • the object detection device may include a detection unit 601, a region construction unit 602, a feature extraction unit 603, a feature construction unit 604, a prediction unit 605, and an optimization unit 606, as follows:
  • the detection unit 601 is configured to detect the front scenic spot from the point cloud of the scene
  • the area constructing unit 602 is configured to construct a candidate object area corresponding to the front scenic spot based on the previous scenic spot and a predetermined size, and determine initial positioning information of the candidate object area;
  • the feature extraction unit 603 is configured to perform feature extraction on all points in the point cloud based on the point cloud network to obtain a feature set corresponding to the point cloud;
  • the feature construction unit 604 is configured to construct the area feature information of the candidate object area based on the feature set;
  • the prediction unit 605 is configured to predict the type and location information of the candidate object area based on the area prediction network and the area feature information, and obtain the prediction type and predicted location information of the candidate object area;
  • the optimization unit 606 is configured to perform optimization processing on the candidate object area based on the initial positioning information of the candidate object area, the prediction type of the candidate object area, and the predicted positioning information to obtain the target object detection area and the positioning information of the target object detection area.
  • the detection unit 601 is specifically configured to:
  • the point corresponding to the foreground pixel in the point cloud of the scene is determined as the front scenic spot.
  • the area construction unit 602 is specifically configured to:
  • the previous scenic spot is the center point, and the candidate object area corresponding to the previous scenic spot is generated according to a predetermined size.
  • the feature construction unit 604 specifically includes:
  • the selection subunit 6041 is configured to select multiple target points in the candidate object area
  • An extraction subunit 6042 configured to extract the feature of the target point from the feature set to obtain the first part of feature information of the candidate object region;
  • the constructing subunit 6043 is configured to construct the second part of the feature information of the candidate object region based on the position information of the target point;
  • the fusion subunit 6045 is configured to fuse the first partial feature information and the second partial feature information to obtain the region feature information of the candidate object region.
  • the subunit 6043 is constructed, specifically for:
  • the standardized position information of the target point is adjusted to obtain the second partial feature information of the candidate object region.
  • the point cloud network includes: a first sampling network, and a second sampling network connected to the first sampling network; the feature extraction unit 603 specifically includes:
  • a down-sampling subunit 6031 configured to perform feature down-sampling processing on all points in the point cloud through the first sampling network to obtain initial features of the point cloud;
  • the up-sampling subunit 6032 is configured to perform up-sampling processing on the initial features through the second sampling network to obtain a feature set of the point cloud.
  • the first sampling network includes a plurality of aggregate abstraction layers connected in sequence
  • the second sampling network includes a plurality of aggregate abstract layers connected in sequence and corresponding to each aggregate abstraction layer in the first sampling network.
  • the downsampling subunit 6031 is specifically used for:
  • the points in the point cloud are sequentially divided into local areas through the set abstraction layer, and the characteristics of the central points of the local areas are extracted to obtain the initial characteristics of the point cloud;
  • the up-sampling subunit 6032 is specifically used for:
  • the current input feature is up-sampled through the current feature propagation layer to obtain the feature set of the point cloud.
  • the regional prediction network includes a feature extraction network, a classification network connected to the feature extraction network, and a regression network connected to the feature extraction network; referring to FIG. 6d, the prediction unit 605 specifically includes:
  • the global feature extraction subunit 6051 is configured to perform feature extraction on the regional feature information through the feature extraction network to obtain global feature information of the candidate object region;
  • the classification subunit 6052 is configured to classify the candidate object region based on the classification network and the global feature information to obtain the prediction type of the candidate region;
  • the regression sub-unit 6053 is configured to locate the candidate object area based on the regression network and the global feature information to obtain predicted positioning information of the candidate object area.
  • the feature extraction network includes a plurality of sequentially connected collective abstraction layers
  • the classification network includes a plurality of sequentially connected fully connected layers
  • the regression network includes a plurality of sequentially connected fully connected layers
  • the global feature extraction subunit 6051 is specifically configured to perform feature extraction on the regional feature information in turn through the collective abstraction layer in the feature extraction network to obtain the global feature information of the candidate object region.
  • the optimization unit 606 specifically includes:
  • the screening subunit 6061 is used to screen candidate object regions based on the prediction type of the candidate object regions to obtain the filtered object regions;
  • the optimization subunit 6062 is configured to optimize and adjust the initial positioning information of the filtered object area according to the predicted positioning information of the filtered object area to obtain the target object detection area and the location information of the target object detection area.
  • each of the above units can be implemented as an independent entity, or can be combined arbitrarily, and implemented as the same or several entities.
  • each of the above units please refer to the previous method embodiments, which will not be repeated here.
  • the object detection device of this embodiment can detect the front scenic spot from the point cloud of the scene through the detection unit 601; then the region construction unit 602 constructs the candidate object area corresponding to the previous scenic spot based on the previous scenic spot and the predetermined size, and obtains The initial positioning information of the candidate object region; the feature extraction unit 603 performs feature extraction on all points in the point cloud based on the point cloud network to obtain the feature set corresponding to the point cloud; the feature construction unit 604 is based on the feature set Construct the region feature information of the candidate object region; the prediction unit 605 predicts the type and location information of the candidate object region based on the region prediction network and the region feature information, and obtains the prediction type and predicted location information of the candidate object region; The unit 606 optimizes the candidate object area based on the initial positioning information of the candidate object area, the prediction type of the candidate object area, and the predicted positioning information to obtain the target object detection area and its positioning information.
  • this solution can use the point cloud data of the scene for object detection, and can also generate candidate object regions for each front scenic spot, and optimize the candidate object regions based on the regional characteristics of the candidate object regions; therefore, it can greatly improve the object detection Accuracy, especially suitable for 3D object detection.
  • FIG. 7 shows a schematic structural diagram of the network device involved in the embodiment of the present application, specifically:
  • the network device may include one or more processing core processors 701, one or more computer-readable storage medium memory 702, power supply 703, input unit 704 and other components.
  • processing core processors 701 one or more computer-readable storage medium memory 702, power supply 703, input unit 704 and other components.
  • FIG. 7 does not constitute a limitation on the network device, and may include more or less components than shown in the figure, or combine some components, or arrange different components. among them:
  • the processor 701 is the control center of the network device. It uses various interfaces and lines to connect the various parts of the entire network device, runs or executes the software programs and/or modules stored in the memory 702, and calls the data stored in the memory 702. Data, perform various functions of network equipment and process data, so as to monitor the network equipment as a whole.
  • the processor 701 may include one or more processing cores; preferably, the processor 701 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface, application programs, etc. , The modem processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 701.
  • the memory 702 may be used to store software programs and modules.
  • the processor 701 executes various functional applications and data processing by running the software programs and modules stored in the memory 702.
  • the memory 702 may mainly include a storage program area and a storage data area.
  • the storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data created by the use of network equipment, etc.
  • the memory 702 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the memory 702 may further include a memory controller to provide the processor 701 with access to the memory 702.
  • the network device also includes a power supply 703 for supplying power to various components.
  • the power supply 703 may be logically connected to the processor 701 through a power management system, so that functions such as charging, discharging, and power consumption management can be managed through the power management system.
  • the power supply 703 may also include any components such as one or more DC or AC power supplies, a recharging system, a power failure detection circuit, a power converter or inverter, and a power status indicator.
  • the network device may further include an input unit 704, which can be used to receive input digital or character information and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control.
  • an input unit 704 which can be used to receive input digital or character information and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control.
  • the network device may also include a display unit, etc., which will not be repeated here.
  • the processor 701 in the network device loads the executable file corresponding to the process of one or more applications into the memory 702 according to the following instructions, and the processor 701 runs the executable file stored in The application programs in the memory 702 thus realize various functions, as follows:
  • Detect the previous scenic spot from the point cloud of the scene construct the candidate object area corresponding to the former scenic spot based on the previous scenic spot and the predetermined size to obtain the initial positioning information of the candidate object area; perform all the points in the point cloud based on the point cloud network Feature extraction to obtain the feature set corresponding to the point cloud; construct the region feature information of the candidate object region based on the feature set; predict the type and location information of the candidate object region based on the region prediction network and the region feature information, Obtain the prediction type and predicted positioning information of the candidate object area; optimize the candidate object area based on the initial positioning information of the candidate area, the prediction type and predicted positioning information of the candidate object area, and obtain the location of the target object detection area and the target object detection area information.
  • the network device of this embodiment detects the former scenic spot from the point cloud of the scene; constructs the candidate object area corresponding to the former scenic spot based on the former scenic spot and a predetermined size to obtain the initial positioning information of the candidate object area; based on the point cloud network Perform feature extraction on all points in the point cloud to obtain the feature set corresponding to the point cloud; construct the region feature information of the candidate object region based on the feature set; based on the region prediction network and the region feature information, Predict the type and positioning information of the candidate object area to obtain the prediction type and predicted positioning information of the candidate object area; optimize the candidate object area based on the initial positioning information of the candidate object area, the prediction type of the candidate object area and the predicted positioning information, and obtain Target object detection area and its location information.
  • this solution can use the point cloud data of the scene for object detection, and can also generate candidate object regions for each front scenic spot, and optimize the candidate object regions based on the regional characteristics of the candidate object regions; therefore, it can greatly improve the object detection Accuracy, especially suitable for 3D object detection.
  • an embodiment of the present application further provides a storage medium in which multiple instructions are stored, and the instructions can be loaded by a processor to execute the steps in any object detection method provided in the embodiments of the present application.
  • the instruction can perform the following steps:
  • Detect the previous scenic spot from the point cloud of the scene construct the candidate object area corresponding to the former scenic spot based on the previous scenic spot and the predetermined size to obtain the initial positioning information of the candidate object area; perform all the points in the point cloud based on the point cloud network Feature extraction to obtain the feature set corresponding to the point cloud; construct the region feature information of the candidate object region based on the feature set; predict the type and location information of the candidate object region based on the region prediction network and the region feature information, Obtain the prediction type and predicted positioning information of the candidate object area; optimize the candidate object area based on the initial positioning information of the candidate object area, the prediction type and predicted positioning information of the candidate object area, and obtain the target object detection area and target object detection area positioning information.
  • the storage medium may include: read only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed are an object detection method and apparatus, and a network device and a storage medium. In the embodiments of the present application, a foreground point can be detected from a point cloud of a scene; a candidate object area corresponding to the foreground point is constructed on the basis of the foreground point and a predetermined size to obtain initial positioning information of the candidate object area; feature extraction is carried out on all points in the point cloud on the basis of a point cloud network to obtain a feature set corresponding to the point cloud; area feature information of the candidate object area is constructed on the basis of the feature set; the type and positioning information of the candidate object area are predicted on the basis of an area prediction network and the area feature information to obtain the predicted type and the predicted positioning information of the candidate object area; and the candidate object area is optimized on the basis of the initial positioning information of the candidate object area and the predicted type and the predicted positioning information of the candidate object area to obtain a target object detection area and positioning information of the target object detection area. The solution can improve the accuracy of object detection.

Description

一种物体检测方法、装置、网络设备和存储介质Object detection method, device, network equipment and storage medium
本申请要求于2019年04月03日提交中国专利局、申请号为201910267019.5、申请名称为“一种物体检测方法、装置、网络设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on April 3, 2019, the application number is 201910267019.5, and the application name is "an object detection method, device, network equipment, and storage medium". The reference is incorporated in this application.
技术领域Technical field
本申请涉及人工智能技术领域,具体涉及物体检测技术。This application relates to the field of artificial intelligence technology, specifically to object detection technology.
背景技术Background technique
物体检测是指在某个场景中确定物体的位置、类别等。目前物体检测技术已经广泛应用到各种场景中,比如,自动驾驶、无人机等场景。Object detection refers to determining the location and category of objects in a scene. At present, object detection technology has been widely used in various scenarios, such as autonomous driving and drones.
目前的物体检测方案普遍是采集场景图像,从场景图像中提取特征,然后,基于提取的特征确定出物体在该场景图像中的位置和类别。然而,经过实践发现,目前的物体检测方案存在物体检测精确度较低等问题,尤其在3D物体检测场景。The current object detection scheme generally collects scene images, extracts features from the scene images, and then determines the position and category of the object in the scene image based on the extracted features. However, through practice, it has been found that the current object detection scheme has problems such as low object detection accuracy, especially in 3D object detection scenes.
发明内容Summary of the invention
本申请实施例提供一种物体检测方法、装置、网络设备和存储介质,可以提升物体检测的精确性。The embodiments of the present application provide an object detection method, device, network device, and storage medium, which can improve the accuracy of object detection.
本申请实施例提供一种物体检测方法,由网络设备执行,包括:The embodiment of the present application provides an object detection method, which is executed by a network device, and includes:
从场景的点云中检测出前景点;Detect the former scenic spot from the point cloud of the scene;
基于前景点和预定尺寸构建所述前景点对应的候选物体区域,确定候选物体区域的初始定位信息;Constructing a candidate object area corresponding to the front scenic spot based on the front scenic spot and a predetermined size, and determining initial positioning information of the candidate object area;
基于点云网络对所述点云中的所有点进行特征提取,得到所述点云对应的特征集;Performing feature extraction on all points in the point cloud based on the point cloud network to obtain a feature set corresponding to the point cloud;
基于所述特征集构建所述候选物体区域的区域特征信息;Constructing the region feature information of the candidate object region based on the feature set;
基于区域预测网络和所述区域特征信息,预测候选物体区域的类型和定位信息,得到候选物体区域的预测类型和预测定位信息;Predicting the type and positioning information of the candidate object area based on the area prediction network and the area feature information, and obtaining the prediction type and predicted positioning information of the candidate object area;
基于候选物体区域的初始定位信息、候选物体区域的预测类型和预测定位信息对候选物体区域进行优化处理,得到目标物体检测区域以及目标物体检测区域的定位信息。Based on the initial positioning information of the candidate object area, the prediction type of the candidate object area and the predicted positioning information, the candidate object area is optimized to obtain the target object detection area and the positioning information of the target object detection area.
相应的,本申请实施例还提供一种物体检测装置,包括:Correspondingly, an embodiment of the present application also provides an object detection device, including:
检测单元,用于从场景的点云中检测出前景点;The detection unit is used to detect the former scenic spot from the point cloud of the scene;
区域构建单元,用于基于前景点和预定尺寸构建所述前景点对应的候选物体区域,得到候选物体区域的初始定位信息;An area constructing unit, configured to construct a candidate object area corresponding to the front scenic spot based on the front scenic spot and a predetermined size, to obtain initial positioning information of the candidate object area;
特征提取单元,用于基于点云网络对所述点云中的所有点进行特征提取,得到所述点云对应的特征集;A feature extraction unit, configured to perform feature extraction on all points in the point cloud based on a point cloud network to obtain a feature set corresponding to the point cloud;
特征构建单元,用于基于所述特征集构建所述候选物体区域的区域特征 信息;A feature construction unit, configured to construct the area feature information of the candidate object area based on the feature set;
预测单元,用于基于区域预测网络和所述区域特征信息,预测候选物体区域的类型和定位信息,得到候选物体区域的预测类型和预测定位信息;The prediction unit is configured to predict the type and location information of the candidate object area based on the area prediction network and the area feature information, and obtain the prediction type and predicted location information of the candidate object area;
优化单元,用于基于候选物体区域的初始定位信息、候选物体区域的预测类型和预测定位信息对候选物体区域进行优化处理,得到目标物体检测区域以及目标物体检测区域的定位信息。The optimization unit is used to optimize the candidate object area based on the initial positioning information of the candidate object area, the prediction type of the candidate object area, and the predicted positioning information to obtain the target object detection area and the positioning information of the target object detection area.
本申请实施例还提供了一种网络设备,包括存储器和处理器;所述存储器存储有多条指令,所述处理器加载所述存储器内的指令,以执行本申请实施例提供的任一种物体检测方法中的步骤。An embodiment of the present application also provides a network device, including a memory and a processor; the memory stores multiple instructions, and the processor loads the instructions in the memory to execute any of the instructions provided in the embodiments of the present application. Steps in the object detection method.
此外,本申请实施例还提供一种存储介质,所述存储介质存储有多条指令,所述指令适于处理器进行加载,以执行本申请实施例提供的任一种物体检测方法中的步骤。In addition, an embodiment of the present application further provides a storage medium that stores a plurality of instructions, and the instructions are suitable for loading by a processor to execute the steps in any object detection method provided in the embodiments of the present application. .
此外,本申请实施例还提供了一种计算机程序产品,包括指令,当其在计算机上运行时,使得计算机执行本申请实施例提供的任一种物体检测方法中的步骤。In addition, the embodiments of the present application also provide a computer program product, including instructions, which when run on a computer, cause the computer to execute the steps in any object detection method provided in the embodiments of the present application.
本申请实施例可以从场景的点云中检测出前景点;基于前景点和预定尺寸构建所述前景点对应的候选物体区域,并确定该候选物体区域的初始定位信息;基于点云网络对所述点云中的所有点进行特征提取,得到所述点云对应的特征集;基于所述特征集构建所述候选物体区域的区域特征信息;基于区域预测网络和所述区域特征信息,预测候选物体区域的类型和定位信息,得到候选物体区域的预测类型和预测定位信息;基于候选物体区域的初始定位信息、候选物体区域的预测类型和预测定位信息对候选物体区域进行优化处理,得到目标物体检测区域以及目标物体检测区域的定位信息。由于该方案可以采用场景的点云数据进行物体检测,并且还可以针对点云中的每个前景点生成候选物体区域,基于候选物体区域的区域特征对候选物体区域进行优化处理;因此,可以大大提升物体检测的精确性,尤其对于3D物体检测来说检测效果提升得格外明显。The embodiment of the present application can detect the front scenic spot from the point cloud of the scene; construct the candidate object area corresponding to the front scenic spot based on the previous scenic spot and the predetermined size, and determine the initial positioning information of the candidate object area; Perform feature extraction on all points in the point cloud to obtain the feature set corresponding to the point cloud; construct the region feature information of the candidate object region based on the feature set; predict the candidate object based on the region prediction network and the region feature information The type and location information of the area, the predicted type and predicted location information of the candidate object area are obtained; the candidate object area is optimized based on the initial location information of the candidate object area, the predicted type and predicted location information of the candidate object area, and the target object detection is obtained Area and location information of the target object detection area. Because this solution can use the point cloud data of the scene for object detection, and can also generate candidate object regions for each front scenic spot in the point cloud, and optimize the candidate object regions based on the regional features of the candidate object regions; therefore, it can greatly Improve the accuracy of object detection, especially for 3D object detection, the detection effect is significantly improved.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can be obtained based on these drawings without creative work.
图1a是本申请实施例提供的物体检测方法的场景示意图;FIG. 1a is a schematic diagram of a scene of an object detection method provided by an embodiment of the present application;
图1b是本申请实施例提供的物体检测方法的流程图;Figure 1b is a flowchart of an object detection method provided by an embodiment of the present application;
图1c是本申请实施例提供的点云网络的结构示意图;Figure 1c is a schematic structural diagram of a point cloud network provided by an embodiment of the present application;
图1d是本申请实施例提供的PointNet++网络结构示意图;Figure 1d is a schematic diagram of the PointNet++ network structure provided by an embodiment of the present application;
图1e是本申请实施例提供的自动驾驶场景中物体检测效果示意图;Figure 1e is a schematic diagram of an object detection effect in an automatic driving scene provided by an embodiment of the present application;
图2a是本申请实施例提供的图像语义分割示意图;Figure 2a is a schematic diagram of image semantic segmentation provided by an embodiment of the present application;
图2b是本申请实施例提供的点云分割示意图;2b is a schematic diagram of point cloud segmentation provided by an embodiment of the present application;
图2c是本申请实施例提供的候选区域生成示意图;Figure 2c is a schematic diagram of candidate region generation provided by an embodiment of the present application;
图3是本申请实施例提供的候选区域特征构建示意图;3 is a schematic diagram of feature construction of candidate regions provided by an embodiment of the present application;
图4a是本申请实施例提供的区域预测网络的结构示意图Figure 4a is a schematic structural diagram of a regional prediction network provided by an embodiment of the present application
图4b是本申请实施例提供的区域预测网络的另一结构示意图;FIG. 4b is another schematic diagram of the structure of the regional prediction network provided by an embodiment of the present application;
图5a是本申请实施例提供的物体检测的另一流程示意图;FIG. 5a is a schematic diagram of another process of object detection provided by an embodiment of the present application;
图5b是本申请实施例提供的物体检测的架构图;FIG. 5b is an architecture diagram of object detection provided by an embodiment of the present application;
图5c是本申请实施例提供的测试实验结果示意图;FIG. 5c is a schematic diagram of test experiment results provided by an embodiment of the present application;
图6a是本申请实施例提供的物体检测装置的结构示意图;Figure 6a is a schematic structural diagram of an object detection device provided by an embodiment of the present application;
图6b是本申请实施例提供的物体检测装置的另一结构示意图;6b is another schematic diagram of the structure of the object detection device provided by the embodiment of the present application;
图6c是本申请实施例提供的物体检测装置的另一结构示意图;FIG. 6c is another schematic diagram of the structure of the object detection device provided by the embodiment of the present application;
图6d是本申请实施例提供的物体检测装置的另一结构示意图;FIG. 6d is another schematic structural diagram of the object detection device provided by the embodiment of the present application;
图6e是本申请实施例提供的物体检测装置的另一结构示意图;6e is another schematic diagram of the structure of the object detection device provided by the embodiment of the present application;
图7是本申请实施例提供的网络设备的结构示意图。Fig. 7 is a schematic structural diagram of a network device provided by an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative work are within the protection scope of this application.
本申请实施例提供一种物体检测方法、装置、网络设备和存储介质。其中,该物体检测装置可以集成在网络设备中,该网络设备可以是服务器,也可以是终端等设备;比如,网络设备可以包括、车载设备、微型处理盒子等设备。The embodiments of the present application provide an object detection method, device, network device, and storage medium. Wherein, the object detection device may be integrated in a network device, and the network device may be a server or a terminal and other devices; for example, the network device may include equipment such as a vehicle-mounted device, a micro processing box and the like.
所谓物体检测,是指确定或识别某个场景中物体的位置、类别等,比如,识别某个道路场景中物体的类别和位置,如路灯、车辆及其位置等。The so-called object detection refers to determining or recognizing the location and category of objects in a scene, for example, recognizing the category and location of objects in a road scene, such as street lights, vehicles and their locations.
参考图1a,本申请实施例提供了物体检测系统包括网络设备和采集设备等;网络设备与采集设备之间通讯连接,比如,通过有线或无线网络连接等。在一实施例中,网络设备与采集设备可以集成在一台设备。Referring to FIG. 1a, an embodiment of the present application provides an object detection system including a network device and a collection device, etc.; the communication connection between the network device and the collection device, for example, through a wired or wireless network connection. In an embodiment, the network device and the collection device may be integrated into one device.
其中,采集设备,可以用于采集场景的点云数据或者图像数据等,在一实施例中采集设备可以将采集到的点云数据上传给网络设备进行处理。Among them, the collection device can be used to collect point cloud data or image data of the scene. In one embodiment, the collection device can upload the collected point cloud data to a network device for processing.
网络设备,可以用于物体检测,具体地,可以从场景的点云中检测出前景点;基于前景点和预定尺寸构建前景点对应的候选物体区域,得到候选物体区域的初始定位信息;基于点云网络对点云中的所有点进行特征提取,得到点云对应的特征集;基于特征集构建候选物体区域的区域特征信息;基于区域预测网络和区域特征信息,预测候选物体区域的类型和定位信息,得到候选物体区域的预测类型和预测定位信息;基于候选物体区域的初始定位信 息、候选物体区域的预测类型和预测定位信息对候选物体区域进行优化处理,得到目标物体检测区域及其定位信息。实际应用中,在得到目标物体检测区域的定位信息之后,可以根据定位信息在场景图像中标识检测到的物体,比如,以检测框的方式在场景图像中框选出检测到的物体,在一实施例中,还可以在场景图像中标识检测到的物体的类型。The network device can be used for object detection, specifically, it can detect the front scenic spot from the point cloud of the scene; construct the candidate object area corresponding to the former scenic spot based on the previous scenic spot and a predetermined size to obtain the initial positioning information of the candidate object area; based on the point cloud The network performs feature extraction on all points in the point cloud to obtain the feature set corresponding to the point cloud; constructs the regional feature information of the candidate object region based on the feature set; predicts the type and location information of the candidate object region based on the regional prediction network and regional feature information , Obtain the prediction type and predicted positioning information of the candidate object area; optimize the candidate object area based on the initial positioning information of the candidate object area, the prediction type and predicted positioning information of the candidate object area, and obtain the target object detection area and its positioning information. In practical applications, after obtaining the location information of the target object detection area, the detected objects can be identified in the scene image according to the location information. For example, the detected objects can be selected in the scene image in the form of a detection frame. In the embodiment, the type of the detected object may also be identified in the scene image.
以下分别进行详细说明。需说明的是,以下实施例的描述顺序不作为对实施例优选顺序的限定。Detailed descriptions are given below. It should be noted that the order of description in the following embodiments is not meant to limit the preferred order of the embodiments.
本实施例将从物体检测装置的角度进行描述,该物体检测装置具体可以集成在网络设备中,该网络设备可以是服务器,也可以是终端等设备;其中,该终端可以包括手机、平板电脑、笔记本电脑、以及个人计算(PC,Personal Computer)、微型处理终端等设备。This embodiment will be described from the perspective of an object detection device. The object detection device can be integrated in a network device. The network device can be a server or a terminal; where the terminal can include a mobile phone, a tablet, Notebook computers, and personal computing (PC, Personal Computer), micro-processing terminals and other equipment.
本申请实施例提供的一种物体检测方法,该方法可以由网络设备的处理器执行,如图1b所示,该物体检测方法的具体流程可以如下:An object detection method provided by an embodiment of the present application may be executed by a processor of a network device. As shown in FIG. 1b, the specific process of the object detection method may be as follows:
101、从场景的点云中检测出前景点。101. Detect the former scenic spot from the point cloud of the scene.
其中,点云为场景或目标表面特性的点集合,点云中的点可以包含点的位置信息如三维坐标,此外,还可以包括颜色信息(RGB)或反射强度信息(Intensity)。Wherein, the point cloud is a point collection of the surface characteristics of the scene or the target. The points in the point cloud may include the position information of the points, such as three-dimensional coordinates, and may also include color information (RGB) or reflection intensity information (Intensity).
点云可以通过激光测量原理或者摄影测量原理检测得到,比如,可以通过激光扫描仪、或者照相式扫描仪扫描得到物体的点云。激光检测点云的原理为:当一束激光照射到物体表面时,所反射的激光会携带方位、距离等信息。若将激光束按照某种轨迹进行扫描,便会边扫描边记录到反射的激光点信息,由于扫描极为精细,则能够得到大量的激光点,因而就可形成激光点云。点云格式有*.las;*.pcd;*.txt等。The point cloud can be detected by the principle of laser measurement or photogrammetry, for example, the point cloud of the object can be obtained by scanning with a laser scanner or a photographic scanner. The principle of laser detection point cloud is: when a beam of laser irradiates the surface of an object, the reflected laser will carry information such as position and distance. If the laser beam is scanned according to a certain track, the reflected laser point information will be recorded while scanning. Because the scanning is extremely fine, a large number of laser points can be obtained, so a laser point cloud can be formed. Point cloud formats are *.las; *.pcd; *.txt, etc.
本申请实施例中,场景的点云数据可以由网络设备自己采集,也可以由其他设备采集,网络设备从其他设备获取,或者,从网络数据库中搜索等等。In the embodiment of the present application, the point cloud data of the scene can be collected by the network device itself, or can be collected by other devices, the network device can be obtained from other devices, or searched from a network database, etc.
其中,场景可以为多种,比如,可以自动驾驶中的道路场景、无人机飞行中的航空场景等等。Among them, the scene can be multiple, for example, a road scene in automatic driving, an aviation scene in a drone flight, and so on.
其中,前景点是相对于背景点而言的,一个场景可以划分为背景和前景,背景中的点可以称为背景点、前景中的点可以称为前景点。本申请实施例可以通过对场景的点云进行语义分割,识别场景点云中的前景点。Among them, the front scenic spot is relative to the background point. A scene can be divided into a background and a foreground. The points in the background can be called background points, and the points in the foreground can be called the front spots. In the embodiment of the present application, the point cloud of the scene can be semantically segmented to identify the front scenic spot in the point cloud of the scene.
本申请实施例中,从点云中检测出前景点的方式有多种,比如,可以直接对场景的点云进行语义分割,得到点云中的前景点。语义分割(Semantic Segmentation)是指对一个场景的点云中的每个点进行分类,从而识别出属于某个类型的点。语义分割的方式可以有多种,比如,可以采用2D语义分割或者3D语义分割对点云进行语义分割。In the embodiment of the present application, there are many ways to detect the previous scenic spot from the point cloud. For example, the point cloud of the scene can be semantically segmented directly to obtain the previous scenic spot in the point cloud. Semantic Segmentation refers to classifying each point in the point cloud of a scene so as to identify points belonging to a certain type. There are many ways of semantic segmentation. For example, 2D semantic segmentation or 3D semantic segmentation can be used to perform semantic segmentation on the point cloud.
又比如,为了能够检测到更多的前景点、提升前景点的检测可信度和准确性,在一实施例中,可以先对场景的图像进行语义分割,得到前景像素, 然后,将前景像素映射到点云中,得到前景点。具体地,步骤“从场景的点云中检测出前景点”,可以包括:For another example, in order to be able to detect more previous scenic spots and improve the credibility and accuracy of the detection of the previous scenic spots, in one embodiment, the image of the scene may be segmented semantically to obtain foreground pixels, and then the foreground pixels Map it to the point cloud to get the front spot. Specifically, the step of "detecting the front scenic spot from the point cloud of the scene" may include:
对场景的图像进行语义分割,得到前景像素;Perform semantic segmentation on the image of the scene to obtain foreground pixels;
将场景的点云中与前景像素对应的点确定为前景点。The point corresponding to the foreground pixel in the point cloud of the scene is determined as the front scenic spot.
在一实施例中,可以将前景像素映射到场景的点云中,得到点云中与前景像素对应的目标点,譬如,可以基于图像中像素与点云中的点之间的映射关系(如位置映射关系等)实现映射,将与前景像素具有映射关系的目标点确定为前景点。In one embodiment, the foreground pixels can be mapped to the point cloud of the scene to obtain the target points in the point cloud corresponding to the foreground pixels. For example, it can be based on the mapping relationship between the pixels in the image and the points in the point cloud (such as The location mapping relationship, etc.) realize the mapping, and determine the target point having the mapping relationship with the foreground pixel as the front scenic spot.
在另一实施例中,可以将点云中的点投影到场景的图像中,如通过点云与像素之间的映射关系矩阵或变换矩阵,将点云中的点投影到场景的图像中,然后,将点在图像中对应的分割结果(如前景像素、背景像素等)作为点的分割结果,基于点的分割结果确定该点是否为前景点,由此从点云中确定各个前景点,具体地,当点的分割结果为前景像素时,确定该点为前景点。In another embodiment, the points in the point cloud can be projected into the image of the scene. For example, the points in the point cloud can be projected into the image of the scene through the mapping relationship matrix or transformation matrix between the point cloud and the pixels. Then, the segmentation result (such as foreground pixels, background pixels, etc.) corresponding to the point in the image is used as the segmentation result of the point, and based on the segmentation result of the point, it is determined whether the point is a former scenic spot, and each former scenic spot is determined from the point cloud. Specifically, when the segmentation result of a point is a foreground pixel, it is determined that the point is a front scenic spot.
为了提升语义分割的精确性,本申请实施例的语义分割可以通过基于深度学习的分割网络来实现,比如,可以了基于X-ception的DeepLabV3作为的分割网络,通过该分割网络对场景的图像进行分割,得到前景像素如自动驾驶中的车、行人、骑行的人的前景像素点。然后,将点云中的点投影到场景的图像中,然后将其在图片中对应的分割结果,作为这个点的分割结果,由此确定点云中的前景点。该方式可以精确地检测出点云中的前景点。In order to improve the accuracy of semantic segmentation, the semantic segmentation in the embodiments of the present application can be implemented by a segmentation network based on deep learning. For example, a segmentation network based on DeepLabV3 based on X-ception can be used, and the image of the scene can be performed through the segmentation network. Segmentation to obtain foreground pixels such as foreground pixels of cars, pedestrians, and cyclists in automatic driving. Then, the point in the point cloud is projected into the image of the scene, and then its corresponding segmentation result in the picture is used as the segmentation result of this point, thereby determining the front scenic spot in the point cloud. This method can accurately detect the front spot in the point cloud.
102、基于前景点和预定尺寸构建前景点对应的候选物体区域,确定候选物体区域的初始定位信息。102. Construct a candidate object area corresponding to the previous scenic spot based on the previous scenic spot and a predetermined size, and determine initial positioning information of the candidate object area.
在得到前景点之后,本申请实施例可以基于前景点和预定尺寸构建每个前景点对应的物体区域,将前景点对应的物体区域作为候选物体区域。After obtaining the front scenic spot, the embodiment of the present application may construct the object area corresponding to each front scenic spot based on the previous scenic spot and the predetermined size, and use the object area corresponding to the previous scenic spot as the candidate object area.
其中,候选物体区域可以为二维区域即2D区域,也可以为三维区域即3D区域,具体可以根据实际需求来定。其中,预定尺寸可以根据实际需求设定,预定尺寸可以包括预定的尺寸参数,比如,在2D区域中包括长l*宽w,在3D区域中包括长l*宽w*高h。Among them, the candidate object area may be a two-dimensional area, that is, a 2D area, or a three-dimensional area, that is, a 3D area, which may be determined according to actual requirements. The predetermined size can be set according to actual needs, and the predetermined size can include predetermined size parameters, for example, length l*width w in the 2D area, and length l*width w*height h in the 3D area.
比如,为了提升物体检测的准确性,可以以前景点为中心点,按照预定尺寸生成前景点对应的候选物体区域。For example, in order to improve the accuracy of object detection, the previous scenic spot can be the center point, and the candidate object area corresponding to the previous scenic spot can be generated according to a predetermined size.
其中,候选物体区域的定位信息可以包括候选物体区域的位置信息、尺寸信息等等。Wherein, the location information of the candidate object area may include position information, size information, and so on of the candidate object area.
比如,在一实施例中,为了便于物体检测过程中的后续计算,候选物体区域的位置信息可以由候选物体区域中参考点的位置信息表示,该参考点可以根据实际需求设定,比如,可以将候选物体区域的中心点作为参考点。例如,以三维区域为例,候选物体区域的位置信息可以包括中心点的3D坐标如(x、y、z)。For example, in one embodiment, in order to facilitate subsequent calculations in the object detection process, the position information of the candidate object area may be represented by the position information of the reference point in the candidate object area, and the reference point may be set according to actual requirements, for example, The center point of the candidate object area is used as the reference point. For example, taking a three-dimensional area as an example, the position information of the candidate object area may include the 3D coordinates of the center point such as (x, y, z).
其中,候选物体区域的尺寸信息可以包括区域的尺寸参数,比如,候选 物体区域为2D区域时,候选物体区域的尺寸信息可以包括长l*宽w,候选物体区域为3D区域时,候选物体区域的尺寸信息可以包括长l*宽w*高h等。Wherein, the size information of the candidate object area may include the size parameter of the area. For example, when the candidate object area is a 2D area, the size information of the candidate object area may include length l*width w, and when the candidate object area is a 3D area, the candidate object area The size information can include length l * width w * height h and so on.
此外,在一些场景中,物体的朝向也是比较重要的参考信息,因此,在一些实施例中,候选物体区域的定位信息还可以包括候选物体区域的朝向,如向前、向后、向下、向上等,该候选物体区域的朝向能够表明场景中的物体的朝向。实际应用中,候选物体区域的朝向可以基于角度来表示,比如,可以定义两个朝向,分别为0°和90°。In addition, in some scenes, the orientation of the object is also important reference information. Therefore, in some embodiments, the positioning information of the candidate object region may also include the orientation of the candidate object region, such as forward, backward, downward, Wait upward, the orientation of the candidate object area can indicate the orientation of the object in the scene. In practical applications, the orientation of the candidate object area can be expressed based on an angle. For example, two orientations can be defined, 0° and 90° respectively.
在实际应用中,为了便于物体检测和用户观察,候选物体区域可以以检测框的形式标识,比如,2D检测框、3D检测框标识。In practical applications, in order to facilitate object detection and user observation, the candidate object area may be identified in the form of a detection frame, for example, 2D detection frame and 3D detection frame identification.
譬如,以行驶道路场景为例,参考图2a可以采用2D分割网络对图像进行语义分割,得到图像分割结果(包括前景像素等);然后,参考图2b,将图像分割结果映射到点云中,得到点云分割结果(包含前景点)。接着,以每个前景点为中心,产生候选物体区域。候选物体区域生成示意图如图2c。以每个前景点为中心,生成一个人为规定大小的3D检测框,作为候选物体区域。候选物体区域以(x,y,z,l,h,w,angle)作为表示,其中x,y,z表示中心点的3D坐标,而l,h,w为我们设定的候选区域的长高宽。在实际实验中l=3.8,h=1.6,w=1.5。angle表示3D候选区域的朝向,当生成候选物体区域的时候,本申请实施例采用了两个朝向,分别是0°和90°。For example, taking a driving road scene as an example, referring to Figure 2a, a 2D segmentation network can be used to semantically segment the image to obtain the image segmentation result (including foreground pixels, etc.); then, referring to Figure 2b, the image segmentation result is mapped to the point cloud, Obtain the point cloud segmentation result (including the former scenic spot). Then, with each front scenic spot as the center, a candidate object area is generated. The schematic diagram of candidate object region generation is shown in Figure 2c. With each front scenic spot as the center, a 3D detection frame of artificially specified size is generated as a candidate object area. The candidate object area is represented by (x,y,z,l,h,w,angle), where x,y,z represent the 3D coordinates of the center point, and l,h,w are the length of the candidate area we set height width. In the actual experiment, l=3.8, h=1.6, w=1.5. Angle represents the orientation of the 3D candidate area. When generating the candidate object area, the embodiment of the present application uses two orientations, 0° and 90° respectively.
通过上述步骤本申请实施例可以针对每个前景点生成一个候选物体区域,如3D候选物体检测框。Through the above steps, this embodiment of the application can generate a candidate object area, such as a 3D candidate object detection frame, for each front scenic spot.
103、基于点云网络对点云中的所有点进行特征提取,得到点云对应的特征集。103. Perform feature extraction on all points in the point cloud based on the point cloud network to obtain a feature set corresponding to the point cloud.
其中,点云网络可以为基于深度学习的网络,比如,可以为PointNet、PointNet++等点云网络。本申请实施例中步骤103与步骤102之间的时序不受序号限制,可以是步骤102执行在步骤103之前,也可以是步骤103执行在步骤102之前,也可以同时执行。Among them, the point cloud network may be a network based on deep learning, for example, it may be a point cloud network such as PointNet and PointNet++. In the embodiment of the present application, the time sequence between step 103 and step 102 is not limited by the sequence number, and step 102 may be executed before step 103, or step 103 may be executed before step 102, or simultaneously.
具体地,可以将点云中所有的点输入至点云网络,点云网络对输入的点进行特征提取,以得到点云对应的特征集。Specifically, all points in the point cloud can be input to the point cloud network, and the point cloud network performs feature extraction on the input points to obtain a feature set corresponding to the point cloud.
下面以PointNet++为例来介绍点云网络,如图1c所示,点云网络可以包括第一采样网络和第二采样网络;其中,第一采样网络与第二采样网络连接。在实际应用中,第一采样网络可以称为编码器,第二采样网络可以成为解码器。具体地,通过第一采样网络对点云中的所有点进行特征降采样处理,得到点云的初始特征;通过第二采样网络对初始特征进行上采样处理,得到点云的特征集。The following uses PointNet++ as an example to introduce the point cloud network. As shown in FIG. 1c, the point cloud network may include a first sampling network and a second sampling network; wherein, the first sampling network is connected to the second sampling network. In practical applications, the first sampling network can be called an encoder, and the second sampling network can be a decoder. Specifically, the feature downsampling process is performed on all points in the point cloud through the first sampling network to obtain the initial feature of the point cloud; the initial feature is upsampled through the second sampling network to obtain the feature set of the point cloud.
参考图1d,第一采样网络包括多个依次连接的集合抽象层(SA,set abstraction),第二采样网络包括多个依次连接、且与第一采样网络中各集合抽象层(SA)一一对应的特征传播层(FP,feature propagation)。第一采 样网络中的SA和第二采样网络中的FP相对应,数量可以根据实际需求设定,比如,第一采样网络和第二采样网络分别包括三层SA、FP。Referring to Figure 1d, the first sampling network includes a plurality of set abstraction layers (SA) connected in sequence, and the second sampling network includes a plurality of set abstraction layers (SA) connected in sequence, and is one-to-one with each set abstraction layer (SA) in the first sampling network. Corresponding feature propagation layer (FP, feature propagation). The SA in the first sampling network corresponds to the FP in the second sampling network, and the number can be set according to actual needs. For example, the first sampling network and the second sampling network include three layers of SA and FP respectively.
参考图1d,第一采样网络可以包括三次降采样处理(也即编码阶段包括三步降采样处理),点的数量分别为1024,256,64;第二采样网络可以包括三次上采样处理(也即解码阶段包括三步上采样处理),三步的点数为256,1024,N。点云网络提取特征过程如下:1d, the first sampling network can include three downsampling processing (that is, the encoding stage includes three steps of downsampling processing), the number of points are 1024, 256, and 64 respectively; the second sampling network can include three upsampling processing (also That is, the decoding stage includes three steps of up-sampling processing), and the points of the three steps are 256, 1024, and N. The feature extraction process of the point cloud network is as follows:
将点云的所有点输入至第一采样网络,通过第一采样网络中各集合抽象层(SA)依次对点云中的点进行局部区域划分,并提取局部区域中心点的特征,得到点云的初始特征;比如,参考图1d,通过输入为点云N×4经过三层SA降采样处理后,输出点云的特征为64×1024特征。Input all the points of the point cloud to the first sampling network, and then divide the points in the point cloud into local areas through each collective abstraction layer (SA) in the first sampling network, and extract the features of the central point of the local area to obtain the point cloud For example, referring to Figure 1d, after the input is a point cloud N×4 and three-layer SA down-sampling processing, the output point cloud feature is 64×1024.
本申请实施例中,pointnet++使用了分层抽取特征的思想,把每一次叫做set abstraction。分为三部分:采样层、分组层、特征提取层。首先来看采样层,为了从稠密的点云中抽取出一些相对较为重要的中心点,采用最远点采样法(farthest point sampling,FPS),当然也可以随机采样。然后是分组层,在上一层提取出的中心点的某个范围内寻找最近的k个近邻点组成patch。特征提取层是将这k个点通过小型pointnet网络进行卷积和pooling处理,得到的特征作为此中心点的特征,再送入下一个分层继续。这样每一层得到的中心点都是上一层中心点的子集,并且随着层数加深,中心点的个数越来越少,但是每一个中心点包含的信息越来越多。In the embodiment of this application, pointnet++ uses the idea of layered feature extraction, and calls each time a set abstraction. Divided into three parts: sampling layer, grouping layer, feature extraction layer. First look at the sampling layer. In order to extract some relatively important center points from the dense point cloud, the farthest point sampling (FPS) method is adopted. Of course, random sampling is also possible. Then there is the grouping layer, which searches for the nearest k nearest neighbors to form a patch within a certain range of the center point extracted by the previous layer. The feature extraction layer is to perform convolution and pooling of these k points through a small pointnet network, and the obtained features are used as the features of this central point, and then sent to the next layer to continue. In this way, the center points obtained for each layer are a subset of the center points of the previous layer, and as the number of layers deepens, the number of center points decreases, but each center point contains more and more information.
根据上述描述,本申请实施例中第一采样网络由多个SA层组成,在每个层次上,处理和抽象一组点以产生具有较少元素的新集合。集合抽象层由三个关键层组成:采样层(Sampling layer)、分组层(Grouping layer)、点云网络层(PointNet layer)。采样层从输入点选择一组点,这些点定义局部区域的质心。分组层通过找到质心周围的“相邻”点来构造局部区域集合。点云网络层使用一个微型点网将局部区域集合编码成特征向量。According to the above description, the first sampling network in the embodiment of the present application is composed of multiple SA layers. At each level, a set of points is processed and abstracted to generate a new set with fewer elements. The collective abstraction layer consists of three key layers: sampling layer, grouping layer, and point cloud network layer (PointNet layer). The sampling layer selects a set of points from the input points, which define the centroid of the local area. The grouping layer constructs a set of local regions by finding the "adjacent" points around the centroid. The point cloud network layer uses a micro point network to encode the local area set into a feature vector.
在一实施例中,考虑到实际点云很少是均匀分布的,在采样的时候,对于密集的区域,应该使用小尺度采样,以得到深入细致的特征(finest details),但在稀疏区域,应该使用大尺度采样,因为过小的尺度会导致稀疏处的采样不足。因此,本申请实施例提出了改良的SA层。具体地,在SA层中的分组层(Grouping layer)可以使用Multi-scale grouping(MSG,多尺度分组),具体地,在分组时把每种半径下的局部特征都提取出来,然后组合到一起。其思想是在grouping layer中,采样多尺度的特征,concat(连接)起来。比如,参考图1d,在第一、二层SA层中使用MSG分组。In one embodiment, considering that the actual point cloud is rarely evenly distributed, when sampling, for dense areas, small-scale sampling should be used to obtain finest details, but in sparse areas, Large-scale sampling should be used, because too small a scale will result in insufficient sampling at sparse areas. Therefore, the embodiment of the present application proposes an improved SA layer. Specifically, the grouping layer (Grouping layer) in the SA layer can use Multi-scale grouping (MSG, multi-scale grouping). Specifically, the local features under each radius are extracted during grouping, and then combined together . The idea is to sample multi-scale features and concat (connect) in grouping layer. For example, referring to Figure 1d, MSG packets are used in the first and second SA layers.
此外,在一实施例中,为了提升采样密度变化的稳健性,在SA中还可以采用单一尺度分组(SSG),比如,在作为输出的SA层使用单一尺度分组(SSG)。In addition, in one embodiment, in order to improve the robustness of the sampling density change, a single-scale grouping (SSG) may also be used in the SA, for example, a single-scale grouping (SSG) is used in the SA layer as the output.
在第一采样网络输出点云的初始特征之后,可以将点云的初始特征输入 至第二采样网络,通过第二采样网络对初始特征进行上采样处理如残差上采样处理。比如,参考图1d,经过第二采样网络的三层FP对64×1024特征进行上采样处理后,输出N×128的特征。After the first sampling network outputs the initial features of the point cloud, the initial features of the point cloud can be input to the second sampling network, and the initial features are up-sampling processing such as residual up-sampling processing through the second sampling network. For example, referring to Fig. 1d, the three-layer FP of the second sampling network performs up-sampling processing on 64×1024 features, and then outputs N×128 features.
在一实施中,为了提升防止特征梯度变化、或者丢失,在第二采样网络进行上采样处理时还需要考虑到第一采样网络中各SA层输出的特征。具体地,步骤“通过第二采样网络对初始特征进行上采样处理,得到点云的特征集”,包括:In an implementation, in order to improve the prevention of feature gradient changes or loss, when performing up-sampling processing in the second sampling network, it is also necessary to consider the output characteristics of each SA layer in the first sampling network. Specifically, the step of "upsampling the initial features through the second sampling network to obtain the feature set of the point cloud" includes:
将上一层的输出特征、以及当前特征传播层对应的集合抽象层的输入特征,确定为当前特征传播层的当前输入特征;Determine the output feature of the previous layer and the input feature of the collective abstraction layer corresponding to the current feature propagation layer as the current input feature of the current feature propagation layer;
通过当前特征传播层对当前输入特征进行上采样处理,得到点云的特征集。The current input feature is up-sampled through the current feature propagation layer to obtain the feature set of the point cloud.
其中,上一层的输出特征可以包括当前FP层上一层的SA层或FP层,比如,参考图1d,在输入64*1024点云特征至第一个FP层,第一个FP层将64*1024点云特征、以及输入第三个SA层的256*256特征确定为当前输入特征,对该特征进行上采样处理,将得到的特征输出至第二个FP层。第二个FP层将上一FP层的输出特征256*128特征、与输入第二个SA层的1024*128特征作为当前层输入特征,并对该特征进行上采样处理,得到1024*128特征输入值第三个FP层。第三个FP层将第二个FP层输出的1024*128特征、与输入第一个SA层的N*4特征作为当前层输入特征,并进行上采样处理,输出点云的最终特征。Among them, the output feature of the previous layer can include the SA layer or the FP layer of the current FP layer. For example, referring to Figure 1d, after inputting 64*1024 point cloud features to the first FP layer, the first FP layer will The 64*1024 point cloud feature and the 256*256 feature input to the third SA layer are determined as the current input feature, and the feature is up-sampled, and the obtained feature is output to the second FP layer. The second FP layer takes the output feature 256*128 feature of the previous FP layer and the 1024*128 feature input to the second SA layer as the input feature of the current layer, and performs up-sampling on the feature to obtain 1024*128 feature Enter the value of the third FP layer. The third FP layer uses the 1024*128 features output by the second FP layer and the N*4 features input to the first SA layer as the input features of the current layer, and performs up-sampling processing to output the final feature of the point cloud.
通过上述步骤可以对点云中所有点进行特征提取,得到点云的特征集,防止信息丢失,提升了物体检测的准确性。Through the above steps, feature extraction can be performed on all points in the point cloud to obtain a feature set of the point cloud, which prevents information loss and improves the accuracy of object detection.
104、基于特征集构建候选物体区域的区域特征信息。104. Construct regional feature information of the candidate object region based on the feature set.
本申请实施例基于点云的特征集构建候选物体区域的特征信息的方式可以有多种,比如,可以从特征集中选择一些点的特征作为其所属的候选物体区域的特征信息;又比如,还可以从特征集中选择一些点的位置信息作为其所属的候选物体区域的特征信息。In the embodiment of the application, there may be many ways to construct the feature information of the candidate object area based on the feature set of the point cloud. For example, the feature of some points can be selected from the feature set as the feature information of the candidate object area to which it belongs; The position information of some points can be selected from the feature set as the feature information of the candidate object region to which they belong.
又比如,为提升区域特征的提取精确性,还可以集合一些点的特征和位置信息来构建区域特征信息。具体地,步骤“基于特征集构建候选物体区域的区域特征信息”,可以包括:For another example, in order to improve the accuracy of regional feature extraction, the feature and location information of some points can also be assembled to construct regional feature information. Specifically, the step of "constructing the region feature information of the candidate object region based on the feature set" may include:
在候选物体区域中选择多个目标点;Select multiple target points in the candidate object area;
从特征集中提取目标点的特征,得到候选物体区域的第一部分特征信息;Extract the features of the target point from the feature set to obtain the first part of the feature information of the candidate object region;
基于目标点的位置信息,构建候选物体区域的第二部分特征;Based on the location information of the target point, construct the second part of the feature of the candidate object area;
对第一部分特征信息与第二部分特征信息进行融合,得到候选物体区域的区域特征。The first part of feature information and the second part of feature information are fused to obtain the regional features of the candidate object region.
其中,目标点的数量和选择方式可以根据实际需求设定,比如,可以在候选物体区域中随机或者按照一定选择方式(如基于离中心点的距离来选择等)选择一定数量的点,如选择512个点。Among them, the number of target points and the selection method can be set according to actual needs. For example, a certain number of points can be selected randomly in the candidate object area or according to a certain selection method (such as selection based on the distance from the center point, etc.), such as selection 512 points.
在从候选物体区域中选择目标点之后,可以从点云的特征集中提取目标点的特征,提取的目标点的特征作为候选物体区域的第一部分特征信息(可以用F1表示)。比如,在随机选择512个点后,可以从点云的特征集(即特征集)中提取这512个点的特征组成第一部分特征信息F1。After selecting the target point from the candidate object area, the feature of the target point can be extracted from the feature set of the point cloud, and the extracted feature of the target point is used as the first part of the feature information of the candidate object area (which can be represented by F1). For example, after randomly selecting 512 points, the features of these 512 points can be extracted from the feature set (ie feature set) of the point cloud to form the first part of feature information F1.
譬如,参考图3,可以从点云的特征集(B、N、C)中crop(裁剪)候选物体区域内512个目标点的特征组成F1(B、M、C),M为目标点数量,如M=512,其中,N为点云中点的数量。For example, referring to Figure 3, from the feature set (B, N, C) of the point cloud, the feature of 512 target points in the candidate object area can be cropped to form F1 (B, M, C), where M is the number of target points , Such as M=512, where N is the number of points in the point cloud.
其中,基于目标点的位置信息构建候选物体区域的第二部分特征的方式可以有多种,比如,可以将目标点的位置信息直接作为候选物体区域的第二部分特征信息(可以用F2表示)。Among them, there are many ways to construct the second part of the feature of the candidate object area based on the location information of the target point. For example, the location information of the target point can be directly used as the second part of the feature information of the candidate object area (which can be represented by F2) .
又比如,为了提升位置特征的提取精确性,还可以在对位置信息做一些变换后构建候选物体区域的第二部分特征。比如,步骤“基于目标点的位置信息构建候选物体区域的第二部分特征信息”,可以包括:For another example, in order to improve the accuracy of location feature extraction, it is also possible to construct the second part of the feature of the candidate object region after some transformation of the location information. For example, the step of "constructing the second part of the feature information of the candidate object region based on the position information of the target point" may include:
(1)、对目标点的位置信息进行标准化处理,得到目标点的标准化位置信息。(1) Standardize the position information of the target point to obtain the standardized position information of the target point.
其中,目标点的位置信息可以包括目标点的坐标信息如3D坐标xyz,位置信息的标准化处理(Normalize)可以根据实际需求设定,比如,可以基于候选物体区域的中心点位置信息对目标点的位置信息进行调整。譬如,将目标点的3D坐标减去候选物体区域中心的3D坐标等。Among them, the position information of the target point may include the coordinate information of the target point, such as 3D coordinates xyz, and the normalize of the position information can be set according to actual needs. For example, the target point can be determined based on the position information of the center point of the candidate object area. Position information is adjusted. For example, subtract the 3D coordinates of the center of the candidate object from the 3D coordinates of the target point.
(2)、对第一部分特征信息和标准化位置信息进行融合,得到目标点的融合后特征信息。(2). The first part of feature information and standardized location information are fused to obtain the fused feature information of the target point.
比如,参考图3,可以将M=512个点的标准化位置信息(如3D坐标xyz)与第一部分特征F1进行融合,具体地,可以采用Concat(连接)方式对二者进行融合,得到融合后特征(B、N、C+3)。For example, referring to Figure 3, the standardized position information of M=512 points (such as 3D coordinates xyz) can be fused with the first part of feature F1. Specifically, the two can be fused using Concat (connection) to obtain the fusion Features (B, N, C+3).
(3)对目标点的融合后特征信息进行空间变换,得到目标点的变换后位置信息。(3) Perform spatial transformation on the fusion feature information of the target point to obtain the transformed position information of the target point.
为了进一步提升第二部分特征的提取准确性,还可以对融合后特征进行空间变换。In order to further improve the accuracy of the second part of the feature extraction, the fusion feature can also be spatially transformed.
比如,在一实施例中,可以采用空间变换网络(STN)进行变换,譬如,可以采用受监督的空间变换网络如T-Net。参考图3,可以通过T-Net对融合后特征(B、N、C+3)进行空间变换,得到变换后坐标(B、3)。For example, in one embodiment, a spatial transformation network (STN) may be used for transformation, for example, a supervised spatial transformation network such as T-Net may be used. Referring to Fig. 3, the merged features (B, N, C+3) can be spatially transformed through T-Net to obtain the transformed coordinates (B, 3).
(4)、基于变换后位置信息,对目标点的标准化位置信息进行调整,得到候选物体区域的第二部分特征信息。(4) Based on the transformed position information, adjust the standardized position information of the target point to obtain the second part of the feature information of the candidate object region.
比如,可以将目标点的标准化位置值减去变换位置值,得到候选物体区域的第二部分特征F2。参考图3,可以将标准化处理(Normalize)的目标点3D坐标(B、N、3)减去变换后3D坐标(B、3)得到第二部分特征F2。For example, the normalized position value of the target point can be subtracted from the transformed position value to obtain the second partial feature F2 of the candidate object region. Referring to FIG. 3, the normalized 3D coordinates (B, N, 3) of the target point can be subtracted from the transformed 3D coordinates (B, 3) to obtain the second partial feature F2.
由于对特征进行空间变换,将位置特征减去变换后位置后,可以提升位 置特征的几何稳定性或者空间不变性,从而提升特征提取的精确性。Due to the spatial transformation of the feature, after subtracting the transformed position from the position feature, the geometric stability or spatial invariance of the position feature can be improved, thereby improving the accuracy of feature extraction.
通过上述方式可以得到每个候选物体区域的第一部分特征信息和第二部分特征信息,然后,将这两部分特征进行融合便可以得到每个候选物体区域的区域特征信息。比如,参考图3,可以将F1与F2连接(Concat)得到候选物体区域的连接后特征(B、N、C+3),将该特征作为候选物体区域的区域特征。The first part feature information and the second part feature information of each candidate object area can be obtained by the above method, and then the two parts of features are fused to obtain the area feature information of each candidate object area. For example, referring to FIG. 3, F1 and F2 can be concatenated (Concat) to obtain the connected features (B, N, C+3) of the candidate object region, and this feature is used as the regional feature of the candidate object region.
105、基于区域预测网络和区域特征信息,预测候选物体区域的类型和定位信息,得到候选物体区域的预测类型和预测定位信息。105. Predict the type and location information of the candidate object area based on the area prediction network and area feature information, and obtain the predicted type and predicted location information of the candidate object area.
其中,区域预测网络,可以用于预测候选物体区域的类型和定位信息,比如,可以对候选物体区域进行分类和定位,得到候选预测区域的预测类型和预测定位信息,该网络可以为基于深度学习的区域预测网络,可以由样本物体的点云或图像训练而成。Among them, the regional prediction network can be used to predict the type and location information of the candidate object area. For example, it can classify and locate the candidate object area to obtain the prediction type and predicted location information of the candidate prediction area. The network can be based on deep learning. The region prediction network can be trained from the point cloud or image of the sample object.
其中,预测定位信息可以包括预测的位置信息如2D或3D坐标、尺寸如长宽高等,此外在一实施例中,还可以包括预测的朝向信息如0°或90°。Wherein, the predicted positioning information may include predicted position information such as 2D or 3D coordinates, dimensions such as length, width, and height. In addition, in an embodiment, it may also include predicted orientation information such as 0° or 90°.
下面介绍区域预测网络的结构,参考图4a,区域预测网络可以包括特征提取网络、分类网络以及回归网络,分类网络与回归网络分别与特征提取网络连接。如下:The following describes the structure of the regional prediction network. Referring to Figure 4a, the regional prediction network may include a feature extraction network, a classification network, and a regression network. The classification network and the regression network are respectively connected to the feature extraction network. as follows:
其中,特征提取网络,用于对输入信息进行特征提取,比如,对候选物体区域的区域特征信息进行特征提取,得到候选物体区域的全局特征信息。Among them, the feature extraction network is used to perform feature extraction on input information, for example, perform feature extraction on the area feature information of the candidate object area to obtain the global feature information of the candidate object area.
分类网络,用于对区域进行分类,比如,可以基于候选物体区域的全局特征信息对候选物体区域进行分类,得到候选物体区域的预测类型。The classification network is used to classify the area. For example, the candidate object area can be classified based on the global feature information of the candidate object area to obtain the prediction type of the candidate object area.
回归网络,用于对区域进行定位,比如,对候选物体区域进行定位,得到候选物体区域的预测定位信息。由于用回归网络预测定位,因此输出的预测定位信息也可以称为回归信息,如预测回归信息。The regression network is used to locate the area, for example, to locate the candidate object area to obtain the predicted location information of the candidate object area. Because the regression network is used to predict the positioning, the output predicted positioning information can also be called regression information, such as predicted regression information.
比如,步骤“基于区域预测网络和区域特征信息,预测候选物体区域的类型和定位信息,得到候选物体区域的预测类型和预测定位信息”,可以包括:For example, the step of "predicting the type and location information of the candidate object area based on the area prediction network and area feature information to obtain the prediction type and predicted location information of the candidate object area" may include:
通过特征提取网络对区域特征信息进行特征提取,得到候选物体区域的全局特征信息;Perform feature extraction on the regional feature information through the feature extraction network to obtain the global feature information of the candidate object region;
基于分类网络和全局特征信息,对候选物体区域进行分类,得到候选物体区域的预测类型;Based on the classification network and global feature information, classify the candidate object area to obtain the prediction type of the candidate object area;
基于回归网络和全局特征信息,对候选物体区域的进行定位,得到候选物体区域的预测定位信息。Based on the regression network and global feature information, the candidate object area is located, and the predicted location information of the candidate object area is obtained.
为了提升预测的准确性,参考图4b,本申请实施例中特征提取网络可以包括:多个依次连接的集合抽象层即SA层;分类网络可以包括多个依次连接的全连接层(fc),如图4b所示,包括用于分类的多个fc,如cls-fc1、cls-fc2、cls-pred。其中,回归网络包括多个依次连接的全连接层,如图4b所示,包括多个用于回归的fc,如reg-fc1、reg-fc2、reg-pred。本申请实施例中,SA层和fc层的数量可以根据实际需求设定。In order to improve the accuracy of prediction, referring to FIG. 4b, the feature extraction network in the embodiment of the present application may include: a plurality of sequentially connected collective abstraction layers, namely SA layers; the classification network may include a plurality of fully connected layers (fc) connected in sequence, As shown in Figure 4b, multiple fcs for classification are included, such as cls-fc1, cls-fc2, and cls-pred. Among them, the regression network includes multiple fully connected layers connected in sequence, as shown in Figure 4b, including multiple fcs for regression, such as reg-fc1, reg-fc2, and reg-pred. In the embodiment of the present application, the number of SA layers and fc layers can be set according to actual requirements.
本申请实施例中,区域的全局特征信息提取过程可以包括:通过特征提取网络中各个集合抽象层依次对区域特征信息进行特征提取,得到候选物体区域的全局特征信息。In the embodiment of the present application, the process of extracting the global feature information of the region may include: sequentially performing feature extraction on the region feature information through each set abstraction layer in the feature extraction network to obtain the global feature information of the candidate object region.
其中,集合抽象层的结构可以参考上述的介绍,在一实施例中,SA层中分组可以采用单一尺度的方式分组,即采用SSG分组,提升全局特征提取的准确性和效率。The structure of the collective abstraction layer can refer to the above introduction. In one embodiment, the grouping in the SA layer can be grouped in a single scale, that is, SSG grouping is used to improve the accuracy and efficiency of global feature extraction.
参考图4b,区域预测网络可以通过三个SA层依次对区域特征信息进行特征提取,如当输入特征input为M×131特征时,经过三个SA层特征提取,分别得到128×128、32×256等特征。在经过SA层特征提取后,得到全局特征信息,此时,可以将全局特征信息分别输入至分类网络和回归网络。Referring to Figure 4b, the regional prediction network can perform feature extraction on regional feature information in turn through three SA layers. For example, when the input feature input is M×131 features, after three SA layer feature extraction, 128×128 and 32× 256 and other features. After the SA layer feature extraction, the global feature information is obtained. At this time, the global feature information can be input to the classification network and the regression network respectively.
分类网络通过前两个cls-fc1、cls-fc2对全局特征信息进行降维处理,并通过最后一个cls-pred层进行分类预测,输出候选物体区域的预测类型。The classification network uses the first two cls-fc1 and cls-fc2 to perform dimensionality reduction processing on the global feature information, and performs classification prediction through the last cls-pred layer, and outputs the prediction type of the candidate object region.
回归网络通过前两个reg-fc1、reg-fc2对全局特征信息进行降维处理,并通过最后一个reg-pred层进行回归预测,得到候选物体区域的预测定位信息。The regression network uses the first two reg-fc1 and reg-fc2 to perform dimensionality reduction processing on the global feature information, and performs regression prediction through the last reg-pred layer to obtain the predicted location information of the candidate object region.
其中,候选物体区域的类型可以根据实际需求设定,比如,按区域内是否有物体可以划分为有物体、没有物体;或者按质量划分还可以划分为质量高、中、低。Among them, the type of the candidate object area can be set according to actual needs, for example, according to whether there are objects in the area, it can be divided into objects with or without objects; or according to quality, it can also be divided into high, medium, and low quality.
通过上述步骤可以预测出每个候选物体区域的类型和定位信息。Through the above steps, the type and positioning information of each candidate object area can be predicted.
106、基于初始定位信息、预测类型和预测定位信息对候选物体区域进行优化处理,得到目标物体检测区域、以及目标物体检测区域的定位信息。106. Perform optimization processing on the candidate object area based on the initial positioning information, the prediction type, and the predicted positioning information, to obtain the target object detection area and the positioning information of the target object detection area.
其中,优化方式可以多种,比如,可以先基于预测定位信息对候选物体区域的定位信息进行调整,然后,再基于预测类型筛选候选物体区域。又比如,在一实施例中,可以先基于预测类型筛选候选物体区域,然后,调整定位信息。Among them, there are many optimization methods. For example, the positioning information of the candidate object area may be adjusted based on the predicted positioning information first, and then the candidate object area may be filtered based on the prediction type. For another example, in an embodiment, the candidate object regions may be screened based on the prediction type first, and then the positioning information may be adjusted.
例如,步骤“基于初始定位信息、预测类型和预测定位信息对候选物体区域进行优化处理,得到目标物体检测区域及目标物体检测区域的定位信息”,可以包括:For example, the step of "optimizing the candidate object area based on the initial positioning information, prediction type, and predicted positioning information to obtain the target object detection area and the positioning information of the target object detection area" may include:
基于候选物体区域的预测类型对候选物体区域进行筛选,得到筛选后物体区域;Screening the candidate object area based on the prediction type of the candidate object area to obtain the filtered object area;
根据筛选后物体区域的预测定位信息,对筛选后物体区域的初始定位信息进行优化调整,得到目标物体检测区域及目标物体检测区域的定位信息。According to the predicted location information of the filtered object area, the initial location information of the filtered object area is optimized and adjusted to obtain the target object detection area and the location information of the target object detection area.
例如,当预测类型包括有物体区域、空区域的情况下,可以将预测类型为空区域的候选物体区域过滤掉,然后,基于过滤处理后剩余的候选物体区域的预测定位信息,对其初始定位信息进行优化调整。For example, when the prediction type includes object regions and empty regions, the candidate object regions whose prediction type is empty regions can be filtered out, and then based on the predicted positioning information of the remaining candidate object regions after the filtering process, their initial positioning Information is optimized and adjusted.
具体地,定位信息优化调整方式,比如,可以基于预测定位信息与初始定位信息之间的差异信息进行调整,譬如,区域3D坐标的差值、尺寸差值等。Specifically, the positioning information optimization adjustment method, for example, can be adjusted based on the difference information between the predicted positioning information and the initial positioning information, for example, the difference in the 3D coordinates of the area, the difference in size, and so on.
又比如,还可以基于预测定位信息和初始定位信息确定一个最优的定位 信息,然后,将候选物体区域的定位信息调整为该最优的定位信息。譬如,确定一个最优区域3d坐标和长宽高等。For another example, it is also possible to determine an optimal positioning information based on the predicted positioning information and the initial positioning information, and then adjust the positioning information of the candidate object area to the optimal positioning information. For example, determine the 3d coordinates and length, width, and height of an optimal area.
在实际应用中,还可以基于目标物体检测区域的定位信息在场景图像中标识出物体检测区域,比如,参考图1e,采用本申请实施例提供的物体检测方法可以在自动驾驶场景中准确地检测当前道路上的物体的位置、大小、以及方向,有利于自动驾驶的决策和判断。In practical applications, the object detection area can also be identified in the scene image based on the location information of the target object detection area. For example, referring to FIG. 1e, the object detection method provided by the embodiment of the application can accurately detect in the automatic driving scene. The position, size, and direction of objects on the current road are conducive to decision-making and judgment of autonomous driving.
本申请实施例提供的物体检测可以适用于各种场景,比如,自动驾驶、无人机、安全监控等场景。The object detection provided by the embodiments of the present application may be applicable to various scenarios, such as scenarios such as autonomous driving, drones, and security monitoring.
由上可知,本申请实施例可以从场景的点云中检测出前景点;基于前景点和预定尺寸构建前景点对应的物体区域,得到候选物体区域的初始定位信息;基于点云网络对点云中的所有点进行特征提取,得到点云对应的特征集;基于特征集构建候选物体区域的区域特征信息;基于区域预测网络和区域特征信息,预测候选物体区域的类型和定位信息,得到候选物体区域的预测类型和预测定位信息;基于候选物体区域的初始定位信息、候选物体区域的预测类型和预测定位信息对候选物体区域进行优化处理,得到目标物体检测区域以及目标物体检测区域的定位信息该方案采用场景的点云数据进行物体检测,可以提升物体检测的准确性。It can be seen from the above that the embodiment of the present application can detect the former scenic spot from the point cloud of the scene; construct the object area corresponding to the former scenic spot based on the former scenic spot and the predetermined size to obtain the initial positioning information of the candidate object area; compare the point cloud based on the point cloud network Perform feature extraction on all points of, to obtain the feature set corresponding to the point cloud; construct the regional feature information of the candidate object region based on the feature set; predict the type and location information of the candidate object region based on the region prediction network and regional feature information to obtain the candidate object region The prediction type and prediction positioning information of the candidate object area; based on the initial positioning information of the candidate object area, the prediction type and prediction positioning information of the candidate object area, the candidate object area is optimized to obtain the target object detection area and the positioning information of the target object detection area. Using the point cloud data of the scene for object detection can improve the accuracy of object detection.
并且该方案还可以针对点云中的每个前景点生成候选物体区域,可以避免信息丢失,同时针对每个前景点生成候选物体区域,也即对于任意一个物体,都会产生其对应的候选区域,因此,不会受到物体尺度变化以及严重遮挡的影响,提升了物体检测的有效性和成功率。And this solution can also generate candidate object regions for each front scenic spot in the point cloud, which can avoid information loss. At the same time, candidate object regions are generated for each front scenic spot, that is, for any object, its corresponding candidate region will be generated. Therefore, it will not be affected by object scale changes and severe occlusion, which improves the effectiveness and success rate of object detection.
此外,该方案还可以基于候选物体区域的区域特征对候选物体区域进行优化处理;因此,可以进一步提升物体检测的精确性和质量。In addition, this solution can also optimize the candidate object region based on the region characteristics of the candidate object region; therefore, the accuracy and quality of object detection can be further improved.
根据上面实施例所描述的方法,以下将举例作进一步详细说明。According to the method described in the above embodiment, an example will be given below for further detailed description.
在本实施例中,将以该物体检测装置具体集成在网络设备为例进行说明。In this embodiment, the object detection device is specifically integrated in a network device as an example for description.
(一)分别对语义分割网络、点云网络以及区域预测网络进行训练,具体可以如下:(1) Train the semantic segmentation network, point cloud network, and area prediction network separately, which can be specifically as follows:
1、语义分割网络的训练。1. Training of semantic segmentation network.
首先,网络设备可以获取语义分割网络的训练集,该训练集包括标注了像素类型(如前景像素、背景像素等)的样本图像。First, the network device can obtain a training set of the semantic segmentation network, which includes sample images labeled with pixel types (such as foreground pixels, background pixels, etc.).
其中,网络设备可以基于该训练集、损失函数对语义分割进行训练。具体地,可以通过语义分割网络对样本图像进行语义分割,得到样本图像的前景像素,然后,基于损失函数对分割得到的像素类型与标注的像素类型进行收敛,得到训练后的语义分割网络。Among them, the network device can train semantic segmentation based on the training set and loss function. Specifically, the sample image can be semantically segmented through the semantic segmentation network to obtain foreground pixels of the sample image, and then the segmented pixel type and the labeled pixel type are converged based on the loss function to obtain the trained semantic segmentation network.
2、点云网络的训练。2. Training of point cloud network.
网络设备获取点云网络的训练集,该训练集包括样本物体或场景的样本点云。网络设备可以基于样本点云训练集对点云网络进行训练。The network device obtains a training set of the point cloud network, and the training set includes sample point clouds of sample objects or scenes. The network device can train the point cloud network based on the sample point cloud training set.
3、区域预测网络3. Regional prediction network
网络设备获取区域预测网络的训练集,该训练集可以包括标注了物体区域类型和定位信息的样本点云;通过该训练集对区域预测网络进行训练,具体地,预测样本点云的物体区域类型和的定位信息,将预测类型与真实类型进行收敛,将预测定位信息与真实定位信息进行收敛,得到训练后的区域预测网络。The network device obtains the training set of the area prediction network, the training set may include the sample point cloud labeled with the object area type and positioning information; the area prediction network is trained through the training set, specifically, the object area type of the sample point cloud is predicted And the positioning information, the prediction type is converged with the real type, and the predicted positioning information is converged with the real positioning information to obtain the trained regional prediction network.
上述网络训练可以由网络设备自己执行,也可以由其他设备训练完成后,网络设备获取应用。应当理解的是本申请实施例应用的网络不仅限于上述方式来训练,还可以通过其他方式来训练。The foregoing network training may be performed by the network device itself, or it may be obtained by the network device after the training of other devices is completed. It should be understood that the network applied in the embodiment of the present application is not limited to training in the foregoing manner, and may also be trained in other manners.
(二)通过该训练好的语义分割网络、点云网络以及区域预测网络,便可以基于点云进行物体检测,具体可参见图5a和图5b。(2) Through the trained semantic segmentation network, point cloud network, and regional prediction network, object detection can be performed based on the point cloud. For details, see Figure 5a and Figure 5b.
如图5a所示,一种物体检测方法,具体流程可以如下:As shown in Figure 5a, an object detection method, the specific process can be as follows:
501、网络设备获取场景的图像和点云。501. The network device acquires an image and a point cloud of the scene.
比如,网络设备可以分别从图像采集设备和点云采集设备获取场景的图像和点云For example, network equipment can obtain scene images and point clouds from image acquisition equipment and point cloud acquisition equipment respectively
502、网络设备采用语义分割网络对场景的图像进行语义分割,得到前景像素。502. The network device uses a semantic segmentation network to perform semantic segmentation on the image of the scene to obtain foreground pixels.
参考图5b,以自动驾驶场景为例,可以先采集道路场景图像,可以采用2D语义分割网络对场景的图像进行分割,得到分割结果,包括前景像素、背景像素等。Referring to Fig. 5b, taking an autonomous driving scene as an example, a road scene image can be collected first, and a 2D semantic segmentation network can be used to segment the scene image to obtain a segmentation result, including foreground pixels, background pixels, and so on.
503、网络设备将前景像素点映射到场景的点云中,得到点云中的前景点。503. The network device maps the foreground pixels to the point cloud of the scene to obtain the front scenic spot in the point cloud.
比如,可以将基于X-ception的DeepLabV3作为的分割网络,通过该分割网络对场景的图像进行分割,得到前景像素如自动驾驶中的车、行人、骑行的人的前景像素点。然后,将点云中的点投影到场景的图像中,然后将其对应的图片中的分割结果,作为这个点的分割结果,进而产生点云中的前景点。该方式可以精确地检测出点云中的前景点。For example, X-ception-based DeepLabV3 can be used as a segmentation network, and the image of the scene can be segmented through the segmentation network to obtain foreground pixels such as foreground pixels of cars, pedestrians, and cyclists in autonomous driving. Then, the point in the point cloud is projected into the image of the scene, and then the segmentation result in the corresponding picture is used as the segmentation result of this point, and then the front scenic spot in the point cloud is generated. This method can accurately detect the front spot in the point cloud.
504、网络设备基于每个前景点和预定尺寸构建每个前景点对应的三维候选物体区域,得到候选物体区域的初始定位信息。504. The network device constructs a three-dimensional candidate object area corresponding to each front scenic spot based on each front scenic spot and a predetermined size, and obtains initial positioning information of the candidate object area.
比如,以前景点为中心点并按照预定尺寸生成前景点对应的三维候选物体区域。For example, the previous scenic spot is the center point and the three-dimensional candidate object area corresponding to the previous scenic spot is generated according to a predetermined size.
其中,候选物体区域的定位信息可以包括候选物体区域的位置信息、尺寸信息等等。Wherein, the location information of the candidate object area may include position information, size information, and so on of the candidate object area.
比如,参考图5b,可以在得到前景点后,通过以前景点为中心点并按照预定尺寸生成前景点对应的候选物体区域,即生成基于点的候选物体区域(Piont-Based Proposal Generation)。For example, referring to FIG. 5b, after obtaining the previous scenic spot, the candidate object area corresponding to the previous scenic spot can be generated according to a predetermined size by using the previous scenic spot as the center point, that is, a Piont-Based Proposal Generation (Piont-Based Proposal Generation) can be generated.
详细的候选物体区域可以参考图2a至图2b,以及上述的相关介绍。For detailed candidate object regions, please refer to Fig. 2a to Fig. 2b, and the above-mentioned related introduction.
505、网络设备通过点云网络对点云中的所有点进行特征提取,得到点云 对应的特征集。505. The network device performs feature extraction on all points in the point cloud through the point cloud network to obtain a feature set corresponding to the point cloud.
参考图5b,可以将点云(B,N,4)中所有点输入到PointNet++,通过PointNet++提取点云的特征,得到(B,N,C)。Referring to Figure 5b, all points in the point cloud (B, N, 4) can be input to PointNet++, and the feature of the point cloud can be extracted through PointNet++ to obtain (B, N, C).
具体的点云网络结构和特征提取过程可以参考上述实施例的描述。For the specific point cloud network structure and feature extraction process, reference may be made to the description of the foregoing embodiment.
506、网络设备基于特征集构建候选物体区域的区域特征信息。506. The network device constructs regional feature information of the candidate object region based on the feature set.
参考图5b,在得到候选物体区域的初始定位信息、以及点云的特征集后,网络设备可以基于点云的特征集生成候选物体区域的区域特征信息(即Proposal Feature Generation)。Referring to FIG. 5b, after obtaining the initial positioning information of the candidate object area and the feature set of the point cloud, the network device can generate the area feature information of the candidate object area based on the feature set of the point cloud (ie, Proposal Feature Generation).
比如,网络设备在候选物体区域中选择多个目标点;从特征集中提取目标点的特征,得到候选物体区域的第一部分特征信息;对目标点的位置信息进行标准化处理,得到目标点的标准化位置信息;对第一部分特征信息和标准化位置信息进行融合,得到目标点的融合后特征信息;对目标的融合后特征信息进行空间变换,得到目标点的变换后位置信息;基于变换后位置信息,对目标点的标准化位置信息进行调整,得到候选物体区域的第二部分特征信息;对第一部分特征信息与第二部分特征信息进行融合,得到候选区域的区域特征。For example, the network device selects multiple target points in the candidate object area; extracts the characteristics of the target point from the feature set to obtain the first part of the feature information of the candidate object area; standardizes the position information of the target point to obtain the standardized position of the target point Information; the first part of the feature information and standardized location information are fused to obtain the fusion feature information of the target point; the fusion feature information of the target is spatially transformed to obtain the transformed location information of the target point; based on the transformed location information, The standardized position information of the target point is adjusted to obtain the second part of the feature information of the candidate object area; the first part of the feature information and the second part of the feature information are fused to obtain the regional feature of the candidate area.
具体地,区域特征生成可以参考上述实施例和图3的描述。Specifically, the region feature generation can refer to the above-mentioned embodiment and the description of FIG. 3.
507、网络设备基于区域预测网络和区域特征信息,预测候选物体区域的类型和定位信息,得到候选物体区域的预测类型和预测定位信息。507. The network device predicts the type and location information of the candidate object area based on the area prediction network and the area feature information, and obtains the prediction type and predicted location information of the candidate object area.
比如,参考图5b,可以通过边界预测网络(Box Prediction Net)对候选区域进行分类(cls)以及回归(reg),从而预测候选物体区域的类型和回归参数,该回归参数即为预测定位信息,包括三维坐标、长宽高、朝向等参数如(x,y,z,l,h,w,angle)。For example, referring to Figure 5b, the candidate region can be classified (cls) and regression (reg) through the Box Prediction Net, so as to predict the type and regression parameters of the candidate object region. The regression parameters are predicted positioning information. Including three-dimensional coordinates, length, width and height, orientation and other parameters such as (x, y, z, l, h, w, angle).
508、网络设备基于候选物体区域的初始定位信息、候选物体区域的预测类型和预测定位信息对候选物体区域进行优化处理,得到目标物体检测区域以及目标物体检测区域的定位信息。508. The network device optimizes the candidate object area based on the initial positioning information of the candidate object area, the prediction type of the candidate object area, and the predicted positioning information to obtain the target object detection area and the positioning information of the target object detection area.
比如,网络设备可以基于候选物体区域的预测类型对候选物体区域进行筛选,得到筛选后物体区域;根据筛选后物体区域的预测定位信息,对筛选后物体区域的初始定位信息进行优化调整,得到优化后物体检测区域及其定位信息。For example, the network device can filter the candidate object regions based on the prediction type of the candidate object regions to obtain the filtered object regions; according to the predicted positioning information of the filtered object regions, the initial positioning information of the filtered object regions can be optimized and adjusted to obtain optimization Back object detection area and its positioning information.
在实际应用中,还可以基于目标物体检测区域的定位信息在场景图像中标识出物体检测区域,比如,参考图1e,采用本申请实施例提供的物体检测方法可以在自动驾驶场景中准确地检测当前道路上的物体的位置、大小、以及方向,有利于自动驾驶的决策和判断。In practical applications, the object detection area can also be identified in the scene image based on the location information of the target object detection area. For example, referring to FIG. 1e, the object detection method provided by the embodiment of the application can accurately detect in the automatic driving scene. The position, size, and direction of objects on the current road are conducive to decision-making and judgment of autonomous driving.
本申请实施例可以将全部的点云作为输入,然后使用一个PointNet++的结构为点云中的每一个点产生特征。然后以点云中的每一个点为锚点生成候选区域。之后,以每一个点的特征作为输入,优化候选区域,从而生成最后的 检测结果。The embodiment of the present application may use all the point clouds as input, and then use a PointNet++ structure to generate features for each point in the point cloud. Then use each point in the point cloud as an anchor point to generate a candidate area. After that, the feature of each point is used as input to optimize the candidate area to generate the final detection result.
并且,在一些数据集中测试了本申请实施例提供的算法能力,比如,在开源的自动驾驶数据集如KITTI数据集上测试了本申请实施例提供的算法的能力,其中KITTI数据集是一个自动驾驶数据集,同时拥有多种大小和距离的物体,非常具有挑战性。本申请实施例的算法在KITTI上超过了所有的现有的3D物体检测的算法,达到了一个全新的state-of-the-art,同时在其中的困难集上更是远超之前最好的算法。In addition, the algorithm capabilities provided by the embodiments of this application have been tested on some data sets. For example, the capabilities of the algorithms provided by the embodiments of this application have been tested on an open source autonomous driving data set such as the KITTI data set. The KITTI data set is an automatic The driving data set, with objects of various sizes and distances at the same time, is very challenging. The algorithm of the embodiment of this application surpasses all existing 3D object detection algorithms on KITTI, reaching a brand-new state-of-the-art, and at the same time, it is far superior to the previous best in the difficulty set. algorithm.
在KITTI数据集上,测试了三类(汽车、行人和骑自行车)的7481训练图像的点云和7518的测试图像的点云。并采用最广泛实验的平均精度(AP)与其他方法进行度量比较,其他方法包括MV3D(Multi-View 3D object detection,多模态3D物体检测)、AVOD(Aggregate View Object Detection,多视图物体检测)、VoxelNet(3D像素网络)、F-PointNet(Frustum-PointNet,视锥点云网络)、AVOD-FPN(多视图物体检测-视锥点云网络)。如图5c所示为测试结果。从而结果来看本申请实施例提供的物体检测方法(图5c中的Ours)的精度明显高于其他方法。On the KITTI data set, the point cloud of 7481 training images and the point cloud of 7518 test images of three categories (cars, pedestrians and cycling) are tested. The average accuracy (AP) of the most extensive experiment is compared with other methods. Other methods include MV3D (Multi-View 3D object detection, multi-modal 3D object detection), AVOD (Aggregate View Object Detection, multi-view object detection) , VoxelNet (3D pixel network), F-PointNet (Frustum-PointNet, cone point cloud network), AVOD-FPN (multi-view object detection-cone point cloud network). Figure 5c shows the test results. As a result, the accuracy of the object detection method (Ours in FIG. 5c) provided by the embodiment of the present application is significantly higher than other methods.
为了更好地实施以上方法,相应的,本申请实施例还提供一种物体检测装置,该物体检测装置具体可以集成在网络设备中,该网络设备可以是服务器,也可以是终端、车载设备、无人机等设备,还可以为比如微型处理盒子等。In order to better implement the above method, correspondingly, an embodiment of the present application also provides an object detection device. The object detection device can be integrated in a network device. The network device can be a server, a terminal, a vehicle-mounted device, Equipment such as drones can also be miniature processing boxes.
例如,如图6a所示,该物体检测装置可以包括检测单元601、区域构建单元602、特征提取单元603、特征构建单元604、预测单元605和优化单元606,如下:For example, as shown in FIG. 6a, the object detection device may include a detection unit 601, a region construction unit 602, a feature extraction unit 603, a feature construction unit 604, a prediction unit 605, and an optimization unit 606, as follows:
检测单元601,用于从场景的点云中检测出前景点;The detection unit 601 is configured to detect the front scenic spot from the point cloud of the scene;
区域构建单元602,用于基于前景点和预定尺寸构建所述前景点对应的候选物体区域,确定候选物体区域的初始定位信息;The area constructing unit 602 is configured to construct a candidate object area corresponding to the front scenic spot based on the previous scenic spot and a predetermined size, and determine initial positioning information of the candidate object area;
特征提取单元603,用于基于点云网络对所述点云中的所有点进行特征提取,得到所述点云对应的特征集;The feature extraction unit 603 is configured to perform feature extraction on all points in the point cloud based on the point cloud network to obtain a feature set corresponding to the point cloud;
特征构建单元604,用于基于所述特征集构建所述候选物体区域的区域特征信息;The feature construction unit 604 is configured to construct the area feature information of the candidate object area based on the feature set;
预测单元605,用于基于区域预测网络和所述区域特征信息,预测候选物体区域的类型和定位信息,得到候选物体区域的预测类型和预测定位信息;The prediction unit 605 is configured to predict the type and location information of the candidate object area based on the area prediction network and the area feature information, and obtain the prediction type and predicted location information of the candidate object area;
优化单元606,用于基于候选物体区域的初始定位信息、候选物体区域的预测类型和预测定位信息对候选物体区域进行优化处理,得到目标物体检测区域及目标物体检测区域的定位信息。The optimization unit 606 is configured to perform optimization processing on the candidate object area based on the initial positioning information of the candidate object area, the prediction type of the candidate object area, and the predicted positioning information to obtain the target object detection area and the positioning information of the target object detection area.
在一实施例中,检测单元601,具体用于:In an embodiment, the detection unit 601 is specifically configured to:
对场景的图像进行语义分割,得到前景像素;Perform semantic segmentation on the image of the scene to obtain foreground pixels;
将场景的点云中与前景像素对应的点确定为前景点。The point corresponding to the foreground pixel in the point cloud of the scene is determined as the front scenic spot.
在一实施例中,区域构建单元602,具体用于:In an embodiment, the area construction unit 602 is specifically configured to:
以前景点为中心点,按照预定尺寸生成所述前景点对应的候选物体区域。The previous scenic spot is the center point, and the candidate object area corresponding to the previous scenic spot is generated according to a predetermined size.
在一实施例中,参考图6b,特征构建单元604,具体包括:In an embodiment, referring to FIG. 6b, the feature construction unit 604 specifically includes:
选择子单元6041,用于在所述候选物体区域中选择多个目标点;The selection subunit 6041 is configured to select multiple target points in the candidate object area;
提取子单元6042,用于从所述特征集中提取所述目标点的特征,得到所述候选物体区域的第一部分特征信息;An extraction subunit 6042, configured to extract the feature of the target point from the feature set to obtain the first part of feature information of the candidate object region;
构建子单元6043,用于基于所述目标点的位置信息构建所述候选物体区域的第二部分特征信息;The constructing subunit 6043 is configured to construct the second part of the feature information of the candidate object region based on the position information of the target point;
融合子单元6045,用于对所述第一部分特征信息与所述第二部分特征信息进行融合,得到所述候选物体区域的区域特征信息。The fusion subunit 6045 is configured to fuse the first partial feature information and the second partial feature information to obtain the region feature information of the candidate object region.
在一实施例中,构建子单元6043,具体用于:In an embodiment, the subunit 6043 is constructed, specifically for:
对所述目标点的位置信息进行标准化处理,得到目标点的标准化位置信息;Standardize the location information of the target point to obtain standardized location information of the target point;
对所述第一部分特征信息和所述标准化位置信息进行融合,得到目标点的融合后特征信息;Fusing the first part of feature information and the standardized location information to obtain the fused feature information of the target point;
对所述目标的融合后特征信息进行空间变换,得到变换后位置信息;Performing spatial transformation on the fused feature information of the target to obtain transformed position information;
基于所述变换后位置信息,对所述目标点的标准化位置信息进行调整,得到候选物体区域的第二部分特征信息。Based on the transformed position information, the standardized position information of the target point is adjusted to obtain the second partial feature information of the candidate object region.
在一实施例中,参考图6c,所述点云网络包括:第一采样网络、与所第一采样网络连接的第二采样网络;所述特征提取单元603,具体包括:In an embodiment, referring to FIG. 6c, the point cloud network includes: a first sampling network, and a second sampling network connected to the first sampling network; the feature extraction unit 603 specifically includes:
降采样子单元6031,用于通过所述第一采样网络对所述点云中的所有点进行特征降采样处理,得到点云的初始特征;A down-sampling subunit 6031, configured to perform feature down-sampling processing on all points in the point cloud through the first sampling network to obtain initial features of the point cloud;
上采样子单元6032,用于通过所述第二采样网络对所述初始特征进行上采样处理,得到点云的特征集。The up-sampling subunit 6032 is configured to perform up-sampling processing on the initial features through the second sampling network to obtain a feature set of the point cloud.
在一实施例中,所述第一采样网络包括多个依次连接的集合抽象层,所述第二采样网络包括多个依次连接且与所述第一采样网络中各集合抽象层一一对应的特征传播层;In one embodiment, the first sampling network includes a plurality of aggregate abstraction layers connected in sequence, and the second sampling network includes a plurality of aggregate abstract layers connected in sequence and corresponding to each aggregate abstraction layer in the first sampling network. Feature propagation layer
降采样子单元6031,具体用于:The downsampling subunit 6031 is specifically used for:
通过所述集合抽象层依次对点云中的点进行局部区域划分,并提取局部区域中心点的特征,得到点云的初始特征;The points in the point cloud are sequentially divided into local areas through the set abstraction layer, and the characteristics of the central points of the local areas are extracted to obtain the initial characteristics of the point cloud;
将所述点云的初始特征输入至第二采样网络;Inputting the initial features of the point cloud to the second sampling network;
上采样子单元6032,具体用于:The up-sampling subunit 6032 is specifically used for:
将上一层的输出特征、以及当前特征传播层对应的集合抽象层的输出特征,确定为当前特征传播层的当前输入特征;Determine the output feature of the previous layer and the output feature of the set abstraction layer corresponding to the current feature propagation layer as the current input feature of the current feature propagation layer;
通过当前特征传播层对当前输入特征进行上采样处理,得到点云的特征集。The current input feature is up-sampled through the current feature propagation layer to obtain the feature set of the point cloud.
在一实施例中,所述区域预测网络包括特征提取网络、与特征提取网络 连接的分类网络、以及与特征提取网络连接的回归网络;参考图6d,预测单元605,具体包括:In one embodiment, the regional prediction network includes a feature extraction network, a classification network connected to the feature extraction network, and a regression network connected to the feature extraction network; referring to FIG. 6d, the prediction unit 605 specifically includes:
全局特征提取子单元6051,用于通过所述特征提取网络对所述区域特征信息进行特征提取,得到候选物体区域的全局特征信息;The global feature extraction subunit 6051 is configured to perform feature extraction on the regional feature information through the feature extraction network to obtain global feature information of the candidate object region;
分类子单元6052,用于基于所述分类网络和所述全局特征信息,对所述候选物体区域进行分类,得到候选区域的预测类型;The classification subunit 6052 is configured to classify the candidate object region based on the classification network and the global feature information to obtain the prediction type of the candidate region;
回归子单元6053,用于基于所述回归网络和所述全局特征信息,对所述候选物体区域的进行定位,得到候选物体区域的预测定位信息。The regression sub-unit 6053 is configured to locate the candidate object area based on the regression network and the global feature information to obtain predicted positioning information of the candidate object area.
在一实施例中,所述特征提取网络包括多个依次连接的集合抽象层,所述分类网络包括多个依次连接的全连接层,所述回归网络包括多个依次连接的全连接层;In an embodiment, the feature extraction network includes a plurality of sequentially connected collective abstraction layers, the classification network includes a plurality of sequentially connected fully connected layers, and the regression network includes a plurality of sequentially connected fully connected layers;
所述全局特征提取子单元6051,具体用于通过特征提取网络中集合抽象层依次对区域特征信息进行特征提取,得到候选物体区域的全局特征信息。The global feature extraction subunit 6051 is specifically configured to perform feature extraction on the regional feature information in turn through the collective abstraction layer in the feature extraction network to obtain the global feature information of the candidate object region.
在一实施例中,参考图6e,优化单元606,具体包括:In an embodiment, referring to FIG. 6e, the optimization unit 606 specifically includes:
筛选子单元6061,用于基于候选物体区域的预测类型对候选物体区域进行筛选,得到筛选后物体区域;The screening subunit 6061 is used to screen candidate object regions based on the prediction type of the candidate object regions to obtain the filtered object regions;
优化子单元6062,用于根据筛选后物体区域的预测定位信息,对筛选后物体区域的初始定位信息进行优化调整,得到目标物体检测区域及目标物体检测区域的定位信息。The optimization subunit 6062 is configured to optimize and adjust the initial positioning information of the filtered object area according to the predicted positioning information of the filtered object area to obtain the target object detection area and the location information of the target object detection area.
具体实施时,以上各个单元可以作为独立的实体来实现,也可以进行任意组合,作为同一或若干个实体来实现,以上各个单元的具体实施可参见前面的方法实施例,在此不再赘述。During specific implementation, each of the above units can be implemented as an independent entity, or can be combined arbitrarily, and implemented as the same or several entities. For the specific implementation of each of the above units, please refer to the previous method embodiments, which will not be repeated here.
由上可知,本实施例的物体检测装置可以通过检测单元601从场景的点云中检测出前景点;然后由区域构建单元602基于前景点和预定尺寸构建所述前景点对应的候选物体区域,得到候选物体区域的初始定位信息;由特征提取单元603基于点云网络对所述点云中的所有点进行特征提取,得到所述点云对应的特征集;由特征构建单元604基于所述特征集构建所述候选物体区域的区域特征信息;由预测单元605基于区域预测网络和所述区域特征信息,预测候选物体区域的类型和定位信息,得到候选物体区域的预测类型和预测定位信息;由优化单元606基于候选物体区域的初始定位信息、候选物体区域的预测类型和预测定位信息对候选物体区域进行优化处理,得到目标物体检测区域及其定位信息。由于该方案可以采用场景的点云数据进行物体检测,并且还可以针对每个前景点生成候选物体区域,基于候选物体区域的区域特征对候选物体区域进行优化处理;因此,可以大大提升物体检测的精确性,尤其适用于3D物体检测。It can be seen from the above that the object detection device of this embodiment can detect the front scenic spot from the point cloud of the scene through the detection unit 601; then the region construction unit 602 constructs the candidate object area corresponding to the previous scenic spot based on the previous scenic spot and the predetermined size, and obtains The initial positioning information of the candidate object region; the feature extraction unit 603 performs feature extraction on all points in the point cloud based on the point cloud network to obtain the feature set corresponding to the point cloud; the feature construction unit 604 is based on the feature set Construct the region feature information of the candidate object region; the prediction unit 605 predicts the type and location information of the candidate object region based on the region prediction network and the region feature information, and obtains the prediction type and predicted location information of the candidate object region; The unit 606 optimizes the candidate object area based on the initial positioning information of the candidate object area, the prediction type of the candidate object area, and the predicted positioning information to obtain the target object detection area and its positioning information. Because this solution can use the point cloud data of the scene for object detection, and can also generate candidate object regions for each front scenic spot, and optimize the candidate object regions based on the regional characteristics of the candidate object regions; therefore, it can greatly improve the object detection Accuracy, especially suitable for 3D object detection.
此外,本申请实施例还提供一种网络设备,如图7所示,其示出了本申请实施例所涉及的网络设备的结构示意图,具体来讲:In addition, the embodiment of the present application also provides a network device, as shown in FIG. 7, which shows a schematic structural diagram of the network device involved in the embodiment of the present application, specifically:
该网络设备可以包括一个或者一个以上处理核心的处理器701、一个或一个以上计算机可读存储介质的存储器702、电源703和输入单元704等部件。本领域技术人员可以理解,图7中示出的网络设备结构并不构成对网络设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。其中:The network device may include one or more processing core processors 701, one or more computer-readable storage medium memory 702, power supply 703, input unit 704 and other components. Those skilled in the art can understand that the network device structure shown in FIG. 7 does not constitute a limitation on the network device, and may include more or less components than shown in the figure, or combine some components, or arrange different components. among them:
处理器701是该网络设备的控制中心,利用各种接口和线路连接整个网络设备的各个部分,通过运行或执行存储在存储器702内的软件程序和/或模块,以及调用存储在存储器702内的数据,执行网络设备的各种功能和处理数据,从而对网络设备进行整体监控。可选的,处理器701可包括一个或多个处理核心;优选的,处理器701可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器701中。The processor 701 is the control center of the network device. It uses various interfaces and lines to connect the various parts of the entire network device, runs or executes the software programs and/or modules stored in the memory 702, and calls the data stored in the memory 702. Data, perform various functions of network equipment and process data, so as to monitor the network equipment as a whole. Optionally, the processor 701 may include one or more processing cores; preferably, the processor 701 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface, application programs, etc. , The modem processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 701.
存储器702可用于存储软件程序以及模块,处理器701通过运行存储在存储器702的软件程序以及模块,从而执行各种功能应用以及数据处理。存储器702可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据网络设备的使用所创建的数据等。此外,存储器702可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器702还可以包括存储器控制器,以提供处理器701对存储器702的访问。The memory 702 may be used to store software programs and modules. The processor 701 executes various functional applications and data processing by running the software programs and modules stored in the memory 702. The memory 702 may mainly include a storage program area and a storage data area. The storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data created by the use of network equipment, etc. In addition, the memory 702 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices. Correspondingly, the memory 702 may further include a memory controller to provide the processor 701 with access to the memory 702.
网络设备还包括给各个部件供电的电源703,优选的,电源703可以通过电源管理系统与处理器701逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源703还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。The network device also includes a power supply 703 for supplying power to various components. Preferably, the power supply 703 may be logically connected to the processor 701 through a power management system, so that functions such as charging, discharging, and power consumption management can be managed through the power management system. The power supply 703 may also include any components such as one or more DC or AC power supplies, a recharging system, a power failure detection circuit, a power converter or inverter, and a power status indicator.
该网络设备还可包括输入单元704,该输入单元704可用于接收输入的数字或字符信息,以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。The network device may further include an input unit 704, which can be used to receive input digital or character information and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control.
尽管未示出,网络设备还可以包括显示单元等,在此不再赘述。具体在本实施例中,网络设备中的处理器701会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行文件加载到存储器702中,并由处理器701来运行存储在存储器702中的应用程序,从而实现各种功能,如下:Although not shown, the network device may also include a display unit, etc., which will not be repeated here. Specifically, in this embodiment, the processor 701 in the network device loads the executable file corresponding to the process of one or more applications into the memory 702 according to the following instructions, and the processor 701 runs the executable file stored in The application programs in the memory 702 thus realize various functions, as follows:
从场景的点云中检测出前景点;基于前景点和预定尺寸构建所述前景点对应的候选物体区域,得到候选物体区域的初始定位信息;基于点云网络对所述点云中的所有点进行特征提取,得到所述点云对应的特征集;基于所述 特征集构建所述候选物体区域的区域特征信息;基于区域预测网络和所述区域特征信息,预测候选物体区域的类型和定位信息,得到候选物体区域的预测类型和预测定位信息;基于候选区域的初始定位信息、候选物体区域的预测类型和预测定位信息对候选物体区域进行优化处理,得到目标物体检测区域及目标物体检测区域的定位信息。Detect the previous scenic spot from the point cloud of the scene; construct the candidate object area corresponding to the former scenic spot based on the previous scenic spot and the predetermined size to obtain the initial positioning information of the candidate object area; perform all the points in the point cloud based on the point cloud network Feature extraction to obtain the feature set corresponding to the point cloud; construct the region feature information of the candidate object region based on the feature set; predict the type and location information of the candidate object region based on the region prediction network and the region feature information, Obtain the prediction type and predicted positioning information of the candidate object area; optimize the candidate object area based on the initial positioning information of the candidate area, the prediction type and predicted positioning information of the candidate object area, and obtain the location of the target object detection area and the target object detection area information.
以上各个操作的具体实施可参见前面的实施例,在此不再赘述。For the specific implementation of the above operations, please refer to the previous embodiments, which will not be repeated here.
由上可知,本实施例的网络设备从场景的点云中检测出前景点;基于前景点和预定尺寸构建所述前景点对应的候选物体区域,得到候选物体区域的初始定位信息;基于点云网络对所述点云中的所有点进行特征提取,得到所述点云对应的特征集;基于所述特征集构建所述候选物体区域的区域特征信息;基于区域预测网络和所述区域特征信息,预测候选物体区域的类型和定位信息,得到候选物体区域的预测类型和预测定位信息;基于候选物体区域的初始定位信息、候选物体区域的预测类型和预测定位信息对候选物体区域进行优化处理,得到目标物体检测区域及其定位信息。由于该方案可以采用场景的点云数据进行物体检测,并且还可以针对每个前景点生成候选物体区域,基于候选物体区域的区域特征对候选物体区域进行优化处理;因此,可以大大提升物体检测的精确性,尤其适用于3D物体检测。It can be seen from the above that the network device of this embodiment detects the former scenic spot from the point cloud of the scene; constructs the candidate object area corresponding to the former scenic spot based on the former scenic spot and a predetermined size to obtain the initial positioning information of the candidate object area; based on the point cloud network Perform feature extraction on all points in the point cloud to obtain the feature set corresponding to the point cloud; construct the region feature information of the candidate object region based on the feature set; based on the region prediction network and the region feature information, Predict the type and positioning information of the candidate object area to obtain the prediction type and predicted positioning information of the candidate object area; optimize the candidate object area based on the initial positioning information of the candidate object area, the prediction type of the candidate object area and the predicted positioning information, and obtain Target object detection area and its location information. Because this solution can use the point cloud data of the scene for object detection, and can also generate candidate object regions for each front scenic spot, and optimize the candidate object regions based on the regional characteristics of the candidate object regions; therefore, it can greatly improve the object detection Accuracy, especially suitable for 3D object detection.
本领域普通技术人员可以理解,上述实施例的各种方法中的全部或部分步骤可以通过指令来完成,或通过指令控制相关的硬件来完成,该指令可以存储于一计算机可读存储介质中,并由处理器进行加载和执行。A person of ordinary skill in the art can understand that all or part of the steps in the various methods of the foregoing embodiments can be completed by instructions, or by instructions to control related hardware. The instructions can be stored in a computer-readable storage medium. And loaded and executed by the processor.
为此,本申请实施例还提供一种存储介质,其中存储有多条指令,该指令能够被处理器进行加载,以执行本申请实施例所提供的任一种物体检测方法中的步骤。例如,该指令可以执行如下步骤:To this end, an embodiment of the present application further provides a storage medium in which multiple instructions are stored, and the instructions can be loaded by a processor to execute the steps in any object detection method provided in the embodiments of the present application. For example, the instruction can perform the following steps:
从场景的点云中检测出前景点;基于前景点和预定尺寸构建所述前景点对应的候选物体区域,得到候选物体区域的初始定位信息;基于点云网络对所述点云中的所有点进行特征提取,得到所述点云对应的特征集;基于所述特征集构建所述候选物体区域的区域特征信息;基于区域预测网络和所述区域特征信息,预测候选物体区域的类型和定位信息,得到候选物体区域的预测类型和预测定位信息;基于候选物体区域的初始定位信息、候选物体区域的预测类型和预测定位信息对候选物体区域进行优化处理,得到目标物体检测区域及目标物体检测区域定位信息。Detect the previous scenic spot from the point cloud of the scene; construct the candidate object area corresponding to the former scenic spot based on the previous scenic spot and the predetermined size to obtain the initial positioning information of the candidate object area; perform all the points in the point cloud based on the point cloud network Feature extraction to obtain the feature set corresponding to the point cloud; construct the region feature information of the candidate object region based on the feature set; predict the type and location information of the candidate object region based on the region prediction network and the region feature information, Obtain the prediction type and predicted positioning information of the candidate object area; optimize the candidate object area based on the initial positioning information of the candidate object area, the prediction type and predicted positioning information of the candidate object area, and obtain the target object detection area and target object detection area positioning information.
以上各个操作的具体实施可参见前面的实施例,在此不再赘述。For the specific implementation of the above operations, please refer to the previous embodiments, which will not be repeated here.
其中,该存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)、磁盘或光盘等。Wherein, the storage medium may include: read only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.
由于该存储介质中所存储的指令,可以执行本申请实施例所提供的任一种物体检测方法中的步骤,因此,可以实现本申请实施例所提供的任一种物体检测方法所能实现的有益效果,详见前面的实施例,在此不再赘述。Since the instructions stored in the storage medium can execute the steps in any object detection method provided in the embodiments of this application, it can achieve what can be achieved by any object detection method provided in the embodiments of this application. For the beneficial effects, see the previous embodiment for details, and will not be repeated here.
以上对本申请实施例所提供的一种物体检测方法、装置、网络设备和存储介质进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The object detection method, device, network device, and storage medium provided by the embodiments of the application are described in detail above. Specific examples are used in this article to explain the principles and implementation of the application. The description of the above embodiments is only It is used to help understand the methods and core ideas of this application; at the same time, for those skilled in the art, according to the ideas of this application, there will be changes in the specific implementation and the scope of application. In summary, this specification The content should not be construed as a limitation on this application.

Claims (23)

  1. 一种物体检测方法,由网络设备执行,所述方法包括:An object detection method, executed by a network device, the method including:
    从场景的点云中检测出前景点;Detect the former scenic spot from the point cloud of the scene;
    基于所述前景点和预定尺寸构建所述前景点对应的候选物体区域,确定所述候选物体区域的初始定位信息;Constructing a candidate object area corresponding to the previous scenic spot based on the front scenic spot and a predetermined size, and determining initial positioning information of the candidate object area;
    基于点云网络对所述点云中的所有点进行特征提取,得到所述点云对应的特征集;Performing feature extraction on all points in the point cloud based on the point cloud network to obtain a feature set corresponding to the point cloud;
    基于所述特征集构建所述候选物体区域的区域特征信息;Constructing the region feature information of the candidate object region based on the feature set;
    基于区域预测网络和所述区域特征信息,预测所述候选物体区域的类型和定位信息,得到所述候选物体区域的预测类型和预测定位信息;Predicting the type and location information of the candidate object area based on the area prediction network and the area feature information to obtain the prediction type and predicted location information of the candidate object area;
    基于所述初始定位信息、所述预测类型和所述预测定位信息对所述候选物体区域进行优化处理,得到目标物体检测区域以及所述目标物体检测区域的定位信息。Perform optimization processing on the candidate object area based on the initial positioning information, the prediction type, and the predicted positioning information to obtain a target object detection area and positioning information of the target object detection area.
  2. 如权利要求1所述的物体检测方法,所述从场景的点云中检测出前景点,包括:The object detection method according to claim 1, wherein the detecting the front scenic spot from the point cloud of the scene includes:
    对所述场景的图像进行语义分割,得到前景像素;Perform semantic segmentation on the image of the scene to obtain foreground pixels;
    将所述场景的点云中与所述前景像素对应的点确定为所述前景点。The point corresponding to the foreground pixel in the point cloud of the scene is determined as the front scenic spot.
  3. 如权利要求1所述的物体检测方法,所述基于所述前景点和预定尺寸构建所述前景点对应的候选物体区域,包括:8. The object detection method according to claim 1, wherein said constructing a candidate object area corresponding to said former scenic spot based on said former scenic spot and a predetermined size comprises:
    以所述前景点为中心点,按照所述预定尺寸生成所述前景点对应的候选物体区域。Using the front scenic spot as a central point, a candidate object area corresponding to the front scenic spot is generated according to the predetermined size.
  4. 如权利要求1所述的物体检测方法,所述基于所述特征集构建所述候选物体区域的区域特征信息,包括:The object detection method according to claim 1, wherein said constructing the area feature information of the candidate object area based on the feature set comprises:
    在所述候选物体区域中选择多个目标点;Selecting multiple target points in the candidate object area;
    从所述特征集中提取所述目标点的特征,得到所述候选物体区域的第一部分特征信息;Extracting the feature of the target point from the feature set to obtain the first part of feature information of the candidate object region;
    基于所述目标点的位置信息构建所述候选物体区域的第二部分特征信息;Constructing the second partial feature information of the candidate object region based on the position information of the target point;
    对所述第一部分特征信息与所述第二部分特征信息进行融合,得到所述候选物体区域的区域特征信息。The first partial feature information and the second partial feature information are fused to obtain the area feature information of the candidate object area.
  5. 如权利要求4所述的物体检测方法,所述基于所述目标点的位置信息构建所述候选物体区域的第二部分特征信息,包括:5. The object detection method according to claim 4, said constructing the second part of the feature information of the candidate object region based on the position information of the target point comprises:
    对所述目标点的位置信息进行标准化处理,得到所述目标点的标准化位置信息;Performing standardized processing on the position information of the target point to obtain the standardized position information of the target point;
    对所述第一部分特征信息和所述标准化位置信息进行融合,得到所述目标点的融合后特征信息;Fusing the first part feature information and the standardized location information to obtain the fused feature information of the target point;
    对所述目标点的融合后特征信息进行空间变换,得到变换后位置信息;Performing spatial transformation on the fused feature information of the target point to obtain transformed position information;
    基于所述变换后位置信息,对所述目标点的标准化位置信息进行调整,得到所述候选物体区域的第二部分特征信息。Based on the transformed position information, the standardized position information of the target point is adjusted to obtain the second partial feature information of the candidate object region.
  6. 如权利要求1所述的物体检测方法,所述点云网络包括:第一采样网络、与所述第一采样网络连接的第二采样网络;所述基于点云网络对所述点云中的所有点进行特征提取,得到所述点云的特征集,包括:The object detection method according to claim 1, wherein the point cloud network includes: a first sampling network, a second sampling network connected to the first sampling network; Perform feature extraction on all points to obtain the feature set of the point cloud, including:
    通过所述第一采样网络对所述点云中的所有点进行特征降采样处理,得到所述点云的初始特征;Performing feature down-sampling processing on all points in the point cloud through the first sampling network to obtain the initial features of the point cloud;
    通过所述第二采样网络对所述初始特征进行上采样处理,得到所述点云的特征集。Up-sampling processing is performed on the initial feature through the second sampling network to obtain the feature set of the point cloud.
  7. 如权利要求6所述的物体检测方法,所述第一采样网络包括多个依次连接的集合抽象层,所述第二采样网络包括多个依次连接且与所述第一采样网络中各集合抽象层一一对应的特征传播层;The object detection method according to claim 6, wherein the first sampling network includes a plurality of set abstraction layers connected in sequence, and the second sampling network includes a plurality of sets connected in sequence and abstracted from each set in the first sampling network. One-to-one corresponding feature propagation layer;
    所述通过所述第一采样网络对所述点云中的所有点进行特征降采样处理,得到所述点云的初始特征,包括:The performing feature down-sampling processing on all points in the point cloud through the first sampling network to obtain the initial features of the point cloud includes:
    通过多个所述集合抽象层依次对所述点云中的点进行局部区域划分,并提取局部区域中心点的特征,得到所述点云的初始特征;Performing local area division on the points in the point cloud sequentially through a plurality of the set abstraction layers, and extracting the features of the central points of the local areas to obtain the initial features of the point cloud;
    将所述点云的初始特征输入至所述第二采样网络;Inputting the initial features of the point cloud to the second sampling network;
    所述通过所述第二采样网络对所述初始特征进行上采样处理,得到所述点云的特征集,包括:The performing up-sampling processing on the initial features through the second sampling network to obtain the feature set of the point cloud includes:
    将上一层的输出特征、以及当前特征传播层对应的集合抽象层的输入特征,确定为当前特征传播层的当前输入特征;Determine the output feature of the previous layer and the input feature of the collective abstraction layer corresponding to the current feature propagation layer as the current input feature of the current feature propagation layer;
    通过所述当前特征传播层对所述当前输入特征进行上采样处理,得到所述点云的特征集。Up-sampling processing is performed on the current input feature through the current feature propagation layer to obtain the feature set of the point cloud.
  8. 如权利要求1所述的物体检测方法,所述区域预测网络包括特征提取网络、与所述特征提取网络连接的分类网络、以及与所述特征提取网络连接的回归网络;The object detection method according to claim 1, wherein the area prediction network includes a feature extraction network, a classification network connected to the feature extraction network, and a regression network connected to the feature extraction network;
    所述基于区域预测网络和所述区域特征信息,预测所述候选物体区域的类型和定位信息,得到所述候选物体区域的预测类型和预测定位信息,包括:The predicting the type and positioning information of the candidate object area based on the area prediction network and the area feature information to obtain the prediction type and predicted positioning information of the candidate object area includes:
    通过所述特征提取网络对所述区域特征信息进行特征提取,得到所述候选物体区域的全局特征信息;Performing feature extraction on the region feature information through the feature extraction network to obtain global feature information of the candidate object region;
    基于所述分类网络和所述全局特征信息,对所述候选物体区域进行分类,得到所述候选物体区域的预测类型;Classifying the candidate object area based on the classification network and the global feature information to obtain a prediction type of the candidate object area;
    基于所述回归网络和所述全局特征信息,对所述候选物体区域的进行定位,得到所述候选物体区域的预测定位信息。Based on the regression network and the global feature information, the candidate object area is located to obtain predicted positioning information of the candidate object area.
  9. 如权利要求8所述的物体检测方法,所述特征提取网络包括多个依次连接的集合抽象层,所述分类网络包括多个依次连接的全连接层,所述回归网络包括多个依次连接的全连接层;The object detection method according to claim 8, wherein the feature extraction network includes a plurality of sequentially connected collective abstraction layers, the classification network includes a plurality of sequentially connected fully connected layers, and the regression network includes a plurality of sequentially connected Fully connected layer
    所述通过所述特征提取网络对所述区域特征信息进行特征提取,得到所述候选物体区域的全局特征信息,包括:The performing feature extraction on the region feature information through the feature extraction network to obtain the global feature information of the candidate object region includes:
    通过所述特征提取网络中各个集合抽象层依次对区域特征信息进行特征提取,得到所述候选物体区域的全局特征信息。The feature extraction is performed on the regional feature information in turn by each set abstraction layer in the feature extraction network to obtain the global feature information of the candidate object region.
  10. 如权利要求1所述的物体检测方法,所述基于所述初始定位信息、所述预测类型和所述预测定位信息对所述候选物体区域进行优化处理,得到目标物体检测区域及所述目标物体检测区域的定位信息,包括:The object detection method of claim 1, wherein the candidate object area is optimized based on the initial positioning information, the prediction type, and the predicted positioning information to obtain a target object detection area and the target object Location information of the detection area, including:
    基于所述预测类型对所述候选物体区域进行筛选,得到筛选后物体区域;Filtering the candidate object area based on the prediction type to obtain the filtered object area;
    根据所述筛选后物体区域的预测定位信息,对所述筛选后物体区域的初始定位信息进行优化调整,得到所述目标物体检测区域及所述目标物体检测区域定位信息。According to the predicted location information of the filtered object area, the initial location information of the filtered object area is optimized and adjusted to obtain the target object detection area and the target object detection area location information.
  11. 一种物体检测装置,包括:An object detection device, including:
    检测单元,用于从场景的点云中检测出前景点;The detection unit is used to detect the former scenic spot from the point cloud of the scene;
    区域构建单元,用于基于所述前景点和预定尺寸构建所述前景点对应的候选物体区域,确定所述候选物体区域的初始定位信息;An area construction unit, configured to construct a candidate object area corresponding to the front scenic spot based on the front scenic spot and a predetermined size, and determine initial positioning information of the candidate object area;
    特征提取单元,用于基于点云网络对所述点云中的所有点进行特征提取,得到所述点云对应的特征集;A feature extraction unit, configured to perform feature extraction on all points in the point cloud based on a point cloud network to obtain a feature set corresponding to the point cloud;
    特征构建单元,用于基于所述特征集构建所述候选物体区域的区域特征信息;A feature construction unit, configured to construct the area feature information of the candidate object area based on the feature set;
    预测单元,用于基于区域预测网络和所述区域特征信息,预测所述候选物体区域的类型和定位信息,得到所述候选物体区域的预测类型和预测定位信息;A prediction unit, configured to predict the type and location information of the candidate object area based on the area prediction network and the area feature information, to obtain the prediction type and predicted location information of the candidate object area;
    优化单元,用于基于所述初始定位信息、所述预测类型和所述预测定位信息对所述候选物体区域进行优化处理,得到目标物体检测区域以及所述目标物体检测区域的定位信息。The optimization unit is configured to perform optimization processing on the candidate object area based on the initial positioning information, the prediction type, and the predicted positioning information to obtain the target object detection area and the positioning information of the target object detection area.
  12. 如权利要求11所述的物体检测装置,所述检测单元,具体用于:The object detection device according to claim 11, the detection unit is specifically configured to:
    对所述场景的图像进行语义分割,得到前景像素;Perform semantic segmentation on the image of the scene to obtain foreground pixels;
    将所述场景的点云中与所述前景像素对应的点确定为所述前景点。The point corresponding to the foreground pixel in the point cloud of the scene is determined as the front scenic spot.
  13. 如权利要求11所述的物体检测装置,所述区域构建单元,具体用于:The object detection device according to claim 11, the area construction unit is specifically configured to:
    以所述前景点为中心点,按照所述预定尺寸生成所述前景点对应的候选物体区域。Using the front scenic spot as a central point, a candidate object area corresponding to the front scenic spot is generated according to the predetermined size.
  14. 如权利要求11所述的物体检测装置,所述特征构建单元,具体包括:The object detection device according to claim 11, the feature construction unit specifically includes:
    选择子单元,用于在所述候选物体区域中选择多个目标点;A selection subunit for selecting multiple target points in the candidate object area;
    提取子单元,用于从所述特征集中提取所述目标点的特征,得到所述候选物体区域的第一部分特征信息;An extraction subunit, configured to extract the feature of the target point from the feature set to obtain the first part of feature information of the candidate object region;
    构建子单元,用于基于所述目标点的位置信息构建所述候选物体区域的第二部分特征信息;A constructing subunit for constructing the second part of the feature information of the candidate object region based on the position information of the target point;
    融合子单元,用于对所述第一部分特征信息与所述第二部分特征信息进行融合,得到所述候选物体区域的区域特征信息。The fusion subunit is used to fuse the first partial feature information and the second partial feature information to obtain the area feature information of the candidate object area.
  15. 如权利要求14所述的物体检测装置,所述构建子单元,具体用于:The object detection device according to claim 14, wherein the construction subunit is specifically used for:
    对所述目标点的位置信息进行标准化处理,得到所述目标点的标准化位置信息;Performing standardized processing on the position information of the target point to obtain the standardized position information of the target point;
    对所述第一部分特征信息和所述标准化位置信息进行融合,得到所述目标点的融合后特征信息;Fusing the first part feature information and the standardized location information to obtain the fused feature information of the target point;
    对所述目标点的融合后特征信息进行空间变换,得到变换后位置信息;Performing spatial transformation on the fused feature information of the target point to obtain transformed position information;
    基于所述变换后位置信息,对所述目标点的标准化位置信息进行调整,得到所述候选物体区域的第二部分特征信息。Based on the transformed position information, the standardized position information of the target point is adjusted to obtain the second partial feature information of the candidate object region.
  16. 如权利要求11所述的物体检测装置,所述点云网络包括:第一采样网络、与所述第一采样网络连接的第二采样网络;所述特征提取单元,具体包括:11. The object detection device according to claim 11, wherein the point cloud network comprises: a first sampling network and a second sampling network connected to the first sampling network; the feature extraction unit specifically includes:
    降采样子单元,用于通过所述第一采样网络对所述点云中的所有点进行特征降采样处理,得到所述点云的初始特征;A down-sampling subunit, configured to perform feature down-sampling processing on all points in the point cloud through the first sampling network to obtain the initial features of the point cloud;
    上采样子单元,用于通过所述第二采样网络对所述初始特征进行上采样处理,得到所述点云的特征集。The upsampling subunit is configured to perform upsampling processing on the initial features through the second sampling network to obtain the feature set of the point cloud.
  17. 如权利要求16所述的物体检测装置,所述第一采样网络包括多个依次连接的集合抽象层,所述第二采样网络包括多个依次连接且与所述第一采样网络中各集合抽象层一一对应的特征传播层;The object detection device according to claim 16, wherein the first sampling network includes a plurality of set abstraction layers connected in sequence, and the second sampling network includes a plurality of sets connected in sequence and abstracted from each set in the first sampling network. One-to-one corresponding feature propagation layer;
    所述降采样子单元,具体用于:The downsampling subunit is specifically used for:
    通过所述集合抽象层依次对所述点云中的点进行局部区域划分,并提取局部区域中心点的特征,得到所述点云的初始特征;Perform local area division on the points in the point cloud sequentially through the set abstraction layer, and extract the features of the central points of the local areas to obtain the initial features of the point cloud;
    将所述点云的初始特征输入至所述第二采样网络;Inputting the initial features of the point cloud to the second sampling network;
    所述上采样子单元,具体用于:The upsampling subunit is specifically used for:
    将上一层的输出特征、以及当前特征传播层对应的集合抽象层的输入特征,确定为当前特征传播层的当前输入特征;Determine the output feature of the previous layer and the input feature of the collective abstraction layer corresponding to the current feature propagation layer as the current input feature of the current feature propagation layer;
    通过所述当前特征传播层对所述当前输入特征进行上采样处理,得到所述点云的特征集。Up-sampling processing is performed on the current input feature through the current feature propagation layer to obtain the feature set of the point cloud.
  18. 如权利要求11所述的物体检测装置,所述区域预测网络包括特征提取网络、与所述特征提取网络网络连接的分类网络、以及与所述特征提取网络连接的回归网络;所述预测单元,具体包括:11. The object detection device according to claim 11, the area prediction network includes a feature extraction network, a classification network connected to the feature extraction network, and a regression network connected to the feature extraction network; the prediction unit, Specifically:
    全局特征提取子单元,用于通过所述特征提取网络对所述区域特征信息进行特征提取,得到所述候选物体区域的全局特征信息;A global feature extraction subunit, configured to perform feature extraction on the regional feature information through the feature extraction network to obtain the global feature information of the candidate object region;
    分类子单元,用于基于所述分类网络和所述全局特征信息,对所述候选物体区域进行分类,得到所述候选物体区域的预测类型;The classification subunit is configured to classify the candidate object area based on the classification network and the global feature information to obtain the prediction type of the candidate object area;
    回归子单元,用于基于所述回归网络和所述全局特征信息,对所述候选 物体区域的进行定位,得到所述候选物体区域的预测定位信息。The regression subunit is used to locate the candidate object area based on the regression network and the global feature information to obtain predicted positioning information of the candidate object area.
  19. 如权利要求18所述的物体检测装置,所述特征提取网络包括多个依次连接的集合抽象层,所述分类网络包括多个依次连接的全连接层,所述回归网络包括多个依次连接的全连接层;The object detection device according to claim 18, wherein the feature extraction network includes a plurality of sequentially connected collective abstraction layers, the classification network includes a plurality of sequentially connected fully connected layers, and the regression network includes a plurality of sequentially connected Fully connected layer
    所述全局特征提取子单元,具体用于通过所述特征提取网络中各个集合抽象层依次对所述区域特征信息进行特征提取,得到所述候选物体区域的全局特征信息。The global feature extraction subunit is specifically configured to perform feature extraction on the regional feature information in turn through each collective abstraction layer in the feature extraction network to obtain the global feature information of the candidate object region.
  20. 如权利要求11所述的物体检测装置,所述优化单元,具体包括:The object detection device according to claim 11, the optimization unit specifically includes:
    筛选子单元,用于基于候选物体区域的预测类型对所述候选物体区域进行筛选,得到筛选后物体区域;The screening subunit is used to screen the candidate object area based on the prediction type of the candidate object area to obtain the selected object area;
    优化子单元,用于根据所述筛选后物体区域的预测定位信息,对所述筛选后物体区域的初始定位信息进行优化调整,得到所述目标物体检测区域及所述目标物体检测区域的定位信息。The optimization subunit is used to optimize and adjust the initial positioning information of the filtered object area according to the predicted positioning information of the filtered object area to obtain the target object detection area and the positioning information of the target object detection area .
  21. 一种存储介质,其特征在于,所述存储介质存储有多条指令,所述指令适于处理器进行加载,以执行权利要求1至10任一项所述的物体检测方法中的步骤。A storage medium, wherein the storage medium stores a plurality of instructions, and the instructions are suitable for loading by a processor to execute the steps in the object detection method according to any one of claims 1 to 10.
  22. 一种网络设备,其特征在于,包括存储器和处理器;所述存储器存储有多条指令,所述处理器加载所述存储器内的指令,以执行权利要求1至10任一项所述的物体检测方法中的步骤。A network device, characterized by comprising a memory and a processor; the memory stores a plurality of instructions, and the processor loads the instructions in the memory to execute the object according to any one of claims 1 to 10 Steps in the detection method.
  23. 一种计算机程序产品,包括指令,当其在计算机上运行时,使得计算机执行权利要求1至10任一项中所述的物体检测方法的步骤。A computer program product, comprising instructions, when running on a computer, causes the computer to execute the steps of the object detection method described in any one of claims 1 to 10.
PCT/CN2020/077721 2019-04-03 2020-03-04 Object detection method and apparatus, and network device and storage medium WO2020199834A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910267019.5A CN110032962B (en) 2019-04-03 2019-04-03 Object detection method, device, network equipment and storage medium
CN201910267019.5 2019-04-03

Publications (1)

Publication Number Publication Date
WO2020199834A1 true WO2020199834A1 (en) 2020-10-08

Family

ID=67237387

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/077721 WO2020199834A1 (en) 2019-04-03 2020-03-04 Object detection method and apparatus, and network device and storage medium

Country Status (2)

Country Link
CN (1) CN110032962B (en)
WO (1) WO2020199834A1 (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112381837A (en) * 2020-11-12 2021-02-19 联想(北京)有限公司 Image processing method and electronic equipment
CN112633376A (en) * 2020-12-24 2021-04-09 南京信息工程大学 Point cloud data ground feature classification method and system based on deep learning and storage medium
CN112766170A (en) * 2021-01-21 2021-05-07 广西财经学院 Self-adaptive segmentation detection method and device based on cluster unmanned aerial vehicle image
CN112862017A (en) * 2021-04-01 2021-05-28 北京百度网讯科技有限公司 Point cloud data labeling method, device, equipment and medium
CN112884884A (en) * 2021-02-06 2021-06-01 罗普特科技集团股份有限公司 Candidate region generation method and system
CN113205531A (en) * 2021-04-30 2021-08-03 北京云圣智能科技有限责任公司 Three-dimensional point cloud segmentation method and device and server
CN113240656A (en) * 2021-05-24 2021-08-10 浙江商汤科技开发有限公司 Visual positioning method and related device and equipment
CN113256793A (en) * 2021-05-31 2021-08-13 浙江科技学院 Three-dimensional data processing method and system
CN113674348A (en) * 2021-05-28 2021-11-19 中国科学院自动化研究所 Object grabbing method, device and system
CN114092478A (en) * 2022-01-21 2022-02-25 合肥中科类脑智能技术有限公司 Anomaly detection method
CN114359561A (en) * 2022-01-10 2022-04-15 北京百度网讯科技有限公司 Target detection method and training method and device of target detection model
CN114372944A (en) * 2021-12-30 2022-04-19 深圳大学 Multi-mode and multi-scale fusion candidate region generation method and related device
CN114549958A (en) * 2022-02-24 2022-05-27 四川大学 Night and disguised target detection method based on context information perception mechanism
CN114820465A (en) * 2022-04-06 2022-07-29 合众新能源汽车有限公司 Point cloud detection model training method and device, electronic equipment and storage medium
WO2023035822A1 (en) * 2021-09-13 2023-03-16 上海芯物科技有限公司 Target detection method and apparatus, and device and storage medium
CN115937644A (en) * 2022-12-15 2023-04-07 清华大学 Point cloud feature extraction method and device based on global and local fusion
CN116229388A (en) * 2023-03-27 2023-06-06 哈尔滨市科佳通用机电股份有限公司 Method, system and equipment for detecting motor car foreign matters based on target detection network
CN116912488A (en) * 2023-06-14 2023-10-20 中国科学院自动化研究所 Three-dimensional panorama segmentation method and device based on multi-view camera
CN116912238A (en) * 2023-09-11 2023-10-20 湖北工业大学 Weld joint pipeline identification method and system based on multidimensional identification network cascade fusion
CN117475397A (en) * 2023-12-26 2024-01-30 安徽蔚来智驾科技有限公司 Target annotation data acquisition method, medium and device based on multi-mode sensor

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032962B (en) * 2019-04-03 2022-07-08 腾讯科技(深圳)有限公司 Object detection method, device, network equipment and storage medium
CN110400304B (en) * 2019-07-25 2023-12-12 腾讯科技(深圳)有限公司 Object detection method, device, equipment and storage medium based on deep learning
JPWO2021024805A1 (en) 2019-08-06 2021-02-11
CN110837789B (en) * 2019-10-31 2023-01-20 北京奇艺世纪科技有限公司 Method and device for detecting object, electronic equipment and medium
EP4073688A4 (en) * 2019-12-12 2023-01-25 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Target detection method, device, terminal device, and medium
CN111144304A (en) * 2019-12-26 2020-05-12 上海眼控科技股份有限公司 Vehicle target detection model generation method, vehicle target detection method and device
CN111209840B (en) * 2019-12-31 2022-02-18 浙江大学 3D target detection method based on multi-sensor data fusion
CN111145174B (en) * 2020-01-02 2022-08-09 南京邮电大学 3D target detection method for point cloud screening based on image semantic features
CN110807461B (en) * 2020-01-08 2020-06-02 深圳市越疆科技有限公司 Target position detection method
CN111260773B (en) * 2020-01-20 2023-10-13 深圳市普渡科技有限公司 Three-dimensional reconstruction method, detection method and detection system for small obstacle
CN111340766B (en) * 2020-02-21 2024-06-11 北京市商汤科技开发有限公司 Target object detection method, device, equipment and storage medium
CN113496160B (en) * 2020-03-20 2023-07-11 百度在线网络技术(北京)有限公司 Three-dimensional object detection method, three-dimensional object detection device, electronic equipment and storage medium
CN111444839B (en) * 2020-03-26 2023-09-08 北京经纬恒润科技股份有限公司 Target detection method and system based on laser radar
CN111578951B (en) * 2020-04-30 2022-11-08 阿波罗智能技术(北京)有限公司 Method and device for generating information in automatic driving
CN112215861A (en) * 2020-09-27 2021-01-12 深圳市优必选科技股份有限公司 Football detection method and device, computer readable storage medium and robot
CN112183330B (en) * 2020-09-28 2022-06-28 北京航空航天大学 Target detection method based on point cloud
WO2022126523A1 (en) * 2020-12-17 2022-06-23 深圳市大疆创新科技有限公司 Object detection method, device, movable platform, and computer-readable storage medium
CN112598635B (en) * 2020-12-18 2024-03-12 武汉大学 Point cloud 3D target detection method based on symmetric point generation
CN112734931B (en) * 2020-12-31 2021-12-07 罗普特科技集团股份有限公司 Method and system for assisting point cloud target detection
CN113312983B (en) * 2021-05-08 2023-09-05 华南理工大学 Semantic segmentation method, system, device and medium based on multi-mode data fusion
CN113989188A (en) * 2021-09-26 2022-01-28 华为技术有限公司 Object detection method and related equipment thereof
CN114898094B (en) * 2022-04-22 2024-07-12 湖南大学 Point cloud upsampling method and device, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150339541A1 (en) * 2014-05-22 2015-11-26 Nokia Technologies Oy Point cloud matching method
CN108010036A (en) * 2017-11-21 2018-05-08 江南大学 A kind of object symmetry axis detection method based on RGB-D cameras
CN109242951A (en) * 2018-08-06 2019-01-18 宁波盈芯信息科技有限公司 A kind of face's real-time three-dimensional method for reconstructing
CN109345510A (en) * 2018-09-07 2019-02-15 百度在线网络技术(北京)有限公司 Object detecting method, device, equipment, storage medium and vehicle
CN109543601A (en) * 2018-11-21 2019-03-29 电子科技大学 A kind of unmanned vehicle object detection method based on multi-modal deep learning
CN110032962A (en) * 2019-04-03 2019-07-19 腾讯科技(深圳)有限公司 A kind of object detecting method, device, the network equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017155970A1 (en) * 2016-03-11 2017-09-14 Kaarta, Inc. Laser scanner with real-time, online ego-motion estimation
CN109410238B (en) * 2018-09-20 2021-10-26 中国科学院合肥物质科学研究院 Wolfberry identification and counting method based on PointNet + + network
CN109410307B (en) * 2018-10-16 2022-09-20 大连理工大学 Scene point cloud semantic segmentation method
CN109523552B (en) * 2018-10-24 2021-11-02 青岛智能产业技术研究院 Three-dimensional object detection method based on viewing cone point cloud

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150339541A1 (en) * 2014-05-22 2015-11-26 Nokia Technologies Oy Point cloud matching method
CN108010036A (en) * 2017-11-21 2018-05-08 江南大学 A kind of object symmetry axis detection method based on RGB-D cameras
CN109242951A (en) * 2018-08-06 2019-01-18 宁波盈芯信息科技有限公司 A kind of face's real-time three-dimensional method for reconstructing
CN109345510A (en) * 2018-09-07 2019-02-15 百度在线网络技术(北京)有限公司 Object detecting method, device, equipment, storage medium and vehicle
CN109543601A (en) * 2018-11-21 2019-03-29 电子科技大学 A kind of unmanned vehicle object detection method based on multi-modal deep learning
CN110032962A (en) * 2019-04-03 2019-07-19 腾讯科技(深圳)有限公司 A kind of object detecting method, device, the network equipment and storage medium

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112381837A (en) * 2020-11-12 2021-02-19 联想(北京)有限公司 Image processing method and electronic equipment
CN112633376A (en) * 2020-12-24 2021-04-09 南京信息工程大学 Point cloud data ground feature classification method and system based on deep learning and storage medium
CN112766170B (en) * 2021-01-21 2024-04-16 广西财经学院 Self-adaptive segmentation detection method and device based on cluster unmanned aerial vehicle image
CN112766170A (en) * 2021-01-21 2021-05-07 广西财经学院 Self-adaptive segmentation detection method and device based on cluster unmanned aerial vehicle image
CN112884884A (en) * 2021-02-06 2021-06-01 罗普特科技集团股份有限公司 Candidate region generation method and system
CN112862017A (en) * 2021-04-01 2021-05-28 北京百度网讯科技有限公司 Point cloud data labeling method, device, equipment and medium
CN112862017B (en) * 2021-04-01 2023-08-01 北京百度网讯科技有限公司 Point cloud data labeling method, device, equipment and medium
CN113205531B (en) * 2021-04-30 2024-03-08 北京云圣智能科技有限责任公司 Three-dimensional point cloud segmentation method, device and server
CN113205531A (en) * 2021-04-30 2021-08-03 北京云圣智能科技有限责任公司 Three-dimensional point cloud segmentation method and device and server
CN113240656A (en) * 2021-05-24 2021-08-10 浙江商汤科技开发有限公司 Visual positioning method and related device and equipment
CN113674348A (en) * 2021-05-28 2021-11-19 中国科学院自动化研究所 Object grabbing method, device and system
CN113674348B (en) * 2021-05-28 2024-03-15 中国科学院自动化研究所 Object grabbing method, device and system
CN113256793A (en) * 2021-05-31 2021-08-13 浙江科技学院 Three-dimensional data processing method and system
WO2023035822A1 (en) * 2021-09-13 2023-03-16 上海芯物科技有限公司 Target detection method and apparatus, and device and storage medium
CN114372944A (en) * 2021-12-30 2022-04-19 深圳大学 Multi-mode and multi-scale fusion candidate region generation method and related device
CN114372944B (en) * 2021-12-30 2024-05-17 深圳大学 Multi-mode and multi-scale fused candidate region generation method and related device
CN114359561A (en) * 2022-01-10 2022-04-15 北京百度网讯科技有限公司 Target detection method and training method and device of target detection model
CN114092478A (en) * 2022-01-21 2022-02-25 合肥中科类脑智能技术有限公司 Anomaly detection method
CN114549958A (en) * 2022-02-24 2022-05-27 四川大学 Night and disguised target detection method based on context information perception mechanism
CN114549958B (en) * 2022-02-24 2023-08-04 四川大学 Night and camouflage target detection method based on context information perception mechanism
CN114820465B (en) * 2022-04-06 2024-04-26 合众新能源汽车股份有限公司 Point cloud detection model training method and device, electronic equipment and storage medium
CN114820465A (en) * 2022-04-06 2022-07-29 合众新能源汽车有限公司 Point cloud detection model training method and device, electronic equipment and storage medium
CN115937644A (en) * 2022-12-15 2023-04-07 清华大学 Point cloud feature extraction method and device based on global and local fusion
CN115937644B (en) * 2022-12-15 2024-01-02 清华大学 Point cloud feature extraction method and device based on global and local fusion
CN116229388A (en) * 2023-03-27 2023-06-06 哈尔滨市科佳通用机电股份有限公司 Method, system and equipment for detecting motor car foreign matters based on target detection network
CN116229388B (en) * 2023-03-27 2023-09-12 哈尔滨市科佳通用机电股份有限公司 Method, system and equipment for detecting motor car foreign matters based on target detection network
CN116912488B (en) * 2023-06-14 2024-02-13 中国科学院自动化研究所 Three-dimensional panorama segmentation method and device based on multi-view camera
CN116912488A (en) * 2023-06-14 2023-10-20 中国科学院自动化研究所 Three-dimensional panorama segmentation method and device based on multi-view camera
CN116912238B (en) * 2023-09-11 2023-11-28 湖北工业大学 Weld joint pipeline identification method and system based on multidimensional identification network cascade fusion
CN116912238A (en) * 2023-09-11 2023-10-20 湖北工业大学 Weld joint pipeline identification method and system based on multidimensional identification network cascade fusion
CN117475397A (en) * 2023-12-26 2024-01-30 安徽蔚来智驾科技有限公司 Target annotation data acquisition method, medium and device based on multi-mode sensor
CN117475397B (en) * 2023-12-26 2024-03-22 安徽蔚来智驾科技有限公司 Target annotation data acquisition method, medium and device based on multi-mode sensor

Also Published As

Publication number Publication date
CN110032962B (en) 2022-07-08
CN110032962A (en) 2019-07-19

Similar Documents

Publication Publication Date Title
WO2020199834A1 (en) Object detection method and apparatus, and network device and storage medium
WO2020207166A1 (en) Object detection method and apparatus, electronic device, and storage medium
US10078790B2 (en) Systems for generating parking maps and methods thereof
Du et al. Car detection for autonomous vehicle: LIDAR and vision fusion approach through deep learning framework
US9142011B2 (en) Shadow detection method and device
CN113378686B (en) Two-stage remote sensing target detection method based on target center point estimation
CN111951212A (en) Method for identifying defects of contact network image of railway
CN114708585A (en) Three-dimensional target detection method based on attention mechanism and integrating millimeter wave radar with vision
Zhong et al. Multi-scale feature fusion network for pixel-level pavement distress detection
CN110222686B (en) Object detection method, object detection device, computer equipment and storage medium
CN111368600A (en) Method and device for detecting and identifying remote sensing image target, readable storage medium and equipment
US20230102467A1 (en) Method of detecting image, electronic device, and storage medium
CN113706480A (en) Point cloud 3D target detection method based on key point multi-scale feature fusion
KR101907883B1 (en) Object detection and classification method
CN110807362A (en) Image detection method and device and computer readable storage medium
CN115731355B (en) SuperPoint-NeRF-based three-dimensional building reconstruction method
CN112733815B (en) Traffic light identification method based on RGB outdoor road scene image
CN113033516A (en) Object identification statistical method and device, electronic equipment and storage medium
CN113281780A (en) Method and device for labeling image data and electronic equipment
Pellis et al. Assembling an image and point cloud dataset for heritage building semantic segmentation
Drobnitzky et al. Survey and systematization of 3D object detection models and methods
CN113505834A (en) Method for training detection model, determining image updating information and updating high-precision map
CN113139540A (en) Backboard detection method and equipment
Chaturvedi et al. Small object detection using retinanet with hybrid anchor box hyper tuning using interface of Bayesian mathematics
CN113514053B (en) Method and device for generating sample image pair and method for updating high-precision map

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20783617

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20783617

Country of ref document: EP

Kind code of ref document: A1