WO2020108311A1 - 3d detection method and apparatus for target object, and medium and device - Google Patents
3d detection method and apparatus for target object, and medium and device Download PDFInfo
- Publication number
- WO2020108311A1 WO2020108311A1 PCT/CN2019/118126 CN2019118126W WO2020108311A1 WO 2020108311 A1 WO2020108311 A1 WO 2020108311A1 CN 2019118126 W CN2019118126 W CN 2019118126W WO 2020108311 A1 WO2020108311 A1 WO 2020108311A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- frame
- information
- neural network
- initial frame
- cloud data
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- the present disclosure relates to computer vision technology, and in particular, to a target object 3D detection method and device, vehicle intelligent control method and device, obstacle avoidance navigation method and device, electronic equipment, computer readable storage medium, and computer program.
- 3D detection can be applied to various technologies such as intelligent driving and obstacle avoidance navigation.
- intelligent driving technology through 3D detection, the specific location, shape, size, and direction of movement of target objects such as surrounding vehicles and pedestrians of intelligent driving vehicles can be obtained, which can help intelligent driving vehicles make intelligent driving decisions.
- Embodiments of the present disclosure provide a technical solution for target object 3D detection, vehicle intelligent control driving, and obstacle avoidance navigation.
- a 3D detection method for a target object includes: extracting characteristic information of point cloud data of the acquired scene; performing semantics on the point cloud data according to the characteristic information of the point cloud data Segmentation to obtain first semantic information of multiple points in the point cloud data; predicting at least one previous scenic spot of the corresponding target object among the multiple points based on the first semantic information; generating based on the first semantic information A 3D initial frame corresponding to each of the at least one front sight; determining a 3D detection frame of the target object in the scene according to the 3D initial frame.
- a vehicle intelligent control method including: obtaining the 3D detection frame of the target object using the above-mentioned 3D detection method of the target object; and generating an instruction to control the vehicle according to the 3D detection frame Or warning information.
- an obstacle avoidance navigation method including: obtaining the 3D detection frame of the target object using the above-mentioned 3D detection method of the target object; and generating the obstacle avoidance for the robot according to the 3D detection frame Command or warning information of navigation control.
- a target object 3D detection device including: an extraction feature module for extracting feature information of point cloud data of the acquired scene; a first semantic segmentation module for The feature information of the point cloud data performs semantic segmentation on the point cloud data to obtain first semantic information of multiple points in the point cloud data; the pre-predicted scenic spot module is used to predict the location based on the first semantic information At least one front sight corresponding to the target object in the plurality of points; generating an initial frame module for generating a 3D initial frame corresponding to each of the at least one front sight according to the first semantic information; determining a detection frame module for The 3D initial frame determines the 3D detection frame of the target object in the scene.
- a vehicle intelligent control device comprising: using the above target object 3D detection device to obtain a 3D detection frame of the target object; a first control module configured to detect the 3D detection frame, Generate instructions or early warning information to control the vehicle.
- an obstacle avoidance navigation device including: using the above target object 3D detection device to obtain a 3D detection frame of the target object; a second control module configured to detect the 3D detection frame, Generate instructions or early warning information for the obstacle avoidance navigation control of the robot.
- an electronic device including: a memory for storing a computer program; a processor for executing the computer program stored in the memory, and when the computer program is executed, it is implemented Any method embodiment of the present disclosure.
- a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, any method embodiment of the present disclosure is implemented.
- a computer program including computer instructions, which when implemented in a processor of a device, implements any method embodiment of the present disclosure.
- the point cloud data in the present disclosure Feature extraction, and semantic segmentation of point cloud data based on the extracted feature information, this part is equivalent to the underlying data analysis; the 3D detection frame generated and determined based on the semantic segmentation results in this disclosure is equivalent to the upper layer data analysis Therefore, in the 3D detection process of the target object, the present disclosure has formed a bottom-up way to generate a 3D detection frame.
- the technical solution provided by the present disclosure is beneficial to improve the detection performance of the 3D detection frame.
- FIG. 1 is a flowchart of an embodiment of a 3D detection method for a target object of the present disclosure
- FIG. 2 is a flowchart of another embodiment of the target object 3D detection method of the present disclosure.
- FIG. 3 is a schematic structural diagram of a first-stage neural network of the present disclosure
- FIG. 4 is another schematic structural diagram of the first-stage neural network of the present disclosure.
- FIG. 5 is a schematic structural diagram of a second-stage neural network of the present disclosure.
- FIG. 6 is a flowchart of an embodiment of a vehicle intelligent control method of the present disclosure
- FIG. 7 is a flowchart of an embodiment of an obstacle avoidance navigation method of the present disclosure.
- FIG. 8 is a schematic structural diagram of an embodiment of a target object 3D device of the present disclosure.
- FIG. 9 is a schematic structural diagram of an embodiment of a vehicle intelligent control device of the present disclosure.
- FIG. 10 is a schematic structural diagram of an embodiment of an obstacle avoidance navigation device of the present disclosure.
- FIG. 11 is a block diagram of an exemplary device that implements an embodiment of the present disclosure.
- Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, and servers, including but not limited to: personal computer systems, server computer systems, thin clients, thick Clients, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, etc. .
- Electronic devices such as terminal devices, computer systems, and servers may be described in the general context of computer system executable instructions (such as program modules) executed by the computer system.
- program modules may include routines, programs, target programs, components, logic, and data structures, etc., which perform specific tasks or implement specific abstract data types.
- the computer system/server can be implemented in a distributed cloud computing environment, where tasks are performed by remote processing devices linked through a communication network.
- program modules may be located on local or remote computing system storage media including storage devices.
- the scene in the present disclosure may refer to a visual-based presentation screen.
- the image captured by the camera and the point cloud data (Point Cloud Data) obtained by the lidar scan can be regarded as a scene.
- the point cloud data in the present disclosure generally refers to scanning information recorded in the form of points.
- point cloud data obtained through lidar scanning.
- Each point in the point cloud data can be described by a variety of information, and it can also be considered that each point in the point cloud data usually includes a variety of information, for example, it may include but is not limited to one or more of the following: Three-dimensional coordinates of points, color information (such as RGB information, etc.), and reflection intensity (Intensity) information, etc.
- a point in the point cloud data can be described by one or more types of information such as three-dimensional coordinates, color information, and reflection intensity information.
- the present disclosure may utilize at least one convolutional layer in the neural network to process the point cloud data to form feature maps of the point cloud data, for example, for each point cloud data
- Each point forms a piece of feature information. Since the feature information of the point cloud data formed this time is the feature information formed separately for each point in consideration of all points in the entire spatial range of the point cloud data, therefore, the feature information formed this time can be This is called global feature information.
- S110 Perform semantic segmentation on the point cloud data according to the feature information of the point cloud data to obtain first semantic information of multiple points in the point cloud data.
- the present disclosure can use a neural network to perform semantic segmentation on point cloud data.
- the neural network can form a first semantic for each point in the point cloud data, or even for each point in the point cloud data. information. For example, after the point cloud data is provided to the neural network, and the feature information of the point cloud data is extracted by the neural network, the neural network continues to process the feature information of the point cloud data to obtain multiple points in the point cloud data The first semantic information.
- the first semantic information of a point in the present disclosure generally refers to a semantic feature (SemanticFeature) generated for the point in consideration of the entire point cloud data. Therefore, the first semantic information can be This is called the first semantic feature or global semantic feature.
- the global semantic features of points in the present disclosure can generally be expressed in the form of a one-dimensional vector array including multiple (eg, 256) elements.
- the global semantic features in this disclosure may also be referred to as global semantic feature vectors.
- the front sights and background points in the present disclosure are for the target object.
- the points belonging to a target object are the front sights of the target object, but not the target object.
- the point is the background point of the target object.
- the point belonging to the target object is the front sight of the target object, but since the point does not belong to other target objects, the point is Background points of other target objects.
- the first semantic information of the multiple points obtained by the present disclosure generally includes: The global semantic features of the front point of the target object and the global semantic features of the background point of the target object.
- the scene in the present disclosure may include one or more target objects.
- Target objects in this disclosure include, but are not limited to: vehicles, non-motor vehicles, pedestrians, and/or obstacles, and the like.
- the present disclosure may use a neural network to predict at least one front point of the corresponding target object among multiple points, the neural network may be a part of points in the point cloud data, or even each point in the point cloud data , To make predictions separately to generate the confidence level of the point as the previous scenic spot.
- the confidence of a point can be expressed as: the probability of the point being the front sight.
- the neural network continues to process the global semantic features to predict the point
- Multiple points in the cloud data are the confidence of the target object's front sight, and the neural network can generate the confidence for each point separately.
- each confidence level generated by the neural network can be judged separately, and the point whose confidence level exceeds a predetermined value can be used as the front sight of the target object.
- the operation of determining the confidence in the present disclosure may be performed in S120 or S130.
- the confidence judgment operation is performed in S120, and the judgment result is that there is no point where the confidence exceeds a predetermined value, that is, there is no previous scenic spot, it can be considered that there is no target object in the scene.
- the present disclosure may obtain a global semantic feature of each point in S110, and generate a 3D initial frame for each point.
- all the confidences obtained in S120 can be judged to select the front attractions of the target object, and the selected front attractions can be used to select from the 3D initial frame generated by S130, so that each front attraction can be corresponding to each other 3D initial box. That is, each 3D initial frame generated by S130 usually includes: a 3D initial frame corresponding to the front sight and a 3D initial frame corresponding to the background point, so S130 needs to filter out the 3D initial frames corresponding to each front sight from all the generated 3D initial frames.
- the present disclosure may generate a 3D initial frame respectively according to the global semantic features of each of the predicted spots predicted above, thereby obtaining each The 3D initial frames are the 3D initial frames corresponding to the front sight. That is, each 3D initial frame generated by S130 is a 3D initial frame corresponding to the front sight, that is to say, S130 may generate a 3D initial frame only for the front sight.
- the 3D initial frame in the present disclosure may be described by the position information of the center point of the 3D initial frame, the length, width, and height information of the 3D initial frame, and the direction information of the 3D initial frame, that is, in the present disclosure
- the 3D initial frame may include position information of the center point of the 3D initial frame, length, width, and height information of the 3D initial frame, and direction information of the 3D initial frame.
- the 3D initial frame may also be referred to as 3D initial frame information.
- the present disclosure may utilize neural networks to generate 3D initial boxes. For example, after the point cloud data is provided to the neural network, the feature information of the point cloud data is extracted by the neural network, and after the semantic segmentation process is performed by the neural network, the neural network continues to process the global semantic features to target multiple Each of the points generates a 3D initial frame.
- the neural network when the point cloud data is provided to the neural network, the feature information of the point cloud data is extracted by the neural network, and the neural network performs semantic segmentation processing, and the neural network performs prediction processing on the global semantic features to obtain points After multiple points in the cloud data are the confidence of the front sight of the target object, the neural network can continue to process the global semantic features of the points whose confidence exceeds the predetermined value to generate a 3D initial frame for each front sight.
- semantic segmentation is based on the feature information of all points in the point cloud data
- the semantic features formed by semantic segmentation include not only the semantic features of the point itself, but also the semantics of surrounding points Feature, so that multiple front sights in this disclosure can semantically point to the same target object in the scene.
- the corresponding 3D initial frames corresponding to different front attractions that point to the same target object are somewhat different, but usually the difference is not large.
- the 3D initial frame corresponding to the front sight does not exist in the 3D initial frame generated by S130 according to the first semantic information, it may be considered that there is no target object in the scene.
- S140 Determine the 3D detection frame of the target object in the scene according to the 3D initial frame.
- the present disclosure finally determines a 3D detection frame for each target object.
- the present disclosure may perform redundant processing on the aforementioned 3D initial frames corresponding to all the front sights, thereby obtaining a 3D detection frame of the target object, that is, performing target object detection on point cloud data, and finally obtaining 3D detection frame.
- the present disclosure may use the degree of overlap between the 3D initial frames to remove redundant 3D initial frames, thereby obtaining the 3D detection frame of the target object.
- the present disclosure may determine the degree of overlap between the 3D initial frames corresponding to multiple front sights, filter the 3D initial frames whose overlap is greater than the set threshold, to obtain the 3D initial frames whose overlap is greater than the set threshold, and then, From the filtered 3D initial frame, the 3D detection frame of the target object is determined.
- the present disclosure may use the NMS (Non-Maximum Suppression Non-Maximum Suppression) algorithm to perform redundant processing on the 3D initial frames corresponding to all the front spots, thereby removing redundant 3D detection frames that overlap each other, and obtain The final 3D detection frame.
- NMS Non-Maximum Suppression Non-Maximum Suppression
- the present disclosure can obtain a final object for each target object in the scene 3D detection box.
- the present disclosure may perform correction (or optimization) on the 3D initial frames corresponding to the currently obtained front spots, and then perform redundant processing on all the corrected 3D initial frames to obtain The 3D detection frame of the target object, that is, the 3D detection frame finally obtained by performing target object detection on the point cloud data.
- the process of respectively correcting the 3D initial frame corresponding to each front sight in the present disclosure may include the following steps A1, B1, and C1:
- Step A1 Acquire feature information of points in a partial area in the point cloud data, where the partial area includes at least a 3D initial frame.
- the present disclosure may set a 3D expansion frame containing a 3D initial frame, and obtain feature information of each point in the 3D expansion frame in the point cloud data.
- the 3D expansion box in the present disclosure is an implementation of partial regions in point cloud data.
- the 3D initial frame corresponding to each front sight in the present disclosure respectively corresponds to a 3D expansion frame, and the space range occupied by the 3D expansion frame generally completely covers and is slightly larger than the space range occupied by the 3D initial frame.
- any surface of the 3D initial frame is not in the same plane as any surface of its corresponding 3D expansion frame, the center point of the 3D initial frame and the center point of the 3D expansion frame coincide with each other, and any surface of the 3D initial frame Both are parallel to the corresponding faces of their corresponding 3D expansion boxes. Since the positional relationship between such a 3D extension frame and the 3D initial frame is relatively standardized, it is beneficial to reduce the difficulty of forming a 3D extension frame, thereby helping to reduce the implementation difficulty of the present disclosure. Of course, the present disclosure does not exclude the case that although the two center points do not coincide, any face of the 3D initial frame is parallel to the corresponding face of the corresponding 3D extension frame.
- the present disclosure may be based on at least one of a preset X-axis direction increment (such as 20 cm), a Y-axis direction increment (such as 20 cm), and a Z-axis direction increment (such as 20 cm).
- the 3D initial frame corresponding to the front sight is expanded in 3D space, so as to form a 3D expansion frame including the 3D initial frame where two center points coincide with each other and the corresponding surfaces are parallel to each other.
- the increment in the present disclosure can be set according to actual needs, for example, the increment in the corresponding direction does not exceed N (such as N greater than 4) of the corresponding side length of the 3D initial frame, optional,
- N such as N greater than 4
- the increment in the X axis direction does not exceed one tenth of the length of the 3D initial frame
- the increment in the Y axis direction does not exceed one tenth of the width of the 3D initial frame
- the increment in the Z axis direction does not exceed ten times the height of the 3D initial frame
- the increment in the X-axis direction, the increment in the Y-axis direction, and the increment in the Z-axis direction may be the same or different.
- ⁇ represents increment.
- the present disclosure may use a neural network to obtain feature information of points in a part of the area in the point cloud data, for example, all points in the part of the area in the point cloud data are used as input to the neural network, and the neural network At least one convolutional layer in processes the point cloud data in the partial area, so that feature information can be formed for each point in the partial area.
- the feature information formed this time may be referred to as local feature information.
- the feature information of the point cloud data formed this time is the feature information separately formed for each point in the partial area when considering all the points in the partial area of the point cloud data. Therefore, the features formed this time Information can be called local feature information.
- Step B1 Perform semantic segmentation on the points in the partial area according to the feature information of the points in the partial area to obtain second semantic information on the points in the partial area.
- the second semantic information of a point in the present disclosure refers to: a semantic feature vector formed for the point in consideration of all points in the spatial range formed by the 3D extension box.
- the second semantic information in this disclosure may be referred to as a second semantic feature or a local spatial semantic feature.
- a local spatial semantic feature can also be expressed in the form of a one-dimensional vector array including multiple (eg, 256) elements.
- a neural network may be used to obtain local spatial semantic features of all points in the 3D expansion box, and a method of using neural networks to obtain local spatial semantic features of points may include the following steps a and b:
- the preset target position of the 3D extension frame may include: the center point of the 3D extension frame (that is, the center point of the 3D initial frame) is located at the origin of coordinates, and the length of the 3D extension frame is parallel to the X axis.
- the above coordinate origin and X axis may be the coordinate origin and X axis of the coordinate system of the point cloud data, and of course, may also be the coordinate origin and X axis of other coordinate systems.
- b i (x i , y i , z i , h i , w i , l i , ⁇ i ), where x i , y i and Z i represent the coordinates of the center point of the i-th frame of the original 3D, h i, w i L i and a high width to length respectively represent the i-th original frame 3D, [theta] i represents the i-th frame of the original 3D directions, e.g.
- the angle between the length of the i-th 3D initial frame and the X coordinate axis is ⁇ i ; then, after performing coordinate transformation on the 3D extended frame that contains the i-th 3D initial frame, the present disclosure obtains A new 3D initial box
- the new 3D initial box It can be expressed as:
- the new 3D initial box The center point of is located at the origin of coordinates, and in a bird’s eye view, the new 3D initial frame The angle between the length of and the X coordinate axis is 0.
- the above coordinate transformation manner of the present disclosure may be referred to as regularized coordinate transformation.
- the present disclosure performs coordinate conversion on a point, and usually only changes the coordinate information of the point, but does not change other information of a point.
- the coordinates of the points in different 3D initial frames can be concentrated in a rough range, which is beneficial to the training of the neural network, that is, to improving the neural network to form local spatial semantics
- the accuracy of the features helps to improve the accuracy of the 3D initial frame correction.
- the data method of coordinate transformation described above is only an optional example, and those skilled in the art may also adopt other transformation methods that transform the coordinates to a certain range.
- the coordinate-converted point cloud data (that is, the coordinate-converted point cloud data located in the 3D extension box) is provided to the neural network, and the neural network performs semantic segmentation processing on the received points to be located in the 3D extension box. Each point within generates a local spatial semantic feature.
- the present disclosure may form a front sight mask (eg, set the point where the confidence exceeds a predetermined value (such as 0.5, etc.) to 1 according to the confidence generated for the front sight in the above steps, and set the confidence to Points exceeding a predetermined value are set to 0, thereby forming a mask of the front sight).
- the present disclosure can provide the front sight mask and the coordinate-transformed point cloud data together to the neural network, so that the neural network can refer to the front sight mask when performing semantic processing, thereby helping to improve the description accuracy of local spatial semantic features .
- Step C1 Form the corrected 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area.
- the method for obtaining the global semantic characteristics of multiple points in the 3D extension frame in the present disclosure may be: first, according to the coordinate information of each point in the point cloud data, determine whether each point belongs to the spatial range of the 3D extension frame Whether it is located in the 3D extension frame, may include: located on any surface of the 3D extension surface), for a point, if the position of the point belongs to the spatial range of the 3D extension frame, the point can be regarded as belonging to the 3D extension frame Point; if the position of the point does not belong to the spatial range of the 3D extension box, the point will not be regarded as a point belonging to the 3D extension box.
- the global semantic features of multiple points are determined.
- the global semantic feature of the point can be found from the global semantic features of each point obtained by the foregoing, and so on, the present disclosure can obtain 3D Global semantic features of all points of the expansion box.
- the neural network can process the global semantic features and local semantic features of each point, and obtain the corrected 3D initial frame according to the processing result of the neural network.
- the neural network encodes the global semantic features and local spatial semantic features of the points in the 3D extension box to obtain the characteristics of the 3D initial box used to describe the 3D extension box, and uses the neural network to describe the 3D
- the characteristics of the initial frame predict the confidence of the 3D initial frame as the target object, and adjust the 3D initial frame according to the characteristics used to describe the 3D initial frame via the neural network, thereby obtaining the corrected 3D initial frame.
- it is beneficial to the accuracy of the 3D initial frame, thereby helping to improve the accuracy of the 3D detection frame.
- the global semantic feature and local spatial semantic feature of each point in the 3D extension box can be stitched.
- the global semantic feature and the point The local spatial semantic features are stitched together to form the stitched semantic features.
- the stitched semantic features are used as input to the neural network to facilitate the neural network to encode the stitched semantic features, and the neural network generates the encoding After processing, it is used to describe the characteristics of the 3D initial frame in the 3D extension frame (hereinafter referred to as the characteristics after the encoding process).
- the neural network can predict the confidence of the 3D initial frame as the target object for each input encoded feature, and for each 3D initial frame, form Confidence.
- the confidence level can represent the probability that the corrected 3D initial frame is the target object.
- the neural network can form a new 3D initial frame (that is, the corrected 3D initial frame) for each input processed feature.
- the neural network respectively forms the position information of the center point of the new 3D initial frame, the length, width, and height information of the new 3D initial frame, and the direction information of the new 3D initial frame according to the input features after each encoding process.
- the present disclosure performs redundant processing on all the 3D initial frames after correction, so as to obtain the process of obtaining the 3D detection frame of the target object. Please refer to the corresponding descriptions above, which will not be described in detail here.
- one embodiment of the target object 3D detection method of the present disclosure includes steps: S200 and S210. Each step in FIG. 2 is described in detail below.
- S200 Provide point cloud data to a neural network, perform feature extraction processing on points in the point cloud data via the neural network, and perform semantic segmentation processing on the point cloud data according to the extracted feature information to obtain semantic features of multiple points, According to the semantic features, the front points of the multiple points are predicted, and a 3D initial frame corresponding to at least some of the multiple points is generated.
- the neural network in the present disclosure is mainly used to generate a 3D initial frame for multiple points in the input point cloud data (such as all points or multiple points in the point cloud data), thereby Make each of the multiple points in the point cloud data correspond to a 3D initial frame. Since multiple points (such as each point) in the point cloud data usually contain the front sight and the background point, the 3D initial information frame generated by the neural network of the present disclosure usually includes: the 3D initial frame corresponding to the front sight and the The 3D initial frame corresponding to the background point.
- the neural network Since the input of the neural network of the present disclosure is point cloud data, the neural network performs feature extraction on the point cloud data and performs semantic segmentation on the point cloud data based on the extracted feature information, which belongs to the underlying data analysis; and because the neural network of the present disclosure is based on The result of semantic segmentation generates a 3D initial frame, which is equivalent to upper-layer data analysis. Therefore, in the process of 3D detection of a target object, the present disclosure forms a bottom-up way to generate a 3D detection frame.
- the neural network of the present disclosure generates a 3D initial frame by using a bottom-up generation method, which not only avoids the projection processing of point cloud data, but also uses the image obtained after the projection processing to perform 3D detection frame detection, resulting in point cloud data
- the phenomenon of loss of original information, and the phenomenon of loss of original information are not conducive to improving the performance of 3D detection frame detection;
- the present disclosure can also avoid the use of 2D images taken by the camera device for 3D detection frame detection, due to the The target object (such as a vehicle or an obstacle) is blocked, resulting in a phenomenon that affects the detection of the 3D detection frame, and this phenomenon is also not conducive to improving the performance of the 3D detection frame detection. It can be seen from this that the neural network of the present disclosure generates a 3D initial frame by using a bottom-up generation method, which is beneficial to improve the detection performance of the 3D detection frame.
- the neural network in the present disclosure may be divided into multiple parts, and each part may be implemented by a small neural network (also called a neural network unit or a neural network module, etc.), that is, the The neural network consists of multiple small neural networks. Since part of the structure of the neural network of the present disclosure can adopt the structure of RCNN (Regions with Convolutional Neural Network), the neural network of the present disclosure can be called Point RCNN (Point Regions with Convolutional Neural Network). Regional Convolutional Neural Network).
- the 3D initial frame generated by the neural network of the present disclosure may include: position information of the center point of the 3D initial frame (such as coordinates of the center point), length, width, and height information of the 3D initial frame, and the Direction information (such as the angle between the length of the 3D initial frame and the X coordinate axis), etc.
- the 3D initial frame formed by the present disclosure may also include: position information of the center point of the bottom or top surface of the 3D initial frame, length, width, and height information of the 3D initial frame, and direction information of the 3D initial frame.
- the present disclosure does not limit the specific expression form of the 3D initial frame.
- the neural network of the present disclosure may include: a first neural network, a second neural network, and a third neural network.
- the point cloud data is provided to the first neural network.
- the first neural network is used to: perform feature extraction processing on multiple points (such as all points) in the received point cloud data, so as to provide each point in the point cloud data
- a global feature information is formed separately, and semantic segmentation processing is performed according to the global feature information of multiple points (such as all points), thereby forming a global semantic feature for each point, and the first neural network outputs the global semantic feature of each point.
- the global semantic features of points can usually be expressed in the form of a one-dimensional vector array including multiple (eg, 256) elements.
- the global semantic features in this disclosure may also be referred to as global semantic feature vectors.
- the points in the point cloud data include: front spots and background points
- the information output by the first neural network usually includes: the global semantic features of the front spots and the global semantic features of the background points.
- the first neural network in the present disclosure may be implemented using Point Cloud Encoder (Point Cloud Data Encoder) and Point Cloud Decoder (Point Cloud Data Decoder).
- the first neural network may use PointNet++ or Network structure such as Pointsift network model.
- the second neural network in the present disclosure may be implemented using MLP (Multi-Layer Perceptron), and the output dimension of the MLP used to implement the second neural network may be 1.
- the third neural network in the present disclosure may also be implemented using MLP, and the output dimensions of the MLP used to implement the third neural network are multi-dimensional, and the number of dimensions is related to the information included in the 3D detection frame information.
- the present disclosure needs to use the global semantic feature to realize the prediction of the front spot and the generation of the 3D initial frame.
- the present disclosure can adopt the following two ways to realize the prediction of the front sight and the generation of the initial 3D frame.
- Manner 1 The global semantic features of each point output by the first neural network are provided to the second neural network and the third neural network simultaneously (as shown in FIG. 3).
- the second neural network is used to predict the confidence of the point as the former scenic spot for each global semantic feature of the input, and output the confidence for each point.
- the confidence predicted by the second neural network may indicate the probability that the point is the front sight.
- the third neural network is used to generate a 3D initial frame for the global semantic feature of each input point and output it. For example, the third neural network outputs the position information of the center point of the 3D initial frame, the length, width, and height information of the 3D initial frame, and the direction information of the 3D initial frame for each point according to the global semantic features of each point.
- the 3D initial frame output by the third neural network usually includes: the 3D initial frame corresponding to the front sight and the background point 3D initial frame; however, the third neural network itself cannot distinguish whether each output 3D initial frame is the 3D initial frame corresponding to the front sight or the 3D initial frame corresponding to the background point.
- Method 2 The global semantic features of each point output by the first neural network are first provided to the second neural network, and the second neural network predicts the confidence that the point is the previous scenic spot for the input global semantic features of each point
- the global semantic feature of the point is provided to the third neural network (as shown in FIG. 4).
- the third neural network generates a 3D initial frame for each global semantic feature that it receives as the front sight, and outputs the corresponding 3D initial frame for each front sight.
- the present disclosure does not provide the global semantic feature of the point to the third neural network when it is determined that the point output by the second neural network is the confidence level of the previous scenic spot does not exceed a predetermined value. Therefore, all the output of the third neural network
- the 3D initial frames are the 3D initial frames corresponding to the front sight.
- the present disclosure may determine that the 3D initial frames corresponding to the points output by the third neural network are corresponding to the front attractions according to the confidences output by the second neural network, respectively
- the 3D initial frame is also the 3D initial frame corresponding to the background point.
- the point is determined as the front sight, so that the present disclosure can output the first point output by the third neural network
- the 3D initial frame corresponding to the point is determined to be the 3D initial frame corresponding to the front sight, and so on, according to the confidence of the output of the second neural network, the present disclosure can select all the front sights from all the 3D initial frames output by the third neural network Corresponding 3D initial frame. Afterwards, the present disclosure may perform redundant processing on the 3D initial frames corresponding to all the selected front sights, thereby obtaining a final 3D detection frame, that is, a 3D detection frame detected for point cloud data.
- the present disclosure may use the NMS (Non-Maximum Suppression Non-Maximum Suppression) algorithm to perform redundant processing on the 3D detection frame information corresponding to all currently selected front spots, thereby removing redundant 3D detections that overlap each other Frame to obtain the final 3D detection frame.
- NMS Non-Maximum Suppression Non-Maximum Suppression
- the present disclosure can directly obtain the 3D initial frame corresponding to the front sight according to the 3D initial frame output by the third neural network, therefore, the present disclosure can directly target the third nerve All the 3D initial frames output by the network are redundantly processed to obtain the final 3D detection frame, that is, the 3D detection frame detected for the point cloud data (refer to the related description in the above embodiment).
- the present disclosure may use the NMS algorithm to perform redundant processing on all 3D initial frames output by the third neural network, thereby removing redundant 3D initial frames that overlap each other, to obtain a final 3D detection frame.
- the present disclosure can correct the 3D initial frame corresponding to each front sight separately, and The 3D initial frames corresponding to the corrected front spots are redundantly processed to obtain the final 3D detection frame. That is to say, the process of generating the 3D detection frame by the neural network of the present disclosure can be divided into two stages. The initial 3D frame generated by the neural network in the first stage of the neural network is provided to the second stage of the neural network.
- the stage neural network corrects the 3D initial frame generated by the first stage neural network (such as position optimization, etc.), and then, the present disclosure determines the final 3D detection frame according to the corrected 3D initial frame of the second stage neural network.
- the final 3D detection frame is the 3D detection frame detected by the present disclosure based on point cloud data.
- the process of generating the 3D initial frame by the neural network of the present disclosure may include only the first-stage neural network and not the second-stage neural network. In the case where the process of generating the 3D initial frame by the neural network includes only the first-stage neural network, it is also completely feasible for the present disclosure to determine the final 3D detection frame according to the 3D initial frame generated by the first-stage neural network.
- Both the first-stage neural network and the second-stage neural network in this disclosure can be implemented by neural networks that can exist independently, or can be composed of part of the network structural units in a complete neural network; in addition, for ease of description, it may be related to
- the received neural network is called the first neural network, the second neural network, the third neural network, the fourth neural network, the fifth neural network, the sixth neural network, or the seventh neural network, but it should be understood that the first to seventh
- Each of the neural networks may be an independent neural network, or may be composed of some network structural units in a large neural network, which is not limited in this disclosure.
- the process of using the neural network to correct the respective 3D initial frames corresponding to each front sight in the present disclosure may include the following steps A2, B2, and C2:
- Step A2 Set a 3D expansion frame containing a 3D initial frame, and obtain global semantic features of points in the 3D expansion frame.
- each 3D initial frame in the present disclosure corresponds to a 3D extension frame, and the space range occupied by the 3D extension frame generally completely covers the space range occupied by the 3D initial frame.
- any surface of the 3D initial frame is not in the same plane as any surface of its corresponding 3D expansion frame, the center point of the 3D initial frame and the center point of the 3D expansion frame coincide with each other, and any surface of the 3D initial frame Both are parallel to the corresponding faces of their corresponding 3D expansion boxes.
- the present disclosure does not exclude the case that although the two center points do not coincide, any face of the 3D initial frame is parallel to the corresponding face of the corresponding 3D extension frame.
- the present disclosure may be based on at least one of a preset X-axis direction increment (such as 20 cm), a Y-axis direction increment (such as 20 cm), and a Z-axis direction increment (such as 20 cm).
- the 3D initial frame of the front view point is expanded in 3D space, so as to form a 3D expansion frame including two 3D initial frames whose center points coincide with each other and the planes are parallel to each other.
- ⁇ represents increment.
- the local space in the present disclosure generally refers to: the spatial range formed by the 3D expansion frame.
- the local spatial semantic feature of a point generally refers to a semantic feature vector formed for that point when considering all the points in the spatial range formed by the 3D extension box.
- a local spatial semantic feature can also be expressed in the form of a one-dimensional vector array including multiple (eg, 256) elements.
- the method for obtaining the global semantic features of multiple points in the 3D extension frame in the present disclosure may be: first, according to the coordinate information of each point in the point cloud data, determine whether each point belongs to the spatial range of the 3D extension frame (ie Whether it is located in the 3D extension frame, may include: located on any surface of the 3D extension surface), for a point, if the position of the point belongs to the spatial range of the 3D extension frame, the point can be regarded as belonging to the 3D extension frame Point; if the position of the point does not belong to the spatial range of the 3D extension box, the point will not be regarded as a point belonging to the 3D extension box.
- the global semantic features of multiple points are determined.
- the global semantic feature of the point can be found from the global semantic features of each point obtained by the foregoing, and so on, the present disclosure can obtain 3D Global semantic features of all points of the expansion box.
- Step B2 The point cloud data located in the 3D extension box is provided to the fourth neural network in the neural network, and the local spatial semantic features of the points in the 3D extension box are generated via the fourth neural network.
- the method for obtaining the local spatial semantic features of all points in the 3D extension frame in the present disclosure may include the following steps a and b:
- the preset target position of the 3D extension frame may include: the center point of the 3D extension frame (that is, the center point of the 3D initial frame) is located at the origin of coordinates, and the length of the 3D extension frame is parallel to the X axis.
- the above coordinate origin and X axis may be the coordinate origin and X axis of the coordinate system of the point cloud data, and of course, may also be the coordinate origin and X axis of other coordinate systems.
- b i (x i , y i , z i , h i , w i , l i , ⁇ i ), where x i , y i and Z i represent the coordinates of the center point of the i-th frame of the original 3D, h i, w i L i and a high width to length respectively represent the i-th original frame 3D, [theta] i represents the i-th frame of the original 3D directions, e.g.
- the angle between the length of the i-th 3D initial frame and the X coordinate axis is ⁇ i ; then, after performing coordinate transformation on the 3D extended frame that contains the i-th 3D initial frame, the present disclosure obtains A new 3D initial box
- the new 3D initial box It can be expressed as:
- the new 3D initial box The center point of is located at the origin of coordinates, and in a bird’s eye view, the new 3D initial frame The angle between the length of and the X coordinate axis is 0.
- the coordinate-converted point cloud data (that is, the coordinate-converted point cloud data in the 3D expansion box) is provided to the fourth neural network in the neural network, and the fourth neural network characterizes the received points Extraction processing, and semantic segmentation processing based on the extracted local feature information, so as to generate local spatial semantic features for each point located in the 3D expansion box.
- the present disclosure can also form a mask of the front sight (such as setting the point where the confidence exceeds a predetermined value (such as 0.5, etc.) to 1, while the confidence does not exceed the predetermined value Is set to 0).
- the present disclosure can provide the front sight mask together with the point cloud data after coordinate conversion to the fourth neural network, so that the fourth neural network can refer to the front sight mask when performing feature extraction and semantic processing, thereby facilitating improvement Description accuracy of local spatial semantic features.
- the fourth neural network in the present disclosure may be implemented using MLP, and the output dimensions of the MLP used to implement the fourth neural network are generally multi-dimensional, and the number of dimensions is related to the information included in the local spatial semantic features.
- Step C2 Through the fifth neural network in the neural network, encode the global semantic features and local spatial semantic features of the points in the 3D extension box to obtain the features describing the 3D initial box in the 3D extension box, And through the sixth neural network in the neural network to predict the confidence of the 3D initial frame according to the characteristics of the 3D initial frame, the seventh neural network in the neural network according to the characteristics of the 3D initial frame, Correcting the 3D initial frame is beneficial to improve the accuracy of the 3D initial frame, and thus to improve the accuracy of the 3D detection frame.
- the fifth neural network in the present disclosure may be implemented using Point Cloud Encoder (point cloud data encoder).
- the fifth neural network may adopt a partial network structure such as PointNet++ or Pointsift network model.
- the sixth neural network in the present disclosure may be implemented using MLP, and the output dimension of the MLP used to implement the sixth neural network may be 1, and the number of dimensions may be related to the number of types of target objects.
- the seventh neural network in the present disclosure may also be implemented using MLP, and the output dimensions of the MLP used to implement the seventh neural network are multi-dimensional, and the number of dimensions is related to the information included in the 3D detection frame information.
- the first neural network to the seventh neural network in the present disclosure may all be implemented by a neural network that can exist independently, or by a part of a neural network that cannot exist independently.
- the global semantic feature and local spatial semantic feature of each point in the 3D extension box can be stitched.
- the global semantic feature and the point The local spatial semantic features are stitched together to form the stitched semantic features.
- the stitched semantic features are taken as input and provided to the fifth neural network, so that the fifth neural network can encode the stitched semantic features.
- Five neural networks output features after encoding processing to describe the features of the 3D initial frame in the 3D expansion frame (hereinafter referred to as features after encoding processing).
- the encoded features output by the fifth neural network are simultaneously provided to the sixth neural network and the seventh neural network (as shown in FIG. 5).
- the sixth neural network is used to predict the confidence level of the 3D initial frame as the target object for each encoded feature of the input, and output the confidence level for each 3D initial frame.
- the confidence predicted by the sixth neural network may represent the probability that the corrected 3D initial frame is the target object.
- the target object here may be a vehicle or a pedestrian.
- the seventh neural network is used to form a new 3D initial frame (that is, the corrected 3D initial frame) for each input feature after encoding processing, and output it.
- the seventh neural network outputs the position information of the center point of the new 3D initial frame, the length, width, and height information of the new 3D initial frame, and the direction information of the new 3D initial frame, respectively, according to the encoded features of each input Wait.
- the neural network of the present disclosure is obtained by training using multiple point cloud data samples with 3D annotation frames.
- the present disclosure can obtain the loss corresponding to the confidence generated by the neural network to be trained, and the loss formed by the 3D initial frame generated by the neural network to be trained for the point cloud data sample relative to the 3D annotation frame of the point cloud data sample
- the neural network can be trained.
- the network parameters in this disclosure may include, but are not limited to, convolution kernel parameters and weight values.
- the present disclosure can obtain the loss corresponding to the confidence generated by the first stage neural network The loss corresponding to the 3D initial box, and using the two losses of the first stage neural network, adjust the network parameters of the first stage neural network (such as the first neural network, the second neural network, and the third neural network), and After the successful training of the neural network in the first stage, the entire neural network is successfully trained.
- the present disclosure can separately train the first stage neural network and the second stage neural network. For example, first obtain the loss corresponding to the confidence generated by the first stage neural network and the loss corresponding to the 3D initial frame, and use these two losses to adjust the network parameters of the first stage neural network.
- the 3D initial frame corresponding to the front sight output by the first stage neural network is input as the input to the second stage neural network, and the corresponding confidence generated by the second stage neural network is obtained Loss and the corresponding loss of the corrected 3D initial box, and using these two losses of the second-stage neural network, the second-stage neural network (such as the fourth neural network, fifth neural network, sixth neural network, and seventh Neural network) network parameters are adjusted.
- the entire neural network is successfully trained.
- the loss corresponding to the confidence generated by the first stage neural network in this disclosure can be expressed by the following formula (1):
- L reg represents the regression loss function of the 3D detection frame, and N pos represents the number of front spots;
- a bucket in the present disclosure may refer to: dividing the spatial range around the point, a range of value ranges, called a bucket, each bucket may have its corresponding number, usually
- the range of the bucket is fixed.
- the range of the bucket is the length.
- the bucket has a fixed length.
- the range of the bucket is Angle range, at this time, the bucket has a fixed angle interval.
- the length of the bucket may be 0.5m.
- the value range of different buckets may be 0-0.5m and 0.5m-1m.
- the present disclosure can divide 2 ⁇ into multiple angle intervals, one angle interval corresponds to a range of value ranges.
- the size of the bucket that is, the angle interval
- S represents the search distance of the previous spot p on the x-axis or z-axis, that is, in the case that the parameter u is x, S represents the 3D initial frame generated for the front spot p
- C is a constant value, and C may be related to the length of the bucket, for example, C is equal to the length of the bucket or half the length of the bucket.
- the training process ends.
- the predetermined iteration conditions in the present disclosure may include that the difference between the 3D initial frame output by the third neural network and the 3D annotation frame of the point cloud data sample meets the predetermined difference requirement, and the confidence level output by the second neural network meets the predetermined requirement. In the case that both meet the requirements, this time the first to third neural networks are successfully trained.
- the predetermined iteration conditions in the present disclosure may also include: training the first to third neural networks, the number of point cloud data samples used meets the predetermined number requirements, and so on. When the number of point cloud data samples used reaches the predetermined number requirement, however, both of them do not meet the requirements, this time the first to third neural networks are not successfully trained.
- the first to third neural networks that have been successfully trained can be used for 3D detection of the target object.
- the first to third neural networks that have been successfully trained can also be used to generate 3D corresponding to the front sight for the point cloud data sample
- the initial frame that is, the present disclosure can again provide point cloud data samples to the successfully trained first neural network, and store the information output by the second neural network and the third neural network separately, so as to facilitate the second-stage neural network Provide input (that is, the 3D initial frame corresponding to the front sight); after that, obtain the loss corresponding to the confidence generated in the second stage and the loss corresponding to the corrected 3D initial frame, and use the obtained loss to adjust the fourth neural network to the third Seven neural network parameters, and after the fourth to seventh neural networks are successfully trained, the entire neural network is successfully trained.
- the loss function used for adjusting the network parameters of the fourth to seventh neural networks in the second-stage neural network in the present disclosure includes the loss corresponding to the confidence and the loss corresponding to the corrected 3D initial frame. Use the following formula (9) to express:
- B represents the 3D initial box set;
- represents the number of 3D initial boxes in the 3D initial box set;
- B pos is a subset of B, and the overlap between the 3D initial box in B pos and the corresponding 3D label box exceeds the set threshold;
- means the sub The number of concentrated 3D initial frames;
- Annotate frame information for the i-th 3D Represents the information of the i-th 3D labeled frame after coordinate conversion; (xi, yi, zi, hi, wi, li, ⁇ i) is the i-th 3D initial frame after correction, Indicates the ith 3D initial frame after coordinate conversion.
- ⁇ represents the size of the barrel, that is, the angle interval of the barrel.
- ⁇ represents the size of the barrel, that is, the angle interval of the barrel.
- the training process ends.
- the predetermined iteration conditions in the present disclosure may include: the difference between the 3D initial frame output by the seventh neural network and the 3D annotation frame of the point cloud data sample meets the predetermined difference requirement, and the confidence of the sixth neural network output meets the predetermined requirement. In the case that both meet the requirements, the fourth to seventh neural networks are successfully trained this time.
- the predetermined iteration conditions in the present disclosure may also include: training the fourth to seventh neural networks, and the number of point cloud data samples used reaches a predetermined number of requirements, etc. When the number of point cloud data samples used reaches the predetermined number requirement, however, both of them do not meet the requirements, this time the fourth to seventh neural networks are not successfully trained.
- FIG. 6 is a flowchart of an embodiment of a vehicle intelligent control method of the present disclosure.
- the method of this embodiment includes steps: S600, S610, S620, S630, S640, and S650. Each step in FIG. 6 will be described in detail below.
- S610 Perform semantic segmentation on the point cloud data according to the feature information of the point cloud data to obtain first semantic information of multiple points in the point cloud data.
- S620 Predict at least one front scenic spot corresponding to the target object among the multiple points according to the first semantic information.
- S640 Determine the 3D detection frame of the target object in the scene according to the 3D initial frame.
- the above S600-S640 can be implemented by providing point cloud data to a neural network, performing feature information extraction processing on points in the point cloud data via the neural network, and performing semantic segmentation processing based on the extracted feature information to obtain The semantic features of multiple points, according to the semantic features, predict the front points of the multiple points, and generate a 3D initial frame corresponding to at least some of the multiple points.
- the present disclosure may first determine at least one of the following information of the target object according to the 3D detection frame: the spatial position, size, distance to the vehicle, and relative orientation information of the target object in the scene. Then, according to the determined at least one piece of information, an instruction or early warning message for controlling the vehicle is generated.
- the instructions generated by the present disclosure are, for example, an instruction to increase the speed, an instruction to decrease the speed, or an emergency braking instruction.
- the generated warning prompt information such as the attention information of a target object such as a vehicle or pedestrian paying attention to a certain position, etc.
- the present disclosure does not limit the specific implementation of generating instructions or warning prompt information according to the 3D detection frame.
- FIG. 7 is a flowchart of an embodiment of an obstacle avoidance navigation method of the present disclosure.
- the method in this embodiment includes steps: S700, S710, S720, S730, S740, and S750. Next, each step in FIG. 7 will be described in detail.
- S710. Perform semantic segmentation on the point cloud data according to the feature information of the point cloud data to obtain first semantic information of multiple points in the point cloud data.
- S720 Predict at least one front scenic spot corresponding to the target object among multiple points according to the first semantic information.
- the above S700-S740 may be implemented by providing point cloud data to a neural network, performing feature information extraction processing on points in the point cloud data via the neural network, and performing semantic segmentation processing based on the extracted feature information to obtain The semantic features of multiple points, according to the semantic features, predict the front points of the multiple points, and generate a 3D initial frame corresponding to at least some of the multiple points.
- S750 According to the above 3D detection frame, generate an instruction or warning prompt information for performing obstacle avoidance navigation control on the robot where the lidar is located.
- the present disclosure may first determine at least one of the following information of the target object according to the 3D detection frame: the spatial position, size, distance to the robot, and relative orientation information of the target object in the scene. Then, according to the determined at least one piece of information, an instruction or early warning prompt information for performing obstacle avoidance navigation control on the robot is generated.
- the instructions generated by the present disclosure are, for example, an instruction to reduce the speed of an action, an instruction to suspend an action, or a turn instruction.
- the generated early warning prompt information such as the prompt information of paying attention to an obstacle (ie, target object) in a certain direction, etc.
- the present disclosure does not limit the specific implementation of generating instructions or warning prompt information according to the 3D detection frame.
- FIG. 8 is a schematic structural diagram of an embodiment of a target object 3D detection device of the present disclosure.
- the device shown in FIG. 8 includes: a feature extraction module 800, a first semantic segmentation module 810, a pre-spot prediction module 820, a generation initial frame module 830, and a determination detection frame module 840.
- the feature extraction module 800 is mainly used to extract feature information of point cloud data of the acquired scene.
- the first semantic segmentation module 810 is mainly used to perform semantic segmentation processing on the point cloud data according to the feature information of the point cloud data to obtain first semantic information of multiple points in the point cloud data.
- the predicted front sight module 820 is mainly used to predict at least one front sight corresponding to the target object among the multiple points according to the first semantic information.
- the generating initial frame module 830 is mainly used to generate a 3D initial frame corresponding to at least one front sight according to the first semantic information.
- the detection frame determination module 840 is mainly used to determine the 3D detection frame of the target object in the scene according to the 3D initial frame.
- the determination detection block module 840 may include: a first submodule, a second submodule, and a third submodule.
- the first sub-module is mainly used to obtain characteristic information of points in a partial area in the point cloud data, where the partial area includes at least one of the 3D initial frames.
- the second sub-module is mainly used for semantically segmenting the points in the partial area according to the feature information of the points in the partial area to obtain second semantic information of the points in the partial area.
- the third sub-module is mainly used to determine the 3D detection frame of the target object in the scene according to the first semantic information and the second semantic information of the points in the partial area.
- the third submodule in the present disclosure may include: a fourth submodule and a fifth submodule.
- the fourth sub-module is mainly used to correct the 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area to obtain the corrected 3D initial frame.
- the fifth sub-module is mainly used to determine the 3D detection frame of the target object in the scene according to the corrected 3D initial frame.
- the third submodule in the present disclosure may be further used to determine the confidence of the target object corresponding to the 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area, and according to the 3D The initial frame and its confidence determine the 3D detection frame of the target object in the scene.
- the third submodule in the present disclosure may include: a fourth submodule, a sixth submodule, and a seventh submodule.
- the fourth sub-module is mainly used to correct the 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area to obtain the corrected 3D initial frame.
- the sixth sub-module is mainly used to determine the confidence of the target object corresponding to the corrected 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area.
- the seventh sub-module is mainly used to determine the 3D detection frame of the target object in the scene according to the corrected 3D initial frame and its confidence.
- some areas in the present disclosure include: the 3D expansion frame obtained by edge-expanding the 3D initial frame according to a predetermined strategy.
- the 3D expansion frame may be: according to a preset X-axis direction increment, Y-axis direction increment and/or Z-axis direction increment, the 3D initial frame is expanded in 3D space to form a 3D initial frame 3D expansion box.
- the second submodule in the present disclosure may include: an eighth submodule and a ninth submodule.
- the eighth sub-module is mainly used to perform coordinate transformation on the coordinate information of the points located in the 3D extension box in the point cloud data according to the preset target position of the 3D extension box to obtain the feature information of the point after the coordinate transformation.
- the ninth sub-module is mainly used to perform semantic segmentation based on the 3D extension box according to the feature information of the coordinate-transformed point, to obtain the second semantic feature of the point in the 3D extension box.
- the ninth sub-module may perform semantic segmentation based on the 3D extension frame according to the mask of the front sight and the feature information of the point after coordinate transformation to obtain the second semantic feature of the point.
- the determination detection frame module 840 in the present disclosure may first determine the degree of overlap between the 3D initial frames corresponding to the multiple front sights, and then determine the detection frame module 840 screens the 3D initial frame whose overlapping degree is greater than the set threshold; then, the detection frame determination module 840 determines the 3D detection frame of the target object in the scene according to the filtered 3D initial frame.
- the feature extraction module 800, the first semantic segmentation module 810, the pre-spot prediction module 820, and the initial frame generation module 830 in the present disclosure may be implemented by a first-stage neural network.
- the device of the present disclosure may further include a first training module.
- the first training module is used to train the first-stage neural network to be trained using point cloud data samples with 3D annotation frames.
- the process of the first training module training the first stage neural network includes:
- the first training module provides the point cloud data samples to the first stage neural network, extracts the feature information of the point cloud data samples based on the first stage neural network, and the first stage neural network performs the point cloud data samples according to the extracted feature information Semantic segmentation processing, the first stage neural network predicts at least one front sight corresponding to the target object among the multiple points according to the first semantic features of the multiple points obtained by the semantic segmentation processing, and generates at least one front sight according to the first semantic information Corresponding 3D initial frame.
- the first training module obtains the loss corresponding to the front sight and the loss formed by the 3D initial frame relative to the corresponding 3D annotation frame, and adjusts the network parameters in the first-stage neural network according to the above loss.
- the first training module may determine the first loss corresponding to the prediction result of the front sight according to the confidence of the front sight predicted by the neural network in the first stage.
- the first training module generates a second loss according to the number of the bucket in which the parameters in the 3D initial frame generated for the front sight are located and the number of the bucket in the 3D annotation frame information in the point cloud data sample.
- the first training module generates the first according to the offset of the parameters in the 3D initial frame generated for the front sight in the corresponding bucket and the offsets of the parameters in the 3D annotation frame information in the point cloud data sample in the corresponding bucket.
- Three losses The first training module generates a fourth loss according to the offset of the parameter in the 3D initial frame generated for the front sight from the predetermined parameter.
- the first training module generates a fifth loss according to the offset of the coordinate parameter of the front sight relative to the coordinate parameter of the 3D initial frame generated for the front sight.
- the first training module adjusts the network parameters of the first-stage neural network according to the first loss, the second loss, the third loss, the fourth loss, and the fifth loss.
- the first submodule, the second submodule, and the third submodule in the present disclosure are implemented by a second-stage neural network.
- the device of the present disclosure further includes a second training module, and the second training module is used to train the second-stage neural network to be trained using point cloud data samples with 3D annotation frames.
- the process of the second training module training the second-stage neural network includes:
- the second training module provides the 3D initial frame obtained by using the first-stage neural network to the second-stage neural network, and obtains the feature information of the points in the partial region of the point cloud data sample based on the second-stage neural network.
- the feature information of the points in the partial area semantically segment the points in the partial area to obtain the second semantic characteristics of the points in the partial area;
- the second stage neural network is based on the first semantic characteristics and the second of the points in the partial area Semantic features, determine the confidence of the 3D initial frame as the target object, and generate a position-corrected 3D initial frame based on the first and second semantic features of the points in the partial area.
- the second training module obtains the loss corresponding to the confidence of the target object in the 3D initial frame, and the loss formed by the position-corrected 3D initial frame relative to the corresponding 3D annotation frame, and according to the obtained loss, the second-stage neural network Adjust the network parameters in the network.
- the second training module may determine the sixth loss corresponding to the prediction result according to the confidence level of the target object predicted by the 3D initial frame predicted by the second stage neural network.
- the second training module is based on the number of the bucket where the parameters in the 3D initial frame after the position correction and the point cloud data samples generated by the second stage neural network and the overlap with the corresponding 3D annotation frame exceed the set threshold
- the second training module corrects the 3D corrected position based on the position generated by the second stage neural network and the corresponding overlap of the 3D annotation frame exceeds the set threshold
- the offset of the parameter in the initial box in the corresponding bucket and the offset of the parameter in the 3D annotation box information in the point cloud data sample in the corresponding bucket generate an eighth loss
- the second training module is based on the second stage The offset of the parameters of the 3D initial frame after the position correction of the position of the 3D initial frame generated by the neural network and the corresponding 3D
- the device of this embodiment includes: a target object 3D detection device 900 and a first control module 910.
- the target object 3D detection device 900 is used to obtain a 3D detection frame of the target object based on the point cloud data.
- the specific structure and specific operations of the target object 3D detection device 900 are as described in the above device and method embodiments, and will not be described in detail here.
- the first control module 910 is mainly used to generate an instruction or early warning information for controlling the vehicle according to the 3D detection frame. For details, reference may be made to the relevant description in the above method embodiment, and no more detailed description is provided here.
- FIG. 10 is an obstacle avoidance navigation device of the present disclosure.
- the device of this embodiment includes: a target object 3D detection device 1000 and a second control module 1010.
- the target object 3D detection device 1000 is used to obtain a 3D detection frame of the target object based on the point cloud data.
- the specific structure and specific operations of the target object 3D detection device 1000 are as described in the above-mentioned device and method embodiments, and will not be described in detail here.
- the second control module 1010 is mainly used to generate instructions or warning prompt information for performing obstacle avoidance navigation control on the robot according to the 3D detection frame. For details, reference may be made to the relevant description in the above method embodiment, and no more detailed description is provided here.
- FIG. 11 shows an exemplary device 1100 suitable for implementing the present disclosure.
- the device 1100 may be a control system/electronic system configured in a car, a mobile terminal (eg, smart mobile phone, etc.), a personal computer (PC, eg, desktop computer Or notebook computers, etc.), tablet computers and servers.
- the device 1100 includes one or more processors, a communication part, etc.
- the one or more processors may be: one or more central processing units (CPUs) 1101, and/or one or more utilizations
- the processor can load executable memory stored in a read-only memory (ROM) 1102 or load it from the storage section 1108 into a random access memory (RAM) 1103.
- ROM read-only memory
- RAM random access memory
- the communication part 1112 may include but is not limited to a network card, and the network card may include but not limited to an IB (Infiniband) network card.
- the processor can communicate with the read-only memory 1102 and/or the random access memory 1103 to execute executable instructions, connect to the communication section 1112 through the bus 1104, and communicate with other target devices via the communication section 1112, thereby completing the corresponding steps in the present disclosure .
- IB Infiniband
- ROM1102 is an optional module.
- the RAM 1103 stores executable instructions, or writes executable instructions to the ROM 1102 at runtime.
- the executable instructions cause the central processing unit 1101 to perform the steps included in the target object 3D detection method.
- An input/output (I/O) interface 1105 is also connected to the bus 1104.
- the communication unit 1112 may be integratedly provided, or may be provided with multiple sub-modules (for example, multiple IB network cards), and are respectively connected to the bus.
- the following components are connected to the I/O interface 1105: an input section 1106 including a keyboard, a mouse, etc.; an output section 1107 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; a storage section 1108 including a hard disk, etc. ; And a communication section 1109 including a network interface card such as a LAN card, a modem, etc.
- the communication section 1109 performs communication processing via a network such as the Internet.
- the driver 1110 is also connected to the I/O interface 1105 as necessary.
- a removable medium 1111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed on the drive 1110 as necessary, so that the computer program read out therefrom is installed in the storage portion 1108 as needed.
- FIG. 11 is only an optional implementation method.
- the number and types of the components in FIG. 11 can be selected, deleted, added, or replaced according to actual needs. ;
- GPU1113 and CPU1101 can be set separately, for example, GPU1113 can be integrated on CPU1101, the communication section 1112 can be set separately, can also be integrated Set on CPU1101 or GPU1113, etc.
- the embodiments of the present disclosure include a computer program product that includes a software program tangibly contained on a machine-readable medium.
- a computer program the computer program includes program code for performing the steps shown in the flowchart, and the program code may include instructions corresponding to the steps in the method provided by the present disclosure.
- the computer program may be downloaded and installed from the network through the communication section 1109, and/or installed from the removable medium 1111.
- CPU central processing unit
- the embodiments of the present disclosure also provide a computer program program product for storing computer-readable instructions, which when executed, causes the computer to perform the operations described in any of the above embodiments Target object 3D detection method.
- the computer program product may be implemented in hardware, software, or a combination thereof.
- the computer program product is embodied as a computer storage medium.
- the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.
- the embodiments of the present disclosure also provide another 3D detection method of a target object and its corresponding device and electronic device, computer storage medium, computer program, and computer program product, wherein the target object
- the 3D detection method includes: the first device sends a target object 3D detection instruction to the second device, the instruction causes the second device to perform the target object 3D detection method in any of the above possible embodiments; the first device receives the 3D detection results of the target object.
- the target object 3D detection instruction may specifically be a call instruction
- the first device may instruct the second device to perform the target object 3D detection operation by calling. Accordingly, in response to receiving the call instruction, the second device
- the steps and/or processes in any of the embodiments of the above-described target object 3D detection method may be performed.
- first and second in the embodiments of the present disclosure are only for distinction, and should not be construed as limiting the embodiments of the present disclosure.
- plurality may refer to two or more, and “at least one” may refer to one, two, or more than two.
- any component, data, or structure mentioned in the present disclosure can be generally understood as one or more, unless it is explicitly defined or given the opposite enlightenment in the context.
- description of the embodiments of the present disclosure emphasizes the differences between the embodiments, and the same or similarities can be referred to each other, and for the sake of brevity, they will not be described one by one.
- the method and apparatus, electronic device, and computer-readable storage medium of the present disclosure may be implemented in many ways.
- the method and apparatus, electronic device, and computer-readable storage medium of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware.
- the above order of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless otherwise specifically stated.
- the present disclosure may also be implemented as programs recorded in a recording medium, and these programs include machine-readable instructions for implementing the method according to the present disclosure.
- the present disclosure also covers the recording medium storing the program for executing the method according to the present disclosure.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Traffic Control Systems (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (43)
- 一种目标对象3D检测方法,其特征在于,包括:A 3D detection method for a target object, characterized in that it includes:提取获取到的场景的点云数据的特征信息;Extract the feature information of the obtained point cloud data of the scene;根据所述点云数据的特征信息对所述点云数据进行语义分割,获得所述点云数据中的多个点的第一语义信息;Performing semantic segmentation on the point cloud data according to the feature information of the point cloud data to obtain first semantic information of multiple points in the point cloud data;根据所述第一语义信息预测所述多个点中对应目标对象的至少一个前景点;Predicting at least one front attraction corresponding to the target object among the plurality of points according to the first semantic information;根据所述第一语义信息生成所述至少一个前景点各自对应的3D初始框;Generating a 3D initial frame corresponding to each of the at least one front sight according to the first semantic information;根据所述3D初始框确定所述场景中的所述目标对象的3D检测框。The 3D detection frame of the target object in the scene is determined according to the 3D initial frame.
- 根据权利要求1所述的方法,所述根据所述3D初始框确定所述场景中的所述目标对象的3D检测框,包括:The method of claim 1, the determining the 3D detection frame of the target object in the scene according to the 3D initial frame comprises:获取所述点云数据中的部分区域内的点的特征信息,其中,所述部分区域至少包括一所述3D初始框;Acquiring characteristic information of points in a partial area in the point cloud data, wherein the partial area includes at least one of the 3D initial frames;根据所述部分区域内的点的特征信息对所述部分区域内的点进行语义分割,获得所述部分区域内的点的第二语义信息;Performing semantic segmentation on the points in the partial area according to the characteristic information of the points in the partial area to obtain second semantic information of the points in the partial area;根据所述部分区域内的点的第一语义信息和第二语义信息,确定所述场景中的所述目标对象的3D检测框。The 3D detection frame of the target object in the scene is determined according to the first semantic information and the second semantic information of the points in the partial area.
- 根据权利要求2所述的方法,所述根据所述部分区域内的点的第一语义信息和第二语义信息,确定所述场景中的所述目标对象的3D检测框,包括:The method according to claim 2, the determining the 3D detection frame of the target object in the scene according to the first semantic information and the second semantic information of the points in the partial area includes:根据所述部分区域内的点的第一语义信息和第二语义信息,校正所述3D初始框,得到校正后的3D初始框;Correct the 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area, to obtain a corrected 3D initial frame;根据校正后的3D初始框确定所述场景中的所述目标对象的3D检测框。The 3D detection frame of the target object in the scene is determined according to the corrected 3D initial frame.
- 根据权利要求2所述的方法,所述根据所述部分区域内的点的第一语义信息和第二语义信息,确定所述场景中的所述目标对象的3D检测框,包括:The method according to claim 2, the determining the 3D detection frame of the target object in the scene according to the first semantic information and the second semantic information of the points in the partial area includes:根据所述部分区域内的点的第一语义信息和第二语义信息,确定所述3D初始框对应目标对象的置信度;Determine the confidence of the target object corresponding to the 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area;根据所述3D初始框及其置信度确定所述场景中的所述目标对象的3D检测框。The 3D detection frame of the target object in the scene is determined according to the 3D initial frame and its confidence.
- 根据权利要求2所述的方法,所述根据所述部分区域内的点的第一语义信息和第二语义信息,确定所述场景中的所述目标对象的3D检测框,包括:The method according to claim 2, the determining the 3D detection frame of the target object in the scene according to the first semantic information and the second semantic information of the points in the partial area includes:根据所述部分区域内的点的第一语义信息和第二语义信息校正所述3D初始框,得到校正后的3D初始框;Correct the 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area to obtain a corrected 3D initial frame;根据所述部分区域内的点的第一语义信息和第二语义信息确定所述校正后的3D初始框对应目标对象的置信度;Determine the confidence of the target object corresponding to the corrected 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area;根据所述校正后的3D初始框及其置信度确定所述场景中的所述目标对象的3D检测框。The 3D detection frame of the target object in the scene is determined according to the corrected 3D initial frame and its confidence.
- 根据权利要求2至5中任一项所述的方法,其特征在于,所述部分区域包括:根据预定策略对3D初始框进行边缘扩展,得到的3D扩展框。The method according to any one of claims 2 to 5, wherein the partial area comprises: performing edge expansion on the 3D initial frame according to a predetermined strategy to obtain a 3D expansion frame.
- 根据权利要求6所述的方法,其特征在于,所述3D扩展框,包括:The method according to claim 6, wherein the 3D expansion frame includes:根据预先设定的X轴方向增量、Y轴方向增量和/或Z轴方向增量,对所述3D初始框进行3D空间扩展,形成包含有所述3D初始框的3D扩展框。According to a preset increment in the X-axis direction, increment in the Y-axis direction, and/or increment in the Z-axis direction, the 3D initial frame is expanded in 3D space to form a 3D expansion frame including the 3D initial frame.
- 根据权利要求6或7所述的方法,其特征在于,所述根据所述部分区域内的点的特征信息对所述部分区域内的点进行语义分割,获得所述部分区域内的点的第二语义信息,包括:The method according to claim 6 or 7, wherein the points in the partial area are semantically segmented according to the feature information of the points in the partial area to obtain the number of points in the partial area Two semantic information, including:根据所述3D扩展框的预设目标位置,对点云数据中位于所述3D扩展框内的点的坐标信息进行坐标变换,获取坐标变换后的点的特征信息;Performing coordinate transformation on the coordinate information of the point located in the 3D extension box in the point cloud data according to the preset target position of the 3D extension box to obtain the feature information of the point after the coordinate transformation;根据坐标变换后的点的特征信息,进行基于所述3D扩展框的语义分割,获得所述3D扩展框中的点的第二语义特征。According to the feature information of the point after coordinate transformation, the semantic segmentation based on the 3D extension frame is performed to obtain the second semantic feature of the point in the 3D extension frame.
- 根据权利要求8所述的方法,其特征在于,所述根据坐标变换后的点的特征信息,进行基于所述3D扩展框的语义分割,包括:The method according to claim 8, wherein the performing semantic segmentation based on the 3D extension frame according to the feature information of the coordinate-transformed point includes:根据所述前景点的掩膜以及坐标变换后的点的特征信息,进行基于所述3D扩展框的语义分割。The semantic segmentation based on the 3D expansion frame is performed according to the mask of the front scenic spot and the feature information of the point after coordinate transformation.
- 根据权利要求1所述的方法,所述前景点为多个,所述根据所述3D初始框确定所述场景中的所述目标对象的3D检测框,包括:The method according to claim 1, wherein there are a plurality of front spots, and the determining the 3D detection frame of the target object in the scene according to the 3D initial frame includes:确定多个所述前景点对应的3D初始框之间的重叠度;Determine the degree of overlap between the 3D initial frames corresponding to the plurality of front sights;对重叠度大于设定阈值的3D初始框进行筛选;Screen the 3D initial frame whose overlap is greater than the set threshold;根据筛选后的3D初始框确定所述场景中的所述目标对象的3D检测框。The 3D detection frame of the target object in the scene is determined according to the filtered 3D initial frame.
- 根据权利要求1至10中任一项所述的方法,其特征在于,所述提取获取到的场景的点云数据的特征信息,根据所述点云数据的特征信息对所述点云数据进行语义分割,获得所述点云数据中的多个点的第一语义信息,根据所述第一语义信息预测所述多个点中对应目标对象的至少一个前景点,根据所述第一语义信息生成所述至少一个前景点各自对应的3D初始框,由第一阶段神经网络实现;The method according to any one of claims 1 to 10, wherein the feature information of the point cloud data of the acquired scene is extracted, and the point cloud data is performed according to the feature information of the point cloud data Semantic segmentation, obtaining first semantic information of multiple points in the point cloud data, predicting at least one previous scenic spot of the corresponding target object among the multiple points according to the first semantic information, and according to the first semantic information Generating a 3D initial frame corresponding to each of the at least one front sight, implemented by the first-stage neural network;所述第一阶段神经网络,是利用带有3D标注框的点云数据样本训练获得的。The first-stage neural network is obtained by training using point cloud data samples with 3D annotation frames.
- 根据权利要求11所述的方法,其特征在于,所述第一阶段神经网络的训练过程包括:The method according to claim 11, wherein the training process of the first-stage neural network includes:将点云数据样本提供给所述第一阶段神经网络,基于所述第一阶段神经网络提取所述点云数据样本的特征信息,根据所述点云数据样本的特征信息对所述点云数据样本进行语义分割,根据语义分割获得的多个点的第一语义特征,预测所述多个点中对应目标对象的至少一个前景点,并根据所述第一语义信息生成所述至少一个前景点各自对应的3D初始框;Providing point cloud data samples to the first-stage neural network, extracting characteristic information of the point cloud data samples based on the first-stage neural network, and comparing the point cloud data according to the characteristic information of the point cloud data samples The sample performs semantic segmentation, predicts at least one front attraction corresponding to the target object among the multiple points according to the first semantic features of the multiple points obtained by the semantic segmentation, and generates the at least one front attraction according to the first semantic information Each corresponding 3D initial frame;获取所述前景点对应的损失、以及所述3D初始框相对于相应的3D标注框形成的损失,并根据所述损失,对所述第一阶段神经网络中的网络参数进行调整。Obtain the loss corresponding to the front sight and the loss formed by the 3D initial frame relative to the corresponding 3D annotation frame, and adjust the network parameters in the first-stage neural network according to the loss.
- 根据权利要求12所述的方法,其特征在于,所述获取所述前景点对应的损失、以及所述3D初始框相对于相应的3D标注框形成的损失,并根据所述损失,对所述第一阶段神经网络中的网络参数进行调整,包括:The method according to claim 12, characterized in that, the loss corresponding to the front sight and the loss formed by the 3D initial frame relative to the corresponding 3D annotation frame are formed, and according to the loss, the The network parameters in the first stage neural network are adjusted, including:根据所述第一阶段神经网络预测的所述前景点的置信度,确定所述前景点预测结果对应的第一损失;Determine the first loss corresponding to the prediction result of the front sight according to the confidence of the front sight predicted by the first-stage neural network;根据针对所述前景点生成的3D初始框中的参数所在的桶的编号、以及所述点云数据样本中的3D标注框信息中的参数所在的桶的编号,产生第二损失;The second loss is generated according to the number of the bucket where the parameter in the 3D initial frame generated for the previous scenic spot is located and the number of the bucket where the parameter in the 3D annotation frame information in the point cloud data sample is located;根据针对所述前景点生成的3D初始框中的参数在对应桶内的偏移量、以及所述点云数据样本中的3D标注框信息中的参数在对应桶内的偏移量,产生第三损失;According to the offset of the parameter in the 3D initial frame generated for the front sight in the corresponding bucket and the offset of the parameter in the 3D annotation frame information in the point cloud data sample in the corresponding bucket, the first Three losses;根据针对所述前景点生成的3D初始框中的参数相对于预定参数的偏移量,产生第四损失;The fourth loss is generated according to the offset of the parameter in the 3D initial frame generated for the front sight from the predetermined parameter;根据所述前景点的坐标参数相对于针对该前景点生成的3D初始框中的坐标参数的偏移量产生的第五损失;A fifth loss generated according to the offset of the coordinate parameter of the front sight relative to the coordinate parameter of the 3D initial frame generated for the front sight;根据所述第一损失、第二损失、第三损失、第四损失和第五损失,对所述第一阶段神经网络的网络参数进行调整。The network parameters of the first-stage neural network are adjusted according to the first loss, second loss, third loss, fourth loss, and fifth loss.
- 根据权利要求2至9中任一项所述的方法,其特征在于,所述获取所述点云数据中的部分区域内的点的特征信息,根据所述部分区域内的点的特征信息对所述部分区域内的点进行语义分割,获得所述部分区域内的点的第二语义信息,根据所述部分区域内的点的所述第一语义信息和所述第二语义信息,确定所述场景中的所述目标对象的3D检测框,由第二阶段神经网络实现;The method according to any one of claims 2 to 9, wherein the acquiring feature information of points in a partial area in the point cloud data, according to the feature information of points in the partial area Perform semantic segmentation of the points in the partial area to obtain second semantic information of the points in the partial area, and determine the points based on the first semantic information and the second semantic information of the points in the partial area The 3D detection frame of the target object in the scene is implemented by the second-stage neural network;所述第二阶段神经网络,是利用带有3D标注框的点云数据样本训练获得的。The second-stage neural network is obtained by training using point cloud data samples with 3D annotation frames.
- 根据权利要求14所述的方法,其特征在于,所述第二阶段神经网络的训练过程包括:The method according to claim 14, wherein the training process of the second-stage neural network includes:将所述3D初始框提供给第二阶段神经网络,基于第二阶段神经网络获取所述点云数据样本中的部分区域内的点的特征信息,根据所述点云数据样本中的部分区域内的点的特征信息,对所述点云数据样本中的部分区域内的点进行语义分割,获得所述点云数据样本中的部分区域内的点的第二语义特征;根据所述点云数据样本中的部分区域内的点的第一语义特征和第二语义特征,确定所述3D初始框为目标对象的置信度,并根据所述点云数据样本中的部分区域内的点的第一语义特征和第二语义特征,生成位置校正后的3D初始框;The 3D initial frame is provided to the second-stage neural network, and based on the second-stage neural network, the feature information of the points in the partial area of the point cloud data sample is obtained, and according to the partial area in the point cloud data sample The characteristic information of the points, performing semantic segmentation on the points in the partial area in the point cloud data sample to obtain the second semantic characteristics of the points in the partial area in the point cloud data sample; according to the point cloud data The first semantic feature and the second semantic feature of the points in the partial area of the sample, determine the confidence of the 3D initial frame as the target object, and according to the first point of the point in the partial area of the point cloud data sample The semantic feature and the second semantic feature generate a 3D initial frame after position correction;获取所述3D初始框为目标对象的置信度对应的损失、以及所述位置校正后的3D初始框相对于相应的3D标注框形成的损失,并根据所述损失,对所述第二阶段神经网络中的网络参数进行调整。Acquiring the loss corresponding to the confidence of the target object in the 3D initial frame and the loss formed by the position-corrected 3D initial frame relative to the corresponding 3D annotation frame, and according to the loss, the second-stage nerve Adjust the network parameters in the network.
- 根据权利要求15所述的方法,其特征在于,所述获取所述3D初始框为目标对象的置信度对应的损失、以及所述位置校正后的3D初始框相对于相应的3D标注框形成的损失,并根据所述损失,对所述第二阶段神经网络中的网络参数进行调整,包括:The method according to claim 15, wherein the acquiring the 3D initial frame is a loss corresponding to the confidence of the target object, and the position-corrected 3D initial frame is formed relative to the corresponding 3D annotation frame Loss, and adjust the network parameters in the second-stage neural network according to the loss, including:根据第二阶段神经网络预测的3D初始框为目标对象的置信度,确定预测结果对应的第六损失;Determine the sixth loss corresponding to the prediction result according to the confidence level of the target 3D initial frame predicted by the second stage neural network;根据第二阶段神经网络生成的与相应的3D标注框的重叠度超过设定阈值的位置校正后的3D初始框中的参数所在的桶的编号、以及点云数据样本中的3D标注框信息中的参数所在的桶的编号,产生第七损失;According to the number of buckets where the parameters in the 3D initial frame after correction in the position where the overlap with the corresponding 3D label frame generated by the second stage neural network exceeds the set threshold and the 3D label frame information in the point cloud data sample The number of the barrel where the parameter is located, resulting in the seventh loss;根据第二阶段神经网络生成的与相应的3D标注框的重叠度超过设定阈值的位置校正后的3D初始框中的参数在对应桶内的偏移量、以及点云数据样本中的3D标注框信息中的参数在对应桶内的偏移 量,产生第八损失;The offset of the parameters in the corresponding 3D initial frame of the corrected 3D initial frame generated by the neural network generated in the second stage and the corresponding overlap of the corresponding 3D label frame in the corresponding threshold, and the 3D label in the point cloud data sample The offset of the parameter in the box information in the corresponding bucket generates an eighth loss;根据第二阶段神经网络生成的与相应的3D标注框的重叠度超过设定阈值的位置校正后的3D初始框中的参数相对于预定参数的偏移量,产生第九损失;The ninth loss is generated according to the offset of the parameters in the 3D initial frame after correction in the position of the position of the 3D initial frame generated by the neural network in the second stage that exceeds the set threshold, relative to the predetermined parameters;根据第二阶段神经网络生成的与相应的3D标注框的重叠度超过设定阈值的位置校正后的3D初始框中的坐标参数相对于3D标注框的中心点的坐标参数的偏移量,产生第十损失;According to the offset of the coordinate parameter of the 3D initial frame after the position generated by the second stage neural network and the corresponding overlap degree of the 3D annotation frame exceeds the set threshold, relative to the coordinate parameter of the center point of the 3D annotation frame, is generated Tenth loss;根据所述第六损失、第七损失、第八损失、第九损失和第十损失,调整所述第二阶段神经网络的网络参数。Adjust the network parameters of the second-stage neural network according to the sixth loss, seventh loss, eighth loss, ninth loss, and tenth loss.
- 一种车辆智能控制方法,其特征在于,所述方法包括:A vehicle intelligent control method, characterized in that the method includes:采用如权利要求1至16中任一项所述的目标对象3D检测方法,获得目标对象的3D检测框;The 3D detection method of the target object according to any one of claims 1 to 16 is used to obtain a 3D detection frame of the target object;根据所述3D检测框,生成对车辆进行控制的指令或者预警提示信息。According to the 3D detection frame, an instruction or early warning information for controlling the vehicle is generated.
- 根据权利要求17所述的方法,所述根据所述3D检测框,生成对车辆进行控制的指令或者预警提示信息,包括:The method according to claim 17, the generating instructions or early warning information for controlling the vehicle according to the 3D detection frame includes:根据所述3D检测框,确定所述目标对象的以下至少之一信息:所述目标对象在场景中的空间位置、大小、与车辆的距离、与车辆的相对方位信息;According to the 3D detection frame, determine at least one of the following information of the target object: the spatial position, size, distance to the vehicle, and relative orientation information of the target object in the scene;根据确定的所述至少之一信息,生成对所述车辆进行控制的指令或者预警提示信息。According to the determined at least one piece of information, an instruction or early warning information for controlling the vehicle is generated.
- 一种避障导航方法,其特征在于,所述方法包括:An obstacle avoidance navigation method, characterized in that the method includes:采用如权利要求1至16中任一项所述的目标对象3D检测方法,获得目标对象的3D检测框;The 3D detection method of the target object according to any one of claims 1 to 16 is used to obtain a 3D detection frame of the target object;根据所述3D检测框,生成对机器人进行避障导航控制的指令或者预警提示信息。According to the 3D detection frame, an instruction or warning prompt information for performing obstacle avoidance navigation control on the robot is generated.
- 根据权利要求19所述的方法,所述根据所述3D检测框,生成对机器人进行避障导航控制的指令或者预警提示信息,包括:The method according to claim 19, the generating instructions or warning prompt information for performing obstacle avoidance navigation control on the robot according to the 3D detection frame includes:根据所述3D检测框,确定所述目标对象的以下至少之一信息:所述目标对象在场景中的空间位置、大小、与机器人的距离、与机器人的相对方位信息;According to the 3D detection frame, determine at least one of the following information of the target object: the spatial position, size, distance to the robot, and relative orientation information of the target object in the scene;根据确定的所述至少之一信息,生成对所述机器人进行避障导航控制的指令或者预警提示信息。According to the determined at least one piece of information, an instruction or early warning prompt information for performing obstacle avoidance navigation control on the robot is generated.
- 一种目标对象3D检测装置,其特征在于,包括:A target object 3D detection device, characterized in that it includes:提取特征模块,用于提取获取到的场景的点云数据的特征信息;Feature extraction module, used to extract the feature information of the acquired point cloud data of the scene;第一语义分割模块,用于根据所述点云数据的特征信息对所述点云数据进行语义分割,获得所述点云数据中的多个点的第一语义信息;A first semantic segmentation module, configured to perform semantic segmentation on the point cloud data according to the feature information of the point cloud data, to obtain first semantic information of multiple points in the point cloud data;预测前景点模块,用于根据所述第一语义信息预测所述多个点中对应目标对象的至少一个前景点;A pre-prediction point module for predicting at least one pre-point of sight corresponding to the target object in the plurality of points according to the first semantic information;生成初始框模块,用于根据所述第一语义信息生成所述至少一个前景点各自对应的3D初始框;Generating an initial frame module, configured to generate a 3D initial frame corresponding to each of the at least one front sight according to the first semantic information;确定检测框模块,用于根据所述3D初始框确定所述场景中的所述目标对象的3D检测框。The detection frame determination module is configured to determine a 3D detection frame of the target object in the scene according to the 3D initial frame.
- 根据权利要求21所述的装置,所述确定检测框模块,进一步包括:The apparatus of claim 21, the determination detection frame module further comprising:第一子模块,用于获取所述点云数据中的部分区域内的点的特征信息,其中,所述部分区域至少包括一所述3D初始框;The first sub-module is used to obtain feature information of points in a partial area in the point cloud data, wherein the partial area includes at least one initial 3D frame;第二子模块,用于根据所述部分区域内的点的特征信息对所述部分区域内的点进行语义分割,获得所述部分区域内的点的第二语义信息;A second sub-module for semantically segmenting the points in the partial area according to the feature information of the points in the partial area to obtain second semantic information of the points in the partial area;第三子模块,用于根据所述部分区域内的点的第一语义信息和第二语义信息,确定所述场景中的所述目标对象的3D检测框。The third submodule is used to determine the 3D detection frame of the target object in the scene according to the first semantic information and the second semantic information of the points in the partial area.
- 根据权利要求22所述的装置,所述第三子模块包括:The apparatus according to claim 22, the third submodule comprises:第四子模块,用于根据所述部分区域内的点的第一语义信息和第二语义信息,校正所述3D初始框,得到校正后的3D初始框;A fourth submodule, configured to correct the 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area, to obtain a corrected 3D initial frame;第五子模块,用于根据校正后的3D初始框确定所述场景中的所述目标对象的3D检测框。The fifth sub-module is used to determine the 3D detection frame of the target object in the scene according to the corrected 3D initial frame.
- 根据权利要求22所述的装置,所述第三子模块,进一步用于:The apparatus according to claim 22, the third submodule is further used for:根据所述部分区域内的点的第一语义信息和第二语义信息,确定所述3D初始框对应目标对象的置信度;Determine the confidence of the target object corresponding to the 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area;根据所述3D初始框及其置信度确定所述场景中的所述目标对象的3D检测框。The 3D detection frame of the target object in the scene is determined according to the 3D initial frame and its confidence.
- 根据权利要求22所述的装置,所述第三子模块包括:The apparatus according to claim 22, the third submodule comprises:第四子模块,用于根据所述部分区域内的点的第一语义信息和第二语义信息,校正所述3D初始框,得到校正后的3D初始框;A fourth submodule, configured to correct the 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area, to obtain a corrected 3D initial frame;第六子模块,用于根据所述部分区域内的点的第一语义信息和第二语义信息确定所述校正后的3D初始框对应目标对象的置信度;A sixth submodule, configured to determine the confidence of the target object corresponding to the corrected 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area;第七子模块,用于根据所述校正后的3D初始框及其置信度确定所述场景中的所述目标对象的3D检测框。The seventh sub-module is used to determine the 3D detection frame of the target object in the scene according to the corrected 3D initial frame and its confidence.
- 根据权利要求22至25中任一项所述的装置,其特征在于,所述部分区域包括:根据预定策略对3D初始框进行边缘扩展,得到的3D扩展框。The device according to any one of claims 22 to 25, wherein the partial area includes: a 3D expansion frame obtained by performing edge expansion on the 3D initial frame according to a predetermined strategy.
- 根据权利要求26所述的装置,其特征在于,所述3D扩展框,包括:The apparatus according to claim 26, wherein the 3D expansion frame comprises:根据预先设定的X轴方向增量、Y轴方向增量和/或Z轴方向增量,对所述3D初始框进行3D空间扩展,形成包含有所述3D初始框的3D扩展框。According to a preset increment in the X-axis direction, increment in the Y-axis direction, and/or increment in the Z-axis direction, the 3D initial frame is expanded in 3D space to form a 3D expansion frame including the 3D initial frame.
- 根据权利要求26或27所述的装置,其特征在于,所述第二子模块包括:The device according to claim 26 or 27, wherein the second submodule includes:第八子模块,用于根据所述3D扩展框的预设目标位置,对点云数据中位于所述3D扩展框内的点的坐标信息进行坐标变换,获取坐标变换后的点的特征信息;The eighth submodule is used to perform coordinate transformation on the coordinate information of the point in the 3D extension box in the point cloud data according to the preset target position of the 3D extension box, to obtain the feature information of the point after the coordinate transformation;第九子模块,用于根据坐标变换后的点的特征信息,进行基于所述3D扩展框的语义分割,获得所述3D扩展框中的点的第二语义特征。The ninth sub-module is used to perform semantic segmentation based on the 3D extension box according to the feature information of the coordinate-transformed point to obtain the second semantic feature of the point in the 3D extension box.
- 根据权利要求28所述的装置,其特征在于,所述第九子模块进一步用于:The apparatus according to claim 28, wherein the ninth sub-module is further used to:根据所述前景点的掩膜以及坐标变换后的点的特征信息,进行基于所述3D扩展框的语义分割。The semantic segmentation based on the 3D expansion frame is performed according to the mask of the front scenic spot and the feature information of the point after coordinate transformation.
- 根据权利要求21所述的装置,所述前景点为多个,所述确定检测框模块进一步用于:The apparatus according to claim 21, wherein there are a plurality of front spots, and the determination detection frame module is further used to:确定多个所述前景点对应的3D初始框之间的重叠度;Determine the degree of overlap between the 3D initial frames corresponding to the plurality of front sights;对重叠度大于设定阈值的3D初始框进行筛选;Screen the 3D initial frame whose overlap is greater than the set threshold;根据筛选后的3D初始框确定所述场景中的所述目标对象的3D检测框。The 3D detection frame of the target object in the scene is determined according to the filtered 3D initial frame.
- 根据权利要求21至30中任一项所述的装置,其特征在于,所述提取特征模块、第一语义分割模块、预测前景点模块和生成初始框模块,由第一阶段神经网络实现,且所述第一阶段神经网络,是第一训练模块利用带有3D标注框的点云数据样本训练获得的。The device according to any one of claims 21 to 30, wherein the feature extraction module, the first semantic segmentation module, the prediction of the pre-spot scenic spot module, and the generation of the initial frame module are implemented by a first-stage neural network, and The first-stage neural network is obtained by training the first training module using point cloud data samples with 3D annotation frames.
- 根据权利要求31所述的装置,其特征在于,所述第一训练模块用于:The apparatus according to claim 31, wherein the first training module is used to:将点云数据样本提供给所述第一阶段神经网络,基于所述第一阶段神经网络提取所述点云数据样本的特征信息,根据所述点云数据样本的特征信息对所述点云数据样本进行语义分割,根据语义分割获得的多个点的第一语义特征,预测所述多个点中对应目标对象的至少一个前景点,并根据所述第一语义信息生成所述至少一个前景点各自对应的3D初始框;Providing point cloud data samples to the first-stage neural network, extracting characteristic information of the point cloud data samples based on the first-stage neural network, and comparing the point cloud data according to the characteristic information of the point cloud data samples The sample performs semantic segmentation, predicts at least one front attraction corresponding to the target object among the multiple points according to the first semantic features of the multiple points obtained by the semantic segmentation, and generates the at least one front attraction according to the first semantic information Each corresponding 3D initial frame;获取所述前景点对应的损失、以及所述3D初始框相对于相应的3D标注框形成的损失,并根据所述损失,对所述第一阶段神经网络中的网络参数进行调整。Obtain the loss corresponding to the front sight and the loss formed by the 3D initial frame relative to the corresponding 3D annotation frame, and adjust the network parameters in the first-stage neural network according to the loss.
- 根据权利要求32所述的装置,其特征在于,所述第一训练模块进一步用于:The apparatus according to claim 32, wherein the first training module is further used to:根据所述第一阶段神经网络预测的所述前景点的置信度,确定所述前景点预测结果对应的第一损失;Determine the first loss corresponding to the prediction result of the front sight according to the confidence of the front sight predicted by the first-stage neural network;根据针对所述前景点生成的3D初始框中的参数所在的桶的编号、以及所述点云数据样本中的3D标注框信息中的参数所在的桶的编号,产生第二损失;The second loss is generated according to the number of the bucket where the parameter in the 3D initial frame generated for the previous scenic spot is located and the number of the bucket where the parameter in the 3D annotation frame information in the point cloud data sample is located;根据针对所述前景点生成的3D初始框中的参数在对应桶内的偏移量、以及所述点云数据样本中的3D标注框信息中的参数在对应桶内的偏移量,产生第三损失;According to the offset of the parameter in the 3D initial frame generated for the front sight in the corresponding bucket and the offset of the parameter in the 3D annotation frame information in the point cloud data sample in the corresponding bucket, the first Three losses;根据针对所述前景点生成的3D初始框中的参数相对于预定参数的偏移量,产生第四损失;The fourth loss is generated according to the offset of the parameter in the 3D initial frame generated for the front sight from the predetermined parameter;根据所述前景点的坐标参数相对于针对该前景点生成的3D初始框中的坐标参数的偏移量产生的第五损失;A fifth loss generated according to the offset of the coordinate parameter of the front sight relative to the coordinate parameter of the 3D initial frame generated for the front sight;根据所述第一损失、第二损失、第三损失、第四损失和第五损失,对所述第一阶段神经网络的网络参数进行调整。The network parameters of the first-stage neural network are adjusted according to the first loss, second loss, third loss, fourth loss, and fifth loss.
- 根据权利要求22至29中任一项所述的装置,其特征在于,所述第一子模块、第二子模块和第三子模块,由第二阶段神经网络实现,且所述第二阶段神经网络,是第二训练模块利用带有3D标注框的点云数据样本训练获得的。The device according to any one of claims 22 to 29, wherein the first submodule, the second submodule, and the third submodule are implemented by a second-stage neural network, and the second stage The neural network is obtained by training the second training module using point cloud data samples with 3D annotation frames.
- 根据权利要求34所述的装置,其特征在于,所述第二训练模块用于:The apparatus according to claim 34, wherein the second training module is used to:将所述3D初始框,提供给第二阶段神经网络,基于第二阶段神经网络获取所述点云数据样本中的部分区域内的点的特征信息,根据所述点云数据样本中的部分区域内的点的特征信息,对所述点云数据样本中的部分区域内的点进行语义分割,获得点云数据样本中的部分区域内的点的第二语义特征;根据所述点云数据样本中的部分区域内的点的第一语义特征和第二语义特征,确定所述3D初始框为目标对象的置信度,并根据所述点云数据样本中的部分区域内的点的第一语义特征和所述第二语义特征,生成位置校正后的3D初始框;The 3D initial frame is provided to the second-stage neural network, and based on the second-stage neural network, the feature information of the points in the partial area of the point cloud data sample is obtained, and according to the partial area in the point cloud data sample Feature information of points within, perform semantic segmentation on points in a partial area in the point cloud data sample to obtain second semantic characteristics of points in a partial area in the point cloud data sample; according to the point cloud data sample The first semantic feature and the second semantic feature of the points in the partial area in the, determine the confidence that the 3D initial frame is the target object, and according to the first semantics of the point in the partial area in the point cloud data sample The feature and the second semantic feature to generate a 3D initial frame after position correction;获取所述3D初始框为目标对象的置信度对应的损失、以及所述位置校正后的3D初始框相对于相 应的3D标注框形成的损失,并根据所述损失,对所述第二阶段神经网络中的网络参数进行调整。Acquiring the loss corresponding to the confidence of the target object in the 3D initial frame and the loss formed by the position-corrected 3D initial frame relative to the corresponding 3D annotation frame, and according to the loss, the second-stage nerve Adjust the network parameters in the network.
- 根据权利要求35所述的装置,其特征在于,所述第二训练模块进一步用于:The apparatus according to claim 35, wherein the second training module is further used to:根据第二阶段神经网络预测的3D初始框为目标对象的置信度,确定预测结果对应的第六损失;Determine the sixth loss corresponding to the prediction result according to the confidence level of the target 3D initial frame predicted by the second stage neural network;根据第二阶段神经网络生成的与相应的3D标注框的重叠度超过设定阈值的位置校正后的3D初始框中的参数所在的桶的编号、以及点云数据样本中的3D标注框信息中的参数所在的桶的编号,产生第七损失;According to the number of buckets where the parameters in the 3D initial frame after correction in the position where the overlap with the corresponding 3D label frame generated by the second stage neural network exceeds the set threshold and the 3D label frame information in the point cloud data sample The number of the barrel where the parameter is located, resulting in the seventh loss;根据第二阶段神经网络生成的与相应的3D标注框的重叠度超过设定阈值的位置校正后的3D初始框中的参数在对应桶内的偏移量、以及点云数据样本中的3D标注框信息中的参数在对应桶内的偏移量,产生第八损失;The offset of the parameters in the corresponding 3D initial frame of the corrected 3D initial frame generated by the neural network generated in the second stage and the corresponding overlap of the corresponding 3D label frame in the corresponding threshold and the 3D label in the point cloud data sample The offset of the parameter in the box information in the corresponding bucket generates an eighth loss;根据第二阶段神经网络生成的与相应的3D标注框的重叠度超过设定阈值的位置校正后的3D初始框中的参数相对于预定参数的偏移量,产生第九损失;The ninth loss is generated according to the offset of the parameters in the 3D initial frame after correction in the position of the position of the 3D initial frame generated by the neural network in the second stage that exceeds the set threshold, relative to the predetermined parameters;根据第二阶段神经网络生成的与相应的3D标注框的重叠度超过设定阈值的位置校正后的3D初始框中的坐标参数相对于3D标注框的中心点的坐标参数的偏移量,产生第十损失;According to the offset of the coordinate parameter of the 3D initial frame after the position generated by the second stage neural network and the corresponding overlap degree of the 3D annotation frame exceeds the set threshold, relative to the coordinate parameter of the center point of the 3D annotation frame, is generated Tenth loss;根据所述第六损失、第七损失、第八损失、第九损失和第十损失,调整所述第二阶段神经网络的网络参数。Adjust the network parameters of the second-stage neural network according to the sixth loss, seventh loss, eighth loss, ninth loss, and tenth loss.
- 一种车辆智能控制装置,其特征在于,所述装置包括:A vehicle intelligent control device, characterized in that the device includes:采用如权利要求21至36中任一项所述的目标对象3D检测装置,获得目标对象的3D检测框;The target object 3D detection device according to any one of claims 21 to 36 is used to obtain a target object 3D detection frame;第一控制模块,用于根据所述3D检测框,生成对车辆进行控制的指令或者预警提示信息。The first control module is configured to generate a command or early warning information for controlling the vehicle according to the 3D detection frame.
- 根据权利要求37所述的装置,所述第一控制模块,进一步用于:The apparatus according to claim 37, the first control module is further used to:根据所述3D检测框,确定所述目标对象的以下至少之一信息:所述目标对象在场景中的空间位置、大小、与车辆的距离、与车辆的相对方位信息;According to the 3D detection frame, determine at least one of the following information of the target object: the spatial position, size, distance to the vehicle, and relative orientation information of the target object in the scene;根据确定的所述至少之一信息,生成对所述车辆进行控制的指令或者预警提示信息。According to the determined at least one piece of information, an instruction or early warning information for controlling the vehicle is generated.
- 一种避障导航装置,其特征在于,所述装置包括:An obstacle avoidance navigation device, characterized in that the device comprises:采用如权利要求21至36中任一项所述的目标对象3D检测装置,获得目标对象的3D检测框;The target object 3D detection device according to any one of claims 21 to 36 is used to obtain a target object 3D detection frame;第二控制模块,用于根据所述3D检测框,生成对机器人进行避障导航控制的指令或者预警提示信息。The second control module is configured to generate an instruction or warning prompt information for performing obstacle avoidance navigation control on the robot according to the 3D detection frame.
- 根据权利要求39所述的装置,所述第二控制模块进一步用于:The apparatus according to claim 39, the second control module is further used to:根据所述3D检测框,确定所述目标对象的以下至少之一信息:所述目标对象在场景中的空间位置、大小、与机器人的距离、与机器人的相对方位信息;According to the 3D detection frame, determine at least one of the following information of the target object: the spatial position, size, distance to the robot, and relative orientation information of the target object in the scene;根据确定的所述至少之一信息,生成对所述机器人进行避障导航控制的指令或者预警提示信息。According to the determined at least one piece of information, an instruction or early warning prompt information for performing obstacle avoidance navigation control on the robot is generated.
- 一种电子设备,包括:An electronic device, including:存储器,用于存储计算机程序;Memory, used to store computer programs;处理器,用于执行所述存储器中存储的计算机程序,且所述计算机程序被执行时,实现上述权利要求1-20中任一项所述的方法。A processor, configured to execute a computer program stored in the memory, and when the computer program is executed, implement the method of any one of claims 1-20.
- 一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现上述权利要求1-20中任一项所述的方法。A computer-readable storage medium on which a computer program is stored, which when executed by a processor, implements the method of any one of claims 1-20 above.
- 一种计算机程序,包括计算机指令,当所述计算机指令在设备的处理器中运行时,实现上述权利要求1-20中任一项所述的方法。A computer program includes computer instructions, and when the computer instructions run in a processor of a device, the method according to any one of claims 1-20 above is implemented.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021526222A JP2022515591A (en) | 2018-11-29 | 2019-11-13 | 3D detection method, device, medium and device of target object |
KR1020217015013A KR20210078529A (en) | 2018-11-29 | 2019-11-13 | Target object 3D detection method, apparatus, medium and device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811446588.8A CN109635685B (en) | 2018-11-29 | 2018-11-29 | Target object 3D detection method, device, medium and equipment |
CN201811446588.8 | 2018-11-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020108311A1 true WO2020108311A1 (en) | 2020-06-04 |
Family
ID=66070171
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/118126 WO2020108311A1 (en) | 2018-11-29 | 2019-11-13 | 3d detection method and apparatus for target object, and medium and device |
Country Status (4)
Country | Link |
---|---|
JP (1) | JP2022515591A (en) |
KR (1) | KR20210078529A (en) |
CN (1) | CN109635685B (en) |
WO (1) | WO2020108311A1 (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111860373A (en) * | 2020-07-24 | 2020-10-30 | 浙江商汤科技开发有限公司 | Target detection method and device, electronic equipment and storage medium |
CN111968133A (en) * | 2020-07-31 | 2020-11-20 | 上海交通大学 | Three-dimensional point cloud data example segmentation method and system in automatic driving scene |
CN112200768A (en) * | 2020-09-07 | 2021-01-08 | 华北水利水电大学 | Point cloud information extraction system based on geographic position |
CN112598635A (en) * | 2020-12-18 | 2021-04-02 | 武汉大学 | Point cloud 3D target detection method based on symmetric point generation |
CN112766206A (en) * | 2021-01-28 | 2021-05-07 | 深圳市捷顺科技实业股份有限公司 | High-order video vehicle detection method and device, electronic equipment and storage medium |
CN112800971A (en) * | 2021-01-29 | 2021-05-14 | 深圳市商汤科技有限公司 | Neural network training and point cloud data processing method, device, equipment and medium |
CN112862953A (en) * | 2021-01-29 | 2021-05-28 | 上海商汤临港智能科技有限公司 | Point cloud data processing method and device, electronic equipment and storage medium |
CN112907760A (en) * | 2021-02-09 | 2021-06-04 | 浙江商汤科技开发有限公司 | Three-dimensional object labeling method and device, tool, electronic equipment and storage medium |
CN112990200A (en) * | 2021-03-31 | 2021-06-18 | 上海商汤临港智能科技有限公司 | Data labeling method and device, computer equipment and storage medium |
CN113298163A (en) * | 2021-05-31 | 2021-08-24 | 国网湖北省电力有限公司黄石供电公司 | Target identification monitoring method based on LiDAR point cloud data |
CN113516013A (en) * | 2021-04-09 | 2021-10-19 | 阿波罗智联(北京)科技有限公司 | Target detection method and device, electronic equipment, road side equipment and cloud control platform |
CN113537316A (en) * | 2021-06-30 | 2021-10-22 | 南京理工大学 | Vehicle detection method based on 4D millimeter wave radar point cloud |
CN113570535A (en) * | 2021-07-30 | 2021-10-29 | 深圳市慧鲤科技有限公司 | Visual positioning method and related device and equipment |
CN113822277A (en) * | 2021-11-19 | 2021-12-21 | 万商云集(成都)科技股份有限公司 | Illegal advertisement picture detection method and system based on deep learning target detection |
CN113984037A (en) * | 2021-09-30 | 2022-01-28 | 电子科技大学长三角研究院(湖州) | Semantic map construction method based on target candidate box in any direction |
CN114241110A (en) * | 2022-02-23 | 2022-03-25 | 北京邮电大学 | Point cloud semantic uncertainty sensing method based on neighborhood aggregation Monte Carlo inactivation |
CN114298581A (en) * | 2021-12-30 | 2022-04-08 | 广州极飞科技股份有限公司 | Quality evaluation model generation method, quality evaluation device, electronic device, and readable storage medium |
CN114743001A (en) * | 2022-04-06 | 2022-07-12 | 合众新能源汽车有限公司 | Semantic segmentation method and device, electronic equipment and storage medium |
CN115880470A (en) * | 2023-03-08 | 2023-03-31 | 深圳佑驾创新科技有限公司 | Method, device and equipment for generating 3D image data and storage medium |
CN116420096A (en) * | 2020-09-24 | 2023-07-11 | 埃尔构人工智能有限责任公司 | Method and system for marking LIDAR point cloud data |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109635685B (en) * | 2018-11-29 | 2021-02-12 | 北京市商汤科技开发有限公司 | Target object 3D detection method, device, medium and equipment |
CN112101066B (en) * | 2019-06-17 | 2024-03-08 | 商汤集团有限公司 | Target detection method and device, intelligent driving method and device and storage medium |
WO2020258218A1 (en) * | 2019-06-28 | 2020-12-30 | 深圳市大疆创新科技有限公司 | Obstacle detection method and device for mobile platform, and mobile platform |
CN110458112B (en) * | 2019-08-14 | 2020-11-20 | 上海眼控科技股份有限公司 | Vehicle detection method and device, computer equipment and readable storage medium |
CN112444784B (en) * | 2019-08-29 | 2023-11-28 | 北京市商汤科技开发有限公司 | Three-dimensional target detection and neural network training method, device and equipment |
CN110751090B (en) * | 2019-10-18 | 2022-09-20 | 宁波博登智能科技有限公司 | Three-dimensional point cloud labeling method and device and electronic equipment |
CN110991468B (en) * | 2019-12-13 | 2023-12-19 | 深圳市商汤科技有限公司 | Three-dimensional target detection and intelligent driving method, device and equipment |
CN111179247A (en) * | 2019-12-27 | 2020-05-19 | 上海商汤智能科技有限公司 | Three-dimensional target detection method, training method of model thereof, and related device and equipment |
CN111507973B (en) * | 2020-04-20 | 2024-04-12 | 上海商汤临港智能科技有限公司 | Target detection method and device, electronic equipment and storage medium |
CN111539347B (en) * | 2020-04-27 | 2023-08-08 | 北京百度网讯科技有限公司 | Method and device for detecting target |
CN111931727A (en) * | 2020-09-23 | 2020-11-13 | 深圳市商汤科技有限公司 | Point cloud data labeling method and device, electronic equipment and storage medium |
CN112183330B (en) * | 2020-09-28 | 2022-06-28 | 北京航空航天大学 | Target detection method based on point cloud |
CN112287939B (en) * | 2020-10-29 | 2024-05-31 | 平安科技(深圳)有限公司 | Three-dimensional point cloud semantic segmentation method, device, equipment and medium |
CN115035359A (en) * | 2021-02-24 | 2022-09-09 | 华为技术有限公司 | Point cloud data processing method, training data processing method and device |
CN113822146A (en) * | 2021-08-02 | 2021-12-21 | 浙江大华技术股份有限公司 | Target detection method, terminal device and computer storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150227775A1 (en) * | 2012-09-11 | 2015-08-13 | Southwest Research Institute | 3-D Imaging Sensor Based Location Estimation |
CN105976400A (en) * | 2016-05-10 | 2016-09-28 | 北京旷视科技有限公司 | Object tracking method and device based on neural network model |
CN108122245A (en) * | 2016-11-30 | 2018-06-05 | 华为技术有限公司 | A kind of goal behavior describes method, apparatus and monitoring device |
CN108171217A (en) * | 2018-01-29 | 2018-06-15 | 深圳市唯特视科技有限公司 | A kind of three-dimension object detection method based on converged network |
CN109635685A (en) * | 2018-11-29 | 2019-04-16 | 北京市商汤科技开发有限公司 | Target object 3D detection method, device, medium and equipment |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008012635A (en) * | 2006-07-07 | 2008-01-24 | Toyota Motor Corp | Personal identification system |
US10733651B2 (en) * | 2014-01-01 | 2020-08-04 | Andrew S Hansen | Methods and systems for identifying physical objects |
CN108509820B (en) * | 2017-02-23 | 2021-12-24 | 百度在线网络技术(北京)有限公司 | Obstacle segmentation method and device, computer equipment and readable medium |
CN108470174B (en) * | 2017-02-23 | 2021-12-24 | 百度在线网络技术(北京)有限公司 | Obstacle segmentation method and device, computer equipment and readable medium |
US10885398B2 (en) * | 2017-03-17 | 2021-01-05 | Honda Motor Co., Ltd. | Joint 3D object detection and orientation estimation via multimodal fusion |
CN107622244B (en) * | 2017-09-25 | 2020-08-28 | 华中科技大学 | Indoor scene fine analysis method based on depth map |
CN108895981B (en) * | 2018-05-29 | 2020-10-09 | 南京怀萃智能科技有限公司 | Three-dimensional measurement method, device, server and storage medium |
-
2018
- 2018-11-29 CN CN201811446588.8A patent/CN109635685B/en active Active
-
2019
- 2019-11-13 WO PCT/CN2019/118126 patent/WO2020108311A1/en active Application Filing
- 2019-11-13 KR KR1020217015013A patent/KR20210078529A/en not_active Application Discontinuation
- 2019-11-13 JP JP2021526222A patent/JP2022515591A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150227775A1 (en) * | 2012-09-11 | 2015-08-13 | Southwest Research Institute | 3-D Imaging Sensor Based Location Estimation |
CN105976400A (en) * | 2016-05-10 | 2016-09-28 | 北京旷视科技有限公司 | Object tracking method and device based on neural network model |
CN108122245A (en) * | 2016-11-30 | 2018-06-05 | 华为技术有限公司 | A kind of goal behavior describes method, apparatus and monitoring device |
CN108171217A (en) * | 2018-01-29 | 2018-06-15 | 深圳市唯特视科技有限公司 | A kind of three-dimension object detection method based on converged network |
CN109635685A (en) * | 2018-11-29 | 2019-04-16 | 北京市商汤科技开发有限公司 | Target object 3D detection method, device, medium and equipment |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111860373A (en) * | 2020-07-24 | 2020-10-30 | 浙江商汤科技开发有限公司 | Target detection method and device, electronic equipment and storage medium |
CN111860373B (en) * | 2020-07-24 | 2022-05-20 | 浙江商汤科技开发有限公司 | Target detection method and device, electronic equipment and storage medium |
WO2022017140A1 (en) * | 2020-07-24 | 2022-01-27 | 浙江商汤科技开发有限公司 | Target detection method and apparatus, electronic device, and storage medium |
CN111968133A (en) * | 2020-07-31 | 2020-11-20 | 上海交通大学 | Three-dimensional point cloud data example segmentation method and system in automatic driving scene |
CN112200768A (en) * | 2020-09-07 | 2021-01-08 | 华北水利水电大学 | Point cloud information extraction system based on geographic position |
CN116420096A (en) * | 2020-09-24 | 2023-07-11 | 埃尔构人工智能有限责任公司 | Method and system for marking LIDAR point cloud data |
CN112598635B (en) * | 2020-12-18 | 2024-03-12 | 武汉大学 | Point cloud 3D target detection method based on symmetric point generation |
CN112598635A (en) * | 2020-12-18 | 2021-04-02 | 武汉大学 | Point cloud 3D target detection method based on symmetric point generation |
CN112766206B (en) * | 2021-01-28 | 2024-05-28 | 深圳市捷顺科技实业股份有限公司 | High-order video vehicle detection method and device, electronic equipment and storage medium |
CN112766206A (en) * | 2021-01-28 | 2021-05-07 | 深圳市捷顺科技实业股份有限公司 | High-order video vehicle detection method and device, electronic equipment and storage medium |
CN112862953A (en) * | 2021-01-29 | 2021-05-28 | 上海商汤临港智能科技有限公司 | Point cloud data processing method and device, electronic equipment and storage medium |
CN112800971A (en) * | 2021-01-29 | 2021-05-14 | 深圳市商汤科技有限公司 | Neural network training and point cloud data processing method, device, equipment and medium |
CN112862953B (en) * | 2021-01-29 | 2023-11-28 | 上海商汤临港智能科技有限公司 | Point cloud data processing method and device, electronic equipment and storage medium |
CN112907760A (en) * | 2021-02-09 | 2021-06-04 | 浙江商汤科技开发有限公司 | Three-dimensional object labeling method and device, tool, electronic equipment and storage medium |
CN112990200A (en) * | 2021-03-31 | 2021-06-18 | 上海商汤临港智能科技有限公司 | Data labeling method and device, computer equipment and storage medium |
CN113516013A (en) * | 2021-04-09 | 2021-10-19 | 阿波罗智联(北京)科技有限公司 | Target detection method and device, electronic equipment, road side equipment and cloud control platform |
CN113516013B (en) * | 2021-04-09 | 2024-05-14 | 阿波罗智联(北京)科技有限公司 | Target detection method, target detection device, electronic equipment, road side equipment and cloud control platform |
CN113298163A (en) * | 2021-05-31 | 2021-08-24 | 国网湖北省电力有限公司黄石供电公司 | Target identification monitoring method based on LiDAR point cloud data |
CN113537316B (en) * | 2021-06-30 | 2024-04-09 | 南京理工大学 | Vehicle detection method based on 4D millimeter wave radar point cloud |
CN113537316A (en) * | 2021-06-30 | 2021-10-22 | 南京理工大学 | Vehicle detection method based on 4D millimeter wave radar point cloud |
CN113570535A (en) * | 2021-07-30 | 2021-10-29 | 深圳市慧鲤科技有限公司 | Visual positioning method and related device and equipment |
CN113984037A (en) * | 2021-09-30 | 2022-01-28 | 电子科技大学长三角研究院(湖州) | Semantic map construction method based on target candidate box in any direction |
CN113984037B (en) * | 2021-09-30 | 2023-09-12 | 电子科技大学长三角研究院(湖州) | Semantic map construction method based on target candidate frame in any direction |
CN113822277A (en) * | 2021-11-19 | 2021-12-21 | 万商云集(成都)科技股份有限公司 | Illegal advertisement picture detection method and system based on deep learning target detection |
CN114298581A (en) * | 2021-12-30 | 2022-04-08 | 广州极飞科技股份有限公司 | Quality evaluation model generation method, quality evaluation device, electronic device, and readable storage medium |
CN114241110B (en) * | 2022-02-23 | 2022-06-03 | 北京邮电大学 | Point cloud semantic uncertainty sensing method based on neighborhood aggregation Monte Carlo inactivation |
CN114241110A (en) * | 2022-02-23 | 2022-03-25 | 北京邮电大学 | Point cloud semantic uncertainty sensing method based on neighborhood aggregation Monte Carlo inactivation |
CN114743001A (en) * | 2022-04-06 | 2022-07-12 | 合众新能源汽车有限公司 | Semantic segmentation method and device, electronic equipment and storage medium |
CN115880470A (en) * | 2023-03-08 | 2023-03-31 | 深圳佑驾创新科技有限公司 | Method, device and equipment for generating 3D image data and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109635685A (en) | 2019-04-16 |
CN109635685B (en) | 2021-02-12 |
JP2022515591A (en) | 2022-02-21 |
KR20210078529A (en) | 2021-06-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020108311A1 (en) | 3d detection method and apparatus for target object, and medium and device | |
CN113486796B (en) | Unmanned vehicle position detection method, unmanned vehicle position detection device, unmanned vehicle position detection equipment, storage medium and vehicle | |
WO2019179464A1 (en) | Method for predicting direction of movement of target object, vehicle control method, and device | |
WO2020253121A1 (en) | Target detection method and apparatus, intelligent driving method and device, and storage medium | |
Zhou et al. | Efficient road detection and tracking for unmanned aerial vehicle | |
US20190156144A1 (en) | Method and apparatus for detecting object, method and apparatus for training neural network, and electronic device | |
US9147255B1 (en) | Rapid object detection by combining structural information from image segmentation with bio-inspired attentional mechanisms | |
US20150253864A1 (en) | Image Processor Comprising Gesture Recognition System with Finger Detection and Tracking Functionality | |
US20210117704A1 (en) | Obstacle detection method, intelligent driving control method, electronic device, and non-transitory computer-readable storage medium | |
EP4088134A1 (en) | Object size estimation using camera map and/or radar information | |
WO2020184207A1 (en) | Object tracking device and object tracking method | |
WO2023116631A1 (en) | Training method and training apparatus for rotating-ship target detection model, and storage medium | |
US20150278589A1 (en) | Image Processor with Static Hand Pose Recognition Utilizing Contour Triangulation and Flattening | |
WO2020238008A1 (en) | Moving object detection method and device, intelligent driving control method and device, medium, and apparatus | |
KR20210012012A (en) | Object tracking methods and apparatuses, electronic devices and storage media | |
CN112651274A (en) | Road obstacle detection device, road obstacle detection method, and recording medium | |
WO2020238073A1 (en) | Method for determining orientation of target object, intelligent driving control method and apparatus, and device | |
US20220335572A1 (en) | Semantically accurate super-resolution generative adversarial networks | |
CN116310993A (en) | Target detection method, device, equipment and storage medium | |
CN115100741A (en) | Point cloud pedestrian distance risk detection method, system, equipment and medium | |
US20230087261A1 (en) | Three-dimensional target estimation using keypoints | |
CN117115414B (en) | GPS-free unmanned aerial vehicle positioning method and device based on deep learning | |
Petković et al. | An overview on horizon detection methods in maritime video surveillance | |
Hashmani et al. | A survey on edge detection based recent marine horizon line detection methods and their applications | |
CN117372928A (en) | Video target detection method and device and related equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19889305 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021526222 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20217015013 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 24.09.2021) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19889305 Country of ref document: EP Kind code of ref document: A1 |