WO2020108311A1 - 3d detection method and apparatus for target object, and medium and device - Google Patents

3d detection method and apparatus for target object, and medium and device Download PDF

Info

Publication number
WO2020108311A1
WO2020108311A1 PCT/CN2019/118126 CN2019118126W WO2020108311A1 WO 2020108311 A1 WO2020108311 A1 WO 2020108311A1 CN 2019118126 W CN2019118126 W CN 2019118126W WO 2020108311 A1 WO2020108311 A1 WO 2020108311A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
information
neural network
initial frame
cloud data
Prior art date
Application number
PCT/CN2019/118126
Other languages
French (fr)
Chinese (zh)
Inventor
史少帅
李鸿升
王晓刚
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to JP2021526222A priority Critical patent/JP2022515591A/en
Priority to KR1020217015013A priority patent/KR20210078529A/en
Publication of WO2020108311A1 publication Critical patent/WO2020108311A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to computer vision technology, and in particular, to a target object 3D detection method and device, vehicle intelligent control method and device, obstacle avoidance navigation method and device, electronic equipment, computer readable storage medium, and computer program.
  • 3D detection can be applied to various technologies such as intelligent driving and obstacle avoidance navigation.
  • intelligent driving technology through 3D detection, the specific location, shape, size, and direction of movement of target objects such as surrounding vehicles and pedestrians of intelligent driving vehicles can be obtained, which can help intelligent driving vehicles make intelligent driving decisions.
  • Embodiments of the present disclosure provide a technical solution for target object 3D detection, vehicle intelligent control driving, and obstacle avoidance navigation.
  • a 3D detection method for a target object includes: extracting characteristic information of point cloud data of the acquired scene; performing semantics on the point cloud data according to the characteristic information of the point cloud data Segmentation to obtain first semantic information of multiple points in the point cloud data; predicting at least one previous scenic spot of the corresponding target object among the multiple points based on the first semantic information; generating based on the first semantic information A 3D initial frame corresponding to each of the at least one front sight; determining a 3D detection frame of the target object in the scene according to the 3D initial frame.
  • a vehicle intelligent control method including: obtaining the 3D detection frame of the target object using the above-mentioned 3D detection method of the target object; and generating an instruction to control the vehicle according to the 3D detection frame Or warning information.
  • an obstacle avoidance navigation method including: obtaining the 3D detection frame of the target object using the above-mentioned 3D detection method of the target object; and generating the obstacle avoidance for the robot according to the 3D detection frame Command or warning information of navigation control.
  • a target object 3D detection device including: an extraction feature module for extracting feature information of point cloud data of the acquired scene; a first semantic segmentation module for The feature information of the point cloud data performs semantic segmentation on the point cloud data to obtain first semantic information of multiple points in the point cloud data; the pre-predicted scenic spot module is used to predict the location based on the first semantic information At least one front sight corresponding to the target object in the plurality of points; generating an initial frame module for generating a 3D initial frame corresponding to each of the at least one front sight according to the first semantic information; determining a detection frame module for The 3D initial frame determines the 3D detection frame of the target object in the scene.
  • a vehicle intelligent control device comprising: using the above target object 3D detection device to obtain a 3D detection frame of the target object; a first control module configured to detect the 3D detection frame, Generate instructions or early warning information to control the vehicle.
  • an obstacle avoidance navigation device including: using the above target object 3D detection device to obtain a 3D detection frame of the target object; a second control module configured to detect the 3D detection frame, Generate instructions or early warning information for the obstacle avoidance navigation control of the robot.
  • an electronic device including: a memory for storing a computer program; a processor for executing the computer program stored in the memory, and when the computer program is executed, it is implemented Any method embodiment of the present disclosure.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, any method embodiment of the present disclosure is implemented.
  • a computer program including computer instructions, which when implemented in a processor of a device, implements any method embodiment of the present disclosure.
  • the point cloud data in the present disclosure Feature extraction, and semantic segmentation of point cloud data based on the extracted feature information, this part is equivalent to the underlying data analysis; the 3D detection frame generated and determined based on the semantic segmentation results in this disclosure is equivalent to the upper layer data analysis Therefore, in the 3D detection process of the target object, the present disclosure has formed a bottom-up way to generate a 3D detection frame.
  • the technical solution provided by the present disclosure is beneficial to improve the detection performance of the 3D detection frame.
  • FIG. 1 is a flowchart of an embodiment of a 3D detection method for a target object of the present disclosure
  • FIG. 2 is a flowchart of another embodiment of the target object 3D detection method of the present disclosure.
  • FIG. 3 is a schematic structural diagram of a first-stage neural network of the present disclosure
  • FIG. 4 is another schematic structural diagram of the first-stage neural network of the present disclosure.
  • FIG. 5 is a schematic structural diagram of a second-stage neural network of the present disclosure.
  • FIG. 6 is a flowchart of an embodiment of a vehicle intelligent control method of the present disclosure
  • FIG. 7 is a flowchart of an embodiment of an obstacle avoidance navigation method of the present disclosure.
  • FIG. 8 is a schematic structural diagram of an embodiment of a target object 3D device of the present disclosure.
  • FIG. 9 is a schematic structural diagram of an embodiment of a vehicle intelligent control device of the present disclosure.
  • FIG. 10 is a schematic structural diagram of an embodiment of an obstacle avoidance navigation device of the present disclosure.
  • FIG. 11 is a block diagram of an exemplary device that implements an embodiment of the present disclosure.
  • Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, and servers, including but not limited to: personal computer systems, server computer systems, thin clients, thick Clients, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, etc. .
  • Electronic devices such as terminal devices, computer systems, and servers may be described in the general context of computer system executable instructions (such as program modules) executed by the computer system.
  • program modules may include routines, programs, target programs, components, logic, and data structures, etc., which perform specific tasks or implement specific abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment, where tasks are performed by remote processing devices linked through a communication network.
  • program modules may be located on local or remote computing system storage media including storage devices.
  • the scene in the present disclosure may refer to a visual-based presentation screen.
  • the image captured by the camera and the point cloud data (Point Cloud Data) obtained by the lidar scan can be regarded as a scene.
  • the point cloud data in the present disclosure generally refers to scanning information recorded in the form of points.
  • point cloud data obtained through lidar scanning.
  • Each point in the point cloud data can be described by a variety of information, and it can also be considered that each point in the point cloud data usually includes a variety of information, for example, it may include but is not limited to one or more of the following: Three-dimensional coordinates of points, color information (such as RGB information, etc.), and reflection intensity (Intensity) information, etc.
  • a point in the point cloud data can be described by one or more types of information such as three-dimensional coordinates, color information, and reflection intensity information.
  • the present disclosure may utilize at least one convolutional layer in the neural network to process the point cloud data to form feature maps of the point cloud data, for example, for each point cloud data
  • Each point forms a piece of feature information. Since the feature information of the point cloud data formed this time is the feature information formed separately for each point in consideration of all points in the entire spatial range of the point cloud data, therefore, the feature information formed this time can be This is called global feature information.
  • S110 Perform semantic segmentation on the point cloud data according to the feature information of the point cloud data to obtain first semantic information of multiple points in the point cloud data.
  • the present disclosure can use a neural network to perform semantic segmentation on point cloud data.
  • the neural network can form a first semantic for each point in the point cloud data, or even for each point in the point cloud data. information. For example, after the point cloud data is provided to the neural network, and the feature information of the point cloud data is extracted by the neural network, the neural network continues to process the feature information of the point cloud data to obtain multiple points in the point cloud data The first semantic information.
  • the first semantic information of a point in the present disclosure generally refers to a semantic feature (SemanticFeature) generated for the point in consideration of the entire point cloud data. Therefore, the first semantic information can be This is called the first semantic feature or global semantic feature.
  • the global semantic features of points in the present disclosure can generally be expressed in the form of a one-dimensional vector array including multiple (eg, 256) elements.
  • the global semantic features in this disclosure may also be referred to as global semantic feature vectors.
  • the front sights and background points in the present disclosure are for the target object.
  • the points belonging to a target object are the front sights of the target object, but not the target object.
  • the point is the background point of the target object.
  • the point belonging to the target object is the front sight of the target object, but since the point does not belong to other target objects, the point is Background points of other target objects.
  • the first semantic information of the multiple points obtained by the present disclosure generally includes: The global semantic features of the front point of the target object and the global semantic features of the background point of the target object.
  • the scene in the present disclosure may include one or more target objects.
  • Target objects in this disclosure include, but are not limited to: vehicles, non-motor vehicles, pedestrians, and/or obstacles, and the like.
  • the present disclosure may use a neural network to predict at least one front point of the corresponding target object among multiple points, the neural network may be a part of points in the point cloud data, or even each point in the point cloud data , To make predictions separately to generate the confidence level of the point as the previous scenic spot.
  • the confidence of a point can be expressed as: the probability of the point being the front sight.
  • the neural network continues to process the global semantic features to predict the point
  • Multiple points in the cloud data are the confidence of the target object's front sight, and the neural network can generate the confidence for each point separately.
  • each confidence level generated by the neural network can be judged separately, and the point whose confidence level exceeds a predetermined value can be used as the front sight of the target object.
  • the operation of determining the confidence in the present disclosure may be performed in S120 or S130.
  • the confidence judgment operation is performed in S120, and the judgment result is that there is no point where the confidence exceeds a predetermined value, that is, there is no previous scenic spot, it can be considered that there is no target object in the scene.
  • the present disclosure may obtain a global semantic feature of each point in S110, and generate a 3D initial frame for each point.
  • all the confidences obtained in S120 can be judged to select the front attractions of the target object, and the selected front attractions can be used to select from the 3D initial frame generated by S130, so that each front attraction can be corresponding to each other 3D initial box. That is, each 3D initial frame generated by S130 usually includes: a 3D initial frame corresponding to the front sight and a 3D initial frame corresponding to the background point, so S130 needs to filter out the 3D initial frames corresponding to each front sight from all the generated 3D initial frames.
  • the present disclosure may generate a 3D initial frame respectively according to the global semantic features of each of the predicted spots predicted above, thereby obtaining each The 3D initial frames are the 3D initial frames corresponding to the front sight. That is, each 3D initial frame generated by S130 is a 3D initial frame corresponding to the front sight, that is to say, S130 may generate a 3D initial frame only for the front sight.
  • the 3D initial frame in the present disclosure may be described by the position information of the center point of the 3D initial frame, the length, width, and height information of the 3D initial frame, and the direction information of the 3D initial frame, that is, in the present disclosure
  • the 3D initial frame may include position information of the center point of the 3D initial frame, length, width, and height information of the 3D initial frame, and direction information of the 3D initial frame.
  • the 3D initial frame may also be referred to as 3D initial frame information.
  • the present disclosure may utilize neural networks to generate 3D initial boxes. For example, after the point cloud data is provided to the neural network, the feature information of the point cloud data is extracted by the neural network, and after the semantic segmentation process is performed by the neural network, the neural network continues to process the global semantic features to target multiple Each of the points generates a 3D initial frame.
  • the neural network when the point cloud data is provided to the neural network, the feature information of the point cloud data is extracted by the neural network, and the neural network performs semantic segmentation processing, and the neural network performs prediction processing on the global semantic features to obtain points After multiple points in the cloud data are the confidence of the front sight of the target object, the neural network can continue to process the global semantic features of the points whose confidence exceeds the predetermined value to generate a 3D initial frame for each front sight.
  • semantic segmentation is based on the feature information of all points in the point cloud data
  • the semantic features formed by semantic segmentation include not only the semantic features of the point itself, but also the semantics of surrounding points Feature, so that multiple front sights in this disclosure can semantically point to the same target object in the scene.
  • the corresponding 3D initial frames corresponding to different front attractions that point to the same target object are somewhat different, but usually the difference is not large.
  • the 3D initial frame corresponding to the front sight does not exist in the 3D initial frame generated by S130 according to the first semantic information, it may be considered that there is no target object in the scene.
  • S140 Determine the 3D detection frame of the target object in the scene according to the 3D initial frame.
  • the present disclosure finally determines a 3D detection frame for each target object.
  • the present disclosure may perform redundant processing on the aforementioned 3D initial frames corresponding to all the front sights, thereby obtaining a 3D detection frame of the target object, that is, performing target object detection on point cloud data, and finally obtaining 3D detection frame.
  • the present disclosure may use the degree of overlap between the 3D initial frames to remove redundant 3D initial frames, thereby obtaining the 3D detection frame of the target object.
  • the present disclosure may determine the degree of overlap between the 3D initial frames corresponding to multiple front sights, filter the 3D initial frames whose overlap is greater than the set threshold, to obtain the 3D initial frames whose overlap is greater than the set threshold, and then, From the filtered 3D initial frame, the 3D detection frame of the target object is determined.
  • the present disclosure may use the NMS (Non-Maximum Suppression Non-Maximum Suppression) algorithm to perform redundant processing on the 3D initial frames corresponding to all the front spots, thereby removing redundant 3D detection frames that overlap each other, and obtain The final 3D detection frame.
  • NMS Non-Maximum Suppression Non-Maximum Suppression
  • the present disclosure can obtain a final object for each target object in the scene 3D detection box.
  • the present disclosure may perform correction (or optimization) on the 3D initial frames corresponding to the currently obtained front spots, and then perform redundant processing on all the corrected 3D initial frames to obtain The 3D detection frame of the target object, that is, the 3D detection frame finally obtained by performing target object detection on the point cloud data.
  • the process of respectively correcting the 3D initial frame corresponding to each front sight in the present disclosure may include the following steps A1, B1, and C1:
  • Step A1 Acquire feature information of points in a partial area in the point cloud data, where the partial area includes at least a 3D initial frame.
  • the present disclosure may set a 3D expansion frame containing a 3D initial frame, and obtain feature information of each point in the 3D expansion frame in the point cloud data.
  • the 3D expansion box in the present disclosure is an implementation of partial regions in point cloud data.
  • the 3D initial frame corresponding to each front sight in the present disclosure respectively corresponds to a 3D expansion frame, and the space range occupied by the 3D expansion frame generally completely covers and is slightly larger than the space range occupied by the 3D initial frame.
  • any surface of the 3D initial frame is not in the same plane as any surface of its corresponding 3D expansion frame, the center point of the 3D initial frame and the center point of the 3D expansion frame coincide with each other, and any surface of the 3D initial frame Both are parallel to the corresponding faces of their corresponding 3D expansion boxes. Since the positional relationship between such a 3D extension frame and the 3D initial frame is relatively standardized, it is beneficial to reduce the difficulty of forming a 3D extension frame, thereby helping to reduce the implementation difficulty of the present disclosure. Of course, the present disclosure does not exclude the case that although the two center points do not coincide, any face of the 3D initial frame is parallel to the corresponding face of the corresponding 3D extension frame.
  • the present disclosure may be based on at least one of a preset X-axis direction increment (such as 20 cm), a Y-axis direction increment (such as 20 cm), and a Z-axis direction increment (such as 20 cm).
  • the 3D initial frame corresponding to the front sight is expanded in 3D space, so as to form a 3D expansion frame including the 3D initial frame where two center points coincide with each other and the corresponding surfaces are parallel to each other.
  • the increment in the present disclosure can be set according to actual needs, for example, the increment in the corresponding direction does not exceed N (such as N greater than 4) of the corresponding side length of the 3D initial frame, optional,
  • N such as N greater than 4
  • the increment in the X axis direction does not exceed one tenth of the length of the 3D initial frame
  • the increment in the Y axis direction does not exceed one tenth of the width of the 3D initial frame
  • the increment in the Z axis direction does not exceed ten times the height of the 3D initial frame
  • the increment in the X-axis direction, the increment in the Y-axis direction, and the increment in the Z-axis direction may be the same or different.
  • represents increment.
  • the present disclosure may use a neural network to obtain feature information of points in a part of the area in the point cloud data, for example, all points in the part of the area in the point cloud data are used as input to the neural network, and the neural network At least one convolutional layer in processes the point cloud data in the partial area, so that feature information can be formed for each point in the partial area.
  • the feature information formed this time may be referred to as local feature information.
  • the feature information of the point cloud data formed this time is the feature information separately formed for each point in the partial area when considering all the points in the partial area of the point cloud data. Therefore, the features formed this time Information can be called local feature information.
  • Step B1 Perform semantic segmentation on the points in the partial area according to the feature information of the points in the partial area to obtain second semantic information on the points in the partial area.
  • the second semantic information of a point in the present disclosure refers to: a semantic feature vector formed for the point in consideration of all points in the spatial range formed by the 3D extension box.
  • the second semantic information in this disclosure may be referred to as a second semantic feature or a local spatial semantic feature.
  • a local spatial semantic feature can also be expressed in the form of a one-dimensional vector array including multiple (eg, 256) elements.
  • a neural network may be used to obtain local spatial semantic features of all points in the 3D expansion box, and a method of using neural networks to obtain local spatial semantic features of points may include the following steps a and b:
  • the preset target position of the 3D extension frame may include: the center point of the 3D extension frame (that is, the center point of the 3D initial frame) is located at the origin of coordinates, and the length of the 3D extension frame is parallel to the X axis.
  • the above coordinate origin and X axis may be the coordinate origin and X axis of the coordinate system of the point cloud data, and of course, may also be the coordinate origin and X axis of other coordinate systems.
  • b i (x i , y i , z i , h i , w i , l i , ⁇ i ), where x i , y i and Z i represent the coordinates of the center point of the i-th frame of the original 3D, h i, w i L i and a high width to length respectively represent the i-th original frame 3D, [theta] i represents the i-th frame of the original 3D directions, e.g.
  • the angle between the length of the i-th 3D initial frame and the X coordinate axis is ⁇ i ; then, after performing coordinate transformation on the 3D extended frame that contains the i-th 3D initial frame, the present disclosure obtains A new 3D initial box
  • the new 3D initial box It can be expressed as:
  • the new 3D initial box The center point of is located at the origin of coordinates, and in a bird’s eye view, the new 3D initial frame The angle between the length of and the X coordinate axis is 0.
  • the above coordinate transformation manner of the present disclosure may be referred to as regularized coordinate transformation.
  • the present disclosure performs coordinate conversion on a point, and usually only changes the coordinate information of the point, but does not change other information of a point.
  • the coordinates of the points in different 3D initial frames can be concentrated in a rough range, which is beneficial to the training of the neural network, that is, to improving the neural network to form local spatial semantics
  • the accuracy of the features helps to improve the accuracy of the 3D initial frame correction.
  • the data method of coordinate transformation described above is only an optional example, and those skilled in the art may also adopt other transformation methods that transform the coordinates to a certain range.
  • the coordinate-converted point cloud data (that is, the coordinate-converted point cloud data located in the 3D extension box) is provided to the neural network, and the neural network performs semantic segmentation processing on the received points to be located in the 3D extension box. Each point within generates a local spatial semantic feature.
  • the present disclosure may form a front sight mask (eg, set the point where the confidence exceeds a predetermined value (such as 0.5, etc.) to 1 according to the confidence generated for the front sight in the above steps, and set the confidence to Points exceeding a predetermined value are set to 0, thereby forming a mask of the front sight).
  • the present disclosure can provide the front sight mask and the coordinate-transformed point cloud data together to the neural network, so that the neural network can refer to the front sight mask when performing semantic processing, thereby helping to improve the description accuracy of local spatial semantic features .
  • Step C1 Form the corrected 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area.
  • the method for obtaining the global semantic characteristics of multiple points in the 3D extension frame in the present disclosure may be: first, according to the coordinate information of each point in the point cloud data, determine whether each point belongs to the spatial range of the 3D extension frame Whether it is located in the 3D extension frame, may include: located on any surface of the 3D extension surface), for a point, if the position of the point belongs to the spatial range of the 3D extension frame, the point can be regarded as belonging to the 3D extension frame Point; if the position of the point does not belong to the spatial range of the 3D extension box, the point will not be regarded as a point belonging to the 3D extension box.
  • the global semantic features of multiple points are determined.
  • the global semantic feature of the point can be found from the global semantic features of each point obtained by the foregoing, and so on, the present disclosure can obtain 3D Global semantic features of all points of the expansion box.
  • the neural network can process the global semantic features and local semantic features of each point, and obtain the corrected 3D initial frame according to the processing result of the neural network.
  • the neural network encodes the global semantic features and local spatial semantic features of the points in the 3D extension box to obtain the characteristics of the 3D initial box used to describe the 3D extension box, and uses the neural network to describe the 3D
  • the characteristics of the initial frame predict the confidence of the 3D initial frame as the target object, and adjust the 3D initial frame according to the characteristics used to describe the 3D initial frame via the neural network, thereby obtaining the corrected 3D initial frame.
  • it is beneficial to the accuracy of the 3D initial frame, thereby helping to improve the accuracy of the 3D detection frame.
  • the global semantic feature and local spatial semantic feature of each point in the 3D extension box can be stitched.
  • the global semantic feature and the point The local spatial semantic features are stitched together to form the stitched semantic features.
  • the stitched semantic features are used as input to the neural network to facilitate the neural network to encode the stitched semantic features, and the neural network generates the encoding After processing, it is used to describe the characteristics of the 3D initial frame in the 3D extension frame (hereinafter referred to as the characteristics after the encoding process).
  • the neural network can predict the confidence of the 3D initial frame as the target object for each input encoded feature, and for each 3D initial frame, form Confidence.
  • the confidence level can represent the probability that the corrected 3D initial frame is the target object.
  • the neural network can form a new 3D initial frame (that is, the corrected 3D initial frame) for each input processed feature.
  • the neural network respectively forms the position information of the center point of the new 3D initial frame, the length, width, and height information of the new 3D initial frame, and the direction information of the new 3D initial frame according to the input features after each encoding process.
  • the present disclosure performs redundant processing on all the 3D initial frames after correction, so as to obtain the process of obtaining the 3D detection frame of the target object. Please refer to the corresponding descriptions above, which will not be described in detail here.
  • one embodiment of the target object 3D detection method of the present disclosure includes steps: S200 and S210. Each step in FIG. 2 is described in detail below.
  • S200 Provide point cloud data to a neural network, perform feature extraction processing on points in the point cloud data via the neural network, and perform semantic segmentation processing on the point cloud data according to the extracted feature information to obtain semantic features of multiple points, According to the semantic features, the front points of the multiple points are predicted, and a 3D initial frame corresponding to at least some of the multiple points is generated.
  • the neural network in the present disclosure is mainly used to generate a 3D initial frame for multiple points in the input point cloud data (such as all points or multiple points in the point cloud data), thereby Make each of the multiple points in the point cloud data correspond to a 3D initial frame. Since multiple points (such as each point) in the point cloud data usually contain the front sight and the background point, the 3D initial information frame generated by the neural network of the present disclosure usually includes: the 3D initial frame corresponding to the front sight and the The 3D initial frame corresponding to the background point.
  • the neural network Since the input of the neural network of the present disclosure is point cloud data, the neural network performs feature extraction on the point cloud data and performs semantic segmentation on the point cloud data based on the extracted feature information, which belongs to the underlying data analysis; and because the neural network of the present disclosure is based on The result of semantic segmentation generates a 3D initial frame, which is equivalent to upper-layer data analysis. Therefore, in the process of 3D detection of a target object, the present disclosure forms a bottom-up way to generate a 3D detection frame.
  • the neural network of the present disclosure generates a 3D initial frame by using a bottom-up generation method, which not only avoids the projection processing of point cloud data, but also uses the image obtained after the projection processing to perform 3D detection frame detection, resulting in point cloud data
  • the phenomenon of loss of original information, and the phenomenon of loss of original information are not conducive to improving the performance of 3D detection frame detection;
  • the present disclosure can also avoid the use of 2D images taken by the camera device for 3D detection frame detection, due to the The target object (such as a vehicle or an obstacle) is blocked, resulting in a phenomenon that affects the detection of the 3D detection frame, and this phenomenon is also not conducive to improving the performance of the 3D detection frame detection. It can be seen from this that the neural network of the present disclosure generates a 3D initial frame by using a bottom-up generation method, which is beneficial to improve the detection performance of the 3D detection frame.
  • the neural network in the present disclosure may be divided into multiple parts, and each part may be implemented by a small neural network (also called a neural network unit or a neural network module, etc.), that is, the The neural network consists of multiple small neural networks. Since part of the structure of the neural network of the present disclosure can adopt the structure of RCNN (Regions with Convolutional Neural Network), the neural network of the present disclosure can be called Point RCNN (Point Regions with Convolutional Neural Network). Regional Convolutional Neural Network).
  • the 3D initial frame generated by the neural network of the present disclosure may include: position information of the center point of the 3D initial frame (such as coordinates of the center point), length, width, and height information of the 3D initial frame, and the Direction information (such as the angle between the length of the 3D initial frame and the X coordinate axis), etc.
  • the 3D initial frame formed by the present disclosure may also include: position information of the center point of the bottom or top surface of the 3D initial frame, length, width, and height information of the 3D initial frame, and direction information of the 3D initial frame.
  • the present disclosure does not limit the specific expression form of the 3D initial frame.
  • the neural network of the present disclosure may include: a first neural network, a second neural network, and a third neural network.
  • the point cloud data is provided to the first neural network.
  • the first neural network is used to: perform feature extraction processing on multiple points (such as all points) in the received point cloud data, so as to provide each point in the point cloud data
  • a global feature information is formed separately, and semantic segmentation processing is performed according to the global feature information of multiple points (such as all points), thereby forming a global semantic feature for each point, and the first neural network outputs the global semantic feature of each point.
  • the global semantic features of points can usually be expressed in the form of a one-dimensional vector array including multiple (eg, 256) elements.
  • the global semantic features in this disclosure may also be referred to as global semantic feature vectors.
  • the points in the point cloud data include: front spots and background points
  • the information output by the first neural network usually includes: the global semantic features of the front spots and the global semantic features of the background points.
  • the first neural network in the present disclosure may be implemented using Point Cloud Encoder (Point Cloud Data Encoder) and Point Cloud Decoder (Point Cloud Data Decoder).
  • the first neural network may use PointNet++ or Network structure such as Pointsift network model.
  • the second neural network in the present disclosure may be implemented using MLP (Multi-Layer Perceptron), and the output dimension of the MLP used to implement the second neural network may be 1.
  • the third neural network in the present disclosure may also be implemented using MLP, and the output dimensions of the MLP used to implement the third neural network are multi-dimensional, and the number of dimensions is related to the information included in the 3D detection frame information.
  • the present disclosure needs to use the global semantic feature to realize the prediction of the front spot and the generation of the 3D initial frame.
  • the present disclosure can adopt the following two ways to realize the prediction of the front sight and the generation of the initial 3D frame.
  • Manner 1 The global semantic features of each point output by the first neural network are provided to the second neural network and the third neural network simultaneously (as shown in FIG. 3).
  • the second neural network is used to predict the confidence of the point as the former scenic spot for each global semantic feature of the input, and output the confidence for each point.
  • the confidence predicted by the second neural network may indicate the probability that the point is the front sight.
  • the third neural network is used to generate a 3D initial frame for the global semantic feature of each input point and output it. For example, the third neural network outputs the position information of the center point of the 3D initial frame, the length, width, and height information of the 3D initial frame, and the direction information of the 3D initial frame for each point according to the global semantic features of each point.
  • the 3D initial frame output by the third neural network usually includes: the 3D initial frame corresponding to the front sight and the background point 3D initial frame; however, the third neural network itself cannot distinguish whether each output 3D initial frame is the 3D initial frame corresponding to the front sight or the 3D initial frame corresponding to the background point.
  • Method 2 The global semantic features of each point output by the first neural network are first provided to the second neural network, and the second neural network predicts the confidence that the point is the previous scenic spot for the input global semantic features of each point
  • the global semantic feature of the point is provided to the third neural network (as shown in FIG. 4).
  • the third neural network generates a 3D initial frame for each global semantic feature that it receives as the front sight, and outputs the corresponding 3D initial frame for each front sight.
  • the present disclosure does not provide the global semantic feature of the point to the third neural network when it is determined that the point output by the second neural network is the confidence level of the previous scenic spot does not exceed a predetermined value. Therefore, all the output of the third neural network
  • the 3D initial frames are the 3D initial frames corresponding to the front sight.
  • the present disclosure may determine that the 3D initial frames corresponding to the points output by the third neural network are corresponding to the front attractions according to the confidences output by the second neural network, respectively
  • the 3D initial frame is also the 3D initial frame corresponding to the background point.
  • the point is determined as the front sight, so that the present disclosure can output the first point output by the third neural network
  • the 3D initial frame corresponding to the point is determined to be the 3D initial frame corresponding to the front sight, and so on, according to the confidence of the output of the second neural network, the present disclosure can select all the front sights from all the 3D initial frames output by the third neural network Corresponding 3D initial frame. Afterwards, the present disclosure may perform redundant processing on the 3D initial frames corresponding to all the selected front sights, thereby obtaining a final 3D detection frame, that is, a 3D detection frame detected for point cloud data.
  • the present disclosure may use the NMS (Non-Maximum Suppression Non-Maximum Suppression) algorithm to perform redundant processing on the 3D detection frame information corresponding to all currently selected front spots, thereby removing redundant 3D detections that overlap each other Frame to obtain the final 3D detection frame.
  • NMS Non-Maximum Suppression Non-Maximum Suppression
  • the present disclosure can directly obtain the 3D initial frame corresponding to the front sight according to the 3D initial frame output by the third neural network, therefore, the present disclosure can directly target the third nerve All the 3D initial frames output by the network are redundantly processed to obtain the final 3D detection frame, that is, the 3D detection frame detected for the point cloud data (refer to the related description in the above embodiment).
  • the present disclosure may use the NMS algorithm to perform redundant processing on all 3D initial frames output by the third neural network, thereby removing redundant 3D initial frames that overlap each other, to obtain a final 3D detection frame.
  • the present disclosure can correct the 3D initial frame corresponding to each front sight separately, and The 3D initial frames corresponding to the corrected front spots are redundantly processed to obtain the final 3D detection frame. That is to say, the process of generating the 3D detection frame by the neural network of the present disclosure can be divided into two stages. The initial 3D frame generated by the neural network in the first stage of the neural network is provided to the second stage of the neural network.
  • the stage neural network corrects the 3D initial frame generated by the first stage neural network (such as position optimization, etc.), and then, the present disclosure determines the final 3D detection frame according to the corrected 3D initial frame of the second stage neural network.
  • the final 3D detection frame is the 3D detection frame detected by the present disclosure based on point cloud data.
  • the process of generating the 3D initial frame by the neural network of the present disclosure may include only the first-stage neural network and not the second-stage neural network. In the case where the process of generating the 3D initial frame by the neural network includes only the first-stage neural network, it is also completely feasible for the present disclosure to determine the final 3D detection frame according to the 3D initial frame generated by the first-stage neural network.
  • Both the first-stage neural network and the second-stage neural network in this disclosure can be implemented by neural networks that can exist independently, or can be composed of part of the network structural units in a complete neural network; in addition, for ease of description, it may be related to
  • the received neural network is called the first neural network, the second neural network, the third neural network, the fourth neural network, the fifth neural network, the sixth neural network, or the seventh neural network, but it should be understood that the first to seventh
  • Each of the neural networks may be an independent neural network, or may be composed of some network structural units in a large neural network, which is not limited in this disclosure.
  • the process of using the neural network to correct the respective 3D initial frames corresponding to each front sight in the present disclosure may include the following steps A2, B2, and C2:
  • Step A2 Set a 3D expansion frame containing a 3D initial frame, and obtain global semantic features of points in the 3D expansion frame.
  • each 3D initial frame in the present disclosure corresponds to a 3D extension frame, and the space range occupied by the 3D extension frame generally completely covers the space range occupied by the 3D initial frame.
  • any surface of the 3D initial frame is not in the same plane as any surface of its corresponding 3D expansion frame, the center point of the 3D initial frame and the center point of the 3D expansion frame coincide with each other, and any surface of the 3D initial frame Both are parallel to the corresponding faces of their corresponding 3D expansion boxes.
  • the present disclosure does not exclude the case that although the two center points do not coincide, any face of the 3D initial frame is parallel to the corresponding face of the corresponding 3D extension frame.
  • the present disclosure may be based on at least one of a preset X-axis direction increment (such as 20 cm), a Y-axis direction increment (such as 20 cm), and a Z-axis direction increment (such as 20 cm).
  • the 3D initial frame of the front view point is expanded in 3D space, so as to form a 3D expansion frame including two 3D initial frames whose center points coincide with each other and the planes are parallel to each other.
  • represents increment.
  • the local space in the present disclosure generally refers to: the spatial range formed by the 3D expansion frame.
  • the local spatial semantic feature of a point generally refers to a semantic feature vector formed for that point when considering all the points in the spatial range formed by the 3D extension box.
  • a local spatial semantic feature can also be expressed in the form of a one-dimensional vector array including multiple (eg, 256) elements.
  • the method for obtaining the global semantic features of multiple points in the 3D extension frame in the present disclosure may be: first, according to the coordinate information of each point in the point cloud data, determine whether each point belongs to the spatial range of the 3D extension frame (ie Whether it is located in the 3D extension frame, may include: located on any surface of the 3D extension surface), for a point, if the position of the point belongs to the spatial range of the 3D extension frame, the point can be regarded as belonging to the 3D extension frame Point; if the position of the point does not belong to the spatial range of the 3D extension box, the point will not be regarded as a point belonging to the 3D extension box.
  • the global semantic features of multiple points are determined.
  • the global semantic feature of the point can be found from the global semantic features of each point obtained by the foregoing, and so on, the present disclosure can obtain 3D Global semantic features of all points of the expansion box.
  • Step B2 The point cloud data located in the 3D extension box is provided to the fourth neural network in the neural network, and the local spatial semantic features of the points in the 3D extension box are generated via the fourth neural network.
  • the method for obtaining the local spatial semantic features of all points in the 3D extension frame in the present disclosure may include the following steps a and b:
  • the preset target position of the 3D extension frame may include: the center point of the 3D extension frame (that is, the center point of the 3D initial frame) is located at the origin of coordinates, and the length of the 3D extension frame is parallel to the X axis.
  • the above coordinate origin and X axis may be the coordinate origin and X axis of the coordinate system of the point cloud data, and of course, may also be the coordinate origin and X axis of other coordinate systems.
  • b i (x i , y i , z i , h i , w i , l i , ⁇ i ), where x i , y i and Z i represent the coordinates of the center point of the i-th frame of the original 3D, h i, w i L i and a high width to length respectively represent the i-th original frame 3D, [theta] i represents the i-th frame of the original 3D directions, e.g.
  • the angle between the length of the i-th 3D initial frame and the X coordinate axis is ⁇ i ; then, after performing coordinate transformation on the 3D extended frame that contains the i-th 3D initial frame, the present disclosure obtains A new 3D initial box
  • the new 3D initial box It can be expressed as:
  • the new 3D initial box The center point of is located at the origin of coordinates, and in a bird’s eye view, the new 3D initial frame The angle between the length of and the X coordinate axis is 0.
  • the coordinate-converted point cloud data (that is, the coordinate-converted point cloud data in the 3D expansion box) is provided to the fourth neural network in the neural network, and the fourth neural network characterizes the received points Extraction processing, and semantic segmentation processing based on the extracted local feature information, so as to generate local spatial semantic features for each point located in the 3D expansion box.
  • the present disclosure can also form a mask of the front sight (such as setting the point where the confidence exceeds a predetermined value (such as 0.5, etc.) to 1, while the confidence does not exceed the predetermined value Is set to 0).
  • the present disclosure can provide the front sight mask together with the point cloud data after coordinate conversion to the fourth neural network, so that the fourth neural network can refer to the front sight mask when performing feature extraction and semantic processing, thereby facilitating improvement Description accuracy of local spatial semantic features.
  • the fourth neural network in the present disclosure may be implemented using MLP, and the output dimensions of the MLP used to implement the fourth neural network are generally multi-dimensional, and the number of dimensions is related to the information included in the local spatial semantic features.
  • Step C2 Through the fifth neural network in the neural network, encode the global semantic features and local spatial semantic features of the points in the 3D extension box to obtain the features describing the 3D initial box in the 3D extension box, And through the sixth neural network in the neural network to predict the confidence of the 3D initial frame according to the characteristics of the 3D initial frame, the seventh neural network in the neural network according to the characteristics of the 3D initial frame, Correcting the 3D initial frame is beneficial to improve the accuracy of the 3D initial frame, and thus to improve the accuracy of the 3D detection frame.
  • the fifth neural network in the present disclosure may be implemented using Point Cloud Encoder (point cloud data encoder).
  • the fifth neural network may adopt a partial network structure such as PointNet++ or Pointsift network model.
  • the sixth neural network in the present disclosure may be implemented using MLP, and the output dimension of the MLP used to implement the sixth neural network may be 1, and the number of dimensions may be related to the number of types of target objects.
  • the seventh neural network in the present disclosure may also be implemented using MLP, and the output dimensions of the MLP used to implement the seventh neural network are multi-dimensional, and the number of dimensions is related to the information included in the 3D detection frame information.
  • the first neural network to the seventh neural network in the present disclosure may all be implemented by a neural network that can exist independently, or by a part of a neural network that cannot exist independently.
  • the global semantic feature and local spatial semantic feature of each point in the 3D extension box can be stitched.
  • the global semantic feature and the point The local spatial semantic features are stitched together to form the stitched semantic features.
  • the stitched semantic features are taken as input and provided to the fifth neural network, so that the fifth neural network can encode the stitched semantic features.
  • Five neural networks output features after encoding processing to describe the features of the 3D initial frame in the 3D expansion frame (hereinafter referred to as features after encoding processing).
  • the encoded features output by the fifth neural network are simultaneously provided to the sixth neural network and the seventh neural network (as shown in FIG. 5).
  • the sixth neural network is used to predict the confidence level of the 3D initial frame as the target object for each encoded feature of the input, and output the confidence level for each 3D initial frame.
  • the confidence predicted by the sixth neural network may represent the probability that the corrected 3D initial frame is the target object.
  • the target object here may be a vehicle or a pedestrian.
  • the seventh neural network is used to form a new 3D initial frame (that is, the corrected 3D initial frame) for each input feature after encoding processing, and output it.
  • the seventh neural network outputs the position information of the center point of the new 3D initial frame, the length, width, and height information of the new 3D initial frame, and the direction information of the new 3D initial frame, respectively, according to the encoded features of each input Wait.
  • the neural network of the present disclosure is obtained by training using multiple point cloud data samples with 3D annotation frames.
  • the present disclosure can obtain the loss corresponding to the confidence generated by the neural network to be trained, and the loss formed by the 3D initial frame generated by the neural network to be trained for the point cloud data sample relative to the 3D annotation frame of the point cloud data sample
  • the neural network can be trained.
  • the network parameters in this disclosure may include, but are not limited to, convolution kernel parameters and weight values.
  • the present disclosure can obtain the loss corresponding to the confidence generated by the first stage neural network The loss corresponding to the 3D initial box, and using the two losses of the first stage neural network, adjust the network parameters of the first stage neural network (such as the first neural network, the second neural network, and the third neural network), and After the successful training of the neural network in the first stage, the entire neural network is successfully trained.
  • the present disclosure can separately train the first stage neural network and the second stage neural network. For example, first obtain the loss corresponding to the confidence generated by the first stage neural network and the loss corresponding to the 3D initial frame, and use these two losses to adjust the network parameters of the first stage neural network.
  • the 3D initial frame corresponding to the front sight output by the first stage neural network is input as the input to the second stage neural network, and the corresponding confidence generated by the second stage neural network is obtained Loss and the corresponding loss of the corrected 3D initial box, and using these two losses of the second-stage neural network, the second-stage neural network (such as the fourth neural network, fifth neural network, sixth neural network, and seventh Neural network) network parameters are adjusted.
  • the entire neural network is successfully trained.
  • the loss corresponding to the confidence generated by the first stage neural network in this disclosure can be expressed by the following formula (1):
  • L reg represents the regression loss function of the 3D detection frame, and N pos represents the number of front spots;
  • a bucket in the present disclosure may refer to: dividing the spatial range around the point, a range of value ranges, called a bucket, each bucket may have its corresponding number, usually
  • the range of the bucket is fixed.
  • the range of the bucket is the length.
  • the bucket has a fixed length.
  • the range of the bucket is Angle range, at this time, the bucket has a fixed angle interval.
  • the length of the bucket may be 0.5m.
  • the value range of different buckets may be 0-0.5m and 0.5m-1m.
  • the present disclosure can divide 2 ⁇ into multiple angle intervals, one angle interval corresponds to a range of value ranges.
  • the size of the bucket that is, the angle interval
  • S represents the search distance of the previous spot p on the x-axis or z-axis, that is, in the case that the parameter u is x, S represents the 3D initial frame generated for the front spot p
  • C is a constant value, and C may be related to the length of the bucket, for example, C is equal to the length of the bucket or half the length of the bucket.
  • the training process ends.
  • the predetermined iteration conditions in the present disclosure may include that the difference between the 3D initial frame output by the third neural network and the 3D annotation frame of the point cloud data sample meets the predetermined difference requirement, and the confidence level output by the second neural network meets the predetermined requirement. In the case that both meet the requirements, this time the first to third neural networks are successfully trained.
  • the predetermined iteration conditions in the present disclosure may also include: training the first to third neural networks, the number of point cloud data samples used meets the predetermined number requirements, and so on. When the number of point cloud data samples used reaches the predetermined number requirement, however, both of them do not meet the requirements, this time the first to third neural networks are not successfully trained.
  • the first to third neural networks that have been successfully trained can be used for 3D detection of the target object.
  • the first to third neural networks that have been successfully trained can also be used to generate 3D corresponding to the front sight for the point cloud data sample
  • the initial frame that is, the present disclosure can again provide point cloud data samples to the successfully trained first neural network, and store the information output by the second neural network and the third neural network separately, so as to facilitate the second-stage neural network Provide input (that is, the 3D initial frame corresponding to the front sight); after that, obtain the loss corresponding to the confidence generated in the second stage and the loss corresponding to the corrected 3D initial frame, and use the obtained loss to adjust the fourth neural network to the third Seven neural network parameters, and after the fourth to seventh neural networks are successfully trained, the entire neural network is successfully trained.
  • the loss function used for adjusting the network parameters of the fourth to seventh neural networks in the second-stage neural network in the present disclosure includes the loss corresponding to the confidence and the loss corresponding to the corrected 3D initial frame. Use the following formula (9) to express:
  • B represents the 3D initial box set;
  • represents the number of 3D initial boxes in the 3D initial box set;
  • B pos is a subset of B, and the overlap between the 3D initial box in B pos and the corresponding 3D label box exceeds the set threshold;
  • means the sub The number of concentrated 3D initial frames;
  • Annotate frame information for the i-th 3D Represents the information of the i-th 3D labeled frame after coordinate conversion; (xi, yi, zi, hi, wi, li, ⁇ i) is the i-th 3D initial frame after correction, Indicates the ith 3D initial frame after coordinate conversion.
  • represents the size of the barrel, that is, the angle interval of the barrel.
  • represents the size of the barrel, that is, the angle interval of the barrel.
  • the training process ends.
  • the predetermined iteration conditions in the present disclosure may include: the difference between the 3D initial frame output by the seventh neural network and the 3D annotation frame of the point cloud data sample meets the predetermined difference requirement, and the confidence of the sixth neural network output meets the predetermined requirement. In the case that both meet the requirements, the fourth to seventh neural networks are successfully trained this time.
  • the predetermined iteration conditions in the present disclosure may also include: training the fourth to seventh neural networks, and the number of point cloud data samples used reaches a predetermined number of requirements, etc. When the number of point cloud data samples used reaches the predetermined number requirement, however, both of them do not meet the requirements, this time the fourth to seventh neural networks are not successfully trained.
  • FIG. 6 is a flowchart of an embodiment of a vehicle intelligent control method of the present disclosure.
  • the method of this embodiment includes steps: S600, S610, S620, S630, S640, and S650. Each step in FIG. 6 will be described in detail below.
  • S610 Perform semantic segmentation on the point cloud data according to the feature information of the point cloud data to obtain first semantic information of multiple points in the point cloud data.
  • S620 Predict at least one front scenic spot corresponding to the target object among the multiple points according to the first semantic information.
  • S640 Determine the 3D detection frame of the target object in the scene according to the 3D initial frame.
  • the above S600-S640 can be implemented by providing point cloud data to a neural network, performing feature information extraction processing on points in the point cloud data via the neural network, and performing semantic segmentation processing based on the extracted feature information to obtain The semantic features of multiple points, according to the semantic features, predict the front points of the multiple points, and generate a 3D initial frame corresponding to at least some of the multiple points.
  • the present disclosure may first determine at least one of the following information of the target object according to the 3D detection frame: the spatial position, size, distance to the vehicle, and relative orientation information of the target object in the scene. Then, according to the determined at least one piece of information, an instruction or early warning message for controlling the vehicle is generated.
  • the instructions generated by the present disclosure are, for example, an instruction to increase the speed, an instruction to decrease the speed, or an emergency braking instruction.
  • the generated warning prompt information such as the attention information of a target object such as a vehicle or pedestrian paying attention to a certain position, etc.
  • the present disclosure does not limit the specific implementation of generating instructions or warning prompt information according to the 3D detection frame.
  • FIG. 7 is a flowchart of an embodiment of an obstacle avoidance navigation method of the present disclosure.
  • the method in this embodiment includes steps: S700, S710, S720, S730, S740, and S750. Next, each step in FIG. 7 will be described in detail.
  • S710. Perform semantic segmentation on the point cloud data according to the feature information of the point cloud data to obtain first semantic information of multiple points in the point cloud data.
  • S720 Predict at least one front scenic spot corresponding to the target object among multiple points according to the first semantic information.
  • the above S700-S740 may be implemented by providing point cloud data to a neural network, performing feature information extraction processing on points in the point cloud data via the neural network, and performing semantic segmentation processing based on the extracted feature information to obtain The semantic features of multiple points, according to the semantic features, predict the front points of the multiple points, and generate a 3D initial frame corresponding to at least some of the multiple points.
  • S750 According to the above 3D detection frame, generate an instruction or warning prompt information for performing obstacle avoidance navigation control on the robot where the lidar is located.
  • the present disclosure may first determine at least one of the following information of the target object according to the 3D detection frame: the spatial position, size, distance to the robot, and relative orientation information of the target object in the scene. Then, according to the determined at least one piece of information, an instruction or early warning prompt information for performing obstacle avoidance navigation control on the robot is generated.
  • the instructions generated by the present disclosure are, for example, an instruction to reduce the speed of an action, an instruction to suspend an action, or a turn instruction.
  • the generated early warning prompt information such as the prompt information of paying attention to an obstacle (ie, target object) in a certain direction, etc.
  • the present disclosure does not limit the specific implementation of generating instructions or warning prompt information according to the 3D detection frame.
  • FIG. 8 is a schematic structural diagram of an embodiment of a target object 3D detection device of the present disclosure.
  • the device shown in FIG. 8 includes: a feature extraction module 800, a first semantic segmentation module 810, a pre-spot prediction module 820, a generation initial frame module 830, and a determination detection frame module 840.
  • the feature extraction module 800 is mainly used to extract feature information of point cloud data of the acquired scene.
  • the first semantic segmentation module 810 is mainly used to perform semantic segmentation processing on the point cloud data according to the feature information of the point cloud data to obtain first semantic information of multiple points in the point cloud data.
  • the predicted front sight module 820 is mainly used to predict at least one front sight corresponding to the target object among the multiple points according to the first semantic information.
  • the generating initial frame module 830 is mainly used to generate a 3D initial frame corresponding to at least one front sight according to the first semantic information.
  • the detection frame determination module 840 is mainly used to determine the 3D detection frame of the target object in the scene according to the 3D initial frame.
  • the determination detection block module 840 may include: a first submodule, a second submodule, and a third submodule.
  • the first sub-module is mainly used to obtain characteristic information of points in a partial area in the point cloud data, where the partial area includes at least one of the 3D initial frames.
  • the second sub-module is mainly used for semantically segmenting the points in the partial area according to the feature information of the points in the partial area to obtain second semantic information of the points in the partial area.
  • the third sub-module is mainly used to determine the 3D detection frame of the target object in the scene according to the first semantic information and the second semantic information of the points in the partial area.
  • the third submodule in the present disclosure may include: a fourth submodule and a fifth submodule.
  • the fourth sub-module is mainly used to correct the 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area to obtain the corrected 3D initial frame.
  • the fifth sub-module is mainly used to determine the 3D detection frame of the target object in the scene according to the corrected 3D initial frame.
  • the third submodule in the present disclosure may be further used to determine the confidence of the target object corresponding to the 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area, and according to the 3D The initial frame and its confidence determine the 3D detection frame of the target object in the scene.
  • the third submodule in the present disclosure may include: a fourth submodule, a sixth submodule, and a seventh submodule.
  • the fourth sub-module is mainly used to correct the 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area to obtain the corrected 3D initial frame.
  • the sixth sub-module is mainly used to determine the confidence of the target object corresponding to the corrected 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area.
  • the seventh sub-module is mainly used to determine the 3D detection frame of the target object in the scene according to the corrected 3D initial frame and its confidence.
  • some areas in the present disclosure include: the 3D expansion frame obtained by edge-expanding the 3D initial frame according to a predetermined strategy.
  • the 3D expansion frame may be: according to a preset X-axis direction increment, Y-axis direction increment and/or Z-axis direction increment, the 3D initial frame is expanded in 3D space to form a 3D initial frame 3D expansion box.
  • the second submodule in the present disclosure may include: an eighth submodule and a ninth submodule.
  • the eighth sub-module is mainly used to perform coordinate transformation on the coordinate information of the points located in the 3D extension box in the point cloud data according to the preset target position of the 3D extension box to obtain the feature information of the point after the coordinate transformation.
  • the ninth sub-module is mainly used to perform semantic segmentation based on the 3D extension box according to the feature information of the coordinate-transformed point, to obtain the second semantic feature of the point in the 3D extension box.
  • the ninth sub-module may perform semantic segmentation based on the 3D extension frame according to the mask of the front sight and the feature information of the point after coordinate transformation to obtain the second semantic feature of the point.
  • the determination detection frame module 840 in the present disclosure may first determine the degree of overlap between the 3D initial frames corresponding to the multiple front sights, and then determine the detection frame module 840 screens the 3D initial frame whose overlapping degree is greater than the set threshold; then, the detection frame determination module 840 determines the 3D detection frame of the target object in the scene according to the filtered 3D initial frame.
  • the feature extraction module 800, the first semantic segmentation module 810, the pre-spot prediction module 820, and the initial frame generation module 830 in the present disclosure may be implemented by a first-stage neural network.
  • the device of the present disclosure may further include a first training module.
  • the first training module is used to train the first-stage neural network to be trained using point cloud data samples with 3D annotation frames.
  • the process of the first training module training the first stage neural network includes:
  • the first training module provides the point cloud data samples to the first stage neural network, extracts the feature information of the point cloud data samples based on the first stage neural network, and the first stage neural network performs the point cloud data samples according to the extracted feature information Semantic segmentation processing, the first stage neural network predicts at least one front sight corresponding to the target object among the multiple points according to the first semantic features of the multiple points obtained by the semantic segmentation processing, and generates at least one front sight according to the first semantic information Corresponding 3D initial frame.
  • the first training module obtains the loss corresponding to the front sight and the loss formed by the 3D initial frame relative to the corresponding 3D annotation frame, and adjusts the network parameters in the first-stage neural network according to the above loss.
  • the first training module may determine the first loss corresponding to the prediction result of the front sight according to the confidence of the front sight predicted by the neural network in the first stage.
  • the first training module generates a second loss according to the number of the bucket in which the parameters in the 3D initial frame generated for the front sight are located and the number of the bucket in the 3D annotation frame information in the point cloud data sample.
  • the first training module generates the first according to the offset of the parameters in the 3D initial frame generated for the front sight in the corresponding bucket and the offsets of the parameters in the 3D annotation frame information in the point cloud data sample in the corresponding bucket.
  • Three losses The first training module generates a fourth loss according to the offset of the parameter in the 3D initial frame generated for the front sight from the predetermined parameter.
  • the first training module generates a fifth loss according to the offset of the coordinate parameter of the front sight relative to the coordinate parameter of the 3D initial frame generated for the front sight.
  • the first training module adjusts the network parameters of the first-stage neural network according to the first loss, the second loss, the third loss, the fourth loss, and the fifth loss.
  • the first submodule, the second submodule, and the third submodule in the present disclosure are implemented by a second-stage neural network.
  • the device of the present disclosure further includes a second training module, and the second training module is used to train the second-stage neural network to be trained using point cloud data samples with 3D annotation frames.
  • the process of the second training module training the second-stage neural network includes:
  • the second training module provides the 3D initial frame obtained by using the first-stage neural network to the second-stage neural network, and obtains the feature information of the points in the partial region of the point cloud data sample based on the second-stage neural network.
  • the feature information of the points in the partial area semantically segment the points in the partial area to obtain the second semantic characteristics of the points in the partial area;
  • the second stage neural network is based on the first semantic characteristics and the second of the points in the partial area Semantic features, determine the confidence of the 3D initial frame as the target object, and generate a position-corrected 3D initial frame based on the first and second semantic features of the points in the partial area.
  • the second training module obtains the loss corresponding to the confidence of the target object in the 3D initial frame, and the loss formed by the position-corrected 3D initial frame relative to the corresponding 3D annotation frame, and according to the obtained loss, the second-stage neural network Adjust the network parameters in the network.
  • the second training module may determine the sixth loss corresponding to the prediction result according to the confidence level of the target object predicted by the 3D initial frame predicted by the second stage neural network.
  • the second training module is based on the number of the bucket where the parameters in the 3D initial frame after the position correction and the point cloud data samples generated by the second stage neural network and the overlap with the corresponding 3D annotation frame exceed the set threshold
  • the second training module corrects the 3D corrected position based on the position generated by the second stage neural network and the corresponding overlap of the 3D annotation frame exceeds the set threshold
  • the offset of the parameter in the initial box in the corresponding bucket and the offset of the parameter in the 3D annotation box information in the point cloud data sample in the corresponding bucket generate an eighth loss
  • the second training module is based on the second stage The offset of the parameters of the 3D initial frame after the position correction of the position of the 3D initial frame generated by the neural network and the corresponding 3D
  • the device of this embodiment includes: a target object 3D detection device 900 and a first control module 910.
  • the target object 3D detection device 900 is used to obtain a 3D detection frame of the target object based on the point cloud data.
  • the specific structure and specific operations of the target object 3D detection device 900 are as described in the above device and method embodiments, and will not be described in detail here.
  • the first control module 910 is mainly used to generate an instruction or early warning information for controlling the vehicle according to the 3D detection frame. For details, reference may be made to the relevant description in the above method embodiment, and no more detailed description is provided here.
  • FIG. 10 is an obstacle avoidance navigation device of the present disclosure.
  • the device of this embodiment includes: a target object 3D detection device 1000 and a second control module 1010.
  • the target object 3D detection device 1000 is used to obtain a 3D detection frame of the target object based on the point cloud data.
  • the specific structure and specific operations of the target object 3D detection device 1000 are as described in the above-mentioned device and method embodiments, and will not be described in detail here.
  • the second control module 1010 is mainly used to generate instructions or warning prompt information for performing obstacle avoidance navigation control on the robot according to the 3D detection frame. For details, reference may be made to the relevant description in the above method embodiment, and no more detailed description is provided here.
  • FIG. 11 shows an exemplary device 1100 suitable for implementing the present disclosure.
  • the device 1100 may be a control system/electronic system configured in a car, a mobile terminal (eg, smart mobile phone, etc.), a personal computer (PC, eg, desktop computer Or notebook computers, etc.), tablet computers and servers.
  • the device 1100 includes one or more processors, a communication part, etc.
  • the one or more processors may be: one or more central processing units (CPUs) 1101, and/or one or more utilizations
  • the processor can load executable memory stored in a read-only memory (ROM) 1102 or load it from the storage section 1108 into a random access memory (RAM) 1103.
  • ROM read-only memory
  • RAM random access memory
  • the communication part 1112 may include but is not limited to a network card, and the network card may include but not limited to an IB (Infiniband) network card.
  • the processor can communicate with the read-only memory 1102 and/or the random access memory 1103 to execute executable instructions, connect to the communication section 1112 through the bus 1104, and communicate with other target devices via the communication section 1112, thereby completing the corresponding steps in the present disclosure .
  • IB Infiniband
  • ROM1102 is an optional module.
  • the RAM 1103 stores executable instructions, or writes executable instructions to the ROM 1102 at runtime.
  • the executable instructions cause the central processing unit 1101 to perform the steps included in the target object 3D detection method.
  • An input/output (I/O) interface 1105 is also connected to the bus 1104.
  • the communication unit 1112 may be integratedly provided, or may be provided with multiple sub-modules (for example, multiple IB network cards), and are respectively connected to the bus.
  • the following components are connected to the I/O interface 1105: an input section 1106 including a keyboard, a mouse, etc.; an output section 1107 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; a storage section 1108 including a hard disk, etc. ; And a communication section 1109 including a network interface card such as a LAN card, a modem, etc.
  • the communication section 1109 performs communication processing via a network such as the Internet.
  • the driver 1110 is also connected to the I/O interface 1105 as necessary.
  • a removable medium 1111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed on the drive 1110 as necessary, so that the computer program read out therefrom is installed in the storage portion 1108 as needed.
  • FIG. 11 is only an optional implementation method.
  • the number and types of the components in FIG. 11 can be selected, deleted, added, or replaced according to actual needs. ;
  • GPU1113 and CPU1101 can be set separately, for example, GPU1113 can be integrated on CPU1101, the communication section 1112 can be set separately, can also be integrated Set on CPU1101 or GPU1113, etc.
  • the embodiments of the present disclosure include a computer program product that includes a software program tangibly contained on a machine-readable medium.
  • a computer program the computer program includes program code for performing the steps shown in the flowchart, and the program code may include instructions corresponding to the steps in the method provided by the present disclosure.
  • the computer program may be downloaded and installed from the network through the communication section 1109, and/or installed from the removable medium 1111.
  • CPU central processing unit
  • the embodiments of the present disclosure also provide a computer program program product for storing computer-readable instructions, which when executed, causes the computer to perform the operations described in any of the above embodiments Target object 3D detection method.
  • the computer program product may be implemented in hardware, software, or a combination thereof.
  • the computer program product is embodied as a computer storage medium.
  • the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.
  • the embodiments of the present disclosure also provide another 3D detection method of a target object and its corresponding device and electronic device, computer storage medium, computer program, and computer program product, wherein the target object
  • the 3D detection method includes: the first device sends a target object 3D detection instruction to the second device, the instruction causes the second device to perform the target object 3D detection method in any of the above possible embodiments; the first device receives the 3D detection results of the target object.
  • the target object 3D detection instruction may specifically be a call instruction
  • the first device may instruct the second device to perform the target object 3D detection operation by calling. Accordingly, in response to receiving the call instruction, the second device
  • the steps and/or processes in any of the embodiments of the above-described target object 3D detection method may be performed.
  • first and second in the embodiments of the present disclosure are only for distinction, and should not be construed as limiting the embodiments of the present disclosure.
  • plurality may refer to two or more, and “at least one” may refer to one, two, or more than two.
  • any component, data, or structure mentioned in the present disclosure can be generally understood as one or more, unless it is explicitly defined or given the opposite enlightenment in the context.
  • description of the embodiments of the present disclosure emphasizes the differences between the embodiments, and the same or similarities can be referred to each other, and for the sake of brevity, they will not be described one by one.
  • the method and apparatus, electronic device, and computer-readable storage medium of the present disclosure may be implemented in many ways.
  • the method and apparatus, electronic device, and computer-readable storage medium of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above order of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless otherwise specifically stated.
  • the present disclosure may also be implemented as programs recorded in a recording medium, and these programs include machine-readable instructions for implementing the method according to the present disclosure.
  • the present disclosure also covers the recording medium storing the program for executing the method according to the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)
  • Image Processing (AREA)

Abstract

Disclosed are a 3D detection method and apparatus for a target object, and an electronic device, a computer-readable storage medium and a computer program. The 3D detection method for a target object comprises: extracting feature information of point cloud data of an acquired scene; carrying out semantic segmentation on the point cloud data according to the feature information of the point cloud data to obtain first semantic information of multiple points in the point cloud data; predicting at least one foreground point, corresponding to a target object, in the multiple points according to the first semantic information; generating a 3D initial frame respectively corresponding to the at least one foreground point according to the first semantic information; and determining a 3D detection frame for the target object in the scene according to the 3D initial frame.

Description

目标对象3D检测方法、装置、介质及设备Target object 3D detection method, device, medium and equipment
本公开要求在2018年11月29日提交中国专利局、申请号为201811446588.8、发明名称为“目标对象3D检测方法、装置、介质及设备”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure requires the priority of the Chinese patent application filed on November 29, 2018 in the Chinese Patent Office with the application number 201811446588.8 and the invention titled "Target Object 3D Inspection Method, Device, Media and Equipment" In this disclosure.
技术领域Technical field
本公开涉及计算机视觉技术,尤其是,涉及一种目标对象3D检测方法和装置、车辆智能控制方法和装置、避障导航方法和装置、电子设备、计算机可读存储介质以及计算机程序。The present disclosure relates to computer vision technology, and in particular, to a target object 3D detection method and device, vehicle intelligent control method and device, obstacle avoidance navigation method and device, electronic equipment, computer readable storage medium, and computer program.
背景技术Background technique
3D检测可以应用在智能驾驶以及避障导航等多种技术中。在智能驾驶技术中,通过3D检测,可以获得智能驾驶车辆的周围车辆以及行人等目标对象的具体位置、形状大小以及移动方向等信息,从而可以帮助智能驾驶车辆进行智能行驶决策。3D detection can be applied to various technologies such as intelligent driving and obstacle avoidance navigation. In intelligent driving technology, through 3D detection, the specific location, shape, size, and direction of movement of target objects such as surrounding vehicles and pedestrians of intelligent driving vehicles can be obtained, which can help intelligent driving vehicles make intelligent driving decisions.
发明内容Summary of the invention
本公开实施方式提供一种目标对象3D检测、车辆智能控制驶和避障导航技术方案。Embodiments of the present disclosure provide a technical solution for target object 3D detection, vehicle intelligent control driving, and obstacle avoidance navigation.
根据本公开实施方式其中一个方面,提供一种目标对象3D检测方法,包括:提取获取到的场景的点云数据的特征信息;根据所述点云数据的特征信息对所述点云数据进行语义分割,获得所述点云数据中的多个点的第一语义信息;根据所述第一语义信息预测所述多个点中对应目标对象的至少一个前景点;根据所述第一语义信息生成所述至少一个前景点各自对应的3D初始框;根据所述3D初始框确定所述场景中的所述目标对象的3D检测框。According to one aspect of an embodiment of the present disclosure, a 3D detection method for a target object is provided, which includes: extracting characteristic information of point cloud data of the acquired scene; performing semantics on the point cloud data according to the characteristic information of the point cloud data Segmentation to obtain first semantic information of multiple points in the point cloud data; predicting at least one previous scenic spot of the corresponding target object among the multiple points based on the first semantic information; generating based on the first semantic information A 3D initial frame corresponding to each of the at least one front sight; determining a 3D detection frame of the target object in the scene according to the 3D initial frame.
根据本公开实施方式其中再一方面,提供一种车辆智能控制方法,包括:采用上述目标对象3D检测方法,获得目标对象的3D检测框;根据所述3D检测框,生成对车辆进行控制的指令或者预警提示信息。According to still another aspect of an embodiment of the present disclosure, a vehicle intelligent control method is provided, including: obtaining the 3D detection frame of the target object using the above-mentioned 3D detection method of the target object; and generating an instruction to control the vehicle according to the 3D detection frame Or warning information.
根据本公开实施方式其中再一方面,提供一种避障导航方法,包括:采用上述的目标对象3D检测方法,获得目标对象的3D检测框;根据所述3D检测框,生成对机器人进行避障导航控制的指令或者预警提示信息。According to still another aspect of the embodiments of the present disclosure, an obstacle avoidance navigation method is provided, including: obtaining the 3D detection frame of the target object using the above-mentioned 3D detection method of the target object; and generating the obstacle avoidance for the robot according to the 3D detection frame Command or warning information of navigation control.
根据本公开实施方式其中再一方面,提供一种目标对象3D检测装置,包括:提取特征模块,用于提取获取到的场景的点云数据的特征信息;第一语义分割模块,用于根据所述点云数据的特征信息对所述点云数据进行语义分割,获得所述点云数据中的多个点的第一语义信息;预测前景点模块,用于根据所述第一语义信息预测所述多个点中对应目标对象的至少一个前景点;生成初始框模块,用于根据所述第一语义信息生成所述至少一个前景点各自对应的3D初始框;确定检测框模块,用于根据所述3D初始框确定所述场景中的所述目标对象的3D检测框。According to still another aspect of the embodiments of the present disclosure, there is provided a target object 3D detection device, including: an extraction feature module for extracting feature information of point cloud data of the acquired scene; a first semantic segmentation module for The feature information of the point cloud data performs semantic segmentation on the point cloud data to obtain first semantic information of multiple points in the point cloud data; the pre-predicted scenic spot module is used to predict the location based on the first semantic information At least one front sight corresponding to the target object in the plurality of points; generating an initial frame module for generating a 3D initial frame corresponding to each of the at least one front sight according to the first semantic information; determining a detection frame module for The 3D initial frame determines the 3D detection frame of the target object in the scene.
根据本公开实施方式其中再一方面,提供一种车辆智能控制装置,包括:采用上述目标对象3D检测装置,获得目标对象的3D检测框;第一控制模块,用于根据所述3D检测框,生成对车辆进行控制的指令或者预警提示信息。According to still another aspect of the embodiments of the present disclosure, there is provided a vehicle intelligent control device, comprising: using the above target object 3D detection device to obtain a 3D detection frame of the target object; a first control module configured to detect the 3D detection frame, Generate instructions or early warning information to control the vehicle.
根据本公开实施方式其中再一方面,提供一种避障导航装置,包括:采用上述目标对象3D检测装置,获得目标对象的3D检测框;第二控制模块,用于根据所述3D检测框,生成对机器人进行避障导航控制的指令或者预警提示信息。According to still another aspect of the embodiments of the present disclosure, there is provided an obstacle avoidance navigation device, including: using the above target object 3D detection device to obtain a 3D detection frame of the target object; a second control module configured to detect the 3D detection frame, Generate instructions or early warning information for the obstacle avoidance navigation control of the robot.
根据本公开实施方式再一方面,提供一种电子设备,包括:存储器,用于存储计算机程序;处理器,用于执行所述存储器中存储的计算机程序,且所述计算机程序被执行时,实现本公开任一方法实施方式。According to still another aspect of the embodiments of the present disclosure, there is provided an electronic device including: a memory for storing a computer program; a processor for executing the computer program stored in the memory, and when the computer program is executed, it is implemented Any method embodiment of the present disclosure.
根据本公开实施方式再一个方面,提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现本公开任一方法实施方式。According to still another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, any method embodiment of the present disclosure is implemented.
根据本公开实施方式的再一个方面,提供一种计算机程序,包括计算机指令,当所述计算机指令在设备的处理器中运行时,实现本公开任一方法实施方式。According to still another aspect of the embodiments of the present disclosure, there is provided a computer program, including computer instructions, which when implemented in a processor of a device, implements any method embodiment of the present disclosure.
基于本公开提供的一种目标对象3D检测方法及装置、车辆智能控制方法及装置、避障导航方法及装置、电子设备、计算机可读存储介质以及计算机程序,本公开中的对点云数据进行特征提取,并基于提取的特征信息对点云数据进行语义分割,这一部分相当于底层数据分析;本公开中的基于语义分割结果生成并确定目标对象的3D检测框,这一部分相当于上层数据分析,因此,本公开在目标对象3D检测过程中,形成了自底而上的生成3D检测框的方式,这样,不仅可以避免先对点云数据进行投影处理,再利用投影处理后获得的图像进行3D检测框检测,而导致的点云数据的原始信息损失的现象;还可以避免利用摄像装置拍摄的2D图像进行3D检测框检测时,由于2D图像中的目标对象(如车辆或者障碍物等)被遮挡,而导致的影响3D检测框检测的现象。由上述描述可知,本公开提供的技术方案 有利于提高3D检测框检测性能。Based on a target object 3D detection method and device, vehicle intelligent control method and device, obstacle avoidance navigation method and device, electronic equipment, computer-readable storage medium, and computer program provided in the present disclosure, the point cloud data in the present disclosure Feature extraction, and semantic segmentation of point cloud data based on the extracted feature information, this part is equivalent to the underlying data analysis; the 3D detection frame generated and determined based on the semantic segmentation results in this disclosure is equivalent to the upper layer data analysis Therefore, in the 3D detection process of the target object, the present disclosure has formed a bottom-up way to generate a 3D detection frame. In this way, not only can the projection processing of the point cloud data be avoided first, and then the image obtained after the projection processing can be used to perform 3D detection frame detection, resulting in the loss of the original information of the point cloud data; it can also avoid the detection of the 3D detection frame using the 2D image taken by the camera device due to the target object in the 2D image (such as a vehicle or an obstacle) It is blocked, which causes the phenomenon that affects the 3D detection frame detection. As can be seen from the above description, the technical solution provided by the present disclosure is beneficial to improve the detection performance of the 3D detection frame.
下面通过附图和实施方式,对本公开的技术方案做进一步的详细描述。The technical solutions of the present disclosure will be further described in detail below through the accompanying drawings and embodiments.
附图说明BRIEF DESCRIPTION
构成说明书的一部分的附图描述了本公开的实施方式,并连同描述一起用于解释本公开的原理。The drawings that form a part of the specification describe the embodiments of the present disclosure, and together with the description serve to explain the principles of the present disclosure.
参照附图,根据下面的详细描述,可以更加清楚地理解本公开,其中:Referring to the drawings, the present disclosure can be more clearly understood from the following detailed description, in which:
图1为本公开的目标对象3D检测方法一个实施方式的流程图;1 is a flowchart of an embodiment of a 3D detection method for a target object of the present disclosure;
图2为本公开的目标对象3D检测方法另一个实施方式的流程图;FIG. 2 is a flowchart of another embodiment of the target object 3D detection method of the present disclosure;
图3为本公开的第一阶段神经网络的一结构示意图;3 is a schematic structural diagram of a first-stage neural network of the present disclosure;
图4为本公开的第一阶段神经网络的又一结构示意图;4 is another schematic structural diagram of the first-stage neural network of the present disclosure;
图5为本公开的第二阶段神经网络的一结构示意图;5 is a schematic structural diagram of a second-stage neural network of the present disclosure;
图6为本公开的车辆智能控制方法的一个实施方式的流程图;6 is a flowchart of an embodiment of a vehicle intelligent control method of the present disclosure;
图7为本公开的避障导航方法的一个实施方式的流程图;7 is a flowchart of an embodiment of an obstacle avoidance navigation method of the present disclosure;
图8为本公开的目标对象3D装置一个实施方式的结构示意图;8 is a schematic structural diagram of an embodiment of a target object 3D device of the present disclosure;
图9为本公开的车辆智能控制装置的一个实施方式的结构示意图;9 is a schematic structural diagram of an embodiment of a vehicle intelligent control device of the present disclosure;
图10为本公开的避障导航装置的一个实施方式的结构示意图;10 is a schematic structural diagram of an embodiment of an obstacle avoidance navigation device of the present disclosure;
图11为实现本公开实施方式的一示例性设备的框图。11 is a block diagram of an exemplary device that implements an embodiment of the present disclosure.
具体实施例Specific examples
现在将参照附图来详细描述本公开的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。Various exemplary embodiments of the present disclosure will now be described in detail with reference to the drawings. It should be noted that the relative arrangement of components and steps, numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。对于相关领域普通技术人员已知的技术、方法以及设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。At the same time, it should be understood that, for ease of description, the dimensions of the various parts shown in the drawings are not drawn according to the actual proportional relationship. The following description of at least one exemplary embodiment is actually merely illustrative, and in no way serves as any limitation to the present disclosure and its application or use. Techniques, methods, and equipment known to those of ordinary skill in the related art may not be discussed in detail, but where appropriate, the techniques, methods, and equipment should be considered as part of the specification.
应当注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。本公开实施例可以应用于终端设备、计算机系统及服务器等电子设备,其可与众多其它通用或者专用的计算系统环境或者配置一起操作。适于与终端设备、计算机系统以及服务器等电子设备一起使用的众所周知的终端设备、计算系统、环境和/或配置的例子,包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。It should be noted that similar reference numerals and letters indicate similar items in the following drawings, therefore, once an item is defined in one drawing, there is no need to discuss it further in subsequent drawings. The embodiments of the present disclosure can be applied to electronic devices such as terminal devices, computer systems, and servers, which can operate together with many other general-purpose or special-purpose computing system environments or configurations. Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, and servers, including but not limited to: personal computer systems, server computer systems, thin clients, thick Clients, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, etc. .
终端设备、计算机系统以及服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑以及数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。Electronic devices such as terminal devices, computer systems, and servers may be described in the general context of computer system executable instructions (such as program modules) executed by the computer system. Generally, program modules may include routines, programs, target programs, components, logic, and data structures, etc., which perform specific tasks or implement specific abstract data types. The computer system/server can be implemented in a distributed cloud computing environment, where tasks are performed by remote processing devices linked through a communication network. In a distributed cloud computing environment, program modules may be located on local or remote computing system storage media including storage devices.
示例性实施例Exemplary embodiment
图1为本公开的目标对象3D检测方法一个实施例的流程图。FIG. 1 is a flowchart of an embodiment of a target object 3D detection method of the present disclosure.
S100、提取获取到的场景的点云数据的特征信息。S100. Extract feature information of the acquired point cloud data of the scene.
在一个可选示例中,本公开中的场景可以是指基于视觉的展现画面。例如,通过摄像装置摄取的图像以及通过激光雷达扫描所获得的点云数据(Point Cloud Data),所展现出的视觉画面,均可以认为是一场景。In an optional example, the scene in the present disclosure may refer to a visual-based presentation screen. For example, the image captured by the camera and the point cloud data (Point Cloud Data) obtained by the lidar scan can be regarded as a scene.
在一个可选示例中,本公开中的点云数据通常是指,以点的形式记录的扫描信息。例如,通过激光雷达扫描,而获得的点云数据。点云数据中的每一个点可以由多种信息来描述,也可以认为,点云数据中的每一个点通常包括有多种信息,例如,可以包括但不限于以下一种或多种:该点的三维坐标、颜色信息(如RGB信息等)以及反射强度(Intensity)信息等。也就是说,点云数据中的一个点可以通过三维坐标、颜色信息、反射强度信息等一种或多种信息来描述。In an optional example, the point cloud data in the present disclosure generally refers to scanning information recorded in the form of points. For example, point cloud data obtained through lidar scanning. Each point in the point cloud data can be described by a variety of information, and it can also be considered that each point in the point cloud data usually includes a variety of information, for example, it may include but is not limited to one or more of the following: Three-dimensional coordinates of points, color information (such as RGB information, etc.), and reflection intensity (Intensity) information, etc. In other words, a point in the point cloud data can be described by one or more types of information such as three-dimensional coordinates, color information, and reflection intensity information.
在一个可选示例中,本公开可以利用神经网络中的至少一卷积层,对点云数据进行处理,从而形成点云数据的特征信息(feature map),例如,为点云数据中的每一个点分别形成一个特征信息。由于本次形成的点云数据的特征信息,是在考虑点云数据的整个空间范围中的所有点的情况下,为每一个点分别形成的特征信息,因此,本次形成的特征信息,可以称为全局特征信息。In an optional example, the present disclosure may utilize at least one convolutional layer in the neural network to process the point cloud data to form feature maps of the point cloud data, for example, for each point cloud data Each point forms a piece of feature information. Since the feature information of the point cloud data formed this time is the feature information formed separately for each point in consideration of all points in the entire spatial range of the point cloud data, therefore, the feature information formed this time can be This is called global feature information.
S110、根据点云数据的特征信息,对点云数据进行语义分割,获得点云数据中的多个点的第一语 义信息。S110. Perform semantic segmentation on the point cloud data according to the feature information of the point cloud data to obtain first semantic information of multiple points in the point cloud data.
在一个可选示例中,本公开可以利用神经网络对点云数据进行语义分割,神经网络可以为点云数据中的部分点,甚至为点云数据中的每一个点,分别形成一第一语义信息。例如,在将点云数据提供给神经网络,并由神经网络提取出了点云数据的特征信息后,神经网络继续对点云数据的特征信息进行处理,以获得点云数据中的多个点的第一语义信息。In an optional example, the present disclosure can use a neural network to perform semantic segmentation on point cloud data. The neural network can form a first semantic for each point in the point cloud data, or even for each point in the point cloud data. information. For example, after the point cloud data is provided to the neural network, and the feature information of the point cloud data is extracted by the neural network, the neural network continues to process the feature information of the point cloud data to obtain multiple points in the point cloud data The first semantic information.
在一个可选示例中,本公开中的点的第一语义信息通常是指,在考虑整个点云数据的情况下,为该点生成的语义特征(Semantic Feature),因此,第一语义信息可以称为第一语义特征或者全局语义特征。本公开中的点的全局语义特征通常可以表现为:包括多个(如256个)元素的一维向量数组的形式。本公开中的全局语义特征也可以称为全局语义特征向量。In an optional example, the first semantic information of a point in the present disclosure generally refers to a semantic feature (SemanticFeature) generated for the point in consideration of the entire point cloud data. Therefore, the first semantic information can be This is called the first semantic feature or global semantic feature. The global semantic features of points in the present disclosure can generally be expressed in the form of a one-dimensional vector array including multiple (eg, 256) elements. The global semantic features in this disclosure may also be referred to as global semantic feature vectors.
在一个可选示例中,本公开中的前景点和背景点是针对目标对象而言的,可选的,属于一个目标对象的点即为该目标对象的前景点,而不属于该目标对象的点即为该目标对象的背景点。在场景中包括多个目标对象的情况下,针对其中一个目标对象而言,属于该目标对象的点为该目标对象的前景点,但是,由于该点不属于其他目标对象,因此,该点是其他目标对象的背景点。In an optional example, the front sights and background points in the present disclosure are for the target object. Optionally, the points belonging to a target object are the front sights of the target object, but not the target object. The point is the background point of the target object. In the case where multiple target objects are included in the scene, for one of the target objects, the point belonging to the target object is the front sight of the target object, but since the point does not belong to other target objects, the point is Background points of other target objects.
在一个可选示例中,在点云数据中的点包括:一目标对象的前景点以及该目标对象的背景点的情况下,本公开所获得的多个点的第一语义信息通常包括:该目标对象的前景点的全局语义特征以及该目标对象的背景点的全局语义特征。本公开中的场景可以包括一个或者多个目标对象。本公开中的目标对象包括但不限于:车辆、非机动车辆、行人和/或障碍物等。In an optional example, in the case where the points in the point cloud data include: a target object’s front sight and a background point of the target object, the first semantic information of the multiple points obtained by the present disclosure generally includes: The global semantic features of the front point of the target object and the global semantic features of the background point of the target object. The scene in the present disclosure may include one or more target objects. Target objects in this disclosure include, but are not limited to: vehicles, non-motor vehicles, pedestrians, and/or obstacles, and the like.
S120、根据第一语义信息预测多个点中对应目标对象的至少一个前景点。S120. Predict at least one front scenic spot corresponding to the target object among multiple points according to the first semantic information.
在一个可选示例中,本公开可以利用神经网络来预测多个点中对应目标对象的至少一个前景点,神经网络可以为点云数据中的部分点,甚至为点云数据中的每一个点,分别进行预测,以生成该点为前景点的置信度。一个点的置信度可以表示出:该点为前景点的概率。例如,在将点云数据提供给神经网络,由神经网络提取出了点云数据的特征信息,并由神经网络执行语义分割处理后,由该神经网络继续对全局语义特征进行处理,以预测点云数据中的多个点为目标对象的前景点的置信度,神经网络可以针对每一个点,分别生成置信度。本公开可以通过对神经网络生成的各置信度分别进行判断,将置信度超过预定值的点作为目标对象的前景点。In an optional example, the present disclosure may use a neural network to predict at least one front point of the corresponding target object among multiple points, the neural network may be a part of points in the point cloud data, or even each point in the point cloud data , To make predictions separately to generate the confidence level of the point as the previous scenic spot. The confidence of a point can be expressed as: the probability of the point being the front sight. For example, after the point cloud data is provided to the neural network, the feature information of the point cloud data is extracted by the neural network, and after the semantic segmentation process is performed by the neural network, the neural network continues to process the global semantic features to predict the point Multiple points in the cloud data are the confidence of the target object's front sight, and the neural network can generate the confidence for each point separately. In the present disclosure, each confidence level generated by the neural network can be judged separately, and the point whose confidence level exceeds a predetermined value can be used as the front sight of the target object.
需要特别说明的是,本公开对置信度进行判断的操作,可以在S120中执行,也可以在S130中执行。另外,如果在S120中执行了置信度的判断操作,且判断结果为不存在置信度超过预定值的点,即不存在前景点,则可以认为该场景中不存在目标对象。It should be particularly noted that the operation of determining the confidence in the present disclosure may be performed in S120 or S130. In addition, if the confidence judgment operation is performed in S120, and the judgment result is that there is no point where the confidence exceeds a predetermined value, that is, there is no previous scenic spot, it can be considered that there is no target object in the scene.
S130、根据第一语义信息生成至少一个前景点各自对应的3D初始框。S130. Generate a 3D initial frame corresponding to each of the at least one front sight according to the first semantic information.
在一个可选示例中,在S120未包括有对置信度进行判断的操作的情况下,本公开可以根据S110中获得每一个点的全局语义特征,为每一个点分别生成一个3D初始框。本公开可以通过对S120中获得的所有置信度进行判断,挑选出目标对象的前景点,并利用挑选出的前景点,从S130生成的3D初始框中进行挑选,从而可以获得各前景点各自对应的3D初始框。即S130生成的各3D初始框通常包括:前景点对应的3D初始框和背景点对应的3D初始框,从而S130需要从生成的所有3D初始框中,筛选出各前景点对应的3D初始框。In an optional example, in the case where S120 does not include an operation to determine the confidence, the present disclosure may obtain a global semantic feature of each point in S110, and generate a 3D initial frame for each point. In the present disclosure, all the confidences obtained in S120 can be judged to select the front attractions of the target object, and the selected front attractions can be used to select from the 3D initial frame generated by S130, so that each front attraction can be corresponding to each other 3D initial box. That is, each 3D initial frame generated by S130 usually includes: a 3D initial frame corresponding to the front sight and a 3D initial frame corresponding to the background point, so S130 needs to filter out the 3D initial frames corresponding to each front sight from all the generated 3D initial frames.
在一个可选示例中,在S120包括有对置信度进行判断的操作的情况下,本公开可以根据上述预测出的每一个前景点的全局语义特征,分别生成一个3D初始框,从而获得的各3D初始框均为前景点对应的3D初始框。即S130生成的各3D初始框均为前景点对应的3D初始框,也就是说,S130可以仅针对前景点生成3D初始框。In an optional example, in the case where S120 includes an operation to judge the confidence, the present disclosure may generate a 3D initial frame respectively according to the global semantic features of each of the predicted spots predicted above, thereby obtaining each The 3D initial frames are the 3D initial frames corresponding to the front sight. That is, each 3D initial frame generated by S130 is a 3D initial frame corresponding to the front sight, that is to say, S130 may generate a 3D initial frame only for the front sight.
在一个可选示例中,本公开中的3D初始框可以通过3D初始框的中心点位置信息、3D初始框的长宽高信息以及3D初始框的方向信息来描述,也就是说,本公开中的3D初始框可以包括:3D初始框的中心点位置信息、3D初始框的长宽高信息以及3D初始框的方向信息等。3D初始框也可以称为3D初始框信息。In an optional example, the 3D initial frame in the present disclosure may be described by the position information of the center point of the 3D initial frame, the length, width, and height information of the 3D initial frame, and the direction information of the 3D initial frame, that is, in the present disclosure The 3D initial frame may include position information of the center point of the 3D initial frame, length, width, and height information of the 3D initial frame, and direction information of the 3D initial frame. The 3D initial frame may also be referred to as 3D initial frame information.
在一个可选示例中,本公开可以利用神经网络生成3D初始框。例如,在将点云数据提供给神经网络,由神经网络提取出了点云数据的特征信息,并由神经网络执行语义分割处理后,由该神经网络继续对全局语义特征进行处理,以针对多个点中的每一个点分别生成一个3D初始框。再例如,在将点云数据提供给神经网络,由神经网络提取出了点云数据的特征信息,由神经网络执行语义分割处理,并由该神经网络对全局语义特征进行预测处理,以获得点云数据中的多个点为目标对象的前景点的置信度之后,神经网络可以针对置信度超过预定值的点的全局语义特征继续进行处理,以针对每一个前景点分别生成一个3D初始框。In an alternative example, the present disclosure may utilize neural networks to generate 3D initial boxes. For example, after the point cloud data is provided to the neural network, the feature information of the point cloud data is extracted by the neural network, and after the semantic segmentation process is performed by the neural network, the neural network continues to process the global semantic features to target multiple Each of the points generates a 3D initial frame. As another example, when the point cloud data is provided to the neural network, the feature information of the point cloud data is extracted by the neural network, and the neural network performs semantic segmentation processing, and the neural network performs prediction processing on the global semantic features to obtain points After multiple points in the cloud data are the confidence of the front sight of the target object, the neural network can continue to process the global semantic features of the points whose confidence exceeds the predetermined value to generate a 3D initial frame for each front sight.
由于点云数据具有一定感受野,而语义分割是基于点云数据中的所有点的特征信息进行的,因此, 语义分割所形成的语义特征不仅包括点自身的语义特征,还包括周围点的语义特征,从而本公开中的多个前景点在语义上可以指向场景中的同一个目标对象。而指向同一目标对象的不同前景点各自对应的3D初始框有一定差别,但通常差别并不大。Since point cloud data has certain receptive fields, and semantic segmentation is based on the feature information of all points in the point cloud data, the semantic features formed by semantic segmentation include not only the semantic features of the point itself, but also the semantics of surrounding points Feature, so that multiple front sights in this disclosure can semantically point to the same target object in the scene. The corresponding 3D initial frames corresponding to different front attractions that point to the same target object are somewhat different, but usually the difference is not large.
另外,如果S130根据第一语义信息生成的3D初始框中不存在前景点对应的3D初始框,则可以认为该场景中不存在目标对象。In addition, if the 3D initial frame corresponding to the front sight does not exist in the 3D initial frame generated by S130 according to the first semantic information, it may be considered that there is no target object in the scene.
S140、根据3D初始框确定场景中的目标对象的3D检测框。S140. Determine the 3D detection frame of the target object in the scene according to the 3D initial frame.
本公开最终为每个目标对象确定一个3D检测框。The present disclosure finally determines a 3D detection frame for each target object.
在一个可选示例中,本公开可以针对上述获得所有前景点各自对应的3D初始框进行冗余处理,从而获得目标对象的3D检测框,即针对点云数据进行目标对象检测,而最终获得的3D检测框。可选的,本公开可以采用3D初始框之间的重叠度,来去除冗余的3D初始框,从而获得目标对象的3D检测框。例如,本公开可以确定多个前景点对应的3D初始框之间的重叠度,对重叠度大于设定阈值的3D初始框进行筛选,以获得重叠度大于设定阈值的3D初始框,然后,从筛选出来的3D初始框中,确定出目标对象的3D检测框。可选的,本公开可以采用NMS(Non-Maximum Suppression非极大值抑制)算法,对所有前景点各自对应的3D初始框进行冗余处理,从而去除相互覆盖的冗余的3D检测框,获得最终的3D检测框。在场景中包括多个目标对象(如一个或者多个行人、一个或者多个非机动车辆、一个或多个车辆等)的情况下,本公开可以针对场景中的每一个目标对象均获得一个最终的3D检测框。In an optional example, the present disclosure may perform redundant processing on the aforementioned 3D initial frames corresponding to all the front sights, thereby obtaining a 3D detection frame of the target object, that is, performing target object detection on point cloud data, and finally obtaining 3D detection frame. Optionally, the present disclosure may use the degree of overlap between the 3D initial frames to remove redundant 3D initial frames, thereby obtaining the 3D detection frame of the target object. For example, the present disclosure may determine the degree of overlap between the 3D initial frames corresponding to multiple front sights, filter the 3D initial frames whose overlap is greater than the set threshold, to obtain the 3D initial frames whose overlap is greater than the set threshold, and then, From the filtered 3D initial frame, the 3D detection frame of the target object is determined. Optionally, the present disclosure may use the NMS (Non-Maximum Suppression Non-Maximum Suppression) algorithm to perform redundant processing on the 3D initial frames corresponding to all the front spots, thereby removing redundant 3D detection frames that overlap each other, and obtain The final 3D detection frame. In the case where multiple target objects (such as one or more pedestrians, one or more non-motorized vehicles, one or more vehicles, etc.) are included in the scene, the present disclosure can obtain a final object for each target object in the scene 3D detection box.
在一个可选示例中,本公开可以对当前获得的前景点各自对应的3D初始框进行校正(或称为优化)处理,然后,再针对校正后的所有3D初始框进行冗余处理,从而获得目标对象的3D检测框,即针对点云数据进行目标对象检测,而最终获得的3D检测框。In an optional example, the present disclosure may perform correction (or optimization) on the 3D initial frames corresponding to the currently obtained front spots, and then perform redundant processing on all the corrected 3D initial frames to obtain The 3D detection frame of the target object, that is, the 3D detection frame finally obtained by performing target object detection on the point cloud data.
在一个可选示例中,本公开对各前景点各自对应的3D初始框分别进行校正的过程,可以包括下述步骤A1、步骤B1和步骤C1:In an optional example, the process of respectively correcting the 3D initial frame corresponding to each front sight in the present disclosure may include the following steps A1, B1, and C1:
步骤A1、获取点云数据中的部分区域内的点的特征信息,其中,部分区域至少包括一3D初始框。Step A1: Acquire feature information of points in a partial area in the point cloud data, where the partial area includes at least a 3D initial frame.
可选的,本公开可以设置包含有3D初始框的3D扩展框,并获取点云数据中的3D扩展框内的各点的特征信息。本公开中的3D扩展框是点云数据中的部分区域的一种实现方式。本公开中的每一个前景点对应的3D初始框分别对应一个3D扩展框,3D扩展框所占用的空间范围通常完全覆盖并稍大于3D初始框所占用的空间范围。在通常情况下,3D初始框的任一面均不与其对应的3D扩展框的任一面处于同一平面中,3D初始框的中心点与3D扩展框的中心点相互重合,且3D初始框的任一面均与其对应的3D扩展框的相应面平行。由于这样的3D扩展框与3D初始框的位置关系较规范,因此,有利于降低形成3D扩展框的难度,从而有利于降低本公开的实现难度。当然,本公开也不排除虽然两个中心点不重合,但是3D初始框的任一面均与其对应的3D扩展框的相应面平行的情况。Optionally, the present disclosure may set a 3D expansion frame containing a 3D initial frame, and obtain feature information of each point in the 3D expansion frame in the point cloud data. The 3D expansion box in the present disclosure is an implementation of partial regions in point cloud data. The 3D initial frame corresponding to each front sight in the present disclosure respectively corresponds to a 3D expansion frame, and the space range occupied by the 3D expansion frame generally completely covers and is slightly larger than the space range occupied by the 3D initial frame. Under normal circumstances, any surface of the 3D initial frame is not in the same plane as any surface of its corresponding 3D expansion frame, the center point of the 3D initial frame and the center point of the 3D expansion frame coincide with each other, and any surface of the 3D initial frame Both are parallel to the corresponding faces of their corresponding 3D expansion boxes. Since the positional relationship between such a 3D extension frame and the 3D initial frame is relatively standardized, it is beneficial to reduce the difficulty of forming a 3D extension frame, thereby helping to reduce the implementation difficulty of the present disclosure. Of course, the present disclosure does not exclude the case that although the two center points do not coincide, any face of the 3D initial frame is parallel to the corresponding face of the corresponding 3D extension frame.
可选的,本公开可以根据预先设定的X轴方向增量(如20厘米)、Y轴方向增量(如20厘米)以及Z轴方向增量(如20厘米)中的至少一个,对前景点对应的3D初始框,进行3D空间扩展,从而形成两个中心点相互重合、且相应的面相互平行的包含有3D初始框的3D扩展框。Optionally, the present disclosure may be based on at least one of a preset X-axis direction increment (such as 20 cm), a Y-axis direction increment (such as 20 cm), and a Z-axis direction increment (such as 20 cm). The 3D initial frame corresponding to the front sight is expanded in 3D space, so as to form a 3D expansion frame including the 3D initial frame where two center points coincide with each other and the corresponding surfaces are parallel to each other.
可选的,本公开中的增量可以根据实际需求设置,例如,相应方向的增量不超过3D初始框的相应边长的N(如N大于4等)分之一等,可选的,X轴方向增量不超过3D初始框的长的十分之一,Y轴方向增量不超过3D初始框的宽的十分之一,Z轴方向增量不超过3D初始框的高的十分之一。另外,X轴方向增量、Y轴方向增量以及Z轴方向增量可以相同,也可以不相同。Optionally, the increment in the present disclosure can be set according to actual needs, for example, the increment in the corresponding direction does not exceed N (such as N greater than 4) of the corresponding side length of the 3D initial frame, optional, The increment in the X axis direction does not exceed one tenth of the length of the 3D initial frame, the increment in the Y axis direction does not exceed one tenth of the width of the 3D initial frame, and the increment in the Z axis direction does not exceed ten times the height of the 3D initial frame One in one. In addition, the increment in the X-axis direction, the increment in the Y-axis direction, and the increment in the Z-axis direction may be the same or different.
可选的,假定第i个3D初始框b i可以表示为:b i=(x i,y i,z i,h i,w i,l ii),其中,x i、y i和z i分别表示第i个3D初始框的中心点的坐标,h i、w i和l i分别表示第i个3D初始框的高宽长,θ i表示第i个3D初始框的方向,例如,在鸟瞰图中,第i个3D初始框的长与X坐标轴的夹角为θ i;那么,第i个3D初始框对应的3D扩展框
Figure PCTCN2019118126-appb-000001
可以表示为:
Alternatively, assume that the i-th 3D initial frame b i can be expressed as: b i = (x i , y i , z i , h i , w i , l i , θ i ), where x i , y i And z i denote the coordinates of the center point of the i-th 3D initial frame, h i , w i and l i denote the height, width and length of the i-th 3D initial frame, and θ i represents the direction of the i-th 3D initial frame, For example, in a bird's eye view, the angle between the length of the i-th 3D initial frame and the X coordinate axis is θ i ; then, the 3D expansion frame corresponding to the i-th 3D initial frame
Figure PCTCN2019118126-appb-000001
It can be expressed as:
Figure PCTCN2019118126-appb-000002
Figure PCTCN2019118126-appb-000002
其中,η表示增量。Among them, η represents increment.
可选的,本公开可以利用神经网络获取点云数据中的部分区域内的点的特征信息,例如,将点云数据中的部分区域内的所有点作为输入,提供给神经网络,由神经网络中的至少一卷积层,对部分区域内的点云数据进行处理,从而可以为部分区域内的各点分别形成特征信息。本次形成的特征信息可以称为局部特征信息。由于本次形成的点云数据的特征信息,是在考虑点云数据的部分区域内的所有点的情况下,为部分区域内的每一个点分别形成的特征信息,因此,本次形成的特征信息,可以称为 局部特征信息。Optionally, the present disclosure may use a neural network to obtain feature information of points in a part of the area in the point cloud data, for example, all points in the part of the area in the point cloud data are used as input to the neural network, and the neural network At least one convolutional layer in processes the point cloud data in the partial area, so that feature information can be formed for each point in the partial area. The feature information formed this time may be referred to as local feature information. The feature information of the point cloud data formed this time is the feature information separately formed for each point in the partial area when considering all the points in the partial area of the point cloud data. Therefore, the features formed this time Information can be called local feature information.
步骤B1、根据部分区域内的点的特征信息对部分区域内的点进行语义分割,获得部分区域内的点的第二语义信息。Step B1: Perform semantic segmentation on the points in the partial area according to the feature information of the points in the partial area to obtain second semantic information on the points in the partial area.
可选的,本公开中的点的第二语义信息是指:在考虑3D扩展框形成的空间范围中的所有点的情况下,为该点形成的语义特征向量。本公开中的第二语义信息可以称为第二语义特征或者局部空间语义特征。一个局部空间语义特征同样可以表现为:包括多个(如256个)元素的一维向量数组的形式。Optionally, the second semantic information of a point in the present disclosure refers to: a semantic feature vector formed for the point in consideration of all points in the spatial range formed by the 3D extension box. The second semantic information in this disclosure may be referred to as a second semantic feature or a local spatial semantic feature. A local spatial semantic feature can also be expressed in the form of a one-dimensional vector array including multiple (eg, 256) elements.
本公开可以利用神经网络,获取3D扩展框中的所有点的局部空间语义特征,利用神经网络获取点的局部空间语义特征的方式可以包括下述的步骤a和步骤b:In the present disclosure, a neural network may be used to obtain local spatial semantic features of all points in the 3D expansion box, and a method of using neural networks to obtain local spatial semantic features of points may include the following steps a and b:
a、先根据3D扩展框的预设目标位置,对位于3D扩展框内的点云数据的坐标信息进行坐标变换,使位于3D扩展框内的各点的坐标发生位移,从而使3D扩展框进行位移和旋转(3D扩展框的方向调整),进而变换到该3D扩展框的预设目标位置。可选的,3D扩展框的预设目标位置可以包括:3D扩展框的中心点(即3D初始框的中心点)位于坐标原点,且3D扩展框的长与X轴平行等。可选的,上述坐标原点和X轴可以是点云数据的坐标系的坐标原点和X轴,当然,也可以是其他坐标系的坐标原点和X轴。a. First, according to the preset target position of the 3D extension frame, coordinate transformation is performed on the coordinate information of the point cloud data located in the 3D extension frame, so that the coordinates of the points located in the 3D extension frame are displaced, so that the 3D extension frame performs Displacement and rotation (direction adjustment of the 3D expansion frame), and then transform to the preset target position of the 3D expansion frame. Optionally, the preset target position of the 3D extension frame may include: the center point of the 3D extension frame (that is, the center point of the 3D initial frame) is located at the origin of coordinates, and the length of the 3D extension frame is parallel to the X axis. Optionally, the above coordinate origin and X axis may be the coordinate origin and X axis of the coordinate system of the point cloud data, and of course, may also be the coordinate origin and X axis of other coordinate systems.
续前例,假定第i个3D初始框b i可以表示为:b i=(x i,y i,z i,h i,w i,l ii),其中,x i、y i和z i分别表示第i个3D初始框的中心点的坐标,h i、w i和l i分别表示第i个3D初始框的高宽长,θ i表示第i个3D初始框的方向,例如,在鸟瞰图中,第i个3D初始框的长与X坐标轴的夹角为θ i;那么,在对包含有第i个3D初始框的3D扩展框,进行坐标变换后,本公开获得了一个新的3D初始框
Figure PCTCN2019118126-appb-000003
该新的3D初始框
Figure PCTCN2019118126-appb-000004
可以表示为:
Figure PCTCN2019118126-appb-000005
Continuing the previous example, assume that the i-th 3D initial frame b i can be expressed as: b i = (x i , y i , z i , h i , w i , l i , θ i ), where x i , y i and Z i represent the coordinates of the center point of the i-th frame of the original 3D, h i, w i L i and a high width to length respectively represent the i-th original frame 3D, [theta] i represents the i-th frame of the original 3D directions, e.g. In the bird's eye view, the angle between the length of the i-th 3D initial frame and the X coordinate axis is θ i ; then, after performing coordinate transformation on the 3D extended frame that contains the i-th 3D initial frame, the present disclosure obtains A new 3D initial box
Figure PCTCN2019118126-appb-000003
The new 3D initial box
Figure PCTCN2019118126-appb-000004
It can be expressed as:
Figure PCTCN2019118126-appb-000005
也就是说,该新的3D初始框
Figure PCTCN2019118126-appb-000006
的中心点位于坐标原点,且在鸟瞰图中,该新的3D初始框
Figure PCTCN2019118126-appb-000007
的长与X坐标轴之间的夹角为0。
In other words, the new 3D initial box
Figure PCTCN2019118126-appb-000006
The center point of is located at the origin of coordinates, and in a bird’s eye view, the new 3D initial frame
Figure PCTCN2019118126-appb-000007
The angle between the length of and the X coordinate axis is 0.
本公开的上述坐标变换方式可以称为正则化坐标变换。本公开针对一个点进行坐标转换,通常只会改变该点的坐标信息,而不会改变一个点的其他信息。本公开通过执行正则化坐标变换的操作,可以使不同的3D初始框中的各点的坐标集中在一个大致的范围中,从而有利于神经网络的训练,即有利于提高神经网络形成局部空间语义特征的准确性,进而有利于提高对3D初始框校正的准确性。可以理解,上述坐标变换的数据方式仅为一个可选示例,本领域技术人员还可以采用使得坐标变换到一定范围的其他变换方式。The above coordinate transformation manner of the present disclosure may be referred to as regularized coordinate transformation. The present disclosure performs coordinate conversion on a point, and usually only changes the coordinate information of the point, but does not change other information of a point. By performing the operation of regularized coordinate transformation in the present disclosure, the coordinates of the points in different 3D initial frames can be concentrated in a rough range, which is beneficial to the training of the neural network, that is, to improving the neural network to form local spatial semantics The accuracy of the features, in turn, helps to improve the accuracy of the 3D initial frame correction. It can be understood that the data method of coordinate transformation described above is only an optional example, and those skilled in the art may also adopt other transformation methods that transform the coordinates to a certain range.
b、将坐标转换后的点云数据(即坐标转换后的位于3D扩展框内的点云数据),提供给神经网络,由神经网络对接收到的点进行语义分割处理,为位于3D扩展框内的各点分别生成局部空间语义特征。b. The coordinate-converted point cloud data (that is, the coordinate-converted point cloud data located in the 3D extension box) is provided to the neural network, and the neural network performs semantic segmentation processing on the received points to be located in the 3D extension box. Each point within generates a local spatial semantic feature.
可选的,本公开可以根据上述步骤中生成的为前景点的置信度,形成前景点掩膜(例如,将置信度超过预定值(如0.5等)的点设置为1,而将置信度未超过预定值的点设置为0,从而形成前景点掩膜)。本公开可以将前景点掩膜和坐标转换后的点云数据一起提供给神经网络,使神经网络在进行语义处理时,可以参考前景点掩膜,从而有利于提高局部空间语义特征的描述准确性。Optionally, the present disclosure may form a front sight mask (eg, set the point where the confidence exceeds a predetermined value (such as 0.5, etc.) to 1 according to the confidence generated for the front sight in the above steps, and set the confidence to Points exceeding a predetermined value are set to 0, thereby forming a mask of the front sight). The present disclosure can provide the front sight mask and the coordinate-transformed point cloud data together to the neural network, so that the neural network can refer to the front sight mask when performing semantic processing, thereby helping to improve the description accuracy of local spatial semantic features .
步骤C1、根据部分区域内的点第一语义信息和第二语义信息,形成校正后的3D初始框。Step C1: Form the corrected 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area.
可选的,本公开获取3D扩展框中的多个点的全局语义特征的方式可以为:首先,根据点云数据中的各点坐标信息,判断各点是否属于3D扩展框的空间范围(即是否位于3D扩展框内,可以包括:位于3D扩展面的任一表面),针对一个点而言,如果该点所在的位置属于3D扩展框的空间范围,则可以将该点作为属于3D扩展框的点;如果该点所在的位置不属于3D扩展框的空间范围,则不会将该点作为属于3D扩展框的点。然后,根据点云数据中的多个点(如所有点)的全局语义特征,确定属于3D扩展框的所有点的全局语义特征。可选的,本公开在判定一个点属于3D扩展框的情况下,可以从前述获得的各点的全局语义特征中,查找到该点的全局语义特征,以此类推,本公开可以获得属于3D扩展框的所有点的全局语义特征。Optionally, the method for obtaining the global semantic characteristics of multiple points in the 3D extension frame in the present disclosure may be: first, according to the coordinate information of each point in the point cloud data, determine whether each point belongs to the spatial range of the 3D extension frame Whether it is located in the 3D extension frame, may include: located on any surface of the 3D extension surface), for a point, if the position of the point belongs to the spatial range of the 3D extension frame, the point can be regarded as belonging to the 3D extension frame Point; if the position of the point does not belong to the spatial range of the 3D extension box, the point will not be regarded as a point belonging to the 3D extension box. Then, according to the global semantic features of multiple points (such as all points) in the point cloud data, the global semantic features of all points belonging to the 3D extension frame are determined. Optionally, in the case of determining that a point belongs to a 3D extension box, the global semantic feature of the point can be found from the global semantic features of each point obtained by the foregoing, and so on, the present disclosure can obtain 3D Global semantic features of all points of the expansion box.
可选的,本公开可以由神经网络对各点的全局语义特征和局部语义特征进行处理,并根据神经网络的处理结果,获得校正后的3D初始框。例如,神经网络对3D扩展框中的点的全局语义特征和局部空间语义特征,进行编码处理,获得用于描述该3D扩展框中的3D初始框的特征,并经由神经网络根据用于描述3D初始框的特征,预测3D初始框为目标对象的置信度,经由神经网络根据用于描述3D初始框的特征,对3D初始框进行调整,从而获得校正后的3D初始框。通过对3D初始框进行校正,有利于3D初始框的准确性,从而有利于提高3D检测框的准确性。Optionally, in the present disclosure, the neural network can process the global semantic features and local semantic features of each point, and obtain the corrected 3D initial frame according to the processing result of the neural network. For example, the neural network encodes the global semantic features and local spatial semantic features of the points in the 3D extension box to obtain the characteristics of the 3D initial box used to describe the 3D extension box, and uses the neural network to describe the 3D The characteristics of the initial frame predict the confidence of the 3D initial frame as the target object, and adjust the 3D initial frame according to the characteristics used to describe the 3D initial frame via the neural network, thereby obtaining the corrected 3D initial frame. By correcting the 3D initial frame, it is beneficial to the accuracy of the 3D initial frame, thereby helping to improve the accuracy of the 3D detection frame.
可选的,本公开可以将3D扩展框中的每一个点的全局语义特征和局部空间语义特征进行拼接, 例如,对于3D扩展框中的任一个点而言,将该点的全局语义特征和局部空间语义特征拼接在一起,形成拼接后的语义特征,将各点拼接后的语义特征均作为输入,提供给神经网络,以便于神经网络对拼接后的语义特征进行编码处理,神经网络生成编码处理后的用于描述该3D扩展框中的3D初始框的特征(下述简称为编码处理后的特征)。Optionally, in the present disclosure, the global semantic feature and local spatial semantic feature of each point in the 3D extension box can be stitched. For example, for any point in the 3D extension box, the global semantic feature and the point The local spatial semantic features are stitched together to form the stitched semantic features. The stitched semantic features are used as input to the neural network to facilitate the neural network to encode the stitched semantic features, and the neural network generates the encoding After processing, it is used to describe the characteristics of the 3D initial frame in the 3D extension frame (hereinafter referred to as the characteristics after the encoding process).
可选的,神经网络在形成编码处理后的特征后,可以针对输入的每一个编码处理后的特征,分别预测该3D初始框为目标对象的置信度,并针对每一个3D初始框,分别形成置信度。该置信度可以表示校正后的3D初始框为目标对象的概率。同时,神经网络可以针对输入的每一个编码处理后的特征,分别形成一个新的3D初始框(即校正后的3D初始框)。例如,神经网络根据输入的每一个编码处理后的特征,分别形成新的3D初始框的中心点位置信息、新的3D初始框的长宽高信息以及新的3D初始框的方向信息等。Optionally, after forming the encoding-processed features, the neural network can predict the confidence of the 3D initial frame as the target object for each input encoded feature, and for each 3D initial frame, form Confidence. The confidence level can represent the probability that the corrected 3D initial frame is the target object. At the same time, the neural network can form a new 3D initial frame (that is, the corrected 3D initial frame) for each input processed feature. For example, the neural network respectively forms the position information of the center point of the new 3D initial frame, the length, width, and height information of the new 3D initial frame, and the direction information of the new 3D initial frame according to the input features after each encoding process.
本公开针对校正后的所有3D初始框进行冗余处理,从而获得目标对象的3D检测框的过程可以参见上述相应的描述,在此不再详细说明。The present disclosure performs redundant processing on all the 3D initial frames after correction, so as to obtain the process of obtaining the 3D detection frame of the target object. Please refer to the corresponding descriptions above, which will not be described in detail here.
如图2所示,本公开的目标对象3D检测方法的一个实施方式包括步骤:S200和S210。下面对图2中的各步骤分别进行详细描述。As shown in FIG. 2, one embodiment of the target object 3D detection method of the present disclosure includes steps: S200 and S210. Each step in FIG. 2 is described in detail below.
S200、将点云数据提供给神经网络,经由该神经网络对点云数据中的点进行特征提取处理,根据提取出的特征信息对点云数据进行语义分割处理,获得多个点的语义特征,并根据语义特征,预测多个点中的前景点,并生成多个点中的至少部分点各自对应的3D初始框。S200. Provide point cloud data to a neural network, perform feature extraction processing on points in the point cloud data via the neural network, and perform semantic segmentation processing on the point cloud data according to the extracted feature information to obtain semantic features of multiple points, According to the semantic features, the front points of the multiple points are predicted, and a 3D initial frame corresponding to at least some of the multiple points is generated.
在一个可选示例中,本公开中的神经网络主要用于,针对输入的点云数据中的多个点(如点云数据中的所有点或者多数点),分别生成一个3D初始框,从而使点云数据中的多个点中的每一个点均对应一个3D初始框。由于点云数据中的多个点(如每一个点)通常包含有前景点和背景点,因此,本公开的神经网络所生成的3D初始信息框通常包含有:前景点对应的3D初始框和背景点对应的3D初始框。In an optional example, the neural network in the present disclosure is mainly used to generate a 3D initial frame for multiple points in the input point cloud data (such as all points or multiple points in the point cloud data), thereby Make each of the multiple points in the point cloud data correspond to a 3D initial frame. Since multiple points (such as each point) in the point cloud data usually contain the front sight and the background point, the 3D initial information frame generated by the neural network of the present disclosure usually includes: the 3D initial frame corresponding to the front sight and the The 3D initial frame corresponding to the background point.
由于本公开的神经网络的输入为点云数据,神经网络对点云数据进行特征提取,并基于提取的特征信息对点云数据进行语义分割,属于底层数据分析;又由于本公开的神经网络基于语义分割结果生成3D初始框,这一部分相当于上层数据分析,因此,本公开在目标对象3D检测过程中,形成了自底而上的生成3D检测框的方式。本公开的神经网络通过采用自底而上的生成方式生成3D初始框,不仅可以避免对点云数据进行投影处理,再利用投影处理后获得的图像进行3D检测框检测,而导致的点云数据的原始信息损失的现象,且原始信息损失的现象,不利于提高3D检测框检测的性能;而且,本公开还可以避免利用摄像装置拍摄的2D图像进行3D检测框检测时,由于2D图像中的目标对象(如车辆或者障碍物等)被遮挡,而导致的影响3D检测框检测的现象,且该现象同样不利于提高3D检测框检测的性能。由此可知,本公开的神经网络通过采用自底而上的生成方式生成3D初始框,有利于提高3D检测框检测性能。Since the input of the neural network of the present disclosure is point cloud data, the neural network performs feature extraction on the point cloud data and performs semantic segmentation on the point cloud data based on the extracted feature information, which belongs to the underlying data analysis; and because the neural network of the present disclosure is based on The result of semantic segmentation generates a 3D initial frame, which is equivalent to upper-layer data analysis. Therefore, in the process of 3D detection of a target object, the present disclosure forms a bottom-up way to generate a 3D detection frame. The neural network of the present disclosure generates a 3D initial frame by using a bottom-up generation method, which not only avoids the projection processing of point cloud data, but also uses the image obtained after the projection processing to perform 3D detection frame detection, resulting in point cloud data The phenomenon of loss of original information, and the phenomenon of loss of original information, are not conducive to improving the performance of 3D detection frame detection; Moreover, the present disclosure can also avoid the use of 2D images taken by the camera device for 3D detection frame detection, due to the The target object (such as a vehicle or an obstacle) is blocked, resulting in a phenomenon that affects the detection of the 3D detection frame, and this phenomenon is also not conducive to improving the performance of the 3D detection frame detection. It can be seen from this that the neural network of the present disclosure generates a 3D initial frame by using a bottom-up generation method, which is beneficial to improve the detection performance of the 3D detection frame.
在一个可选示例中,本公开中的神经网络可以被划分为多个部分,每一部分可以分别由一个小神经网络(也可以称为神经网络单元或者神经网络模块等)实现,即本公开的神经网络由多个小神经网络组成。由于本公开的神经网络的部分结构可以采用RCNN(Regions with Convolutional Neural Network,区域卷积神经网络)的结构,因此,本公开的神经网络可以称为PointRCNN(Point Regions with Convolutional Neural Network,基于点的区域卷积神经网络)。In an optional example, the neural network in the present disclosure may be divided into multiple parts, and each part may be implemented by a small neural network (also called a neural network unit or a neural network module, etc.), that is, the The neural network consists of multiple small neural networks. Since part of the structure of the neural network of the present disclosure can adopt the structure of RCNN (Regions with Convolutional Neural Network), the neural network of the present disclosure can be called Point RCNN (Point Regions with Convolutional Neural Network). Regional Convolutional Neural Network).
在一个可选示例中,本公开的神经网络所生成的3D初始框可以包括:3D初始框的中心点位置信息(如中心点的坐标)、3D初始框的长宽高信息以及3D初始框的方向信息(如3D初始框的长与X坐标轴的夹角)等。当然,本公开所形成的3D初始框也可以包括:3D初始框底面或者顶面的中心点位置信息、3D初始框的长宽高信息和3D初始框的方向信息等。本公开不限制3D初始框的具体表现形式。In an optional example, the 3D initial frame generated by the neural network of the present disclosure may include: position information of the center point of the 3D initial frame (such as coordinates of the center point), length, width, and height information of the 3D initial frame, and the Direction information (such as the angle between the length of the 3D initial frame and the X coordinate axis), etc. Of course, the 3D initial frame formed by the present disclosure may also include: position information of the center point of the bottom or top surface of the 3D initial frame, length, width, and height information of the 3D initial frame, and direction information of the 3D initial frame. The present disclosure does not limit the specific expression form of the 3D initial frame.
在一个可选示例中,本公开的神经网络可以包括:第一神经网络、第二神经网络以及第三神经网络。点云数据被提供给第一神经网络,第一神经网络用于:对接收到的点云数据中的多个点(如所有点)进行特征提取处理,从而为点云数据中的每一个点分别形成一全局特征信息,并根据多个点(如所有点)的全局特征信息进行语义分割处理,从而为每一个点分别形成一全局语义特征,第一神经网络输出各点的全局语义特征。可选的,点的全局语义特征通常可以表现为:包括多个(如256个)元素的一维向量数组的形式。本公开中的全局语义特征也可以称为全局语义特征向量。在点云数据中的点包括:前景点和背景点的情况下,第一神经网络输出的信息通常包括:前景点的全局语义特征以及背景点的全局语义特征。In an alternative example, the neural network of the present disclosure may include: a first neural network, a second neural network, and a third neural network. The point cloud data is provided to the first neural network. The first neural network is used to: perform feature extraction processing on multiple points (such as all points) in the received point cloud data, so as to provide each point in the point cloud data A global feature information is formed separately, and semantic segmentation processing is performed according to the global feature information of multiple points (such as all points), thereby forming a global semantic feature for each point, and the first neural network outputs the global semantic feature of each point. Optionally, the global semantic features of points can usually be expressed in the form of a one-dimensional vector array including multiple (eg, 256) elements. The global semantic features in this disclosure may also be referred to as global semantic feature vectors. In the case where the points in the point cloud data include: front spots and background points, the information output by the first neural network usually includes: the global semantic features of the front spots and the global semantic features of the background points.
可选的,本公开中的第一神经网络可以采用Point Cloud Encoder(点云数据编码器)和Point Cloud Decoder(点云数据解码器)来实现,可选的,第一神经网络可以采用PointNet++或者Pointsift网络模型等的网络结构。本公开中的第二神经网络可以采用MLP(Multi-Layer Perceptron,多层感知机)来 实现,且用于实现第二神经网络的MLP的输出维度可以为1。本公开中的第三神经网络也可以采用MLP来实现,且用于实现第三神经网络的MLP的输出维度为多维度,维度的数量与3D检测框信息所包括的信息有关。Optionally, the first neural network in the present disclosure may be implemented using Point Cloud Encoder (Point Cloud Data Encoder) and Point Cloud Decoder (Point Cloud Data Decoder). Alternatively, the first neural network may use PointNet++ or Network structure such as Pointsift network model. The second neural network in the present disclosure may be implemented using MLP (Multi-Layer Perceptron), and the output dimension of the MLP used to implement the second neural network may be 1. The third neural network in the present disclosure may also be implemented using MLP, and the output dimensions of the MLP used to implement the third neural network are multi-dimensional, and the number of dimensions is related to the information included in the 3D detection frame information.
在获得了点的全局语义特征的情况下,本公开需要利用该全局语义特征,实现前景点预测和3D初始框生成。本公开可以采用下述两种方式,实现前景点预测和3D初始框生成。In the case where the global semantic feature of the point is obtained, the present disclosure needs to use the global semantic feature to realize the prediction of the front spot and the generation of the 3D initial frame. The present disclosure can adopt the following two ways to realize the prediction of the front sight and the generation of the initial 3D frame.
方式一、第一神经网络输出的各点的全局语义特征,被同时提供给第二神经网络和第三神经网络(如图3所示)。第二神经网络用于,针对输入的每一个点的全局语义特征,分别预测该点为前景点的置信度,并针对每一个点,分别输出置信度。第二神经网络预测的置信度可以表示出点为前景点的概率。第三神经网络用于,针对输入的每一个点的全局语义特征,分别生成一个3D初始框,并输出。例如,第三神经网络根据每一个点的全局语义特征,针对每一个点分别输出3D初始框的中心点位置信息、3D初始框的长宽高信息以及3D初始框的方向信息等。Manner 1: The global semantic features of each point output by the first neural network are provided to the second neural network and the third neural network simultaneously (as shown in FIG. 3). The second neural network is used to predict the confidence of the point as the former scenic spot for each global semantic feature of the input, and output the confidence for each point. The confidence predicted by the second neural network may indicate the probability that the point is the front sight. The third neural network is used to generate a 3D initial frame for the global semantic feature of each input point and output it. For example, the third neural network outputs the position information of the center point of the 3D initial frame, the length, width, and height information of the 3D initial frame, and the direction information of the 3D initial frame for each point according to the global semantic features of each point.
由于第一神经网络输出的信息通常包括:前景点的全局语义特征以及背景点的全局语义特征;因此,第三神经网络输出的3D初始框通常包括:前景点对应的3D初始框以及背景点对应的3D初始框;然而,第三神经网络自身并不能区分其输出的各3D初始框分别是前景点对应的3D初始框,还是背景点对应的3D初始框。Since the information output by the first neural network usually includes: the global semantic feature of the front sight and the global semantic feature of the background point; therefore, the 3D initial frame output by the third neural network usually includes: the 3D initial frame corresponding to the front sight and the background point 3D initial frame; however, the third neural network itself cannot distinguish whether each output 3D initial frame is the 3D initial frame corresponding to the front sight or the 3D initial frame corresponding to the background point.
方式二、第一神经网络输出的各点的全局语义特征,先被提供给第二神经网络,由第二神经网络针对输入的每一个点的全局语义特征,分别预测点为前景点的置信度,本公开可以在判定第二神经网络输出的点为前景点的置信度超过预定值的情况下,将该点的全局语义特征提供给第三神经网络(如图4所示)。由第三神经网络针对其接收到的每一个被判断为前景点的全局语义特征,分别生成一个3D初始框,并输出各前景点各自对应的3D初始框。本公开在判定第二神经网络输出的点为前景点的置信度未超过预定值的情况下,不会将该点的全局语义特征提供给第三神经网络,因此,第三神经网络输出的所有3D初始框均为前景点对应的3D初始框。Method 2: The global semantic features of each point output by the first neural network are first provided to the second neural network, and the second neural network predicts the confidence that the point is the previous scenic spot for the input global semantic features of each point In the present disclosure, when it is determined that the point output by the second neural network is that the confidence of the front sight exceeds a predetermined value, the global semantic feature of the point is provided to the third neural network (as shown in FIG. 4). The third neural network generates a 3D initial frame for each global semantic feature that it receives as the front sight, and outputs the corresponding 3D initial frame for each front sight. The present disclosure does not provide the global semantic feature of the point to the third neural network when it is determined that the point output by the second neural network is the confidence level of the previous scenic spot does not exceed a predetermined value. Therefore, all the output of the third neural network The 3D initial frames are the 3D initial frames corresponding to the front sight.
S210、根据多个点中的前景点对应的3D检测框信息,确定最终的3D检测框。S210. Determine the final 3D detection frame according to the 3D detection frame information corresponding to the front scenic spot among the multiple points.
在一个可选示例中,在S200采用方式一的情况下,本公开可以根据第二神经网络输出的各置信度,判定第三神经网络输出的各点对应的3D初始框分别是前景点对应的3D初始框,还是背景点对应的3D初始框。例如,本公开在判断出第二神经网络输出的第一个点为前景点的置信度超过预定值时,将该点判定为前景点,从而本公开可以将第三神经网络输出的第一个点对应的3D初始框判定为前景点对应的3D初始框,以此类推,本公开可以根据第二神经网络输出的置信度,从第三神经网络输出的所有3D初始框中挑选出所有前景点对应的3D初始框。之后,本公开可以针对挑选出的所有前景点对应的3D初始框进行冗余处理,从而获得最终的3D检测框,即针对点云数据检测出的3D检测框。例如,本公开可以采用NMS(Non-Maximum Suppression非极大值抑制)算法,对当前挑选出的所有前景点各自对应的3D检测框信息进行冗余处理,从而去除相互覆盖的冗余的3D检测框,获得最终的3D检测框。In an optional example, in the case where S200 adopts the first way, the present disclosure may determine that the 3D initial frames corresponding to the points output by the third neural network are corresponding to the front attractions according to the confidences output by the second neural network, respectively The 3D initial frame is also the 3D initial frame corresponding to the background point. For example, when it is determined that the first point output by the second neural network is the confidence level of the front sight, the point is determined as the front sight, so that the present disclosure can output the first point output by the third neural network The 3D initial frame corresponding to the point is determined to be the 3D initial frame corresponding to the front sight, and so on, according to the confidence of the output of the second neural network, the present disclosure can select all the front sights from all the 3D initial frames output by the third neural network Corresponding 3D initial frame. Afterwards, the present disclosure may perform redundant processing on the 3D initial frames corresponding to all the selected front sights, thereby obtaining a final 3D detection frame, that is, a 3D detection frame detected for point cloud data. For example, the present disclosure may use the NMS (Non-Maximum Suppression Non-Maximum Suppression) algorithm to perform redundant processing on the 3D detection frame information corresponding to all currently selected front spots, thereby removing redundant 3D detections that overlap each other Frame to obtain the final 3D detection frame.
在一个可选示例中,在S200采用方式二的情况下,本公开可以根据第三神经网络输出的3D初始框,直接获得前景点对应的3D初始框,因此,本公开可以直接针对第三神经网络输出的所有3D初始框进行冗余处理,从而获得最终的3D检测框,即针对点云数据检测出的3D检测框(可参见上述实施方式中的相关描述)。例如,本公开可以采用NMS算法,对第三神经网络输出的所有3D初始框进行冗余处理,从而去除相互覆盖的冗余的3D初始框,获得最终的3D检测框。In an optional example, in the case where S200 adopts the second way, the present disclosure can directly obtain the 3D initial frame corresponding to the front sight according to the 3D initial frame output by the third neural network, therefore, the present disclosure can directly target the third nerve All the 3D initial frames output by the network are redundantly processed to obtain the final 3D detection frame, that is, the 3D detection frame detected for the point cloud data (refer to the related description in the above embodiment). For example, the present disclosure may use the NMS algorithm to perform redundant processing on all 3D initial frames output by the third neural network, thereby removing redundant 3D initial frames that overlap each other, to obtain a final 3D detection frame.
在一个可选示例中,无论是S200采用方式一,还是采用方式二,本公开在获得了前景点对应的3D初始框之后,可以对各前景点各自对应的3D初始框分别进行校正,并对校正后的各前景点各自对应的3D初始框进行冗余处理,从而获得最终的3D检测框。也就是说,本公开的神经网络生成3D检测框的过程,可以被划分为两个阶段,神经网络在第一阶段神经网络生成的3D初始框,被提供给第二阶段神经网络,由第二阶段神经网络针对第一阶段神经网络生成的3D初始框进行校正(如位置优化等),之后,本公开再根据第二阶段神经网络校正后的3D初始框确定最终的3D检测框。最终的3D检测框即本公开基于点云数据而检测出的3D检测框。然而,本公开的神经网络生成3D初始框的过程可以仅包括第一阶段神经网络,而不包括第二阶段神经网络。在神经网络生成3D初始框的过程仅包括第一阶段神经网络的情况下,本公开根据第一阶段神经网络生成的3D初始框确定最终的3D检测框,也是完全可行的。由于校正后的3D初始框往往更准确,因此,基于校正后的3D初始框确定最终的3D检测框,有利于提高3D检测框检测的准确性。本公开中的第一阶段神经网络和第二阶段神经网络均可以由能够独立存在的神经网络实现,也可以由一个完整神经网络中的部分网络结构单元组成;另外,为了便于描述,不妨将涉及到的神经网络称为第一神经网络、第二神经网络、第三神经网络、第四神经网络、第五神经网络、第六神经网络或第七神经网络,但应当理解,第一至第七神经网络中的每个均可以为独 立的神经网络,也可为一个大的神经网络中的某一些网络结构单元组成的,本公开对此并不限定。In an optional example, regardless of whether the S200 adopts the first way or the second way, after obtaining the 3D initial frame corresponding to the front sight, the present disclosure can correct the 3D initial frame corresponding to each front sight separately, and The 3D initial frames corresponding to the corrected front spots are redundantly processed to obtain the final 3D detection frame. That is to say, the process of generating the 3D detection frame by the neural network of the present disclosure can be divided into two stages. The initial 3D frame generated by the neural network in the first stage of the neural network is provided to the second stage of the neural network. The stage neural network corrects the 3D initial frame generated by the first stage neural network (such as position optimization, etc.), and then, the present disclosure determines the final 3D detection frame according to the corrected 3D initial frame of the second stage neural network. The final 3D detection frame is the 3D detection frame detected by the present disclosure based on point cloud data. However, the process of generating the 3D initial frame by the neural network of the present disclosure may include only the first-stage neural network and not the second-stage neural network. In the case where the process of generating the 3D initial frame by the neural network includes only the first-stage neural network, it is also completely feasible for the present disclosure to determine the final 3D detection frame according to the 3D initial frame generated by the first-stage neural network. Since the corrected 3D initial frame is often more accurate, determining the final 3D detection frame based on the corrected 3D initial frame is beneficial to improve the accuracy of the 3D detection frame detection. Both the first-stage neural network and the second-stage neural network in this disclosure can be implemented by neural networks that can exist independently, or can be composed of part of the network structural units in a complete neural network; in addition, for ease of description, it may be related to The received neural network is called the first neural network, the second neural network, the third neural network, the fourth neural network, the fifth neural network, the sixth neural network, or the seventh neural network, but it should be understood that the first to seventh Each of the neural networks may be an independent neural network, or may be composed of some network structural units in a large neural network, which is not limited in this disclosure.
在一个可选示例中,本公开利用神经网络对各前景点各自对应的3D初始框分别进行校正的过程,可以包括下述步骤A2、步骤B2和步骤C2:In an optional example, the process of using the neural network to correct the respective 3D initial frames corresponding to each front sight in the present disclosure may include the following steps A2, B2, and C2:
步骤A2、设置包含有3D初始框的3D扩展框,并获取3D扩展框中的点的全局语义特征。Step A2: Set a 3D expansion frame containing a 3D initial frame, and obtain global semantic features of points in the 3D expansion frame.
可选的,本公开中的每一个3D初始框对应一个3D扩展框,3D扩展框所占用的空间范围通常完全覆盖3D初始框所占用的空间范围。在通常情况下,3D初始框的任一面均不与其对应的3D扩展框的任一面处于同一平面中,3D初始框的中心点与3D扩展框的中心点相互重合,且3D初始框的任一面均与其对应的3D扩展框的相应面平行。当然,本公开也不排除虽然两个中心点不重合,但是3D初始框的任一面均与其对应的3D扩展框的相应面平行的情况。Optionally, each 3D initial frame in the present disclosure corresponds to a 3D extension frame, and the space range occupied by the 3D extension frame generally completely covers the space range occupied by the 3D initial frame. Under normal circumstances, any surface of the 3D initial frame is not in the same plane as any surface of its corresponding 3D expansion frame, the center point of the 3D initial frame and the center point of the 3D expansion frame coincide with each other, and any surface of the 3D initial frame Both are parallel to the corresponding faces of their corresponding 3D expansion boxes. Of course, the present disclosure does not exclude the case that although the two center points do not coincide, any face of the 3D initial frame is parallel to the corresponding face of the corresponding 3D extension frame.
可选的,本公开可以根据预先设定的X轴方向增量(如20厘米)、Y轴方向增量(如20厘米)以及Z轴方向增量(如20厘米)中的至少一个,对前景点的3D初始框,进行3D空间扩展,从而形成两个中心点相互重合、且面相互平行的包含有3D初始框的3D扩展框。Optionally, the present disclosure may be based on at least one of a preset X-axis direction increment (such as 20 cm), a Y-axis direction increment (such as 20 cm), and a Z-axis direction increment (such as 20 cm). The 3D initial frame of the front view point is expanded in 3D space, so as to form a 3D expansion frame including two 3D initial frames whose center points coincide with each other and the planes are parallel to each other.
可选的,假定第i个3D初始框b i可以表示为:b i=(x i,y i,z i,h i,w i,l ii),其中,x i、y i和z i分别表示第i个3D初始框的中心点的坐标,h i、w i和l i分别表示第i个3D初始框的高宽长,θ i表示第i个3D初始框的方向,例如,在鸟瞰图中,第i个3D初始框的长与X坐标轴的夹角为θ i;那么,第i个3D初始框对应的3D扩展框
Figure PCTCN2019118126-appb-000008
可以表示为:
Alternatively, assume that the i-th 3D initial frame b i can be expressed as: b i = (x i , y i , z i , h i , w i , l i , θ i ), where x i , y i And z i denote the coordinates of the center point of the i-th 3D initial frame, h i , w i and l i denote the height, width and length of the i-th 3D initial frame, and θ i represents the direction of the i-th 3D initial frame, For example, in a bird's eye view, the angle between the length of the i-th 3D initial frame and the X coordinate axis is θ i ; then, the 3D expansion frame corresponding to the i-th 3D initial frame
Figure PCTCN2019118126-appb-000008
It can be expressed as:
Figure PCTCN2019118126-appb-000009
Figure PCTCN2019118126-appb-000009
其中,η表示增量。Among them, η represents increment.
可选的,本公开中的局部空间通常是指:3D扩展框形成的空间范围。点的局部空间语义特征通常是指,在考虑3D扩展框形成的空间范围中的所有点的情况下,为该点形成的语义特征向量。一个局部空间语义特征同样可以表现为:包括多个(如256个)元素的一维向量数组的形式。Optionally, the local space in the present disclosure generally refers to: the spatial range formed by the 3D expansion frame. The local spatial semantic feature of a point generally refers to a semantic feature vector formed for that point when considering all the points in the spatial range formed by the 3D extension box. A local spatial semantic feature can also be expressed in the form of a one-dimensional vector array including multiple (eg, 256) elements.
可选的,本公开获取3D扩展框中的多个点的全局语义特征的方式可以为:首先,根据点云数据中的各点坐标信息,判断各点是否属于3D扩展框的空间范围(即是否位于3D扩展框内,可以包括:位于3D扩展面的任一表面),针对一个点而言,如果该点所在的位置属于3D扩展框的空间范围,则可以将该点作为属于3D扩展框的点;如果该点所在的位置不属于3D扩展框的空间范围,则不会将该点作为属于3D扩展框的点。然后,根据点云数据中的多个点(如所有点)的全局语义特征,确定属于3D扩展框的所有点的全局语义特征。可选的,本公开在判定一个点属于3D扩展框的情况下,可以从前述获得的各点的全局语义特征中,查找到该点的全局语义特征,以此类推,本公开可以获得属于3D扩展框的所有点的全局语义特征。Optionally, the method for obtaining the global semantic features of multiple points in the 3D extension frame in the present disclosure may be: first, according to the coordinate information of each point in the point cloud data, determine whether each point belongs to the spatial range of the 3D extension frame (ie Whether it is located in the 3D extension frame, may include: located on any surface of the 3D extension surface), for a point, if the position of the point belongs to the spatial range of the 3D extension frame, the point can be regarded as belonging to the 3D extension frame Point; if the position of the point does not belong to the spatial range of the 3D extension box, the point will not be regarded as a point belonging to the 3D extension box. Then, according to the global semantic features of multiple points (such as all points) in the point cloud data, the global semantic features of all points belonging to the 3D extension frame are determined. Optionally, in the case of determining that a point belongs to a 3D extension box, the global semantic feature of the point can be found from the global semantic features of each point obtained by the foregoing, and so on, the present disclosure can obtain 3D Global semantic features of all points of the expansion box.
步骤B2、将位于3D扩展框内的点云数据提供给神经网络中的第四神经网络,经由第四神经网络生成3D扩展框中的点的局部空间语义特征。Step B2: The point cloud data located in the 3D extension box is provided to the fourth neural network in the neural network, and the local spatial semantic features of the points in the 3D extension box are generated via the fourth neural network.
可选的,本公开获取3D扩展框中的所有点的局部空间语义特征的方式可以包括下述的步骤a和b:Optionally, the method for obtaining the local spatial semantic features of all points in the 3D extension frame in the present disclosure may include the following steps a and b:
a、先根据3D扩展框的预设目标位置,对位于3D扩展框内的点云数据的坐标信息进行坐标变换,使位于3D扩展框内的各点的坐标发生位移,从而使3D扩展框进行位移和旋转(3D扩展框的方向调整),进而变换到该3D扩展框的预设目标位置。可选的,3D扩展框的预设目标位置可以包括:3D扩展框的中心点(即3D初始框的中心点)位于坐标原点,且3D扩展框的长与X轴平行等。可选的,上述坐标原点和X轴可以是点云数据的坐标系的坐标原点和X轴,当然,也可以是其他坐标系的坐标原点和X轴。a. First, according to the preset target position of the 3D extension frame, coordinate transformation is performed on the coordinate information of the point cloud data located in the 3D extension frame, so that the coordinates of the points located in the 3D extension frame are displaced, so that the 3D extension frame performs Displacement and rotation (direction adjustment of the 3D expansion frame), and then transform to the preset target position of the 3D expansion frame. Optionally, the preset target position of the 3D extension frame may include: the center point of the 3D extension frame (that is, the center point of the 3D initial frame) is located at the origin of coordinates, and the length of the 3D extension frame is parallel to the X axis. Optionally, the above coordinate origin and X axis may be the coordinate origin and X axis of the coordinate system of the point cloud data, and of course, may also be the coordinate origin and X axis of other coordinate systems.
续前例,假定第i个3D初始框b i可以表示为:b i=(x i,y i,z i,h i,w i,l ii),其中,x i、y i和z i分别表示第i个3D初始框的中心点的坐标,h i、w i和l i分别表示第i个3D初始框的高宽长,θ i表示第i个3D初始框的方向,例如,在鸟瞰图中,第i个3D初始框的长与X坐标轴的夹角为θ i;那么,在对包含有第i个3D初始框的3D扩展框,进行坐标变换后,本公开获得了一个新的3D初始框
Figure PCTCN2019118126-appb-000010
该新的3D初始框
Figure PCTCN2019118126-appb-000011
可以表示为:
Figure PCTCN2019118126-appb-000012
Continuing the previous example, assume that the i-th 3D initial frame b i can be expressed as: b i = (x i , y i , z i , h i , w i , l i , θ i ), where x i , y i and Z i represent the coordinates of the center point of the i-th frame of the original 3D, h i, w i L i and a high width to length respectively represent the i-th original frame 3D, [theta] i represents the i-th frame of the original 3D directions, e.g. In the bird's eye view, the angle between the length of the i-th 3D initial frame and the X coordinate axis is θ i ; then, after performing coordinate transformation on the 3D extended frame that contains the i-th 3D initial frame, the present disclosure obtains A new 3D initial box
Figure PCTCN2019118126-appb-000010
The new 3D initial box
Figure PCTCN2019118126-appb-000011
It can be expressed as:
Figure PCTCN2019118126-appb-000012
也就是说,该新的3D初始框
Figure PCTCN2019118126-appb-000013
的中心点位于坐标原点,且在鸟瞰图中,该新的3D初始框
Figure PCTCN2019118126-appb-000014
的长与X坐标轴之间的夹角为0。
In other words, the new 3D initial box
Figure PCTCN2019118126-appb-000013
The center point of is located at the origin of coordinates, and in a bird’s eye view, the new 3D initial frame
Figure PCTCN2019118126-appb-000014
The angle between the length of and the X coordinate axis is 0.
b、将坐标转换后的点云数据(即坐标转换后的位于3D扩展框内的点云数据),提供给神经网络中的第四神经网络,由第四神经网络对接收到的点进行特征提取处理,并基于其提取出的局部特征信息进行语义分割处理,从而为位于3D扩展框内的各点分别生成局部空间语义特征。b. The coordinate-converted point cloud data (that is, the coordinate-converted point cloud data in the 3D expansion box) is provided to the fourth neural network in the neural network, and the fourth neural network characterizes the received points Extraction processing, and semantic segmentation processing based on the extracted local feature information, so as to generate local spatial semantic features for each point located in the 3D expansion box.
可选的,本公开还可以根据第二神经网络输出的置信度,形成前景点掩膜(如将置信度超过预定值(如0.5等)的点设置为1,而将置信度未超过预定值的点设置为0)。本公开可以将前景点掩膜和坐标转换后的点云数据一起,提供给第四神经网络,使第四神经网络在进行特征提取以及语义处理时,可以参考前景点掩膜,从而有利于提高局部空间语义特征的描述准确性。Optionally, according to the confidence output by the second neural network, the present disclosure can also form a mask of the front sight (such as setting the point where the confidence exceeds a predetermined value (such as 0.5, etc.) to 1, while the confidence does not exceed the predetermined value Is set to 0). The present disclosure can provide the front sight mask together with the point cloud data after coordinate conversion to the fourth neural network, so that the fourth neural network can refer to the front sight mask when performing feature extraction and semantic processing, thereby facilitating improvement Description accuracy of local spatial semantic features.
可选的,本公开中的第四神经网络可以采用MLP来实现,且用于实现第四神经网络的MLP的输出维度通常为多维度,维度的数量与局部空间语义特征所包括的信息有关。Optionally, the fourth neural network in the present disclosure may be implemented using MLP, and the output dimensions of the MLP used to implement the fourth neural network are generally multi-dimensional, and the number of dimensions is related to the information included in the local spatial semantic features.
步骤C2、经由神经网络中的第五神经网络,对3D扩展框中的点的全局语义特征和局部空间语义特征,进行编码处理,获得用于描述该3D扩展框中的3D初始框的特征,并经由神经网络中的第六神经网络根据用于描述3D初始框的特征,预测3D初始框为目标对象的置信度,经由神经网络中的第七神经网络根据用于描述3D初始框的特征,对3D初始框进行校正,从而有利于提高3D初始框的准确性,进而有利于提高3D检测框的准确性。Step C2: Through the fifth neural network in the neural network, encode the global semantic features and local spatial semantic features of the points in the 3D extension box to obtain the features describing the 3D initial box in the 3D extension box, And through the sixth neural network in the neural network to predict the confidence of the 3D initial frame according to the characteristics of the 3D initial frame, the seventh neural network in the neural network according to the characteristics of the 3D initial frame, Correcting the 3D initial frame is beneficial to improve the accuracy of the 3D initial frame, and thus to improve the accuracy of the 3D detection frame.
可选的,本公开中的第五神经网络可以采用Point Cloud Encoder(点云数据编码器)来实现,可选的,第五神经网络可以采用PointNet++或者Pointsift网络模型等的部分网络结构。本公开中的第六神经网络可以采用MLP来实现,且用于实现第六神经网络的MLP的输出维度可以为1,维度的数量可以与目标对象的种类数量有关。本公开中的第七神经网络也可以采用MLP来实现,且用于实现第七神经网络的MLP的输出维度为多维度,维度的数量与3D检测框信息所包括的信息有关。本公开中的第一神经网络至第七神经网络均可以由能够独立存在的神经网络实现,也可以由一个神经网络中的不能独立存在的部分内容来实现。Optionally, the fifth neural network in the present disclosure may be implemented using Point Cloud Encoder (point cloud data encoder). Alternatively, the fifth neural network may adopt a partial network structure such as PointNet++ or Pointsift network model. The sixth neural network in the present disclosure may be implemented using MLP, and the output dimension of the MLP used to implement the sixth neural network may be 1, and the number of dimensions may be related to the number of types of target objects. The seventh neural network in the present disclosure may also be implemented using MLP, and the output dimensions of the MLP used to implement the seventh neural network are multi-dimensional, and the number of dimensions is related to the information included in the 3D detection frame information. The first neural network to the seventh neural network in the present disclosure may all be implemented by a neural network that can exist independently, or by a part of a neural network that cannot exist independently.
可选的,本公开可以将3D扩展框中的每一个点的全局语义特征和局部空间语义特征进行拼接,例如,对于3D扩展框中的任一个点而言,将该点的全局语义特征和局部空间语义特征拼接在一起,形成拼接后的语义特征,将各点拼接后的语义特征作为输入,提供给第五神经网络,以便于第五神经网络对拼接后的语义特征进行编码处理,第五神经网络输出编码处理后的用于描述该3D扩展框中的3D初始框的特征(下述简称为编码处理后的特征)。Optionally, in the present disclosure, the global semantic feature and local spatial semantic feature of each point in the 3D extension box can be stitched. For example, for any point in the 3D extension box, the global semantic feature and the point The local spatial semantic features are stitched together to form the stitched semantic features. The stitched semantic features are taken as input and provided to the fifth neural network, so that the fifth neural network can encode the stitched semantic features. Five neural networks output features after encoding processing to describe the features of the 3D initial frame in the 3D expansion frame (hereinafter referred to as features after encoding processing).
可选的,第五神经网络输出的编码处理后的特征,被同时提供给第六神经网络和第七神经网络(如图5所示)。第六神经网络用于,针对输入的每一个编码处理后的特征,分别预测该3D初始框为目标对象的置信度,并针对每一个3D初始框,分别输出置信度。第六神经网络预测的置信度可以表示校正后的3D初始框为目标对象的概率。这里的目标对象可以为车辆或者行人等。第七神经网络用于,针对输入的每一个编码处理后的特征,分别形成一个新的3D初始框(即校正后的3D初始框),并输出。例如,第七神经网络根据输入的每一个的编码处理后的特征,分别输出新的3D初始框的中心点位置信息、新的3D初始框的长宽高信息以及新的3D初始框的方向信息等。Optionally, the encoded features output by the fifth neural network are simultaneously provided to the sixth neural network and the seventh neural network (as shown in FIG. 5). The sixth neural network is used to predict the confidence level of the 3D initial frame as the target object for each encoded feature of the input, and output the confidence level for each 3D initial frame. The confidence predicted by the sixth neural network may represent the probability that the corrected 3D initial frame is the target object. The target object here may be a vehicle or a pedestrian. The seventh neural network is used to form a new 3D initial frame (that is, the corrected 3D initial frame) for each input feature after encoding processing, and output it. For example, the seventh neural network outputs the position information of the center point of the new 3D initial frame, the length, width, and height information of the new 3D initial frame, and the direction information of the new 3D initial frame, respectively, according to the encoded features of each input Wait.
针对本公开需要特别说明的是,本公开的神经网络的实现方式有多种,一种实现方式如图3所示;另一种实现方式如图4所示;又一种实现方式如图3和图5的组合,再一种实现方式如图4和图5的组合。在此不再针对各实现方式进行逐一详细说明。It should be noted that there are many ways to implement the neural network of the present disclosure. One way is shown in FIG. 3; another way is shown in FIG. 4; another way is shown in FIG. 3. With the combination of FIG. 5, yet another implementation manner is as shown in the combination of FIG. 4 and FIG. 5. The detailed description of each implementation will not be made here.
在一个可选示例中,本公开的神经网络是:利用带有3D标注框的多个点云数据样本训练获得的。例如,本公开可以获取待训练的神经网络生成的置信度对应的损失,并获取待训练的神经网络针对点云数据样本生成的3D初始框相对于点云数据样本的3D标注框所形成的损失,从而利用这两个损失,对待训练的神经网络的网络参数进行调整,实现神经网络的训练。本公开中的网络参数可以包括但不限于卷积核参数以及权重值等。In an alternative example, the neural network of the present disclosure is obtained by training using multiple point cloud data samples with 3D annotation frames. For example, the present disclosure can obtain the loss corresponding to the confidence generated by the neural network to be trained, and the loss formed by the 3D initial frame generated by the neural network to be trained for the point cloud data sample relative to the 3D annotation frame of the point cloud data sample In order to use these two losses to adjust the network parameters of the neural network to be trained, the neural network can be trained. The network parameters in this disclosure may include, but are not limited to, convolution kernel parameters and weight values.
在本公开的神经网络形成3D检测框的过程只包括一个阶段(即第一阶段神经网络形成3D检测框的过程)的情况下,本公开可以获得第一阶段神经网络生成的置信度对应的损失和3D初始框对应的损失,并利用第一阶段神经网络的两个损失,对第一阶段神经网络(如第一神经网络、第二神经网络和第三神经网络)的网络参数进行调整,且在第一阶段神经网络成功训练完成之后,整个神经网络成功训练完成。In the case where the process of forming a 3D detection frame by the neural network of the present disclosure includes only one stage (that is, the process of forming the 3D detection frame by the first stage neural network), the present disclosure can obtain the loss corresponding to the confidence generated by the first stage neural network The loss corresponding to the 3D initial box, and using the two losses of the first stage neural network, adjust the network parameters of the first stage neural network (such as the first neural network, the second neural network, and the third neural network), and After the successful training of the neural network in the first stage, the entire neural network is successfully trained.
在本公开的神经网络形成3D检测框的过程被划分为两个阶段的情况下,本公开可以对第一阶段神经网络和第二阶段神经网络分别进行训练。例如,先获得第一阶段神经网络生成的置信度对应的损失和3D初始框对应的损失,并利用这两个损失,对第一阶段神经网络的网络参数进行调整。在第一阶段神经网络成功训练完成之后,将第一阶段神经网络输出的前景点对应的3D初始框作为输入,提供给第二阶段神经网络,并获得第二阶段神经网络生成的置信度对应的损失和校正后的3D初始框对应的损 失,并利用第二阶段神经网络的这两个损失,对第二阶段神经网络(如第四神经网络、第五神经网络、第六神经网络和第七神经网络)的网络参数进行调整,在第二阶段神经网络成功训练完成之后,整个神经网络成功训练完成。In the case where the process of forming the 3D detection frame by the neural network of the present disclosure is divided into two stages, the present disclosure can separately train the first stage neural network and the second stage neural network. For example, first obtain the loss corresponding to the confidence generated by the first stage neural network and the loss corresponding to the 3D initial frame, and use these two losses to adjust the network parameters of the first stage neural network. After the successful training of the first stage neural network, the 3D initial frame corresponding to the front sight output by the first stage neural network is input as the input to the second stage neural network, and the corresponding confidence generated by the second stage neural network is obtained Loss and the corresponding loss of the corrected 3D initial box, and using these two losses of the second-stage neural network, the second-stage neural network (such as the fourth neural network, fifth neural network, sixth neural network, and seventh Neural network) network parameters are adjusted. After the successful completion of the second stage neural network training, the entire neural network is successfully trained.
本公开中的第一阶段神经网络生成的置信度对应的损失,可以采用下述公式(1)来表示:The loss corresponding to the confidence generated by the first stage neural network in this disclosure can be expressed by the following formula (1):
L focal(p t)=-α t(1-p t) γlog(p t)       公式(1) L focal (p t )=-α t (1-p t ) γ log(p t ) formula (1)
在上述公式(1)中,在点p为前景点的情况下,p t为前景点p的置信度;在点p不为前景点的情况下,p t为1与前景点p的置信度的差值;α t和γ均为常数,在一个可选示例中,α t=0.25,γ=2。 In the above formula (1), when point p is the front sight, p t is the confidence of the front sight p; when point p is not the front sight, p t is the confidence of 1 and the front sight p The difference between α t and γ are constants. In an optional example, α t =0.25 and γ=2.
本公开中的第一阶段神经网络生成的3D初始框对应的损失,可以采用下述公式(2)来表示:The loss corresponding to the 3D initial frame generated by the first stage neural network in this disclosure can be expressed by the following formula (2):
Figure PCTCN2019118126-appb-000015
Figure PCTCN2019118126-appb-000015
在上述公式(2)中,L reg表示3D检测框的回归损失函数,N pos表示前景点的数量;
Figure PCTCN2019118126-appb-000016
表示针对前景点p生成的3D初始框的桶(bin)损失函数,且
Figure PCTCN2019118126-appb-000017
可以表示为下述公式(3)的形式;
Figure PCTCN2019118126-appb-000018
表示针对前景点p生成的3D初始框的余量损失函数,且
Figure PCTCN2019118126-appb-000019
可以表示为下述公式(4)的形式。
In the above formula (2), L reg represents the regression loss function of the 3D detection frame, and N pos represents the number of front spots;
Figure PCTCN2019118126-appb-000016
Represents the bin loss function of the 3D initial frame generated for the front spot p, and
Figure PCTCN2019118126-appb-000017
It can be expressed in the form of the following formula (3);
Figure PCTCN2019118126-appb-000018
Represents the residual loss function of the 3D initial frame generated for the front spot p, and
Figure PCTCN2019118126-appb-000019
It can be expressed as the following formula (4).
Figure PCTCN2019118126-appb-000020
Figure PCTCN2019118126-appb-000020
在上述公式(3)中,
Figure PCTCN2019118126-appb-000021
表示针对前景点p生成的3D初始框的桶损失函数;x、z和θ分别表示中心点的x坐标、中心点的z坐标和目标对象的方向,且目标对象可以为神经网络生成的3D初始框,也可以为点云数据样本中的3D标注框;F cls(*)表示分类的交叉熵损失函数(Cross-entropy classification loss);
Figure PCTCN2019118126-appb-000022
表示针对前景点p生成的3D初始框的中心点的参数u所在的桶的编号;
Figure PCTCN2019118126-appb-000023
表示点云数据样本中的3D标注框的参数u所在的桶的编号;
Figure PCTCN2019118126-appb-000024
Figure PCTCN2019118126-appb-000025
在参数u为x的情况下,可以表示为下述公式(5)的形式;
Figure PCTCN2019118126-appb-000026
Figure PCTCN2019118126-appb-000027
在参数u为z的情况下,可以表示为下述公式(6)的形式;F reg(*)表示平滑的L1损失函数(Smooth L1 Loss);
Figure PCTCN2019118126-appb-000028
表示针对前景点p生成的3D初始框的参数u在对应桶内的偏移量;
Figure PCTCN2019118126-appb-000029
表示点云数据样本中的3D标注框的参数u在对应桶内的偏移量;
Figure PCTCN2019118126-appb-000030
Figure PCTCN2019118126-appb-000031
在参数u为x或者z的情况下,可以表示为下述公式(7)的形式。
In the above formula (3),
Figure PCTCN2019118126-appb-000021
Represents the barrel loss function of the 3D initial frame generated for the front spot p; x, z, and θ respectively represent the x coordinate of the center point, the z coordinate of the center point, and the direction of the target object, and the target object can be the 3D initial generated by the neural network The frame can also be a 3D labeling frame in the point cloud data sample; F cls (*) represents the cross-entropy classification loss of the classification (Cross-entropy classification loss);
Figure PCTCN2019118126-appb-000022
Represents the number of the bucket where the parameter u of the center point of the 3D initial frame generated for the front spot p is located;
Figure PCTCN2019118126-appb-000023
Represents the number of the bucket where the parameter u of the 3D annotation box in the point cloud data sample is located
Figure PCTCN2019118126-appb-000024
with
Figure PCTCN2019118126-appb-000025
When the parameter u is x, it can be expressed in the form of the following formula (5);
Figure PCTCN2019118126-appb-000026
with
Figure PCTCN2019118126-appb-000027
When the parameter u is z, it can be expressed in the form of the following formula (6); F reg (*) represents a smooth L1 loss function (Smooth L1 Loss);
Figure PCTCN2019118126-appb-000028
Represents the offset of the parameter u of the 3D initial frame generated for the front spot p in the corresponding bucket;
Figure PCTCN2019118126-appb-000029
Represents the offset of the parameter u of the 3D annotation frame in the point cloud data sample in the corresponding bucket;
Figure PCTCN2019118126-appb-000030
with
Figure PCTCN2019118126-appb-000031
When the parameter u is x or z, it can be expressed as the following formula (7)
对于点而言,本公开中的桶可以是指:对点周围的空间范围进行切分,切分出的一个值域范围,称为一个桶,每个桶均可以有其对应的编号,通常情况下,桶的值域范围是固定的,在一个可选示例中,桶的值域范围是长度范围,此时,桶有固定长度;在另一个可选示例中,桶的值域范围是角度范围,此时,桶有固定的角度区间。可选的,对于x方向或者z方向而言,桶的长度可以为0.5m,此时,不同桶的值域范围可以为0-0.5m以及0.5m-1m等。可选的,本公开可以将2π均分为多个个角度区间,一个角度区间对应一个值域范围,此时,桶的大小(即角度区间)可以为度45度或者30度等。For a point, a bucket in the present disclosure may refer to: dividing the spatial range around the point, a range of value ranges, called a bucket, each bucket may have its corresponding number, usually In this case, the range of the bucket is fixed. In an optional example, the range of the bucket is the length. In this case, the bucket has a fixed length. In another optional example, the range of the bucket is Angle range, at this time, the bucket has a fixed angle interval. Optionally, for the x direction or the z direction, the length of the bucket may be 0.5m. At this time, the value range of different buckets may be 0-0.5m and 0.5m-1m. Optionally, the present disclosure can divide 2π into multiple angle intervals, one angle interval corresponds to a range of value ranges. In this case, the size of the bucket (that is, the angle interval) can be 45 degrees or 30 degrees.
Figure PCTCN2019118126-appb-000032
Figure PCTCN2019118126-appb-000032
在上述公式(4)中,
Figure PCTCN2019118126-appb-000033
表示针对前景点p生成的3D初始框的余量损失函数;y、h、w和l分别表示针对前景点p生成的3D初始框的中心点的y坐标、以及针对前景点p生成的3D初始框的高、宽和长;F reg(*)表示平滑的L1损失函数;在参数v为y的情况下,
Figure PCTCN2019118126-appb-000034
表示前景点p的y坐标相对于针对前景点p生成的3D初始框的中心点的y坐标的偏移量,如公式(8)所示;在参数v为h、w或者l的情况下,
Figure PCTCN2019118126-appb-000035
表示针对前景点p生成的3D初始框的高、宽或者长相对于相应的预设参数的偏移量;在参数v为y的情况下,
Figure PCTCN2019118126-appb-000036
表示前景点p的y坐标相对于3D标注框的中心点的y坐标的偏移量,如公式(8)所示;在参数v为h、w或者l的情况下,
Figure PCTCN2019118126-appb-000037
表示3D标注框的高、宽或者长相对于相应的预设参数的偏移量;本公开中的预设参数可以是通过对训练数据中的各点云数据样本中的3D标注框的长、宽和高分别进行统计计算,而获得的长均值、宽均值和高均值。
In the above formula (4),
Figure PCTCN2019118126-appb-000033
Represents the residual loss function of the 3D initial frame generated for the front sight p; y, h, w, and l respectively represent the y coordinate of the center point of the 3D initial frame generated for the front sight p, and the 3D initial generated for the front sight p The height, width and length of the box; F reg (*) represents a smooth L1 loss function; when the parameter v is y,
Figure PCTCN2019118126-appb-000034
Represents the offset of the y coordinate of the front sight p from the y coordinate of the center point of the 3D initial frame generated for the front sight p, as shown in formula (8); when the parameter v is h, w, or l,
Figure PCTCN2019118126-appb-000035
Represents the offset of the height, width or length of the 3D initial frame generated for the front spot p with respect to the corresponding preset parameters; when the parameter v is y,
Figure PCTCN2019118126-appb-000036
The offset of the y coordinate of the front sight p relative to the y coordinate of the center point of the 3D annotation frame, as shown in formula (8); when the parameter v is h, w, or l,
Figure PCTCN2019118126-appb-000037
Represents the offset of the height, width or length of the 3D annotation frame relative to the corresponding preset parameters; the preset parameters in the present disclosure may be the length and width of the 3D annotation frame in each point cloud data sample in the training data Statistical calculations are carried out separately for high and high, while the long, wide and high averages are obtained.
Figure PCTCN2019118126-appb-000038
Figure PCTCN2019118126-appb-000038
Figure PCTCN2019118126-appb-000039
Figure PCTCN2019118126-appb-000039
在上述公式(5)和公式(6)中,
Figure PCTCN2019118126-appb-000040
表示点云数据样本中的3D标注框的中心点在X坐标轴方向的桶的编号;
Figure PCTCN2019118126-appb-000041
表示点云数据样本中的3D标注框的中心点在Z坐标轴方向的桶的编号;(x (p),z (p))表示前景点p的x坐标和z坐标,(x p,z p)表示针对前景点p生成的3D初始框的中心点的x坐标和z坐标;δ表示桶的长度,S表示在x轴或者z轴上的搜寻前景点p的搜索距离。
In the above formula (5) and formula (6),
Figure PCTCN2019118126-appb-000040
Represents the barrel number of the center point of the 3D annotation frame in the point cloud data sample in the direction of the X coordinate axis;
Figure PCTCN2019118126-appb-000041
Represents the barrel number of the center point of the 3D annotation frame in the point cloud data sample in the direction of the Z coordinate axis; (x (p) , z (p) ) represents the x coordinate and z coordinate of the front spot p, (x p , z p ) represents the x coordinate and z coordinate of the center point of the 3D initial frame generated for the front sight p; δ represents the length of the barrel, and S represents the search distance of the front sight p on the x-axis or z-axis.
Figure PCTCN2019118126-appb-000042
Figure PCTCN2019118126-appb-000042
在上述公式(7)中,S表示在x轴或者z轴上的搜寻前景点p的搜索距离,也就是说,在参数u为x的情况下,S表示针对前景点p生成的3D初始框的中心点在x轴方向上距离前景点p的x坐标的距离,而在参数u为z的情况下,S表示针对前景点p生成的3D初始框的中心点在z轴方向上距离前景点p的z坐标的距离;δ表示桶的长度,桶的长度是常数值,例如,δ=0.5m;
Figure PCTCN2019118126-appb-000043
如上述公式(5)和公式(6)所示;C为一常数值,且C可以与桶的长度相关,例如C等于桶的长度或者桶的长度的一半。
In the above formula (7), S represents the search distance of the previous spot p on the x-axis or z-axis, that is, in the case that the parameter u is x, S represents the 3D initial frame generated for the front spot p The distance of the center point of x from the x coordinate of the front sight p in the x-axis direction, and in the case where the parameter u is z, S represents the center point of the 3D initial frame generated for the front sight p from the front sight in the z-axis direction The distance of the z coordinate of p; δ represents the length of the barrel, and the length of the barrel is a constant value, for example, δ=0.5m;
Figure PCTCN2019118126-appb-000043
As shown in the above formula (5) and formula (6); C is a constant value, and C may be related to the length of the bucket, for example, C is equal to the length of the bucket or half the length of the bucket.
Figure PCTCN2019118126-appb-000044
Figure PCTCN2019118126-appb-000044
在上述公式(8)中,
Figure PCTCN2019118126-appb-000045
表示前景点的y坐标在对应桶内的偏移量;y p表示针对前景点p生成的3D初始框的中心点的y坐标;y (p)表示前景点的y坐标。
In the above formula (8),
Figure PCTCN2019118126-appb-000045
Represents the offset of the y coordinate of the front sight in the corresponding bucket; y p represents the y coordinate of the center point of the 3D initial frame generated for the front sight p; y (p) represents the y coordinate of the front sight.
在一个可选示例中,在针对第一至第三神经网络的训练达到预定迭代条件时,本次训练过程结束。 本公开中的预定迭代条件可以包括:第三神经网络输出的3D初始框与点云数据样本的3D标注框之间的差异满足预定差异要求,且第二神经网络输出的置信度满足预定要求。在两者均满足要求的情况下,本次对第一至第三神经网络成功训练完成。本公开中的预定迭代条件也可以包括:对第一至第三神经网络进行训练,所使用的点云数据样本的数量达到预定数量要求等。在使用的点云数据样本的数量达到预定数量要求,然而,两者并未都满足要求的情况下,本次对第一至第三神经网络并未训练成功。In an optional example, when the training for the first to third neural networks reaches a predetermined iteration condition, the training process ends. The predetermined iteration conditions in the present disclosure may include that the difference between the 3D initial frame output by the third neural network and the 3D annotation frame of the point cloud data sample meets the predetermined difference requirement, and the confidence level output by the second neural network meets the predetermined requirement. In the case that both meet the requirements, this time the first to third neural networks are successfully trained. The predetermined iteration conditions in the present disclosure may also include: training the first to third neural networks, the number of point cloud data samples used meets the predetermined number requirements, and so on. When the number of point cloud data samples used reaches the predetermined number requirement, however, both of them do not meet the requirements, this time the first to third neural networks are not successfully trained.
可选的,在本公开的神经网络形成3D检测框的过程包括一个阶段的情况下,成功训练完成的第一至第三神经网络可以用于目标对象的3D检测。Optionally, in a case where the process of forming the 3D detection frame by the neural network of the present disclosure includes one stage, the first to third neural networks that have been successfully trained can be used for 3D detection of the target object.
可选的,在本公开的神经网络形成3D检测框的过程包括两个阶段的情况下,成功训练完成的第一至第三神经网络也可以用于针对点云数据样本产生前景点对应的3D初始框,即本公开可以再次将点云数据样本提供给成功训练后的第一神经网络,并将第二神经网络和第三神经网络输出的信息分别存储起来,以便于为第二阶段神经网络提供输入(即前景点对应的3D初始框);之后,获得第二阶段生成的置信度对应的损失以及校正后的3D初始框对应的损失,并利用获得的损失,调整第四神经网络至第七神经网络的网络参数,且在第四至第七神经网络成功训练完成之后,整个神经网络成功训练完成。Optionally, in the case where the process of forming a 3D detection frame by the neural network of the present disclosure includes two stages, the first to third neural networks that have been successfully trained can also be used to generate 3D corresponding to the front sight for the point cloud data sample The initial frame, that is, the present disclosure can again provide point cloud data samples to the successfully trained first neural network, and store the information output by the second neural network and the third neural network separately, so as to facilitate the second-stage neural network Provide input (that is, the 3D initial frame corresponding to the front sight); after that, obtain the loss corresponding to the confidence generated in the second stage and the loss corresponding to the corrected 3D initial frame, and use the obtained loss to adjust the fourth neural network to the third Seven neural network parameters, and after the fourth to seventh neural networks are successfully trained, the entire neural network is successfully trained.
本公开中的对第二阶段神经网络中的第四至第七神经网络的网络参数进行调整所使用的,包含有置信度对应的损失和校正后的3D初始框对应的损失的损失函数,可以采用下述公式(9)来表示:The loss function used for adjusting the network parameters of the fourth to seventh neural networks in the second-stage neural network in the present disclosure includes the loss corresponding to the confidence and the loss corresponding to the corrected 3D initial frame. Use the following formula (9) to express:
Figure PCTCN2019118126-appb-000046
Figure PCTCN2019118126-appb-000046
在上述公式(9)中,B表示3D初始框集合;||B||表示3D初始框集合中的3D初始框的数量;F cls(*)表示用于对预测的置信度进行监督的交叉熵损失函数,即F cls(*)为基于分类的交叉熵损失函数;prob i表示第六神经网络预测出的校正后的第i个3D初始框为目标对象的置信度;label i表示第i个3D初始框是否为目标对象的标签,该标签可以通过计算获得,例如,在第i个3D初始框与相应的3D标注框的重叠度超过设定阈值时,该标签的取值为1,否则,该标签的取值为0;B pos是B的一个子集,且B pos中的3D初始框与相应的3D标注框的重叠度超过设定阈值;||B pos||表示该子集中的3D初始框的数量;
Figure PCTCN2019118126-appb-000047
与上述
Figure PCTCN2019118126-appb-000048
相似,
Figure PCTCN2019118126-appb-000049
与上述
Figure PCTCN2019118126-appb-000050
相似,只是在利用了
Figure PCTCN2019118126-appb-000051
(替换公式中的第i个3D初始框b i)和
Figure PCTCN2019118126-appb-000052
(替换公式中的第i个3D标注框信息),
Figure PCTCN2019118126-appb-000053
Figure PCTCN2019118126-appb-000054
可以表示为下述公式(10)的形式:
In the above formula (9), B represents the 3D initial box set; ||B|| represents the number of 3D initial boxes in the 3D initial box set; F cls (*) represents the crossover used to supervise the confidence of the prediction Entropy loss function, that is, F cls (*) is the cross-entropy loss function based on classification; prob i represents the corrected i-th 3D initial frame predicted by the sixth neural network is the confidence of the target object; label i represents the i Whether the 3D initial frame is the label of the target object, the label can be obtained by calculation, for example, when the overlap degree between the i-th 3D initial frame and the corresponding 3D label frame exceeds the set threshold, the value of the label is 1, Otherwise, the value of the label is 0; B pos is a subset of B, and the overlap between the 3D initial box in B pos and the corresponding 3D label box exceeds the set threshold; ||B pos || means the sub The number of concentrated 3D initial frames;
Figure PCTCN2019118126-appb-000047
With the above
Figure PCTCN2019118126-appb-000048
similar,
Figure PCTCN2019118126-appb-000049
With the above
Figure PCTCN2019118126-appb-000050
Similar, just using
Figure PCTCN2019118126-appb-000051
(Replace the ith 3D initial box b i in the formula) and
Figure PCTCN2019118126-appb-000052
(Replace the i-th 3D callout box information in the formula),
Figure PCTCN2019118126-appb-000053
with
Figure PCTCN2019118126-appb-000054
It can be expressed in the form of the following formula (10):
Figure PCTCN2019118126-appb-000055
Figure PCTCN2019118126-appb-000055
在上述公式(10)中,
Figure PCTCN2019118126-appb-000056
为第i个3D标注框信息;
Figure PCTCN2019118126-appb-000057
表示坐标转换后的第i个3D标注框信息;(xi,yi,zi,hi,wi,li,θi)为校正后的第i个3D初始框,
Figure PCTCN2019118126-appb-000058
表示坐标转换后的第i个3D初始框。
In the above formula (10),
Figure PCTCN2019118126-appb-000056
Annotate frame information for the i-th 3D;
Figure PCTCN2019118126-appb-000057
Represents the information of the i-th 3D labeled frame after coordinate conversion; (xi, yi, zi, hi, wi, li, θi) is the i-th 3D initial frame after correction,
Figure PCTCN2019118126-appb-000058
Indicates the ith 3D initial frame after coordinate conversion.
在计算公式(9)时,需要利用上述公式(3),且公式(3)中的
Figure PCTCN2019118126-appb-000059
Figure PCTCN2019118126-appb-000060
可以替换为下述公式(11)的形式:
When calculating formula (9), you need to use the above formula (3), and the formula (3)
Figure PCTCN2019118126-appb-000059
with
Figure PCTCN2019118126-appb-000060
It can be replaced with the following formula (11):
Figure PCTCN2019118126-appb-000061
Figure PCTCN2019118126-appb-000061
在上述公式(11)中,ω表示桶的大小,即桶的角度区间。In the above formula (11), ω represents the size of the barrel, that is, the angle interval of the barrel.
在计算公式(9)时,需要利用上述公式(3),且公式(3)中的
Figure PCTCN2019118126-appb-000062
Figure PCTCN2019118126-appb-000063
可以替换为下述公式(12)的形式:
When calculating formula (9), you need to use the above formula (3), and the formula (3)
Figure PCTCN2019118126-appb-000062
with
Figure PCTCN2019118126-appb-000063
It can be replaced by the following formula (12):
Figure PCTCN2019118126-appb-000064
Figure PCTCN2019118126-appb-000064
其中,ω表示桶的大小,即桶的角度区间。Among them, ω represents the size of the barrel, that is, the angle interval of the barrel.
在一个可选示例中,在针对第四至第七神经网络的训练达到预定迭代条件时,本次训练过程结束。本公开中的预定迭代条件可以包括:第七神经网络输出的3D初始框与点云数据样本的3D标注框之间的差异满足预定差异要求,且第六神经网络输出的置信度满足预定要求。在两者均满足要求的情况下,本次对第四至第七神经网络成功训练完成。本公开中的预定迭代条件也可以包括:对第四至第七神经网络进行训练,所使用的点云数据样本的数量达到预定数量要求等。在使用的点云数据样本的数量达到预定数量要求,然而,两者并未都满足要求的情况下,本次对第四至第七神经网络并未训练成功。In an optional example, when the training for the fourth to seventh neural networks reaches a predetermined iteration condition, the training process ends. The predetermined iteration conditions in the present disclosure may include: the difference between the 3D initial frame output by the seventh neural network and the 3D annotation frame of the point cloud data sample meets the predetermined difference requirement, and the confidence of the sixth neural network output meets the predetermined requirement. In the case that both meet the requirements, the fourth to seventh neural networks are successfully trained this time. The predetermined iteration conditions in the present disclosure may also include: training the fourth to seventh neural networks, and the number of point cloud data samples used reaches a predetermined number of requirements, etc. When the number of point cloud data samples used reaches the predetermined number requirement, however, both of them do not meet the requirements, this time the fourth to seventh neural networks are not successfully trained.
图6为本公开的车辆智能控制方法的一个实施例的流程图。6 is a flowchart of an embodiment of a vehicle intelligent control method of the present disclosure.
如图6所示,该实施例方法包括步骤:S600、S610、S620、S630、S640以及S650。下面对图6中的各步骤分别进行详细说明。As shown in FIG. 6, the method of this embodiment includes steps: S600, S610, S620, S630, S640, and S650. Each step in FIG. 6 will be described in detail below.
S600、提取获取到的场景的点云数据的特征信息。S600. Extract feature information of the acquired point cloud data of the scene.
S610、根据点云数据的特征信息对点云数据进行语义分割,获得点云数据中的多个点的第一语义信息。S610. Perform semantic segmentation on the point cloud data according to the feature information of the point cloud data to obtain first semantic information of multiple points in the point cloud data.
S620、根据第一语义信息预测多个点中对应目标对象的至少一个前景点。S620. Predict at least one front scenic spot corresponding to the target object among the multiple points according to the first semantic information.
S630、根据第一语义信息生成至少一个前景点各自对应的3D初始框。S630. Generate a 3D initial frame corresponding to each of the at least one front sight according to the first semantic information.
S640、根据3D初始框确定场景中的目标对象的3D检测框。S640. Determine the 3D detection frame of the target object in the scene according to the 3D initial frame.
上述S600-S640的具体实现过程可以参见上述实施方式中的相关描述,在此不再重复说明。而且,上述S600-S640的实现方式可以为:将点云数据提供给神经网络,经由该神经网络对点云数据中的点进行特征信息提取处理,根据提取出的特征信息进行语义分割处理,获得多个点的语义特征,根据语义特征,预测多个点中的前景点,并生成多个点中的至少部分点各自对应的3D初始框。For the specific implementation process of the foregoing S600-S640, reference may be made to the relevant description in the above-mentioned embodiment, and the description will not be repeated here. Moreover, the above S600-S640 can be implemented by providing point cloud data to a neural network, performing feature information extraction processing on points in the point cloud data via the neural network, and performing semantic segmentation processing based on the extracted feature information to obtain The semantic features of multiple points, according to the semantic features, predict the front points of the multiple points, and generate a 3D initial frame corresponding to at least some of the multiple points.
S650、根据上述3D检测框,生成对车辆进行控制的指令或者预警提示信息。S650. According to the above 3D detection frame, generate a command or warning prompt information for controlling the vehicle.
可选的,本公开可以先根据3D检测框,确定目标对象的以下至少之一信息:目标对象在场景中的空间位置、大小、与车辆的距离、与车辆的相对方位信息。然后,再根据确定的至少之一信息,生成对车辆进行控制的指令或者预警提示信息。本公开生成的指令如提高时速的指令、降低时速的指令或者急刹车指令等。生成的预警提示信息如注意某个方位的车辆或者行人等目标对象的提示信息等。本公开不限制根据3D检测框产生指令或者预警提示信息的具体实现方式。Optionally, the present disclosure may first determine at least one of the following information of the target object according to the 3D detection frame: the spatial position, size, distance to the vehicle, and relative orientation information of the target object in the scene. Then, according to the determined at least one piece of information, an instruction or early warning message for controlling the vehicle is generated. The instructions generated by the present disclosure are, for example, an instruction to increase the speed, an instruction to decrease the speed, or an emergency braking instruction. The generated warning prompt information such as the attention information of a target object such as a vehicle or pedestrian paying attention to a certain position, etc. The present disclosure does not limit the specific implementation of generating instructions or warning prompt information according to the 3D detection frame.
图7为本公开的避障导航方法的一个实施例的流程图。7 is a flowchart of an embodiment of an obstacle avoidance navigation method of the present disclosure.
如图7所示,该实施例方法包括步骤:S700、S710、S720、S730、S740以及S750。下面对图7中的各步骤分别进行详细说明。As shown in FIG. 7, the method in this embodiment includes steps: S700, S710, S720, S730, S740, and S750. Next, each step in FIG. 7 will be described in detail.
S700、提取获取到的场景的点云数据的特征信息。S700. Extract feature information of the acquired point cloud data of the scene.
S710、根据点云数据的特征信息对点云数据进行语义分割,获得点云数据中的多个点的第一语义信息。S710. Perform semantic segmentation on the point cloud data according to the feature information of the point cloud data to obtain first semantic information of multiple points in the point cloud data.
S720、根据第一语义信息预测多个点中对应目标对象的至少一个前景点。S720. Predict at least one front scenic spot corresponding to the target object among multiple points according to the first semantic information.
S730、根据第一语义信息生成至少一个前景点各自对应的3D初始框。S730. Generate a 3D initial frame corresponding to each of the at least one front sight according to the first semantic information.
S740、根据3D初始框确定场景中的目标对象的3D检测框。S740. Determine the 3D detection frame of the target object in the scene according to the 3D initial frame.
上述S700-S740的具体实现过程可以参见上述实施方式中的相关描述,在此不再重复说明。而且, 上述S700-S740的实现方式可以为:将点云数据提供给神经网络,经由该神经网络对点云数据中的点进行特征信息提取处理,根据提取出的特征信息进行语义分割处理,获得多个点的语义特征,根据语义特征,预测多个点中的前景点,并生成多个点中的至少部分点各自对应的3D初始框。For the specific implementation process of the foregoing S700-S740, reference may be made to the relevant description in the above-mentioned embodiment, and the description will not be repeated here. Moreover, the above S700-S740 may be implemented by providing point cloud data to a neural network, performing feature information extraction processing on points in the point cloud data via the neural network, and performing semantic segmentation processing based on the extracted feature information to obtain The semantic features of multiple points, according to the semantic features, predict the front points of the multiple points, and generate a 3D initial frame corresponding to at least some of the multiple points.
S750、根据上述3D检测框,生成对激光雷达所在机器人进行避障导航控制的指令或者预警提示信息。S750: According to the above 3D detection frame, generate an instruction or warning prompt information for performing obstacle avoidance navigation control on the robot where the lidar is located.
可选的,本公开可以先根据3D检测框,确定目标对象的以下至少之一信息:目标对象在场景中的空间位置、大小、与机器人的距离、与机器人的相对方位信息。然后,再根据确定的至少之一信息,生成对机器人进行避障导航控制的指令或者预警提示信息。本公开生成的指令如降低行动速度的指令或者暂停行动的指令或者转弯指令等。生成的预警提示信息如注意某个方位的障碍物(即目标对象)的提示信息等。本公开不限制根据3D检测框产生指令或者预警提示信息的具体实现方式。Optionally, the present disclosure may first determine at least one of the following information of the target object according to the 3D detection frame: the spatial position, size, distance to the robot, and relative orientation information of the target object in the scene. Then, according to the determined at least one piece of information, an instruction or early warning prompt information for performing obstacle avoidance navigation control on the robot is generated. The instructions generated by the present disclosure are, for example, an instruction to reduce the speed of an action, an instruction to suspend an action, or a turn instruction. The generated early warning prompt information such as the prompt information of paying attention to an obstacle (ie, target object) in a certain direction, etc. The present disclosure does not limit the specific implementation of generating instructions or warning prompt information according to the 3D detection frame.
图8为本公开的目标对象3D检测装置一个实施例的结构示意图。图8所示的装置包括:提取特征模块800、第一语义分割模块810、预测前景点模块820、生成初始框模块830以及确定检测框模块840。FIG. 8 is a schematic structural diagram of an embodiment of a target object 3D detection device of the present disclosure. The device shown in FIG. 8 includes: a feature extraction module 800, a first semantic segmentation module 810, a pre-spot prediction module 820, a generation initial frame module 830, and a determination detection frame module 840.
提取特征模块800主要用于提取获取到的场景的点云数据的特征信息。第一语义分割模块810主要用于根据点云数据的特征信息对点云数据进行语义分割处理,获得点云数据中的多个点的第一语义信息。预测前景点模块820主要用于根据第一语义信息预测多个点中对应目标对象的至少一个前景点。生成初始框模块830主要用于根据第一语义信息生成至少一个前景点各自对应的3D初始框。确定检测框模块840主要用于根据3D初始框确定场景中的目标对象的3D检测框。The feature extraction module 800 is mainly used to extract feature information of point cloud data of the acquired scene. The first semantic segmentation module 810 is mainly used to perform semantic segmentation processing on the point cloud data according to the feature information of the point cloud data to obtain first semantic information of multiple points in the point cloud data. The predicted front sight module 820 is mainly used to predict at least one front sight corresponding to the target object among the multiple points according to the first semantic information. The generating initial frame module 830 is mainly used to generate a 3D initial frame corresponding to at least one front sight according to the first semantic information. The detection frame determination module 840 is mainly used to determine the 3D detection frame of the target object in the scene according to the 3D initial frame.
在一个可选示例中,确定检测框模块840可以包括:第一子模块、第二子模块和第三子模块。其中的第一子模块主要用于获取点云数据中的部分区域内的点的特征信息,其中,部分区域至少包括一所述3D初始框。其中的第二子模块主要用于根据部分区域内的点的特征信息对部分区域内的点进行语义分割,获得部分区域内的点的第二语义信息。第三子模块主要用于根据部分区域内的点的第一语义信息和第二语义信息,确定场景中目标对象的3D检测框。In an optional example, the determination detection block module 840 may include: a first submodule, a second submodule, and a third submodule. The first sub-module is mainly used to obtain characteristic information of points in a partial area in the point cloud data, where the partial area includes at least one of the 3D initial frames. The second sub-module is mainly used for semantically segmenting the points in the partial area according to the feature information of the points in the partial area to obtain second semantic information of the points in the partial area. The third sub-module is mainly used to determine the 3D detection frame of the target object in the scene according to the first semantic information and the second semantic information of the points in the partial area.
在一个可选示例中,本公开中的第三子模块可以包括:第四子模块和第五子模块。其中的第四子模块主要用于根据部分区域内的点的第一语义信息和第二语义信息,校正3D初始框,得到校正后的3D初始框。其中的第五子模块主要用于根据校正后的3D初始框确定场景中目标对象的3D检测框。In an optional example, the third submodule in the present disclosure may include: a fourth submodule and a fifth submodule. The fourth sub-module is mainly used to correct the 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area to obtain the corrected 3D initial frame. The fifth sub-module is mainly used to determine the 3D detection frame of the target object in the scene according to the corrected 3D initial frame.
在一个可选示例中,本公开中的第三子模块可以进一步用于根据部分区域内的点的第一语义信息和第二语义信息,确定3D初始框对应目标对象的置信度,并根据3D初始框及其置信度确定场景中目标对象的3D检测框。In an optional example, the third submodule in the present disclosure may be further used to determine the confidence of the target object corresponding to the 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area, and according to the 3D The initial frame and its confidence determine the 3D detection frame of the target object in the scene.
在一个可选示例中,本公开中的第三子模块可以包括:第四子模块、第六子模块和第七子模块。其中的第四子模块主要用于根据部分区域内的点的第一语义信息和第二语义信息校正3D初始框,得到校正后的3D初始框。其中的第六子模块主要用于根据部分区域内的点的第一语义信息和第二语义信息确定校正后的3D初始框对应目标对象的置信度。其中的第七子模块主要用于根据校正后的3D初始框及其置信度确定场景中目标对象的3D检测框。In an optional example, the third submodule in the present disclosure may include: a fourth submodule, a sixth submodule, and a seventh submodule. The fourth sub-module is mainly used to correct the 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area to obtain the corrected 3D initial frame. The sixth sub-module is mainly used to determine the confidence of the target object corresponding to the corrected 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area. The seventh sub-module is mainly used to determine the 3D detection frame of the target object in the scene according to the corrected 3D initial frame and its confidence.
在一个可选示例中,本公开中的部分区域包括:根据预定策略对3D初始框进行边缘扩展,得到的3D扩展框。例如,3D扩展框可以为:根据预先设定的X轴方向增量、Y轴方向增量和/或Z轴方向增量,对所述3D初始框进行3D空间扩展,形成包含有3D初始框的3D扩展框。In an optional example, some areas in the present disclosure include: the 3D expansion frame obtained by edge-expanding the 3D initial frame according to a predetermined strategy. For example, the 3D expansion frame may be: according to a preset X-axis direction increment, Y-axis direction increment and/or Z-axis direction increment, the 3D initial frame is expanded in 3D space to form a 3D initial frame 3D expansion box.
在一个可选示例中,本公开中的第二子模块可以包括:第八子模块和第九子模块。其中的第八子模块主要用于根据3D扩展框的预设目标位置,对点云数据中位于3D扩展框内的点的坐标信息进行坐标变换,获取坐标变换后的点的特征信息。其中的第九子模块主要用于根据坐标变换后的点的特征信息,进行基于3D扩展框的语义分割,获得3D扩展框中的点的第二语义特征。可选的,第九子模块可以根据前景点的掩膜以及坐标变换后的点的特征信息,进行基于3D扩展框的语义分割,获得点的第二语义特征。In an optional example, the second submodule in the present disclosure may include: an eighth submodule and a ninth submodule. The eighth sub-module is mainly used to perform coordinate transformation on the coordinate information of the points located in the 3D extension box in the point cloud data according to the preset target position of the 3D extension box to obtain the feature information of the point after the coordinate transformation. The ninth sub-module is mainly used to perform semantic segmentation based on the 3D extension box according to the feature information of the coordinate-transformed point, to obtain the second semantic feature of the point in the 3D extension box. Optionally, the ninth sub-module may perform semantic segmentation based on the 3D extension frame according to the mask of the front sight and the feature information of the point after coordinate transformation to obtain the second semantic feature of the point.
在一个可选示例中,在前景点为多个的情况下,本公开中的确定检测框模块840可以先确定多个前景点对应的3D初始框之间的重叠度,然后,确定检测框模块840对重叠度大于设定阈值的3D初始框进行筛选;再后,确定检测框模块840根据筛选后的3D初始框确定场景中的目标对象的3D检测框。In an optional example, in the case where there are multiple front sights, the determination detection frame module 840 in the present disclosure may first determine the degree of overlap between the 3D initial frames corresponding to the multiple front sights, and then determine the detection frame module 840 screens the 3D initial frame whose overlapping degree is greater than the set threshold; then, the detection frame determination module 840 determines the 3D detection frame of the target object in the scene according to the filtered 3D initial frame.
在一个可选示例中,本公开中的提取特征模块800、第一语义分割模块810、预测前景点模块820以及生成初始框模块830,可以由第一阶段神经网络实现。此时,本公开的装置还可以包括第一训练模块。第一训练模块用于,利用带有3D标注框的点云数据样本对待训练的第一阶段神经网络进行训练。In an optional example, the feature extraction module 800, the first semantic segmentation module 810, the pre-spot prediction module 820, and the initial frame generation module 830 in the present disclosure may be implemented by a first-stage neural network. At this time, the device of the present disclosure may further include a first training module. The first training module is used to train the first-stage neural network to be trained using point cloud data samples with 3D annotation frames.
在一个可选示例中,第一训练模块对第一阶段神经网络进行训练的过程,包括:In an optional example, the process of the first training module training the first stage neural network includes:
首先,第一训练模块将点云数据样本提供给第一阶段神经网络,基于第一阶段神经网络提取点云数据样本的特征信息,第一阶段神经网络根据提取的特征信息对点云数据样本进行语义分割处理,第 一阶段神经网络根据语义分割处理获得的多个点的第一语义特征,预测多个点中对应目标对象的至少一个前景点,并根据第一语义信息生成至少一个前景点各自对应的3D初始框。First, the first training module provides the point cloud data samples to the first stage neural network, extracts the feature information of the point cloud data samples based on the first stage neural network, and the first stage neural network performs the point cloud data samples according to the extracted feature information Semantic segmentation processing, the first stage neural network predicts at least one front sight corresponding to the target object among the multiple points according to the first semantic features of the multiple points obtained by the semantic segmentation processing, and generates at least one front sight according to the first semantic information Corresponding 3D initial frame.
其次,第一训练模块获取前景点对应的损失、以及3D初始框相对于相应的3D标注框形成的损失,并根据上述损失,对第一阶段神经网络中的网络参数进行调整。Second, the first training module obtains the loss corresponding to the front sight and the loss formed by the 3D initial frame relative to the corresponding 3D annotation frame, and adjusts the network parameters in the first-stage neural network according to the above loss.
可选的,第一训练模块可以根据第一阶段神经网络预测的前景点的置信度,确定前景点预测结果对应的第一损失。第一训练模块根据针对前景点生成的3D初始框中的参数所在的桶的编号、以及点云数据样本中的3D标注框信息中的参数所在的桶的编号,产生第二损失。第一训练模块根据针对前景点生成的3D初始框中的参数在对应桶内的偏移量、以及点云数据样本中的3D标注框信息中的参数在对应桶内的偏移量,产生第三损失。第一训练模块根据针对前景点生成的3D初始框中的参数相对于预定参数的偏移量,产生第四损失。第一训练模块根据前景点的坐标参数相对于针对该前景点生成的3D初始框中的坐标参数的偏移量产生的第五损失。第一训练模块根据其获得第一损失、第二损失、第三损失、第四损失和第五损失,对第一阶段神经网络的网络参数进行调整。Optionally, the first training module may determine the first loss corresponding to the prediction result of the front sight according to the confidence of the front sight predicted by the neural network in the first stage. The first training module generates a second loss according to the number of the bucket in which the parameters in the 3D initial frame generated for the front sight are located and the number of the bucket in the 3D annotation frame information in the point cloud data sample. The first training module generates the first according to the offset of the parameters in the 3D initial frame generated for the front sight in the corresponding bucket and the offsets of the parameters in the 3D annotation frame information in the point cloud data sample in the corresponding bucket. Three losses. The first training module generates a fourth loss according to the offset of the parameter in the 3D initial frame generated for the front sight from the predetermined parameter. The first training module generates a fifth loss according to the offset of the coordinate parameter of the front sight relative to the coordinate parameter of the 3D initial frame generated for the front sight. The first training module adjusts the network parameters of the first-stage neural network according to the first loss, the second loss, the third loss, the fourth loss, and the fifth loss.
在一个可选示例中,本公开中的第一子模块、第二子模块和第三子模块,由第二阶段神经网络实现。此时,本公开的装置还包括第二训练模块,第二训练模块用于利用带有3D标注框的点云数据样本对待训练的第二阶段神经网络进行训练。In an optional example, the first submodule, the second submodule, and the third submodule in the present disclosure are implemented by a second-stage neural network. At this time, the device of the present disclosure further includes a second training module, and the second training module is used to train the second-stage neural network to be trained using point cloud data samples with 3D annotation frames.
在一个可选示例中,第二训练模块对第二阶段神经网络进行训练的过程,包括:In an optional example, the process of the second training module training the second-stage neural network includes:
首先,第二训练模块将利用第一阶段神经网络获得的3D初始框,提供给第二阶段神经网络,基于第二阶段神经网络获取点云数据样本中的部分区域内的点的特征信息,根据部分区域内的点的特征信息,对部分区域内的点进行语义分割,获得部分区域内的点的第二语义特征;第二阶段神经网络根据部分区域内的点的第一语义特征和第二语义特征,确定3D初始框为目标对象的置信度,并根据部分区域内的点的第一语义特征和第二语义特征,生成位置校正后的3D初始框。First, the second training module provides the 3D initial frame obtained by using the first-stage neural network to the second-stage neural network, and obtains the feature information of the points in the partial region of the point cloud data sample based on the second-stage neural network. The feature information of the points in the partial area, semantically segment the points in the partial area to obtain the second semantic characteristics of the points in the partial area; the second stage neural network is based on the first semantic characteristics and the second of the points in the partial area Semantic features, determine the confidence of the 3D initial frame as the target object, and generate a position-corrected 3D initial frame based on the first and second semantic features of the points in the partial area.
其次,第二训练模块获取3D初始框为目标对象的置信度对应的损失、和位置校正后的3D初始框相对于相应的3D标注框形成的损失,并根据获得的损失,对第二阶段神经网络中的网络参数进行调整。Second, the second training module obtains the loss corresponding to the confidence of the target object in the 3D initial frame, and the loss formed by the position-corrected 3D initial frame relative to the corresponding 3D annotation frame, and according to the obtained loss, the second-stage neural network Adjust the network parameters in the network.
可选的,第二训练模块可以根据第二阶段神经网络预测的3D初始框为目标对象的置信度,确定预测结果对应的第六损失。第二训练模块根据第二阶段神经网络生成的、与相应的3D标注框的重叠度超过设定阈值的、位置校正后的3D初始框中的参数所在的桶的编号、以及点云数据样本中的3D标注框信息中的参数所在的桶的编号,产生第七损失;第二训练模块根据第二阶段神经网络生成的与相应的3D标注框的重叠度超过设定阈值的位置校正后的3D初始框中的参数在对应桶内的偏移量、以及点云数据样本中的3D标注框信息中的参数在对应桶内的偏移量,产生第八损失;第二训练模块根据第二阶段神经网络生成的与相应的3D标注框的重叠度超过设定阈值的位置校正后的3D初始框中的参数相对于预定参数的偏移量,产生第九损失;第二训练模块根据第二阶段神经网络生成的与相应的3D标注框的重叠度超过设定阈值的位置校正后的3D初始框中的坐标参数相对于3D标注框的中心点的坐标参数的偏移量,产生第十损失;第二训练模块根据第六损失、第七损失、第八损失、第九损失和第十损失,调整第二阶段神经网络的网络参数。Optionally, the second training module may determine the sixth loss corresponding to the prediction result according to the confidence level of the target object predicted by the 3D initial frame predicted by the second stage neural network. The second training module is based on the number of the bucket where the parameters in the 3D initial frame after the position correction and the point cloud data samples generated by the second stage neural network and the overlap with the corresponding 3D annotation frame exceed the set threshold The number of the bucket where the parameter in the 3D annotation frame information is generated, resulting in a seventh loss; the second training module corrects the 3D corrected position based on the position generated by the second stage neural network and the corresponding overlap of the 3D annotation frame exceeds the set threshold The offset of the parameter in the initial box in the corresponding bucket and the offset of the parameter in the 3D annotation box information in the point cloud data sample in the corresponding bucket generate an eighth loss; the second training module is based on the second stage The offset of the parameters of the 3D initial frame after the position correction of the position of the 3D initial frame generated by the neural network and the corresponding 3D labeling frame with respect to the set threshold generates a ninth loss; the second training module according to the second stage The offset of the coordinate parameters of the corrected 3D initial frame generated by the neural network and the corresponding position of the 3D labeled frame that exceeds the set threshold relative to the coordinate parameters of the center point of the 3D labeled frame generates a tenth loss; The second training module adjusts the network parameters of the second stage neural network according to the sixth loss, seventh loss, eighth loss, ninth loss, and tenth loss.
图9为本公开的车辆智能控制装置一个实施例的结构示意图。如图9所示,该实施例的装置包括:目标对象3D检测装置900以及第一控制模块910。目标对象3D检测装置900用于基于点云数据,获得目标对象的3D检测框。目标对象3D检测装置900的具体结构以及执行的具体操作如上述装置和方法实施方式中的描述,在此不再详细说明。第一控制模块910主要用于根据3D检测框,生成对车辆进行控制的指令或者预警提示信息。具体可以参见上述方法实施方式中的相关描述,在此不再详细说明。9 is a schematic structural diagram of an embodiment of a vehicle intelligent control device of the present disclosure. As shown in FIG. 9, the device of this embodiment includes: a target object 3D detection device 900 and a first control module 910. The target object 3D detection device 900 is used to obtain a 3D detection frame of the target object based on the point cloud data. The specific structure and specific operations of the target object 3D detection device 900 are as described in the above device and method embodiments, and will not be described in detail here. The first control module 910 is mainly used to generate an instruction or early warning information for controlling the vehicle according to the 3D detection frame. For details, reference may be made to the relevant description in the above method embodiment, and no more detailed description is provided here.
图10为本公开的避障导航装置,如图10所示,该实施例的装置包括:目标对象3D检测装置1000以及第二控制模块1010。目标对象3D检测装置1000用于基于点云数据,获得目标对象的3D检测框。目标对象3D检测装置1000的具体结构以及执行的具体操作如上述装置和方法实施方式中的相关描述,在此不再详细说明。第二控制模块1010主要用于根据3D检测框,生成对机器人进行避障导航控制的指令或者预警提示信息。具体可以参见上述方法实施方式中的相关描述,在此不再详细说明。FIG. 10 is an obstacle avoidance navigation device of the present disclosure. As shown in FIG. 10, the device of this embodiment includes: a target object 3D detection device 1000 and a second control module 1010. The target object 3D detection device 1000 is used to obtain a 3D detection frame of the target object based on the point cloud data. The specific structure and specific operations of the target object 3D detection device 1000 are as described in the above-mentioned device and method embodiments, and will not be described in detail here. The second control module 1010 is mainly used to generate instructions or warning prompt information for performing obstacle avoidance navigation control on the robot according to the 3D detection frame. For details, reference may be made to the relevant description in the above method embodiment, and no more detailed description is provided here.
示例性设备Exemplary equipment
图11示出了适于实现本公开的示例性设备1100,设备1100可以是汽车中配置的控制系统/电子系统、移动终端(例如,智能移动电话等)、个人计算机(PC,例如,台式计算机或者笔记型计算机等)、平板电脑以及服务器等。图11中,设备1100包括一个或者多个处理器、通信部等,所述一个或者多个处理器可以为:一个或者多个中央处理单元(CPU)1101,和/或,一个或者多个利用神经网络进行视觉跟踪的图像处理器(GPU)1113等,处理器可以根据存储在只读存储器(ROM)1102中的可执行指令或者从存储部分1108加载到随机访问存储器(RAM)1103中的可执行指令而执行各种适当的动 作和处理。通信部1112可以包括但不限于网卡,所述网卡可以包括但不限于IB(Infiniband)网卡。处理器可与只读存储器1102和/或随机访问存储器1103通信以执行可执行指令,通过总线1104与通信部1112相连、并经通信部1112与其他目标设备通信,从而完成本公开中的相应步骤。上述各指令所执行的操作可以参见上述方法实施例中的相关描述,在此不再详细说明。在RAM 1103中,还可以存储有装置操作所需的各种程序以及数据。CPU1101、ROM1102以及RAM1103通过总线1104彼此相连。FIG. 11 shows an exemplary device 1100 suitable for implementing the present disclosure. The device 1100 may be a control system/electronic system configured in a car, a mobile terminal (eg, smart mobile phone, etc.), a personal computer (PC, eg, desktop computer Or notebook computers, etc.), tablet computers and servers. In FIG. 11, the device 1100 includes one or more processors, a communication part, etc. The one or more processors may be: one or more central processing units (CPUs) 1101, and/or one or more utilizations An image processor (GPU) 1113 for visual tracking by a neural network, etc. The processor can load executable memory stored in a read-only memory (ROM) 1102 or load it from the storage section 1108 into a random access memory (RAM) 1103. Execute instructions to perform various appropriate actions and processes. The communication part 1112 may include but is not limited to a network card, and the network card may include but not limited to an IB (Infiniband) network card. The processor can communicate with the read-only memory 1102 and/or the random access memory 1103 to execute executable instructions, connect to the communication section 1112 through the bus 1104, and communicate with other target devices via the communication section 1112, thereby completing the corresponding steps in the present disclosure . For operations performed by the foregoing instructions, reference may be made to related descriptions in the foregoing method embodiments, and details are not described herein again. In RAM 1103, various programs and data necessary for device operation can also be stored. The CPU 1101, ROM 1102, and RAM 1103 are connected to each other through a bus 1104.
在有RAM1103的情况下,ROM1102为可选模块。RAM1103存储可执行指令,或在运行时向ROM1102中写入可执行指令,可执行指令使中央处理单元1101执行上述目标对象3D检测方法所包括的步骤。输入/输出(I/O)接口1105也连接至总线1104。通信部1112可以集成设置,也可以设置为具有多个子模块(例如,多个IB网卡),并分别与总线连接。以下部件连接至I/O接口1105:包括键盘、鼠标等的输入部分1106;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分1107;包括硬盘等的存储部分1108;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分1109。通信部分1109经由诸如因特网的网络执行通信处理。驱动器1110也根据需要连接至I/O接口1105。可拆卸介质1111,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1110上,以便于从其上读出的计算机程序根据需要被安装在存储部分1108中。In the case of RAM1103, ROM1102 is an optional module. The RAM 1103 stores executable instructions, or writes executable instructions to the ROM 1102 at runtime. The executable instructions cause the central processing unit 1101 to perform the steps included in the target object 3D detection method. An input/output (I/O) interface 1105 is also connected to the bus 1104. The communication unit 1112 may be integratedly provided, or may be provided with multiple sub-modules (for example, multiple IB network cards), and are respectively connected to the bus. The following components are connected to the I/O interface 1105: an input section 1106 including a keyboard, a mouse, etc.; an output section 1107 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; a storage section 1108 including a hard disk, etc. ; And a communication section 1109 including a network interface card such as a LAN card, a modem, etc. The communication section 1109 performs communication processing via a network such as the Internet. The driver 1110 is also connected to the I/O interface 1105 as necessary. A removable medium 1111, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed on the drive 1110 as necessary, so that the computer program read out therefrom is installed in the storage portion 1108 as needed.
需要特别说明的是,如图11所示的架构仅为一种可选实现方式,在具体实践过程中,可根据实际需要对上述图11的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如,GPU1113和CPU1101可分离设置,再如,可将GPU1113集成在CPU1101上,通信部1112可分离设置,也可集成设置在CPU1101或GPU1113上等。这些可替换的实施方式均落入本公开的保护范围。特别地,根据本公开的实施方式,下文参考流程图描述的过程可以被实现为计算机软件程序,例如,本公开实施方式包括一种计算机程序产品,其包含有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的步骤的程序代码,程序代码可包括对应执行本公开提供的方法中的步骤对应的指令。在这样的实施方式中,该计算机程序可以通过通信部分1109从网络上被下载及安装,和/或从可拆卸介质1111被安装。在该计算机程序被中央处理单元(CPU)1101执行时,执行本公开中记载的实现上述相应步骤的指令。It should be noted that the architecture shown in FIG. 11 is only an optional implementation method. In the specific practical process, the number and types of the components in FIG. 11 can be selected, deleted, added, or replaced according to actual needs. ; In the setting of different functional components, you can also use separate settings or integrated settings, for example, GPU1113 and CPU1101 can be set separately, for example, GPU1113 can be integrated on CPU1101, the communication section 1112 can be set separately, can also be integrated Set on CPU1101 or GPU1113, etc. These alternative embodiments all fall within the protection scope of the present disclosure. In particular, according to the embodiments of the present disclosure, the process described below with reference to the flowchart may be implemented as a computer software program. For example, the embodiments of the present disclosure include a computer program product that includes a software program tangibly contained on a machine-readable medium. A computer program, the computer program includes program code for performing the steps shown in the flowchart, and the program code may include instructions corresponding to the steps in the method provided by the present disclosure. In such an embodiment, the computer program may be downloaded and installed from the network through the communication section 1109, and/or installed from the removable medium 1111. When the computer program is executed by the central processing unit (CPU) 1101, instructions described in the present disclosure that implement the above-mentioned corresponding steps are executed.
在一个或多个可选实施方式中,本公开实施例还提供了一种计算机程序程序产品,用于存储计算机可读指令,所述指令被执行时使得计算机执行上述任意实施例中所述的目标对象3D检测方法。In one or more optional implementation manners, the embodiments of the present disclosure also provide a computer program program product for storing computer-readable instructions, which when executed, causes the computer to perform the operations described in any of the above embodiments Target object 3D detection method.
该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选例子中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选例子中,所述计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。The computer program product may be implemented in hardware, software, or a combination thereof. In an optional example, the computer program product is embodied as a computer storage medium. In another optional example, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.
在一个或多个可选实施方式中,本公开实施例还提供了另一种目标对象3D检测方法及其对应的装置和电子设备、计算机存储介质、计算机程序以及计算机程序产品,其中的目标对象3D检测方法包括:第一装置向第二装置发送目标对象3D检测指示,该指示使得第二装置执行上述任一可能的实施例中的目标对象3D检测方法;第一装置接收第二装置发送的目标对象3D检测结果。In one or more optional implementation manners, the embodiments of the present disclosure also provide another 3D detection method of a target object and its corresponding device and electronic device, computer storage medium, computer program, and computer program product, wherein the target object The 3D detection method includes: the first device sends a target object 3D detection instruction to the second device, the instruction causes the second device to perform the target object 3D detection method in any of the above possible embodiments; the first device receives the 3D detection results of the target object.
在一些实施例中,该目标对象3D检测指示可以具体为调用指令,第一装置可以通过调用的方式指示第二装置执行目标对象3D检测操作,相应地,响应于接收到调用指令,第二装置可以执行上述目标对象3D检测方法中的任意实施例中的步骤和/或流程。In some embodiments, the target object 3D detection instruction may specifically be a call instruction, and the first device may instruct the second device to perform the target object 3D detection operation by calling. Accordingly, in response to receiving the call instruction, the second device The steps and/or processes in any of the embodiments of the above-described target object 3D detection method may be performed.
应理解,本公开实施例中的“第一”、“第二”等术语仅仅是为了区分,而不应理解成对本公开实施例的限定。还应理解,在本公开中,“多个”可以指两个或两个以上,“至少一个”可以指一个、两个或两个以上。还应理解,对于本公开中提及的任一部件、数据或结构,在没有明确限定或者在前后文给出相反启示的情况下,一般可以理解为一个或多个。还应理解,本公开对各个实施例的描述着重强调各个实施例之间的不同之处,其相同或相似之处可以相互参考,为了简洁,不再一一赘述。It should be understood that the terms “first” and “second” in the embodiments of the present disclosure are only for distinction, and should not be construed as limiting the embodiments of the present disclosure. It should also be understood that in the present disclosure, “plurality” may refer to two or more, and “at least one” may refer to one, two, or more than two. It should also be understood that any component, data, or structure mentioned in the present disclosure can be generally understood as one or more, unless it is explicitly defined or given the opposite enlightenment in the context. It should also be understood that the description of the embodiments of the present disclosure emphasizes the differences between the embodiments, and the same or similarities can be referred to each other, and for the sake of brevity, they will not be described one by one.
可能以许多方式来实现本公开的方法和装置、电子设备以及计算机可读存储介质。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本公开的方法和装置、电子设备以及计算机可读存储介质。用于方法的步骤的上述顺序仅是为了进行说明,本公开的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施方式中,还可将本公开实施为记录在记录介质中的程序,这些程序包括用于实现根据本公开的方法的机器可读指令。因而,本公开还覆盖存储用于执行根据本公开的方法的程序的记录介质。The method and apparatus, electronic device, and computer-readable storage medium of the present disclosure may be implemented in many ways. For example, the method and apparatus, electronic device, and computer-readable storage medium of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above order of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless otherwise specifically stated. In addition, in some embodiments, the present disclosure may also be implemented as programs recorded in a recording medium, and these programs include machine-readable instructions for implementing the method according to the present disclosure. Thus, the present disclosure also covers the recording medium storing the program for executing the method according to the present disclosure.
本公开的描述,是为了示例和描述起见而给出的,而并不是无遗漏的或者将本公开限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言,是显然的。选择和描述实施方式是为了更好说明本公开的原理以及实际应用,并且使本领域的普通技术人员能够理解本公开实施例可以从而设计适于特定用途的带有各种修改的各种实施方式。The description of the present disclosure is given for the sake of example and description, and is not exhaustive or limits the present disclosure to the disclosed form. Many modifications and changes will be apparent to those of ordinary skill in the art. The embodiments are selected and described in order to better explain the principles and practical applications of the present disclosure, and enable those of ordinary skill in the art to understand that the embodiments of the present disclosure can thereby design various embodiments with various modifications suitable for specific uses .

Claims (43)

  1. 一种目标对象3D检测方法,其特征在于,包括:A 3D detection method for a target object, characterized in that it includes:
    提取获取到的场景的点云数据的特征信息;Extract the feature information of the obtained point cloud data of the scene;
    根据所述点云数据的特征信息对所述点云数据进行语义分割,获得所述点云数据中的多个点的第一语义信息;Performing semantic segmentation on the point cloud data according to the feature information of the point cloud data to obtain first semantic information of multiple points in the point cloud data;
    根据所述第一语义信息预测所述多个点中对应目标对象的至少一个前景点;Predicting at least one front attraction corresponding to the target object among the plurality of points according to the first semantic information;
    根据所述第一语义信息生成所述至少一个前景点各自对应的3D初始框;Generating a 3D initial frame corresponding to each of the at least one front sight according to the first semantic information;
    根据所述3D初始框确定所述场景中的所述目标对象的3D检测框。The 3D detection frame of the target object in the scene is determined according to the 3D initial frame.
  2. 根据权利要求1所述的方法,所述根据所述3D初始框确定所述场景中的所述目标对象的3D检测框,包括:The method of claim 1, the determining the 3D detection frame of the target object in the scene according to the 3D initial frame comprises:
    获取所述点云数据中的部分区域内的点的特征信息,其中,所述部分区域至少包括一所述3D初始框;Acquiring characteristic information of points in a partial area in the point cloud data, wherein the partial area includes at least one of the 3D initial frames;
    根据所述部分区域内的点的特征信息对所述部分区域内的点进行语义分割,获得所述部分区域内的点的第二语义信息;Performing semantic segmentation on the points in the partial area according to the characteristic information of the points in the partial area to obtain second semantic information of the points in the partial area;
    根据所述部分区域内的点的第一语义信息和第二语义信息,确定所述场景中的所述目标对象的3D检测框。The 3D detection frame of the target object in the scene is determined according to the first semantic information and the second semantic information of the points in the partial area.
  3. 根据权利要求2所述的方法,所述根据所述部分区域内的点的第一语义信息和第二语义信息,确定所述场景中的所述目标对象的3D检测框,包括:The method according to claim 2, the determining the 3D detection frame of the target object in the scene according to the first semantic information and the second semantic information of the points in the partial area includes:
    根据所述部分区域内的点的第一语义信息和第二语义信息,校正所述3D初始框,得到校正后的3D初始框;Correct the 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area, to obtain a corrected 3D initial frame;
    根据校正后的3D初始框确定所述场景中的所述目标对象的3D检测框。The 3D detection frame of the target object in the scene is determined according to the corrected 3D initial frame.
  4. 根据权利要求2所述的方法,所述根据所述部分区域内的点的第一语义信息和第二语义信息,确定所述场景中的所述目标对象的3D检测框,包括:The method according to claim 2, the determining the 3D detection frame of the target object in the scene according to the first semantic information and the second semantic information of the points in the partial area includes:
    根据所述部分区域内的点的第一语义信息和第二语义信息,确定所述3D初始框对应目标对象的置信度;Determine the confidence of the target object corresponding to the 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area;
    根据所述3D初始框及其置信度确定所述场景中的所述目标对象的3D检测框。The 3D detection frame of the target object in the scene is determined according to the 3D initial frame and its confidence.
  5. 根据权利要求2所述的方法,所述根据所述部分区域内的点的第一语义信息和第二语义信息,确定所述场景中的所述目标对象的3D检测框,包括:The method according to claim 2, the determining the 3D detection frame of the target object in the scene according to the first semantic information and the second semantic information of the points in the partial area includes:
    根据所述部分区域内的点的第一语义信息和第二语义信息校正所述3D初始框,得到校正后的3D初始框;Correct the 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area to obtain a corrected 3D initial frame;
    根据所述部分区域内的点的第一语义信息和第二语义信息确定所述校正后的3D初始框对应目标对象的置信度;Determine the confidence of the target object corresponding to the corrected 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area;
    根据所述校正后的3D初始框及其置信度确定所述场景中的所述目标对象的3D检测框。The 3D detection frame of the target object in the scene is determined according to the corrected 3D initial frame and its confidence.
  6. 根据权利要求2至5中任一项所述的方法,其特征在于,所述部分区域包括:根据预定策略对3D初始框进行边缘扩展,得到的3D扩展框。The method according to any one of claims 2 to 5, wherein the partial area comprises: performing edge expansion on the 3D initial frame according to a predetermined strategy to obtain a 3D expansion frame.
  7. 根据权利要求6所述的方法,其特征在于,所述3D扩展框,包括:The method according to claim 6, wherein the 3D expansion frame includes:
    根据预先设定的X轴方向增量、Y轴方向增量和/或Z轴方向增量,对所述3D初始框进行3D空间扩展,形成包含有所述3D初始框的3D扩展框。According to a preset increment in the X-axis direction, increment in the Y-axis direction, and/or increment in the Z-axis direction, the 3D initial frame is expanded in 3D space to form a 3D expansion frame including the 3D initial frame.
  8. 根据权利要求6或7所述的方法,其特征在于,所述根据所述部分区域内的点的特征信息对所述部分区域内的点进行语义分割,获得所述部分区域内的点的第二语义信息,包括:The method according to claim 6 or 7, wherein the points in the partial area are semantically segmented according to the feature information of the points in the partial area to obtain the number of points in the partial area Two semantic information, including:
    根据所述3D扩展框的预设目标位置,对点云数据中位于所述3D扩展框内的点的坐标信息进行坐标变换,获取坐标变换后的点的特征信息;Performing coordinate transformation on the coordinate information of the point located in the 3D extension box in the point cloud data according to the preset target position of the 3D extension box to obtain the feature information of the point after the coordinate transformation;
    根据坐标变换后的点的特征信息,进行基于所述3D扩展框的语义分割,获得所述3D扩展框中的点的第二语义特征。According to the feature information of the point after coordinate transformation, the semantic segmentation based on the 3D extension frame is performed to obtain the second semantic feature of the point in the 3D extension frame.
  9. 根据权利要求8所述的方法,其特征在于,所述根据坐标变换后的点的特征信息,进行基于所述3D扩展框的语义分割,包括:The method according to claim 8, wherein the performing semantic segmentation based on the 3D extension frame according to the feature information of the coordinate-transformed point includes:
    根据所述前景点的掩膜以及坐标变换后的点的特征信息,进行基于所述3D扩展框的语义分割。The semantic segmentation based on the 3D expansion frame is performed according to the mask of the front scenic spot and the feature information of the point after coordinate transformation.
  10. 根据权利要求1所述的方法,所述前景点为多个,所述根据所述3D初始框确定所述场景中的所述目标对象的3D检测框,包括:The method according to claim 1, wherein there are a plurality of front spots, and the determining the 3D detection frame of the target object in the scene according to the 3D initial frame includes:
    确定多个所述前景点对应的3D初始框之间的重叠度;Determine the degree of overlap between the 3D initial frames corresponding to the plurality of front sights;
    对重叠度大于设定阈值的3D初始框进行筛选;Screen the 3D initial frame whose overlap is greater than the set threshold;
    根据筛选后的3D初始框确定所述场景中的所述目标对象的3D检测框。The 3D detection frame of the target object in the scene is determined according to the filtered 3D initial frame.
  11. 根据权利要求1至10中任一项所述的方法,其特征在于,所述提取获取到的场景的点云数据的特征信息,根据所述点云数据的特征信息对所述点云数据进行语义分割,获得所述点云数据中的多个点的第一语义信息,根据所述第一语义信息预测所述多个点中对应目标对象的至少一个前景点,根据所述第一语义信息生成所述至少一个前景点各自对应的3D初始框,由第一阶段神经网络实现;The method according to any one of claims 1 to 10, wherein the feature information of the point cloud data of the acquired scene is extracted, and the point cloud data is performed according to the feature information of the point cloud data Semantic segmentation, obtaining first semantic information of multiple points in the point cloud data, predicting at least one previous scenic spot of the corresponding target object among the multiple points according to the first semantic information, and according to the first semantic information Generating a 3D initial frame corresponding to each of the at least one front sight, implemented by the first-stage neural network;
    所述第一阶段神经网络,是利用带有3D标注框的点云数据样本训练获得的。The first-stage neural network is obtained by training using point cloud data samples with 3D annotation frames.
  12. 根据权利要求11所述的方法,其特征在于,所述第一阶段神经网络的训练过程包括:The method according to claim 11, wherein the training process of the first-stage neural network includes:
    将点云数据样本提供给所述第一阶段神经网络,基于所述第一阶段神经网络提取所述点云数据样本的特征信息,根据所述点云数据样本的特征信息对所述点云数据样本进行语义分割,根据语义分割获得的多个点的第一语义特征,预测所述多个点中对应目标对象的至少一个前景点,并根据所述第一语义信息生成所述至少一个前景点各自对应的3D初始框;Providing point cloud data samples to the first-stage neural network, extracting characteristic information of the point cloud data samples based on the first-stage neural network, and comparing the point cloud data according to the characteristic information of the point cloud data samples The sample performs semantic segmentation, predicts at least one front attraction corresponding to the target object among the multiple points according to the first semantic features of the multiple points obtained by the semantic segmentation, and generates the at least one front attraction according to the first semantic information Each corresponding 3D initial frame;
    获取所述前景点对应的损失、以及所述3D初始框相对于相应的3D标注框形成的损失,并根据所述损失,对所述第一阶段神经网络中的网络参数进行调整。Obtain the loss corresponding to the front sight and the loss formed by the 3D initial frame relative to the corresponding 3D annotation frame, and adjust the network parameters in the first-stage neural network according to the loss.
  13. 根据权利要求12所述的方法,其特征在于,所述获取所述前景点对应的损失、以及所述3D初始框相对于相应的3D标注框形成的损失,并根据所述损失,对所述第一阶段神经网络中的网络参数进行调整,包括:The method according to claim 12, characterized in that, the loss corresponding to the front sight and the loss formed by the 3D initial frame relative to the corresponding 3D annotation frame are formed, and according to the loss, the The network parameters in the first stage neural network are adjusted, including:
    根据所述第一阶段神经网络预测的所述前景点的置信度,确定所述前景点预测结果对应的第一损失;Determine the first loss corresponding to the prediction result of the front sight according to the confidence of the front sight predicted by the first-stage neural network;
    根据针对所述前景点生成的3D初始框中的参数所在的桶的编号、以及所述点云数据样本中的3D标注框信息中的参数所在的桶的编号,产生第二损失;The second loss is generated according to the number of the bucket where the parameter in the 3D initial frame generated for the previous scenic spot is located and the number of the bucket where the parameter in the 3D annotation frame information in the point cloud data sample is located;
    根据针对所述前景点生成的3D初始框中的参数在对应桶内的偏移量、以及所述点云数据样本中的3D标注框信息中的参数在对应桶内的偏移量,产生第三损失;According to the offset of the parameter in the 3D initial frame generated for the front sight in the corresponding bucket and the offset of the parameter in the 3D annotation frame information in the point cloud data sample in the corresponding bucket, the first Three losses;
    根据针对所述前景点生成的3D初始框中的参数相对于预定参数的偏移量,产生第四损失;The fourth loss is generated according to the offset of the parameter in the 3D initial frame generated for the front sight from the predetermined parameter;
    根据所述前景点的坐标参数相对于针对该前景点生成的3D初始框中的坐标参数的偏移量产生的第五损失;A fifth loss generated according to the offset of the coordinate parameter of the front sight relative to the coordinate parameter of the 3D initial frame generated for the front sight;
    根据所述第一损失、第二损失、第三损失、第四损失和第五损失,对所述第一阶段神经网络的网络参数进行调整。The network parameters of the first-stage neural network are adjusted according to the first loss, second loss, third loss, fourth loss, and fifth loss.
  14. 根据权利要求2至9中任一项所述的方法,其特征在于,所述获取所述点云数据中的部分区域内的点的特征信息,根据所述部分区域内的点的特征信息对所述部分区域内的点进行语义分割,获得所述部分区域内的点的第二语义信息,根据所述部分区域内的点的所述第一语义信息和所述第二语义信息,确定所述场景中的所述目标对象的3D检测框,由第二阶段神经网络实现;The method according to any one of claims 2 to 9, wherein the acquiring feature information of points in a partial area in the point cloud data, according to the feature information of points in the partial area Perform semantic segmentation of the points in the partial area to obtain second semantic information of the points in the partial area, and determine the points based on the first semantic information and the second semantic information of the points in the partial area The 3D detection frame of the target object in the scene is implemented by the second-stage neural network;
    所述第二阶段神经网络,是利用带有3D标注框的点云数据样本训练获得的。The second-stage neural network is obtained by training using point cloud data samples with 3D annotation frames.
  15. 根据权利要求14所述的方法,其特征在于,所述第二阶段神经网络的训练过程包括:The method according to claim 14, wherein the training process of the second-stage neural network includes:
    将所述3D初始框提供给第二阶段神经网络,基于第二阶段神经网络获取所述点云数据样本中的部分区域内的点的特征信息,根据所述点云数据样本中的部分区域内的点的特征信息,对所述点云数据样本中的部分区域内的点进行语义分割,获得所述点云数据样本中的部分区域内的点的第二语义特征;根据所述点云数据样本中的部分区域内的点的第一语义特征和第二语义特征,确定所述3D初始框为目标对象的置信度,并根据所述点云数据样本中的部分区域内的点的第一语义特征和第二语义特征,生成位置校正后的3D初始框;The 3D initial frame is provided to the second-stage neural network, and based on the second-stage neural network, the feature information of the points in the partial area of the point cloud data sample is obtained, and according to the partial area in the point cloud data sample The characteristic information of the points, performing semantic segmentation on the points in the partial area in the point cloud data sample to obtain the second semantic characteristics of the points in the partial area in the point cloud data sample; according to the point cloud data The first semantic feature and the second semantic feature of the points in the partial area of the sample, determine the confidence of the 3D initial frame as the target object, and according to the first point of the point in the partial area of the point cloud data sample The semantic feature and the second semantic feature generate a 3D initial frame after position correction;
    获取所述3D初始框为目标对象的置信度对应的损失、以及所述位置校正后的3D初始框相对于相应的3D标注框形成的损失,并根据所述损失,对所述第二阶段神经网络中的网络参数进行调整。Acquiring the loss corresponding to the confidence of the target object in the 3D initial frame and the loss formed by the position-corrected 3D initial frame relative to the corresponding 3D annotation frame, and according to the loss, the second-stage nerve Adjust the network parameters in the network.
  16. 根据权利要求15所述的方法,其特征在于,所述获取所述3D初始框为目标对象的置信度对应的损失、以及所述位置校正后的3D初始框相对于相应的3D标注框形成的损失,并根据所述损失,对所述第二阶段神经网络中的网络参数进行调整,包括:The method according to claim 15, wherein the acquiring the 3D initial frame is a loss corresponding to the confidence of the target object, and the position-corrected 3D initial frame is formed relative to the corresponding 3D annotation frame Loss, and adjust the network parameters in the second-stage neural network according to the loss, including:
    根据第二阶段神经网络预测的3D初始框为目标对象的置信度,确定预测结果对应的第六损失;Determine the sixth loss corresponding to the prediction result according to the confidence level of the target 3D initial frame predicted by the second stage neural network;
    根据第二阶段神经网络生成的与相应的3D标注框的重叠度超过设定阈值的位置校正后的3D初始框中的参数所在的桶的编号、以及点云数据样本中的3D标注框信息中的参数所在的桶的编号,产生第七损失;According to the number of buckets where the parameters in the 3D initial frame after correction in the position where the overlap with the corresponding 3D label frame generated by the second stage neural network exceeds the set threshold and the 3D label frame information in the point cloud data sample The number of the barrel where the parameter is located, resulting in the seventh loss;
    根据第二阶段神经网络生成的与相应的3D标注框的重叠度超过设定阈值的位置校正后的3D初始框中的参数在对应桶内的偏移量、以及点云数据样本中的3D标注框信息中的参数在对应桶内的偏移 量,产生第八损失;The offset of the parameters in the corresponding 3D initial frame of the corrected 3D initial frame generated by the neural network generated in the second stage and the corresponding overlap of the corresponding 3D label frame in the corresponding threshold, and the 3D label in the point cloud data sample The offset of the parameter in the box information in the corresponding bucket generates an eighth loss;
    根据第二阶段神经网络生成的与相应的3D标注框的重叠度超过设定阈值的位置校正后的3D初始框中的参数相对于预定参数的偏移量,产生第九损失;The ninth loss is generated according to the offset of the parameters in the 3D initial frame after correction in the position of the position of the 3D initial frame generated by the neural network in the second stage that exceeds the set threshold, relative to the predetermined parameters;
    根据第二阶段神经网络生成的与相应的3D标注框的重叠度超过设定阈值的位置校正后的3D初始框中的坐标参数相对于3D标注框的中心点的坐标参数的偏移量,产生第十损失;According to the offset of the coordinate parameter of the 3D initial frame after the position generated by the second stage neural network and the corresponding overlap degree of the 3D annotation frame exceeds the set threshold, relative to the coordinate parameter of the center point of the 3D annotation frame, is generated Tenth loss;
    根据所述第六损失、第七损失、第八损失、第九损失和第十损失,调整所述第二阶段神经网络的网络参数。Adjust the network parameters of the second-stage neural network according to the sixth loss, seventh loss, eighth loss, ninth loss, and tenth loss.
  17. 一种车辆智能控制方法,其特征在于,所述方法包括:A vehicle intelligent control method, characterized in that the method includes:
    采用如权利要求1至16中任一项所述的目标对象3D检测方法,获得目标对象的3D检测框;The 3D detection method of the target object according to any one of claims 1 to 16 is used to obtain a 3D detection frame of the target object;
    根据所述3D检测框,生成对车辆进行控制的指令或者预警提示信息。According to the 3D detection frame, an instruction or early warning information for controlling the vehicle is generated.
  18. 根据权利要求17所述的方法,所述根据所述3D检测框,生成对车辆进行控制的指令或者预警提示信息,包括:The method according to claim 17, the generating instructions or early warning information for controlling the vehicle according to the 3D detection frame includes:
    根据所述3D检测框,确定所述目标对象的以下至少之一信息:所述目标对象在场景中的空间位置、大小、与车辆的距离、与车辆的相对方位信息;According to the 3D detection frame, determine at least one of the following information of the target object: the spatial position, size, distance to the vehicle, and relative orientation information of the target object in the scene;
    根据确定的所述至少之一信息,生成对所述车辆进行控制的指令或者预警提示信息。According to the determined at least one piece of information, an instruction or early warning information for controlling the vehicle is generated.
  19. 一种避障导航方法,其特征在于,所述方法包括:An obstacle avoidance navigation method, characterized in that the method includes:
    采用如权利要求1至16中任一项所述的目标对象3D检测方法,获得目标对象的3D检测框;The 3D detection method of the target object according to any one of claims 1 to 16 is used to obtain a 3D detection frame of the target object;
    根据所述3D检测框,生成对机器人进行避障导航控制的指令或者预警提示信息。According to the 3D detection frame, an instruction or warning prompt information for performing obstacle avoidance navigation control on the robot is generated.
  20. 根据权利要求19所述的方法,所述根据所述3D检测框,生成对机器人进行避障导航控制的指令或者预警提示信息,包括:The method according to claim 19, the generating instructions or warning prompt information for performing obstacle avoidance navigation control on the robot according to the 3D detection frame includes:
    根据所述3D检测框,确定所述目标对象的以下至少之一信息:所述目标对象在场景中的空间位置、大小、与机器人的距离、与机器人的相对方位信息;According to the 3D detection frame, determine at least one of the following information of the target object: the spatial position, size, distance to the robot, and relative orientation information of the target object in the scene;
    根据确定的所述至少之一信息,生成对所述机器人进行避障导航控制的指令或者预警提示信息。According to the determined at least one piece of information, an instruction or early warning prompt information for performing obstacle avoidance navigation control on the robot is generated.
  21. 一种目标对象3D检测装置,其特征在于,包括:A target object 3D detection device, characterized in that it includes:
    提取特征模块,用于提取获取到的场景的点云数据的特征信息;Feature extraction module, used to extract the feature information of the acquired point cloud data of the scene;
    第一语义分割模块,用于根据所述点云数据的特征信息对所述点云数据进行语义分割,获得所述点云数据中的多个点的第一语义信息;A first semantic segmentation module, configured to perform semantic segmentation on the point cloud data according to the feature information of the point cloud data, to obtain first semantic information of multiple points in the point cloud data;
    预测前景点模块,用于根据所述第一语义信息预测所述多个点中对应目标对象的至少一个前景点;A pre-prediction point module for predicting at least one pre-point of sight corresponding to the target object in the plurality of points according to the first semantic information;
    生成初始框模块,用于根据所述第一语义信息生成所述至少一个前景点各自对应的3D初始框;Generating an initial frame module, configured to generate a 3D initial frame corresponding to each of the at least one front sight according to the first semantic information;
    确定检测框模块,用于根据所述3D初始框确定所述场景中的所述目标对象的3D检测框。The detection frame determination module is configured to determine a 3D detection frame of the target object in the scene according to the 3D initial frame.
  22. 根据权利要求21所述的装置,所述确定检测框模块,进一步包括:The apparatus of claim 21, the determination detection frame module further comprising:
    第一子模块,用于获取所述点云数据中的部分区域内的点的特征信息,其中,所述部分区域至少包括一所述3D初始框;The first sub-module is used to obtain feature information of points in a partial area in the point cloud data, wherein the partial area includes at least one initial 3D frame;
    第二子模块,用于根据所述部分区域内的点的特征信息对所述部分区域内的点进行语义分割,获得所述部分区域内的点的第二语义信息;A second sub-module for semantically segmenting the points in the partial area according to the feature information of the points in the partial area to obtain second semantic information of the points in the partial area;
    第三子模块,用于根据所述部分区域内的点的第一语义信息和第二语义信息,确定所述场景中的所述目标对象的3D检测框。The third submodule is used to determine the 3D detection frame of the target object in the scene according to the first semantic information and the second semantic information of the points in the partial area.
  23. 根据权利要求22所述的装置,所述第三子模块包括:The apparatus according to claim 22, the third submodule comprises:
    第四子模块,用于根据所述部分区域内的点的第一语义信息和第二语义信息,校正所述3D初始框,得到校正后的3D初始框;A fourth submodule, configured to correct the 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area, to obtain a corrected 3D initial frame;
    第五子模块,用于根据校正后的3D初始框确定所述场景中的所述目标对象的3D检测框。The fifth sub-module is used to determine the 3D detection frame of the target object in the scene according to the corrected 3D initial frame.
  24. 根据权利要求22所述的装置,所述第三子模块,进一步用于:The apparatus according to claim 22, the third submodule is further used for:
    根据所述部分区域内的点的第一语义信息和第二语义信息,确定所述3D初始框对应目标对象的置信度;Determine the confidence of the target object corresponding to the 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area;
    根据所述3D初始框及其置信度确定所述场景中的所述目标对象的3D检测框。The 3D detection frame of the target object in the scene is determined according to the 3D initial frame and its confidence.
  25. 根据权利要求22所述的装置,所述第三子模块包括:The apparatus according to claim 22, the third submodule comprises:
    第四子模块,用于根据所述部分区域内的点的第一语义信息和第二语义信息,校正所述3D初始框,得到校正后的3D初始框;A fourth submodule, configured to correct the 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area, to obtain a corrected 3D initial frame;
    第六子模块,用于根据所述部分区域内的点的第一语义信息和第二语义信息确定所述校正后的3D初始框对应目标对象的置信度;A sixth submodule, configured to determine the confidence of the target object corresponding to the corrected 3D initial frame according to the first semantic information and the second semantic information of the points in the partial area;
    第七子模块,用于根据所述校正后的3D初始框及其置信度确定所述场景中的所述目标对象的3D检测框。The seventh sub-module is used to determine the 3D detection frame of the target object in the scene according to the corrected 3D initial frame and its confidence.
  26. 根据权利要求22至25中任一项所述的装置,其特征在于,所述部分区域包括:根据预定策略对3D初始框进行边缘扩展,得到的3D扩展框。The device according to any one of claims 22 to 25, wherein the partial area includes: a 3D expansion frame obtained by performing edge expansion on the 3D initial frame according to a predetermined strategy.
  27. 根据权利要求26所述的装置,其特征在于,所述3D扩展框,包括:The apparatus according to claim 26, wherein the 3D expansion frame comprises:
    根据预先设定的X轴方向增量、Y轴方向增量和/或Z轴方向增量,对所述3D初始框进行3D空间扩展,形成包含有所述3D初始框的3D扩展框。According to a preset increment in the X-axis direction, increment in the Y-axis direction, and/or increment in the Z-axis direction, the 3D initial frame is expanded in 3D space to form a 3D expansion frame including the 3D initial frame.
  28. 根据权利要求26或27所述的装置,其特征在于,所述第二子模块包括:The device according to claim 26 or 27, wherein the second submodule includes:
    第八子模块,用于根据所述3D扩展框的预设目标位置,对点云数据中位于所述3D扩展框内的点的坐标信息进行坐标变换,获取坐标变换后的点的特征信息;The eighth submodule is used to perform coordinate transformation on the coordinate information of the point in the 3D extension box in the point cloud data according to the preset target position of the 3D extension box, to obtain the feature information of the point after the coordinate transformation;
    第九子模块,用于根据坐标变换后的点的特征信息,进行基于所述3D扩展框的语义分割,获得所述3D扩展框中的点的第二语义特征。The ninth sub-module is used to perform semantic segmentation based on the 3D extension box according to the feature information of the coordinate-transformed point to obtain the second semantic feature of the point in the 3D extension box.
  29. 根据权利要求28所述的装置,其特征在于,所述第九子模块进一步用于:The apparatus according to claim 28, wherein the ninth sub-module is further used to:
    根据所述前景点的掩膜以及坐标变换后的点的特征信息,进行基于所述3D扩展框的语义分割。The semantic segmentation based on the 3D expansion frame is performed according to the mask of the front scenic spot and the feature information of the point after coordinate transformation.
  30. 根据权利要求21所述的装置,所述前景点为多个,所述确定检测框模块进一步用于:The apparatus according to claim 21, wherein there are a plurality of front spots, and the determination detection frame module is further used to:
    确定多个所述前景点对应的3D初始框之间的重叠度;Determine the degree of overlap between the 3D initial frames corresponding to the plurality of front sights;
    对重叠度大于设定阈值的3D初始框进行筛选;Screen the 3D initial frame whose overlap is greater than the set threshold;
    根据筛选后的3D初始框确定所述场景中的所述目标对象的3D检测框。The 3D detection frame of the target object in the scene is determined according to the filtered 3D initial frame.
  31. 根据权利要求21至30中任一项所述的装置,其特征在于,所述提取特征模块、第一语义分割模块、预测前景点模块和生成初始框模块,由第一阶段神经网络实现,且所述第一阶段神经网络,是第一训练模块利用带有3D标注框的点云数据样本训练获得的。The device according to any one of claims 21 to 30, wherein the feature extraction module, the first semantic segmentation module, the prediction of the pre-spot scenic spot module, and the generation of the initial frame module are implemented by a first-stage neural network, and The first-stage neural network is obtained by training the first training module using point cloud data samples with 3D annotation frames.
  32. 根据权利要求31所述的装置,其特征在于,所述第一训练模块用于:The apparatus according to claim 31, wherein the first training module is used to:
    将点云数据样本提供给所述第一阶段神经网络,基于所述第一阶段神经网络提取所述点云数据样本的特征信息,根据所述点云数据样本的特征信息对所述点云数据样本进行语义分割,根据语义分割获得的多个点的第一语义特征,预测所述多个点中对应目标对象的至少一个前景点,并根据所述第一语义信息生成所述至少一个前景点各自对应的3D初始框;Providing point cloud data samples to the first-stage neural network, extracting characteristic information of the point cloud data samples based on the first-stage neural network, and comparing the point cloud data according to the characteristic information of the point cloud data samples The sample performs semantic segmentation, predicts at least one front attraction corresponding to the target object among the multiple points according to the first semantic features of the multiple points obtained by the semantic segmentation, and generates the at least one front attraction according to the first semantic information Each corresponding 3D initial frame;
    获取所述前景点对应的损失、以及所述3D初始框相对于相应的3D标注框形成的损失,并根据所述损失,对所述第一阶段神经网络中的网络参数进行调整。Obtain the loss corresponding to the front sight and the loss formed by the 3D initial frame relative to the corresponding 3D annotation frame, and adjust the network parameters in the first-stage neural network according to the loss.
  33. 根据权利要求32所述的装置,其特征在于,所述第一训练模块进一步用于:The apparatus according to claim 32, wherein the first training module is further used to:
    根据所述第一阶段神经网络预测的所述前景点的置信度,确定所述前景点预测结果对应的第一损失;Determine the first loss corresponding to the prediction result of the front sight according to the confidence of the front sight predicted by the first-stage neural network;
    根据针对所述前景点生成的3D初始框中的参数所在的桶的编号、以及所述点云数据样本中的3D标注框信息中的参数所在的桶的编号,产生第二损失;The second loss is generated according to the number of the bucket where the parameter in the 3D initial frame generated for the previous scenic spot is located and the number of the bucket where the parameter in the 3D annotation frame information in the point cloud data sample is located;
    根据针对所述前景点生成的3D初始框中的参数在对应桶内的偏移量、以及所述点云数据样本中的3D标注框信息中的参数在对应桶内的偏移量,产生第三损失;According to the offset of the parameter in the 3D initial frame generated for the front sight in the corresponding bucket and the offset of the parameter in the 3D annotation frame information in the point cloud data sample in the corresponding bucket, the first Three losses;
    根据针对所述前景点生成的3D初始框中的参数相对于预定参数的偏移量,产生第四损失;The fourth loss is generated according to the offset of the parameter in the 3D initial frame generated for the front sight from the predetermined parameter;
    根据所述前景点的坐标参数相对于针对该前景点生成的3D初始框中的坐标参数的偏移量产生的第五损失;A fifth loss generated according to the offset of the coordinate parameter of the front sight relative to the coordinate parameter of the 3D initial frame generated for the front sight;
    根据所述第一损失、第二损失、第三损失、第四损失和第五损失,对所述第一阶段神经网络的网络参数进行调整。The network parameters of the first-stage neural network are adjusted according to the first loss, second loss, third loss, fourth loss, and fifth loss.
  34. 根据权利要求22至29中任一项所述的装置,其特征在于,所述第一子模块、第二子模块和第三子模块,由第二阶段神经网络实现,且所述第二阶段神经网络,是第二训练模块利用带有3D标注框的点云数据样本训练获得的。The device according to any one of claims 22 to 29, wherein the first submodule, the second submodule, and the third submodule are implemented by a second-stage neural network, and the second stage The neural network is obtained by training the second training module using point cloud data samples with 3D annotation frames.
  35. 根据权利要求34所述的装置,其特征在于,所述第二训练模块用于:The apparatus according to claim 34, wherein the second training module is used to:
    将所述3D初始框,提供给第二阶段神经网络,基于第二阶段神经网络获取所述点云数据样本中的部分区域内的点的特征信息,根据所述点云数据样本中的部分区域内的点的特征信息,对所述点云数据样本中的部分区域内的点进行语义分割,获得点云数据样本中的部分区域内的点的第二语义特征;根据所述点云数据样本中的部分区域内的点的第一语义特征和第二语义特征,确定所述3D初始框为目标对象的置信度,并根据所述点云数据样本中的部分区域内的点的第一语义特征和所述第二语义特征,生成位置校正后的3D初始框;The 3D initial frame is provided to the second-stage neural network, and based on the second-stage neural network, the feature information of the points in the partial area of the point cloud data sample is obtained, and according to the partial area in the point cloud data sample Feature information of points within, perform semantic segmentation on points in a partial area in the point cloud data sample to obtain second semantic characteristics of points in a partial area in the point cloud data sample; according to the point cloud data sample The first semantic feature and the second semantic feature of the points in the partial area in the, determine the confidence that the 3D initial frame is the target object, and according to the first semantics of the point in the partial area in the point cloud data sample The feature and the second semantic feature to generate a 3D initial frame after position correction;
    获取所述3D初始框为目标对象的置信度对应的损失、以及所述位置校正后的3D初始框相对于相 应的3D标注框形成的损失,并根据所述损失,对所述第二阶段神经网络中的网络参数进行调整。Acquiring the loss corresponding to the confidence of the target object in the 3D initial frame and the loss formed by the position-corrected 3D initial frame relative to the corresponding 3D annotation frame, and according to the loss, the second-stage nerve Adjust the network parameters in the network.
  36. 根据权利要求35所述的装置,其特征在于,所述第二训练模块进一步用于:The apparatus according to claim 35, wherein the second training module is further used to:
    根据第二阶段神经网络预测的3D初始框为目标对象的置信度,确定预测结果对应的第六损失;Determine the sixth loss corresponding to the prediction result according to the confidence level of the target 3D initial frame predicted by the second stage neural network;
    根据第二阶段神经网络生成的与相应的3D标注框的重叠度超过设定阈值的位置校正后的3D初始框中的参数所在的桶的编号、以及点云数据样本中的3D标注框信息中的参数所在的桶的编号,产生第七损失;According to the number of buckets where the parameters in the 3D initial frame after correction in the position where the overlap with the corresponding 3D label frame generated by the second stage neural network exceeds the set threshold and the 3D label frame information in the point cloud data sample The number of the barrel where the parameter is located, resulting in the seventh loss;
    根据第二阶段神经网络生成的与相应的3D标注框的重叠度超过设定阈值的位置校正后的3D初始框中的参数在对应桶内的偏移量、以及点云数据样本中的3D标注框信息中的参数在对应桶内的偏移量,产生第八损失;The offset of the parameters in the corresponding 3D initial frame of the corrected 3D initial frame generated by the neural network generated in the second stage and the corresponding overlap of the corresponding 3D label frame in the corresponding threshold and the 3D label in the point cloud data sample The offset of the parameter in the box information in the corresponding bucket generates an eighth loss;
    根据第二阶段神经网络生成的与相应的3D标注框的重叠度超过设定阈值的位置校正后的3D初始框中的参数相对于预定参数的偏移量,产生第九损失;The ninth loss is generated according to the offset of the parameters in the 3D initial frame after correction in the position of the position of the 3D initial frame generated by the neural network in the second stage that exceeds the set threshold, relative to the predetermined parameters;
    根据第二阶段神经网络生成的与相应的3D标注框的重叠度超过设定阈值的位置校正后的3D初始框中的坐标参数相对于3D标注框的中心点的坐标参数的偏移量,产生第十损失;According to the offset of the coordinate parameter of the 3D initial frame after the position generated by the second stage neural network and the corresponding overlap degree of the 3D annotation frame exceeds the set threshold, relative to the coordinate parameter of the center point of the 3D annotation frame, is generated Tenth loss;
    根据所述第六损失、第七损失、第八损失、第九损失和第十损失,调整所述第二阶段神经网络的网络参数。Adjust the network parameters of the second-stage neural network according to the sixth loss, seventh loss, eighth loss, ninth loss, and tenth loss.
  37. 一种车辆智能控制装置,其特征在于,所述装置包括:A vehicle intelligent control device, characterized in that the device includes:
    采用如权利要求21至36中任一项所述的目标对象3D检测装置,获得目标对象的3D检测框;The target object 3D detection device according to any one of claims 21 to 36 is used to obtain a target object 3D detection frame;
    第一控制模块,用于根据所述3D检测框,生成对车辆进行控制的指令或者预警提示信息。The first control module is configured to generate a command or early warning information for controlling the vehicle according to the 3D detection frame.
  38. 根据权利要求37所述的装置,所述第一控制模块,进一步用于:The apparatus according to claim 37, the first control module is further used to:
    根据所述3D检测框,确定所述目标对象的以下至少之一信息:所述目标对象在场景中的空间位置、大小、与车辆的距离、与车辆的相对方位信息;According to the 3D detection frame, determine at least one of the following information of the target object: the spatial position, size, distance to the vehicle, and relative orientation information of the target object in the scene;
    根据确定的所述至少之一信息,生成对所述车辆进行控制的指令或者预警提示信息。According to the determined at least one piece of information, an instruction or early warning information for controlling the vehicle is generated.
  39. 一种避障导航装置,其特征在于,所述装置包括:An obstacle avoidance navigation device, characterized in that the device comprises:
    采用如权利要求21至36中任一项所述的目标对象3D检测装置,获得目标对象的3D检测框;The target object 3D detection device according to any one of claims 21 to 36 is used to obtain a target object 3D detection frame;
    第二控制模块,用于根据所述3D检测框,生成对机器人进行避障导航控制的指令或者预警提示信息。The second control module is configured to generate an instruction or warning prompt information for performing obstacle avoidance navigation control on the robot according to the 3D detection frame.
  40. 根据权利要求39所述的装置,所述第二控制模块进一步用于:The apparatus according to claim 39, the second control module is further used to:
    根据所述3D检测框,确定所述目标对象的以下至少之一信息:所述目标对象在场景中的空间位置、大小、与机器人的距离、与机器人的相对方位信息;According to the 3D detection frame, determine at least one of the following information of the target object: the spatial position, size, distance to the robot, and relative orientation information of the target object in the scene;
    根据确定的所述至少之一信息,生成对所述机器人进行避障导航控制的指令或者预警提示信息。According to the determined at least one piece of information, an instruction or early warning prompt information for performing obstacle avoidance navigation control on the robot is generated.
  41. 一种电子设备,包括:An electronic device, including:
    存储器,用于存储计算机程序;Memory, used to store computer programs;
    处理器,用于执行所述存储器中存储的计算机程序,且所述计算机程序被执行时,实现上述权利要求1-20中任一项所述的方法。A processor, configured to execute a computer program stored in the memory, and when the computer program is executed, implement the method of any one of claims 1-20.
  42. 一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现上述权利要求1-20中任一项所述的方法。A computer-readable storage medium on which a computer program is stored, which when executed by a processor, implements the method of any one of claims 1-20 above.
  43. 一种计算机程序,包括计算机指令,当所述计算机指令在设备的处理器中运行时,实现上述权利要求1-20中任一项所述的方法。A computer program includes computer instructions, and when the computer instructions run in a processor of a device, the method according to any one of claims 1-20 above is implemented.
PCT/CN2019/118126 2018-11-29 2019-11-13 3d detection method and apparatus for target object, and medium and device WO2020108311A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2021526222A JP2022515591A (en) 2018-11-29 2019-11-13 3D detection method, device, medium and device of target object
KR1020217015013A KR20210078529A (en) 2018-11-29 2019-11-13 Target object 3D detection method, apparatus, medium and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811446588.8A CN109635685B (en) 2018-11-29 2018-11-29 Target object 3D detection method, device, medium and equipment
CN201811446588.8 2018-11-29

Publications (1)

Publication Number Publication Date
WO2020108311A1 true WO2020108311A1 (en) 2020-06-04

Family

ID=66070171

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118126 WO2020108311A1 (en) 2018-11-29 2019-11-13 3d detection method and apparatus for target object, and medium and device

Country Status (4)

Country Link
JP (1) JP2022515591A (en)
KR (1) KR20210078529A (en)
CN (1) CN109635685B (en)
WO (1) WO2020108311A1 (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860373A (en) * 2020-07-24 2020-10-30 浙江商汤科技开发有限公司 Target detection method and device, electronic equipment and storage medium
CN111968133A (en) * 2020-07-31 2020-11-20 上海交通大学 Three-dimensional point cloud data example segmentation method and system in automatic driving scene
CN112200768A (en) * 2020-09-07 2021-01-08 华北水利水电大学 Point cloud information extraction system based on geographic position
CN112598635A (en) * 2020-12-18 2021-04-02 武汉大学 Point cloud 3D target detection method based on symmetric point generation
CN112766206A (en) * 2021-01-28 2021-05-07 深圳市捷顺科技实业股份有限公司 High-order video vehicle detection method and device, electronic equipment and storage medium
CN112800971A (en) * 2021-01-29 2021-05-14 深圳市商汤科技有限公司 Neural network training and point cloud data processing method, device, equipment and medium
CN112862953A (en) * 2021-01-29 2021-05-28 上海商汤临港智能科技有限公司 Point cloud data processing method and device, electronic equipment and storage medium
CN112907760A (en) * 2021-02-09 2021-06-04 浙江商汤科技开发有限公司 Three-dimensional object labeling method and device, tool, electronic equipment and storage medium
CN112990200A (en) * 2021-03-31 2021-06-18 上海商汤临港智能科技有限公司 Data labeling method and device, computer equipment and storage medium
CN113298163A (en) * 2021-05-31 2021-08-24 国网湖北省电力有限公司黄石供电公司 Target identification monitoring method based on LiDAR point cloud data
CN113516013A (en) * 2021-04-09 2021-10-19 阿波罗智联(北京)科技有限公司 Target detection method and device, electronic equipment, road side equipment and cloud control platform
CN113537316A (en) * 2021-06-30 2021-10-22 南京理工大学 Vehicle detection method based on 4D millimeter wave radar point cloud
CN113570535A (en) * 2021-07-30 2021-10-29 深圳市慧鲤科技有限公司 Visual positioning method and related device and equipment
CN113822277A (en) * 2021-11-19 2021-12-21 万商云集(成都)科技股份有限公司 Illegal advertisement picture detection method and system based on deep learning target detection
CN113984037A (en) * 2021-09-30 2022-01-28 电子科技大学长三角研究院(湖州) Semantic map construction method based on target candidate box in any direction
CN114241110A (en) * 2022-02-23 2022-03-25 北京邮电大学 Point cloud semantic uncertainty sensing method based on neighborhood aggregation Monte Carlo inactivation
CN114298581A (en) * 2021-12-30 2022-04-08 广州极飞科技股份有限公司 Quality evaluation model generation method, quality evaluation device, electronic device, and readable storage medium
CN114743001A (en) * 2022-04-06 2022-07-12 合众新能源汽车有限公司 Semantic segmentation method and device, electronic equipment and storage medium
CN115880470A (en) * 2023-03-08 2023-03-31 深圳佑驾创新科技有限公司 Method, device and equipment for generating 3D image data and storage medium
CN116420096A (en) * 2020-09-24 2023-07-11 埃尔构人工智能有限责任公司 Method and system for marking LIDAR point cloud data

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635685B (en) * 2018-11-29 2021-02-12 北京市商汤科技开发有限公司 Target object 3D detection method, device, medium and equipment
CN112101066B (en) * 2019-06-17 2024-03-08 商汤集团有限公司 Target detection method and device, intelligent driving method and device and storage medium
WO2020258218A1 (en) * 2019-06-28 2020-12-30 深圳市大疆创新科技有限公司 Obstacle detection method and device for mobile platform, and mobile platform
CN110458112B (en) * 2019-08-14 2020-11-20 上海眼控科技股份有限公司 Vehicle detection method and device, computer equipment and readable storage medium
CN112444784B (en) * 2019-08-29 2023-11-28 北京市商汤科技开发有限公司 Three-dimensional target detection and neural network training method, device and equipment
CN110751090B (en) * 2019-10-18 2022-09-20 宁波博登智能科技有限公司 Three-dimensional point cloud labeling method and device and electronic equipment
CN110991468B (en) * 2019-12-13 2023-12-19 深圳市商汤科技有限公司 Three-dimensional target detection and intelligent driving method, device and equipment
CN111179247A (en) * 2019-12-27 2020-05-19 上海商汤智能科技有限公司 Three-dimensional target detection method, training method of model thereof, and related device and equipment
CN111507973B (en) * 2020-04-20 2024-04-12 上海商汤临港智能科技有限公司 Target detection method and device, electronic equipment and storage medium
CN111539347B (en) * 2020-04-27 2023-08-08 北京百度网讯科技有限公司 Method and device for detecting target
CN111931727A (en) * 2020-09-23 2020-11-13 深圳市商汤科技有限公司 Point cloud data labeling method and device, electronic equipment and storage medium
CN112183330B (en) * 2020-09-28 2022-06-28 北京航空航天大学 Target detection method based on point cloud
CN112287939B (en) * 2020-10-29 2024-05-31 平安科技(深圳)有限公司 Three-dimensional point cloud semantic segmentation method, device, equipment and medium
CN115035359A (en) * 2021-02-24 2022-09-09 华为技术有限公司 Point cloud data processing method, training data processing method and device
CN113822146A (en) * 2021-08-02 2021-12-21 浙江大华技术股份有限公司 Target detection method, terminal device and computer storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150227775A1 (en) * 2012-09-11 2015-08-13 Southwest Research Institute 3-D Imaging Sensor Based Location Estimation
CN105976400A (en) * 2016-05-10 2016-09-28 北京旷视科技有限公司 Object tracking method and device based on neural network model
CN108122245A (en) * 2016-11-30 2018-06-05 华为技术有限公司 A kind of goal behavior describes method, apparatus and monitoring device
CN108171217A (en) * 2018-01-29 2018-06-15 深圳市唯特视科技有限公司 A kind of three-dimension object detection method based on converged network
CN109635685A (en) * 2018-11-29 2019-04-16 北京市商汤科技开发有限公司 Target object 3D detection method, device, medium and equipment

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008012635A (en) * 2006-07-07 2008-01-24 Toyota Motor Corp Personal identification system
US10733651B2 (en) * 2014-01-01 2020-08-04 Andrew S Hansen Methods and systems for identifying physical objects
CN108509820B (en) * 2017-02-23 2021-12-24 百度在线网络技术(北京)有限公司 Obstacle segmentation method and device, computer equipment and readable medium
CN108470174B (en) * 2017-02-23 2021-12-24 百度在线网络技术(北京)有限公司 Obstacle segmentation method and device, computer equipment and readable medium
US10885398B2 (en) * 2017-03-17 2021-01-05 Honda Motor Co., Ltd. Joint 3D object detection and orientation estimation via multimodal fusion
CN107622244B (en) * 2017-09-25 2020-08-28 华中科技大学 Indoor scene fine analysis method based on depth map
CN108895981B (en) * 2018-05-29 2020-10-09 南京怀萃智能科技有限公司 Three-dimensional measurement method, device, server and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150227775A1 (en) * 2012-09-11 2015-08-13 Southwest Research Institute 3-D Imaging Sensor Based Location Estimation
CN105976400A (en) * 2016-05-10 2016-09-28 北京旷视科技有限公司 Object tracking method and device based on neural network model
CN108122245A (en) * 2016-11-30 2018-06-05 华为技术有限公司 A kind of goal behavior describes method, apparatus and monitoring device
CN108171217A (en) * 2018-01-29 2018-06-15 深圳市唯特视科技有限公司 A kind of three-dimension object detection method based on converged network
CN109635685A (en) * 2018-11-29 2019-04-16 北京市商汤科技开发有限公司 Target object 3D detection method, device, medium and equipment

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860373A (en) * 2020-07-24 2020-10-30 浙江商汤科技开发有限公司 Target detection method and device, electronic equipment and storage medium
CN111860373B (en) * 2020-07-24 2022-05-20 浙江商汤科技开发有限公司 Target detection method and device, electronic equipment and storage medium
WO2022017140A1 (en) * 2020-07-24 2022-01-27 浙江商汤科技开发有限公司 Target detection method and apparatus, electronic device, and storage medium
CN111968133A (en) * 2020-07-31 2020-11-20 上海交通大学 Three-dimensional point cloud data example segmentation method and system in automatic driving scene
CN112200768A (en) * 2020-09-07 2021-01-08 华北水利水电大学 Point cloud information extraction system based on geographic position
CN116420096A (en) * 2020-09-24 2023-07-11 埃尔构人工智能有限责任公司 Method and system for marking LIDAR point cloud data
CN112598635B (en) * 2020-12-18 2024-03-12 武汉大学 Point cloud 3D target detection method based on symmetric point generation
CN112598635A (en) * 2020-12-18 2021-04-02 武汉大学 Point cloud 3D target detection method based on symmetric point generation
CN112766206B (en) * 2021-01-28 2024-05-28 深圳市捷顺科技实业股份有限公司 High-order video vehicle detection method and device, electronic equipment and storage medium
CN112766206A (en) * 2021-01-28 2021-05-07 深圳市捷顺科技实业股份有限公司 High-order video vehicle detection method and device, electronic equipment and storage medium
CN112862953A (en) * 2021-01-29 2021-05-28 上海商汤临港智能科技有限公司 Point cloud data processing method and device, electronic equipment and storage medium
CN112800971A (en) * 2021-01-29 2021-05-14 深圳市商汤科技有限公司 Neural network training and point cloud data processing method, device, equipment and medium
CN112862953B (en) * 2021-01-29 2023-11-28 上海商汤临港智能科技有限公司 Point cloud data processing method and device, electronic equipment and storage medium
CN112907760A (en) * 2021-02-09 2021-06-04 浙江商汤科技开发有限公司 Three-dimensional object labeling method and device, tool, electronic equipment and storage medium
CN112990200A (en) * 2021-03-31 2021-06-18 上海商汤临港智能科技有限公司 Data labeling method and device, computer equipment and storage medium
CN113516013A (en) * 2021-04-09 2021-10-19 阿波罗智联(北京)科技有限公司 Target detection method and device, electronic equipment, road side equipment and cloud control platform
CN113516013B (en) * 2021-04-09 2024-05-14 阿波罗智联(北京)科技有限公司 Target detection method, target detection device, electronic equipment, road side equipment and cloud control platform
CN113298163A (en) * 2021-05-31 2021-08-24 国网湖北省电力有限公司黄石供电公司 Target identification monitoring method based on LiDAR point cloud data
CN113537316B (en) * 2021-06-30 2024-04-09 南京理工大学 Vehicle detection method based on 4D millimeter wave radar point cloud
CN113537316A (en) * 2021-06-30 2021-10-22 南京理工大学 Vehicle detection method based on 4D millimeter wave radar point cloud
CN113570535A (en) * 2021-07-30 2021-10-29 深圳市慧鲤科技有限公司 Visual positioning method and related device and equipment
CN113984037A (en) * 2021-09-30 2022-01-28 电子科技大学长三角研究院(湖州) Semantic map construction method based on target candidate box in any direction
CN113984037B (en) * 2021-09-30 2023-09-12 电子科技大学长三角研究院(湖州) Semantic map construction method based on target candidate frame in any direction
CN113822277A (en) * 2021-11-19 2021-12-21 万商云集(成都)科技股份有限公司 Illegal advertisement picture detection method and system based on deep learning target detection
CN114298581A (en) * 2021-12-30 2022-04-08 广州极飞科技股份有限公司 Quality evaluation model generation method, quality evaluation device, electronic device, and readable storage medium
CN114241110B (en) * 2022-02-23 2022-06-03 北京邮电大学 Point cloud semantic uncertainty sensing method based on neighborhood aggregation Monte Carlo inactivation
CN114241110A (en) * 2022-02-23 2022-03-25 北京邮电大学 Point cloud semantic uncertainty sensing method based on neighborhood aggregation Monte Carlo inactivation
CN114743001A (en) * 2022-04-06 2022-07-12 合众新能源汽车有限公司 Semantic segmentation method and device, electronic equipment and storage medium
CN115880470A (en) * 2023-03-08 2023-03-31 深圳佑驾创新科技有限公司 Method, device and equipment for generating 3D image data and storage medium

Also Published As

Publication number Publication date
CN109635685A (en) 2019-04-16
CN109635685B (en) 2021-02-12
JP2022515591A (en) 2022-02-21
KR20210078529A (en) 2021-06-28

Similar Documents

Publication Publication Date Title
WO2020108311A1 (en) 3d detection method and apparatus for target object, and medium and device
CN113486796B (en) Unmanned vehicle position detection method, unmanned vehicle position detection device, unmanned vehicle position detection equipment, storage medium and vehicle
WO2019179464A1 (en) Method for predicting direction of movement of target object, vehicle control method, and device
WO2020253121A1 (en) Target detection method and apparatus, intelligent driving method and device, and storage medium
Zhou et al. Efficient road detection and tracking for unmanned aerial vehicle
US20190156144A1 (en) Method and apparatus for detecting object, method and apparatus for training neural network, and electronic device
US9147255B1 (en) Rapid object detection by combining structural information from image segmentation with bio-inspired attentional mechanisms
US20150253864A1 (en) Image Processor Comprising Gesture Recognition System with Finger Detection and Tracking Functionality
US20210117704A1 (en) Obstacle detection method, intelligent driving control method, electronic device, and non-transitory computer-readable storage medium
EP4088134A1 (en) Object size estimation using camera map and/or radar information
WO2020184207A1 (en) Object tracking device and object tracking method
WO2023116631A1 (en) Training method and training apparatus for rotating-ship target detection model, and storage medium
US20150278589A1 (en) Image Processor with Static Hand Pose Recognition Utilizing Contour Triangulation and Flattening
WO2020238008A1 (en) Moving object detection method and device, intelligent driving control method and device, medium, and apparatus
KR20210012012A (en) Object tracking methods and apparatuses, electronic devices and storage media
CN112651274A (en) Road obstacle detection device, road obstacle detection method, and recording medium
WO2020238073A1 (en) Method for determining orientation of target object, intelligent driving control method and apparatus, and device
US20220335572A1 (en) Semantically accurate super-resolution generative adversarial networks
CN116310993A (en) Target detection method, device, equipment and storage medium
CN115100741A (en) Point cloud pedestrian distance risk detection method, system, equipment and medium
US20230087261A1 (en) Three-dimensional target estimation using keypoints
CN117115414B (en) GPS-free unmanned aerial vehicle positioning method and device based on deep learning
Petković et al. An overview on horizon detection methods in maritime video surveillance
Hashmani et al. A survey on edge detection based recent marine horizon line detection methods and their applications
CN117372928A (en) Video target detection method and device and related equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19889305

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021526222

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20217015013

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 24.09.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19889305

Country of ref document: EP

Kind code of ref document: A1