WO2024037552A1 - Target detection model training method and apparatus, map generation method and apparatus, and device - Google Patents

Target detection model training method and apparatus, map generation method and apparatus, and device Download PDF

Info

Publication number
WO2024037552A1
WO2024037552A1 PCT/CN2023/113197 CN2023113197W WO2024037552A1 WO 2024037552 A1 WO2024037552 A1 WO 2024037552A1 CN 2023113197 W CN2023113197 W CN 2023113197W WO 2024037552 A1 WO2024037552 A1 WO 2024037552A1
Authority
WO
WIPO (PCT)
Prior art keywords
point
training
loss
target detection
instance
Prior art date
Application number
PCT/CN2023/113197
Other languages
French (fr)
Chinese (zh)
Inventor
廖本成
陈少宇
程天恒
张骞
Original Assignee
北京地平线信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京地平线信息技术有限公司 filed Critical 北京地平线信息技术有限公司
Publication of WO2024037552A1 publication Critical patent/WO2024037552A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/588Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present disclosure relates to autonomous driving technology, and in particular, to a training method of a target detection model, a map generation method, device and equipment.
  • Embodiments of the present disclosure provide a training method for a target detection model, a map generation method, an apparatus and a device.
  • a training method for a target detection model including: obtaining training input data and corresponding first label data, where the training input data includes training image data and/or training point cloud data.
  • the first label data includes an ordered point set corresponding to a first number of instances in the training input data, and the ordered point set includes a target number of coordinate points in the first coordinate system; based on the training Input data, the first label data, a point-to-point loss function and a direction loss function to train the pre-established target detection network to obtain a target detection model.
  • the point-to-point loss function is used to determine the training instance output by the target detection network.
  • the point-to-point loss of the point set relative to the ordered point set of the instance in the first label data is used to determine the direction between points in the training instance point set relative to the first label data The loss of the direction between points in the ordered point set of the instance.
  • a method for generating a map including: acquiring first image data and/or first point cloud data of at least one perspective; based on the first image data and/or the The first point cloud data uses a target detection model obtained by pre-training to obtain an ordered point set of target instances.
  • the target detection model is obtained by the training method of the target detection model as described in any of the above embodiments.
  • the target instance The ordered point set includes ordered point sets corresponding to the first number of instances, and the ordered point set includes a target number of coordinate points in the first coordinate system; based on the target instance ordered point set, a map is generated.
  • a training device for a target detection model including: a first acquisition module for acquiring training input data and corresponding first label data, where the training input data includes training images data and/or training point cloud data.
  • the first label data includes an ordered point set corresponding to a first number of instances in the training input data.
  • the ordered point set includes a target number of points in the first coordinate system.
  • a first processing module configured to train a pre-established target detection network based on the training input data, the first label data, a point-to-point loss function and a direction loss function to obtain a target detection model, the The point-to-point loss function is used to determine the point-to-point loss of the training instance point set output by the target detection network relative to the ordered point set of instances in the first label data, and the direction loss function is used to determine the training instance point set. The loss of the direction between points relative to the direction between points of the ordered point set of instances in the first label data.
  • a map generation device including: a second acquisition module for acquiring first image data and/or first point cloud data of at least one perspective; a second processing module, Used to obtain an ordered point set of target instances based on the first image data and/or the first point cloud data using a target detection model obtained through pre-training.
  • the target detection model is as described in any of the above embodiments.
  • the training method of the target detection model is obtained, so
  • the ordered point set of the target instance includes an ordered point set corresponding to a first number of instances, and the ordered point set includes a target number of coordinate points in the first coordinate system.
  • a computer-readable storage medium stores a computer program, the computer program is used to perform training of the target detection model described in any of the above embodiments of the present disclosure. Method; or, the computer program is used to execute the map generation method described in any of the above embodiments of the present disclosure.
  • an electronic device includes: a processor; a memory for storing instructions executable by the processor; and the processor is configured to retrieve instructions from the memory.
  • the executable instructions are read and executed to implement the training method of the target detection model described in any of the above embodiments of the present disclosure.
  • an electronic device includes: a processor; a memory for storing instructions executable by the processor; and the processor is configured to retrieve instructions from the memory.
  • the executable instructions are read and executed to implement the map generation method described in any of the above embodiments of the present disclosure.
  • a computer program product is provided.
  • the instruction processor in the computer program product is executed, the map generation method or the map generation method described in any of the above embodiments of the present disclosure is executed. How to generate a map.
  • the pre-established target detection network is trained by using the ordered point set corresponding to the instance as a label and combining point-to-point loss and direction loss.
  • the obtained target detection model can predict the ordered point set of instances for image data and/or point cloud data, that is, to achieve prediction at the map element coordinate point level, relative to the prediction at the map element instance frame level, embodiments of the present disclosure It can help improve the prediction accuracy of the model.
  • Figure 1 is an exemplary application scenario of the training method of the target detection model provided by the present disclosure
  • Figure 2 is a schematic flowchart of a training method for a target detection model provided by an exemplary embodiment of the present disclosure
  • Figure 3 is a schematic flowchart of step 202 provided by an exemplary embodiment of the present disclosure.
  • Figure 4 is a schematic structural diagram of a target detection network provided by an exemplary embodiment of the present disclosure.
  • Figure 5 is a schematic flowchart of step 202 provided by another exemplary embodiment of the present disclosure.
  • Figure 6 is a schematic flowchart of step 2021 provided by an exemplary embodiment of the present disclosure.
  • Figure 7 is a schematic structural diagram of a decoder network provided by an exemplary embodiment of the present disclosure.
  • Figure 8 is a schematic diagram of the principle of Deformable DETR provided by an exemplary embodiment of the present disclosure.
  • Figure 9 is a schematic diagram of the determination principle of a training instance point set provided by another exemplary embodiment of the present disclosure.
  • Figure 10 is a schematic flowchart of step 202 provided by yet another exemplary embodiment of the present disclosure.
  • Figure 11 is a schematic diagram of a prediction network of a prediction type provided by an exemplary embodiment of the present disclosure.
  • Figure 12 is a schematic flowchart of step 2024 provided by an exemplary embodiment of the present disclosure.
  • Figure 13 is a schematic flowchart of a map generation method provided by an exemplary embodiment of the present disclosure.
  • Figure 14 is a schematic structural diagram of a training device for a target detection model provided by an exemplary embodiment of the present disclosure
  • Figure 15 is a schematic structural diagram of the first processing module 502 provided by an exemplary embodiment of the present disclosure.
  • Figure 16 is a schematic structural diagram of the second processing unit 5022 provided by an exemplary embodiment of the present disclosure.
  • Figure 17 is a schematic structural diagram of the first processing unit 5021 provided by an exemplary embodiment of the present disclosure.
  • Figure 18 is a schematic structural diagram of the first processing module 502 provided by another exemplary embodiment of the present disclosure.
  • Figure 19 is a schematic structural diagram of a map generation device provided by an exemplary embodiment of the present disclosure.
  • Figure 20 is a schematic structural diagram of an application embodiment of the electronic device of the present disclosure.
  • the inventor found that in autonomous driving scenarios, it is usually necessary to use vehicle-mounted surround-view cameras and/or radars to perceive road elements (such as lane lines, zebra crossings, curbs, drivable areas, etc.), using for the generation of online maps. If the detection model in related technologies is used to predict the map instances corresponding to various road elements to obtain the map instance frame position, and the map is generated based on the map instance frame position, the accuracy of the generated map will be lower.
  • road elements such as lane lines, zebra crossings, curbs, drivable areas, etc.
  • Figure 1 is an exemplary application scenario of the training method of the target detection model provided by the present disclosure.
  • the pre-collected image data can be used as training image data
  • the pre-collected point cloud data can be used as training point cloud data to form training input data
  • the training input data can be formed
  • the labels corresponding to the image data and the labels corresponding to the training point cloud data are used as the first label data for training the target detection model.
  • the network output of the target detection model may include ordered point sets corresponding to the first number of instances, that is, each Each instance can correspond to an ordered point set.
  • Each ordered point set can include a target number of coordinate points in a first coordinate system.
  • the first coordinate system can be a coordinate system corresponding to a bird's-eye view.
  • the instances can be various types of points on the road.
  • the representation of elements in an image or point cloud can be instances, that is, each element can have one or more corresponding elements in the image or point cloud. Multiple instances. Each instance can predict its corresponding ordered point set. The ordered point set can fit the elements corresponding to the instance, such as the lane line instance. The ordered point set includes 3 coordinate points. Through this 3 coordinate points can fit a lane line.
  • the network parameters of the target detection model are adjusted based on the point-to-point loss function and the direction loss function, so that the target detection model obtained through training can effectively detect the ordered point sets corresponding to each instance.
  • the prediction at the instance coordinate point level of the present disclosure helps to improve the accuracy of prediction results.
  • the target detection model obtained by training can then be deployed to the map generation device on the on-board computing platform of the autonomous vehicle for online mapping of the autonomous vehicle, which helps to improve the accuracy of the generated map.
  • the training method of the target detection model of the present disclosure is not limited to autonomous driving scenarios, but can also be applied to any other implementable scenarios according to actual needs, and can be set according to actual needs.
  • FIG. 2 is a schematic flowchart of a training method for a target detection model provided by an exemplary embodiment of the present disclosure. This embodiment can be applied to electronic devices, such as servers, terminals and other electronic devices. As shown in Figure 2, it includes the following steps:
  • Step 201 Obtain training input data and corresponding first label data.
  • the training input data includes training image data and/or training point cloud data.
  • the first label data includes ordered points corresponding to the first number of instances in the training input data. Set, the ordered point set includes the target number of coordinate points in the first coordinate system.
  • the training image data and training point cloud data can be obtained based on vehicle-mounted surround-view cameras and radar collection. For example, by driving a collection vehicle equipped with a surround-view camera and radar on the road, the road environment images and road point cloud data around the vehicle are collected as training image data and training point cloud data respectively.
  • the first label data may be obtained by annotating ordered point sets for instances in the training image data and/or training point cloud data.
  • the first label data includes coordinate points in a first coordinate system.
  • the first coordinate system can be a coordinate system corresponding to a bird's-eye view.
  • the training image data is data in an image coordinate system.
  • the training point cloud data can be data in a radar coordinate system.
  • the labeling result can be converted to the first coordinate system based on the camera parameters and radar parameters to obtain the corresponding first label data.
  • Instances can be the representation of various elements on the road in images or point clouds.
  • lane lines, zebra crossings, arrows, curbs, drivable areas and other elements in the image can be instances, that is, each element is represented in the image or point cloud.
  • Each instance can correspond to an ordered point set.
  • the ordered point set can be fitted to the elements corresponding to the instance, such as lane line instances.
  • the ordered point set includes 3 coordinates. point, a segment of lane line can be fitted through these three coordinate points.
  • first quantity and target quantity The amount can be set according to actual needs.
  • step 201 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the first acquisition module run by the processor.
  • Step 202 based on the training input data, the first label data, the point-to-point loss function and the direction loss function, train the pre-established target detection network to obtain the target detection model.
  • the point-to-point loss function is used to determine the training instance point output by the target detection network.
  • the point-to-point loss of the set relative to the ordered point set of the instance in the first label data, the direction loss function is used to determine the direction between the points in the training instance point set relative to the point of the ordered point set of the instance in the first label data and direction loss between points.
  • the target detection network can be set according to actual needs.
  • the target detection network can be a detection network based on a deformable detection transformer (Deformable DEtection TRansformer, referred to as: Deformable DETR) or other implementable detection networks.
  • Deformable DETR Deformable DEtection TRansformer
  • the point-to-point loss function and the direction loss function can use any implementable loss function.
  • the point-to-point loss function can use the L1 loss function.
  • the L1 loss function refers to the L1 norm loss function, also known as the least absolute deviation (LAD) or minimum Absolute error (LAE), which is to minimize the sum of the absolute differences between the target value (in this embodiment, it refers to the label value of the annotation) and the estimated value (in this embodiment, it refers to the output value of the target detection network) ization
  • the direction loss function can adopt the cosine similarity loss function of the direction vectors of two adjacent points.
  • the network parameters of the target detection model are adjusted based on the point-to-point loss function and the direction loss function, so that the target detection model obtained through training can effectively detect the ordered point sets corresponding to each instance.
  • the point-to-point loss determined by the point-to-point loss function is used to supervise the point-level prediction results of the target detection network, so that the target detection network can accurately predict the points of the instance, and the direction loss determined by the direction loss function is used to supervise the order of points.
  • This enables the target detection network to predict a more accurate ordered point set. Since the target detection model obtained by training in the embodiments of the present disclosure predicts the ordered coordinate points of the instance, compared to the prediction of the instance frame, the embodiments of the present disclosure are helpful. Improve the accuracy of prediction results.
  • step 202 may be performed by the processor calling corresponding instructions stored in the memory, or may be performed by the first processing module run by the processor.
  • the training method of the target detection model provided by this embodiment is based on the point-to-point loss determined by the point-to-point loss function and the direction loss determined based on the direction loss function, and supervises the points and the order of the points in the instance point set output by the target detection network, so that the training The obtained target detection model can accurately and effectively predict the ordered point sets corresponding to each instance, realizing the prediction at the coordinate point level of the instance.
  • the target detection model obtained by training in the embodiment of the present disclosure It helps to improve the accuracy of prediction results, which in turn helps improve map accuracy when used for map generation.
  • Figure 3 is a schematic flowchart of step 202 provided by an exemplary embodiment of the present disclosure.
  • step 202 may specifically include the following steps:
  • Step 2021 Obtain the training instance point set based on the training input data and the target detection network.
  • the target detection network can be set according to actual needs to any one of three situations: only training image data can be input, only training point cloud data can be input, or both training image data and training point cloud data can be input.
  • FIG. 4 is a schematic structural diagram of a target detection network provided by an exemplary embodiment of the present disclosure.
  • the target detection network may include a feature extraction network, an encoder network, a decoder network and a prediction head network.
  • the feature extraction network may include a first feature extraction network and/or a second feature extraction network according to actual needs.
  • the first feature extraction network is used for feature extraction of training image data to obtain training image features
  • the second feature extraction network is used for Extract training point cloud data to obtain training point cloud features
  • the encoder network is used to encode training image features and/or training point cloud features to obtain the training feature map in the first coordinate system
  • the decoder network is used to train The feature map is decoded to obtain the training decoding results
  • the prediction head network is used to predict the training instance point set based on the training decoding results.
  • the prediction head network can be a linear neural network such as MLP (Multilayer Perceptron, or feedforward neural network), which can be set according to actual needs.
  • MLP Multilayer Perceptron, or feedforward neural network
  • different ordered point sets can be used for different instances.
  • an open-loop ordered point set can be used, that is, the starting point and the end point of the ordered point set are not the same point.
  • the ordered point set can be a polygon point set, forming a closed-loop ordered point set. After fitting, it becomes a closed-loop polygon.
  • the details can be set according to actual needs.
  • the target number of coordinate points of the ordered point sets corresponding to different instances can be the same or different.
  • the lane line instance sets an ordered point set of 3 coordinate points
  • the zebra crossing sets an ordered point set of 5 coordinate points. There is no specific limit. .
  • step 2021 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the first processing unit run by the processor.
  • Step 2022 Determine the first loss based on the training instance point set, the first label data and the point-to-point loss function.
  • the point set of each instance in the training instance point set can be compared point-to-point with the ordered point set of the instance in the first label data to determine the first loss.
  • the point-to-point point set can be compared The absolute value of the difference is used as the loss of the point, thereby obtaining the loss of each point in each instance, and then based on the loss of each point in each instance, the point-to-point loss of the entire network can be determined as the first loss.
  • the losses at each point of each instance can be summed to obtain the first loss.
  • the details can be set according to actual needs.
  • step 2022 may be performed by the processor calling corresponding instructions stored in the memory, or may be performed by a second processing unit run by the processor.
  • Step 2023 Determine the second loss based on the training instance point set, the first label data and the direction loss function.
  • the direction loss function is used to determine the loss of the direction between the points in the training instance point set relative to the direction between the points of the ordered point set of the instance in the first label data.
  • the loss can be based on the difference between the two points. Determined by the cosine similarity of the direction vectors.
  • the first direction vector of the two adjacent points can be determined based on the coordinate values of the two adjacent points, based on the first label data and the The coordinate value labels of the two points corresponding to the two adjacent points are used to determine the second direction vector of the two adjacent points.
  • the cosine similarity of the two direction vectors is determined.
  • the direction loss of the entire network is determined as the second loss, which can be set according to actual needs.
  • Step 2022 and step 2023 are in no particular order.
  • step 2023 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by a third processing unit run by the processor.
  • Step 2024 Based on the first loss and the second loss, adjust the network parameters of the target detection network until the first loss and the second loss meet the preset conditions, and obtain the target detection model.
  • the first loss and the second loss can be weighted and summed by preset weights as a comprehensive loss for adjusting network parameters.
  • Preset conditions can be set according to actual needs.
  • the adjustment of network parameters can be implemented using any implementable optimizer, such as the Adam training optimizer, which can be set according to actual needs.
  • the Adam training optimizer absorbs the advantages of the adaptive learning rate gradient descent algorithm (Adagrad) and the momentum gradient descent algorithm. It can not only adapt to sparse gradients (that is, natural language and computer vision problems), but also alleviate the problem of gradient oscillation.
  • step 2024 may be performed by the processor calling corresponding instructions stored in the memory, or may be performed by a fourth processing unit run by the processor.
  • Figure 5 is a schematic flowchart of step 202 provided by another exemplary embodiment of the present disclosure.
  • step 2022 determines the first loss based on the training instance point set, the first label data and the point-to-point loss function, including:
  • Step 20221 For each instance, based on the ordered point set corresponding to the instance in the first label data, determine the points in the ordered point set and the training instance point set of the instance in different orders of the ordered point set. The corresponding relationship between points is to obtain the point-to-point relationship corresponding to each sequence.
  • the different orders of ordered point sets refer to the order in which different endpoints of the ordered point set are used as the starting points.
  • the ordered point set of line segments such as lane lines
  • it includes three ordered coordinate points A1, A2, and A3. , which has two endpoints A1 and A3.
  • the different orders of this ordered point set include two orders, one is A1-A2-A3, and the other is A3-A2-A1.
  • the coordinates of any two coordinate points in different orders The adjacent relationship remains unchanged.
  • the ordered coordinate points of B1-B5 are included, where B5 can be equal to B1 to represent the polygon, or it can be represented by other symbols.
  • the point set corresponds to a polygon. When fitting, it needs to be connected end to end to form a closed loop. Specifically, It can be set according to actual needs, as long as it can be distinguished from the ordered point set of line segments. It also provides a basis for determining point-to-point relationships in different sequences. For the ordered point set B1-B5, take B5 not equal to B1 as an example. Since each coordinate point can be a vertex of a polygon and can be used as a starting point, the ordered point set has 5 starting points.
  • the ordered point set can correspond to 10 sequences, including B1-B5 and the reverse sequence B5-B1, B2-B3-B4-B5-B1 and its reverse sequence, B3-B4-B5-B1-B2 and Its reverse order, B4-B5-B1-B2-B3 and its reverse order, B5-B1-B2-B3-B4 and its reverse order.
  • the ordered point set of an instance in the training instance point set is C1-C5
  • the valid point set labels of the instance in the first label data are D1-D5.
  • D1-D5 are arranged in the 10 different orders of B1-B5 mentioned above. , corresponding to C1-C5 respectively, that is, forming a point-to-point relationship corresponding to different orders.
  • C1-C5 correspond to D5-D1 in order, and the specific principles will not be repeated one by one.
  • step 20221 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the first determination subunit executed by the processor.
  • Step 20222 Based on the point-to-point relationships corresponding to each sequence, determine the point-to-point losses corresponding to each sequence.
  • the point-to-point loss can be obtained based on the point-to-point loss function.
  • step 20222 may be performed by the processor calling corresponding instructions stored in the memory, or may be performed by the second determination subunit run by the processor.
  • Step 20223 use the order with the smallest point-to-point loss as the target order of this instance.
  • the minimum point-to-point loss and the order of the minimum point-to-point loss can be determined, and this order is used as the target order of the ordered point set of the instance for subsequent point-to-point points in the entire network. Determination of loss.
  • step 20223 may be executed by the processor calling a corresponding instruction stored in the memory, or may be executed by a third determination subunit run by the processor.
  • Step 20224 Use the point-to-point loss corresponding to the target sequence as the target point-to-point loss of this instance.
  • the training instance point set includes an ordered point set corresponding to the first number of instances, when the first number is multiple, the corresponding target sequence and the corresponding target point-to-point loss can be determined for each instance.
  • step 20224 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the fourth determination subunit executed by the processor.
  • Step 20225 Determine the first loss based on the target point-to-point loss of each instance.
  • the target point-to-point losses of each instance can be combined to determine the first loss of the entire network.
  • step 20225 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the fifth determination subunit executed by the processor.
  • the embodiments of the present disclosure can determine the minimum point-to-point loss for determining the overall point-to-point loss of the network through various possible orders of the ordered point set during the training process, so that the target detection network can simulate the optimal starting point of the instance and The order of corresponding instances helps to further improve model performance and accuracy of prediction results.
  • step 2023 determines the second loss based on the training instance point set, the first label data and the direction loss function, including:
  • Step 20231 Determine the second loss based on the training instance point set, the first label data, the target order and direction loss function corresponding to each instance.
  • the second loss is a direction loss, it involves the directionality of two adjacent points. Therefore, when the point-to-point loss adopts the point-to-point relationship of the target sequence, the direction loss can also be based on the two adjacent coordinate points determined by the point-to-point relationship of the target sequence.
  • the direction vector is determined to ensure the consistency between the predicted point and the direction of the label.
  • step 20231 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by a third processing unit run by the processor.
  • Figure 6 is a schematic flowchart of step 2021 provided by an exemplary embodiment of the present disclosure.
  • the training input data may also include initial query features and initial reference points.
  • the initial query features include initial features of the target number corresponding to the first number of instances.
  • the initial reference points may include initial features.
  • the reference coordinate points corresponding to each feature; the target detection network is a detection network based on the deformable detection transformer; accordingly, in step 2021, based on the training input data and the target detection network, a training instance point set is obtained, including:
  • Step 20211 Extract features from the training image data based on the first feature extraction network in the target detection network to obtain the first training image features.
  • Deformable Detection Transformer is a detection network obtained by improving DETR. It uses a multi-scale variable attention module instead of the attention module in DETR to process features, which helps to solve DETR Problems such as high computational complexity and too slow convergence.
  • DETR is an end-to-end target detector that fully integrates a convolutional neural network (CNN) and a transformer (Transformer). It can achieve target detection based on the powerful modeling capabilities of the Transformer.
  • the initial query features (queries) may be randomly initialized query features, and the initial query features include initial features corresponding to the target number of the first number of instances, that is, each instance may correspond to the initial features of the target number.
  • the target number of initial features corresponding to different instances can be the same or different.
  • the initial query features are used in the decoder attention operation in the object detection network.
  • the initial reference point can be a set of initial reference coordinate points corresponding to each randomly initialized instance.
  • the first feature extraction network can use any implementable feature extraction network, such as using a convolutional neural network as the feature extraction network. The specifics can be based on actual needs. set up.
  • step 20211 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the first feature extraction subunit run by the processor.
  • Step 20212 Perform feature extraction on the training point cloud data based on the second feature extraction network in the target detection network to obtain the first training point cloud features.
  • the second feature extraction network can use any implementable feature extraction network, such as using a convolutional neural network as the feature extraction network, which can be set according to actual needs.
  • step 20211 and step 20212 are not in any order.
  • step 20212 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the second feature extraction subunit run by the processor.
  • Step 20213 Encode the first training image feature and/or the first training point cloud feature based on the encoder network in the target detection network to obtain the target training feature map in the first coordinate system.
  • the encoder network may include at least one encoder, and the encoder network may convert the first training image features and the first training point cloud features into the first coordinate system through coding to obtain the corresponding target training feature map.
  • step 20213 may be performed by the processor calling corresponding instructions stored in the memory, or may be performed by a coding subunit executed by the processor.
  • Step 20214 Obtain the training decoding result based on the target training feature map, the initial query feature, the initial reference point, and the decoder network in the target detection network.
  • the decoder network includes at least one decoder.
  • the training decoding result may be a decoding result obtained by decoding by at least one decoder.
  • the decoder network can continuously update the initial query features based on the initial reference points and target training feature maps to obtain training decoding results.
  • step 20214 may be performed by the processor calling corresponding instructions stored in the memory, or may be performed by a decoding subunit run by the processor.
  • Step 20215 Based on the training decoding results, determine the training instance point set.
  • the training instance point set can be obtained by continuously updating the initial reference point with the decoding result of each decoder. That is, after each decoder obtains the decoding result, the offset corresponding to each reference point can be predicted based on the decoding result, so as to Take the first decoder as an example.
  • the decoding result predicts the offset corresponding to each initial reference point.
  • Each offset is added to the corresponding initial reference point to obtain the updated reference point as the corresponding first decoder. output reference point.
  • the offset corresponding to each output reference point of the first decoder is predicted based on its decoding result, and added to each output reference point corresponding to the first decoder to obtain the second decoding
  • the output reference point corresponding to the decoder, and so on, the output reference point corresponding to the last decoder can be used as the training instance point set.
  • FIG. 7 is a schematic structural diagram of a decoder network provided by an exemplary embodiment of the present disclosure.
  • the decoder network can include N decoders
  • the initial query features can include 3 initial features corresponding to the two instances of instance 1 and instance 2, that is, each instance corresponds to 3 initial features, where, The three gray blocks corresponding to instance 1 respectively represent the three initial features of instance 1, and the three black blocks corresponding to instance 2 represent the three initial features of instance 2 respectively.
  • the initial reference points may include reference coordinate points corresponding to the two instances. Each initial feature corresponds to a reference coordinate point. Taking Example 1 as an example, 3 initial features correspond to 3 reference coordinate points (see Figure 7 The 3 gray dots corresponding to Example 1).
  • the training decoding results are obtained through decoding by N decoders, and then the training instance point set is obtained based on the training decoding results.
  • FIG. 8 is a schematic diagram of the principle of Deformable DETR provided by an exemplary embodiment of the present disclosure.
  • Query Feature represents the initial query feature
  • Reference Point represents the initial reference point
  • Input Feature Map represents the target training feature map.
  • the variable attention module of the decoder can only Focus on a part of the range near the reference point, regardless of the resolution of the entire feature map.
  • the offset is represented by three arrows respectively.
  • the feature offset refers to the position offset of the key points collected in the value vector relative to the initial reference point.
  • the Input Feature Map obtains the value vector Values through the linear layer. Each note The force head obtains the corresponding value vector, and the feature offset can be used to extract sparse values (that is, the key points in the above Values) from near the initial reference point in the value vector Values.
  • Attention Weights are aggregated with sparse values, such as through the three weights of Head1 in Attention Weights (A mqk ) , 0.5 and 0.3. , 0.2 perform a weighted sum of the values of the three key points (three stacked gray blocks) extracted from the value vector of Head1 to obtain the attention result corresponding to the attention head Head1.
  • each attention head can be obtained Corresponding attention results (Aggregated Sampled Values) respectively, Aggregated Sampled Values obtain the decoding result (Output) through the linear layer, or the Output can also be added to the Query Feature through the residual connection, and the addition result is used as the decoding result.
  • Aggregated Sampled Values obtain the decoding result (Output) through the linear layer, or the Output can also be added to the Query Feature through the residual connection, and the addition result is used as the decoding result.
  • Set according to actual needs.
  • the specific principles of Deformable DETR will not be repeated one by one.
  • step 20215 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the first processing subunit run by the processor.
  • the embodiments of the present disclosure realize coordinate point level prediction of instances through a target detection model based on Deformable DETR. Compared with prediction using segmentation combined with post-processing or autoregressive prediction, it helps to improve prediction accuracy, and due to the target detection based on Deformable DETR
  • the model, using deformable convolution, can collect only the main feature points near the reference point when performing attention operations, which helps to reduce the amount of calculation and thus helps to improve the prediction speed.
  • step 20214 obtains training decoding results based on the target training feature map, initial query features, initial reference points, and the decoder network in the target detection network, including: for each decoding in the decoder network
  • the decoding result of the decoder is obtained based on the target training feature map and the input query features and input reference points corresponding to the decoder.
  • the input query features and input reference points corresponding to the first decoder are the initial query features respectively. and the initial reference point.
  • the input query feature corresponding to any other decoder except the first decoder is the decoding result of the previous decoder of the other decoder.
  • the input reference point of the other decoder is based on the previous decoder.
  • the output reference point determined by the decoding result of the decoder; the decoding result of the last decoder is used as the training decoding result.
  • the decoding result of each decoder in the decoder network after obtaining the decoding result of each decoder in the decoder network based on the target training feature map and the input query feature and input reference point corresponding to the decoder, it also includes: Based on the decoding result of the decoder and the offset prediction network corresponding to the decoder, the first offset corresponding to the decoder is determined; based on the first offset and the input reference point corresponding to the decoder, the first offset is determined The output reference point corresponding to the decoder; accordingly, the training instance point set is determined based on the training decoding result, including: using the output reference point corresponding to the last decoder determined based on the training decoding result as the training instance point set.
  • FIG. 9 is a schematic diagram of the determination principle of the training instance point set provided by another exemplary embodiment of the present disclosure.
  • each decoder can correspond to an offset prediction network, which is used to predict the first offset of the reference point based on the decoding result of the decoder, and add it to the input reference point corresponding to the decoder.
  • the input reference point of decoder 1 is the initial reference point
  • the output reference point of decoder i-1 Continuously fine-tune the reference points through training to obtain an accurate set of instance points.
  • Figure 10 is a schematic flowchart of step 202 provided by yet another exemplary embodiment of the present disclosure.
  • the first label data also includes the type label corresponding to each instance in the training input data; in step 20214, the target training feature map, the initial query feature, the initial reference point, and the decoding in the target detection network are After obtaining the training decoding results of the machine network, it also includes:
  • Step 20216 Based on the training decoding results, determine the training type results.
  • the training type results include the prediction types corresponding to each instance.
  • the type label of the instance can be the real type of each instance obtained by pre-annotation, such as lane lines, curbs, zebra crossings, arrows, drivable areas, etc.
  • the prediction type corresponding to an instance refers to the element type to which the instance is predicted by the target detection network.
  • the element type can include lane lines, curbs, zebra crossings, arrows, drivable areas, etc. For example, predict that an instance belongs to a lane line.
  • FIG. 11 is a schematic diagram of a prediction network provided by an exemplary embodiment of the present disclosure.
  • decoder N decodes to obtain the training decoding result, and predicts the training type result through the type prediction network.
  • the type prediction network can be a prediction network based on a feedforward neural network, which can be set according to actual needs.
  • the type prediction network, each offset prediction network, and each reference point update network can be collectively referred to as a prediction head network.
  • step 20216 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the fifth processing unit run by the processor.
  • Step 20217 Determine the type loss based on the training type result and the type label in the first label data.
  • the type loss can be determined based on the preset type loss function, and the type loss function can use any implementable loss function.
  • the type loss function can use the focal loss function, which can be set according to actual needs.
  • step 20217 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the sixth processing unit run by the processor.
  • step 2024 adjusts the network parameters of the target detection network based on the first loss and the second loss until the first loss and the second loss meet the preset conditions, and obtains the target detection model, including:
  • Step 20241 Determine the comprehensive loss based on the first loss, the second loss, the type loss and the preset weight.
  • the preset weights can be set according to actual needs.
  • the weights of the first loss l1, the second loss l2, and the type loss l3 can be set to ⁇ 1, ⁇ 2, and ⁇ 3 respectively.
  • ⁇ 1, ⁇ 2, and ⁇ 3 can be set to 5, 0.1, and 2 respectively, and there are no specific limitations.
  • step 20241 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by a third processing subunit run by the processor.
  • Step 20242 Based on the comprehensive loss, adjust the network parameters of the target detection network until the comprehensive loss meets the preset conditions and obtain the target detection model.
  • step 20242 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the fourth processing subunit run by the processor.
  • the embodiments of the present disclosure help to further improve the performance of the target detection model and the accuracy of the prediction results by further combining type loss, point-to-point loss and direction loss to comprehensively adjust network parameters.
  • Figure 12 is a schematic flowchart of step 2024 provided by an exemplary embodiment of the present disclosure.
  • step 2024 adjusts the network parameters of the target detection network based on the first loss and the second loss. , until the first loss and the second loss meet the preset conditions, and the target detection model is obtained, including:
  • Step 20241a Determine the comprehensive loss based on the first loss and the second loss.
  • the first loss and the second loss can be weighted and summed according to a certain proportional weight to obtain the comprehensive loss.
  • the specific principle please refer to the above content and will not be repeated here.
  • step 20241a may be executed by the processor calling corresponding instructions stored in the memory, or Can be executed by a fourth processing unit 5024 executed by the processor.
  • Step 20242a Based on the comprehensive loss, adjust the network parameters of the target detection network until the comprehensive loss meets the preset conditions, and obtain the target detection model.
  • step 20242a may be performed by the processor calling corresponding instructions stored in the memory, or may be performed by the fourth processing unit 5024 run by the processor.
  • the training method of the target detection model in the embodiment of the present disclosure uses hierarchical prediction methods of examples and corresponding ordered point sets, and combines point-to-point loss, direction loss, and type loss to conduct model training, so that the obtained target detection model can be more accurate. Predicting the ordered point set of the instance helps to further improve the prediction accuracy, and combined with the deformable DETR network, the attention operation of the target detection model during the inference process can only focus on the feature interaction of neighboring points around the reference point, which helps Reduce the computational complexity, thereby helping to reduce the amount of calculation and improve prediction efficiency.
  • the target detection model query vector in the embodiment of the present disclosure is at the point level, which is more flexible than the instance box level.
  • FIG. 13 is a schematic flowchart of a map generation method provided by an exemplary embodiment of the present disclosure. This embodiment can be applied to electronic devices, specifically such as vehicle-mounted computing platforms. As shown in Figure 13, it includes the following steps:
  • Step 301 Obtain first image data and/or first point cloud data of at least one viewing angle.
  • the first image data may be the image data of the current frame collected in real time by at least one camera installed on the vehicle while the vehicle is driving, and the first point cloud data may be collected in real time by a radar installed on the vehicle while the vehicle is driving.
  • the point cloud data of the current frame may be the image data of the current frame collected in real time by at least one camera installed on the vehicle while the vehicle is driving.
  • step 301 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the second acquisition module run by the processor.
  • Step 302 Based on the first image data and/or the first point cloud data, use the target detection model obtained by pre-training to obtain an ordered point set of target instances.
  • the target detection model is obtained through the training method of the target detection model provided in any of the above embodiments or optional examples.
  • the target instance ordered point set includes ordered point sets corresponding to the first number of instances, and the ordered point set includes The target number of coordinate points in the first coordinate system.
  • the specific input data required by the target detection model can be set and trained according to actual needs, and can support image data or point cloud data, or can support both image data and point cloud data.
  • the specific reasoning principle of the target detection model can be found in the foregoing embodiments and will not be described again here.
  • step 302 may be performed by the processor calling corresponding instructions stored in the memory, or may be performed by a second processing module run by the processor.
  • Step 303 Generate a map based on the ordered point set of the target instance.
  • the target instance ordered point set is the coordinate point set under the first coordinate system (such as the coordinate system corresponding to the bird's-eye view).
  • the corresponding Map elements such as lane lines, zebra crossings, curbs, etc.
  • the fitting results of each instance can be used as a local road map around the current location of the vehicle.
  • the ordered point set of the target instance can also be converted into the global coordinate system through coordinate transformation, so that a global road map can be generated according to the regional growth method, which can be set according to actual needs.
  • the global coordinate system may be, for example, a world coordinate system or a relatively stable coordinate system rigidly connected to the world coordinate system.
  • the global coordinate system may be a preset coordinate system with the starting position of the vehicle as the origin.
  • step 303 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by a third processing module run by the processor.
  • the map generation method in the embodiment of the present disclosure realizes prediction at the coordinate point level of the map instance based on the target detection model. Compared with the prediction at the frame level of the map instance, the method in the embodiment of the present disclosure can help to improve the accuracy of the map.
  • any method provided by the embodiments of the present disclosure can be executed by any appropriate device with data processing capabilities, including but not limited to: terminal devices and servers.
  • any method provided by the embodiments of the present disclosure can be executed by a processor.
  • the processor executes any method mentioned in the embodiments of the present disclosure by calling corresponding instructions stored in the memory. No further details will be given below.
  • the aforementioned program can be stored in a computer-readable storage medium.
  • the program When the program is executed, It includes the steps of the above method embodiment; and the aforementioned storage medium includes: ROM, RAM, magnetic disk or optical disk and other various media that can store program codes.
  • Figure 14 is a schematic structural diagram of a training device for a target detection model provided by an exemplary embodiment of the present disclosure.
  • the device of this embodiment can be used to implement the training method embodiment of the corresponding target detection model of the present disclosure.
  • the device shown in Figure 14 includes: a first acquisition module 501 and a first processing module 502.
  • the first acquisition module 501 is used to acquire training input data and corresponding first label data.
  • the training input data includes training image data and/or training point cloud data.
  • the first label data includes a first number of instances in the training input data.
  • Corresponding ordered point set the ordered point set includes a target number of coordinate points in the first coordinate system; the first processing module 502 is used to based on the training input data, first label data, point-to-point obtained by the first acquisition module 501
  • the loss function and the direction loss function train the pre-established target detection network to obtain the target detection model.
  • the point-to-point loss function is used to determine the training instance point set output by the target detection network relative to the ordered point set of the instance in the first label data.
  • Point-to-point loss the direction loss function is used to determine the loss of the direction between points in the point set of the training instance relative to the direction between points in the ordered point set of the instance in the first label data.
  • FIG. 15 is a schematic structural diagram of the first processing module 502 provided by an exemplary embodiment of the present disclosure.
  • the first processing module 502 includes: a first processing unit 5021, a second processing unit 5022, a third processing unit 5023 and a fourth processing unit 5024.
  • the first processing unit 5021 is used to obtain the training instance point set based on the training input data and the target detection network; the second processing unit 5022 is used to obtain the training instance point set, the first label data and the point-to-point based on the first processing unit 5021
  • the loss function determines the first loss;
  • the third processing unit 5023 is used to determine the second loss based on the training instance point set, the first label data and the direction loss function obtained by the first processing unit 5021;
  • the fourth processing unit 5024 uses Based on the first loss and the second loss, the network parameters of the target detection network are adjusted until the first loss and the second loss meet the preset conditions, and the target detection model is obtained.
  • FIG. 16 is a schematic structural diagram of the second processing unit 5022 provided by an exemplary embodiment of the present disclosure.
  • the second processing unit 5022 includes: a first determination sub-unit 50221, a second determination sub-unit 50222, a third determination sub-unit 50223, a fourth determination sub-unit 50224 and a fifth determination sub-unit 50225.
  • the first determination subunit 50221 is used for each instance, based on the ordered point set corresponding to the instance in the first label data, and in different orders of the ordered point set, determine the relationship between each point in the ordered point set and the training
  • the instance point set corresponds to the point of the instance to obtain the point-to-point relationship corresponding to each sequence
  • the second determination subunit 50222 is used to determine the point-to-point loss corresponding to each sequence based on the point-to-point relationship corresponding to each sequence
  • the third determination Subunit 50223 is used to use the order with the smallest point-to-point loss as the target order of this instance
  • the fourth determination subunit 50224 is used to use the point-to-point loss corresponding to the target order as the target point-to-point loss of this instance
  • the fifth determination subunit 50225 Based on the target point-to-point loss for each instance, a first loss is determined.
  • the third processing unit 5023 is specifically configured to determine the second loss based on the training instance point set, the first label data, and the target order and direction loss function corresponding to each instance.
  • Figure 17 is a schematic structural diagram of the first processing unit 5021 provided by an exemplary embodiment of the present disclosure.
  • the training input data also includes initial query features and initial reference points.
  • the initial query features include the target number of initial features corresponding to the first number of instances, and the initial reference points include reference coordinates corresponding to each initial feature. point;
  • the target detection network is a detection network based on a deformable detection transformer;
  • the first processing unit 5021 includes: a first feature extraction subunit 50211, a second feature extraction subunit 50212, an encoding subunit 50213, a decoding subunit 50214 and a third A processing subunit 50215.
  • the first feature extraction subunit 50211 is used to pair the training graph based on the first feature extraction network in the target detection network. Perform feature extraction on the image data to obtain the first training image features; the second feature extraction subunit 50212 is used to perform feature extraction on the training point cloud data based on the second feature extraction network in the target detection network to obtain the first training point cloud features. ; Encoding subunit 50213, used to encode the first training image feature and/or the first training point cloud feature based on the encoder network in the target detection network to obtain the target training feature map in the first coordinate system; decoding subunit 50214, used to obtain training decoding results based on the target training feature map, initial query features, initial reference points, and the decoder network in the target detection network.
  • the decoder network includes at least one decoder; the first processing subunit 50215, with Based on the training decoding results, the training instance point set is determined.
  • the decoding subunit 50214 is specifically configured to: for each decoder in the decoder network, obtain the decoder's decoder based on the target training feature map and the input query features and input reference points corresponding to the decoder.
  • Decoding result where the input query feature and input reference point corresponding to the first decoder are the initial query feature and the initial reference point respectively, and the input query feature corresponding to any other decoder except the first decoder is The decoding result of the previous decoder of other decoders, the input reference point of the other decoder is the output reference point determined based on the decoding result of the previous decoder; the decoding result of the last decoder is used as the training decoding result.
  • the first processing unit 5021 also includes: an offset prediction sub-unit 50216 and a second processing sub-unit 50217.
  • the offset prediction subunit 50216 is used to determine the first offset corresponding to the decoder based on the decoding result of the decoder and the offset prediction network corresponding to the decoder; the second processing subunit 50217 is used to Based on the first offset and the input reference point corresponding to the decoder, the output reference point corresponding to the decoder is determined; accordingly, the first processing subunit 50215 is specifically used to: convert the last decoding determined based on the training decoding result The output reference points corresponding to the device are used as the training instance point set.
  • Figure 18 is a schematic structural diagram of the first processing module 502 provided by another exemplary embodiment of the present disclosure.
  • the first label data also includes type labels corresponding to each instance in the training input data; the first processing module 502 also includes:
  • the fifth processing unit 5025 is used to determine the training type result based on the training decoding result.
  • the training type result includes the prediction type corresponding to each instance;
  • the sixth processing unit 5026 is used to determine the training type result based on the training type result and the type in the first label data.
  • tag to determine the type loss;
  • the fourth processing unit 5024 includes: a third processing subunit 50241, used to determine the comprehensive loss based on the first loss, the second loss, the type loss and the preset weight; the fourth processing subunit 50242, used to adjust the network parameters of the target detection network based on the comprehensive loss until the comprehensive loss meets the preset conditions and obtain the target detection model.
  • the fourth processing unit 5024 is specifically configured to: determine the comprehensive loss based on the first loss and the second loss; and adjust the network parameters of the target detection network based on the comprehensive loss until the comprehensive loss meets the preset conditions. , obtain the target detection model.
  • Figure 19 is a schematic structural diagram of a map generation device provided by an exemplary embodiment of the present disclosure.
  • the device of this embodiment can be used to implement the corresponding map generation method embodiment of the present disclosure.
  • the device shown in Figure 19 includes: a second acquisition module 601, a second processing module 602, and a third processing module 603.
  • the second acquisition module 601 is used to acquire the first image data and/or the first point cloud data of at least one perspective; the second processing module 602 is used to acquire the first image data and/or the first point cloud data based on the second acquisition module 601.
  • a target detection model obtained through pre-training is used to obtain an ordered point set of target instances.
  • the target detection model is obtained through the training method of the target detection model in any of the above embodiments or optional examples.
  • An ordered point set of target instances is obtained. It includes an ordered point set corresponding to the first number of instances, and the ordered point set includes a target number of coordinate points in the first coordinate system; the third processing module 603 is configured to based on the target instance obtained by the second processing module 602: Sequence point set, generate map.
  • Figure 20 is a schematic structural diagram of an application embodiment of the electronic device of the present disclosure.
  • the electronic device 10 includes one or more processors 11 and memories 12 .
  • the processor 11 may be a central processing unit (CPU) or other form of processing unit with data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.
  • CPU central processing unit
  • the processor 11 may control other components in the electronic device 10 to perform desired functions.
  • Memory 12 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • Volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache), etc.
  • Non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, etc.
  • One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 11 may execute the program instructions to implement the methods of various embodiments of the present disclosure described above and/or other desired functions.
  • the electronic device 10 may further include an input device 13 and an output device 14, and these components are interconnected through a bus system and/or other forms of connection mechanisms (not shown).
  • the input device 13 may also include, for example, a keyboard, a mouse, and the like.
  • the output device 14 can output various information to the outside, including determined distance information, direction information, etc.
  • the output device 14 may include, for example, a display, a speaker, a printer, a communication network and remote output devices connected thereto, and the like.
  • the electronic device 10 may also include any other appropriate components depending on the specific application.
  • embodiments of the present disclosure may also be a computer program product, which includes computer program instructions. When executed by a processor, the computer program instructions cause the processor to perform the above-mentioned “Example Methods” section of this specification. Described steps in methods according to various embodiments of the present disclosure.
  • the computer program product may have program code for performing operations of embodiments of the present disclosure written in any combination of one or more programming languages, including object-oriented programming languages such as Java, C++, etc., and Includes conventional procedural programming languages, such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on.
  • embodiments of the present disclosure may also be a computer-readable storage medium having computer program instructions stored thereon.
  • the computer program instructions when executed by a processor, cause the processor to execute the above-mentioned “example method” part of this specification. The steps in methods according to various embodiments of the present disclosure are described in .
  • Computer-readable storage media can take the form of any combination of one or more computer-readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may include, for example, but is not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any combination thereof. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Navigation (AREA)
  • Traffic Control Systems (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed in embodiments of the present disclosure are a target detection model training method and apparatus, a map generation method and apparatus, and a device. The target detection model training method comprises: acquiring training input data and corresponding first label data, wherein the training input data comprises training image data and/or training point cloud data, the first label data comprises ordered point sets respectively corresponding to a first number of instances in the training input data, and each ordered point set comprises a target number of coordinate points under a first coordinate system; and on the basis of the training input data, the first label data, a point-to-point loss function and a direction loss function, training a pre-established target detection network to obtain a target detection model. The target detection model obtained in the embodiments of the present disclosure can accurately and effectively predict ordered point sets respectively corresponding to instances, realizing the coordinate point-level prediction of the instances, and compared with instance box-level prediction, the embodiments of the present disclosure are beneficial to improving the precision of a prediction result.

Description

目标检测模型的训练方法、地图的生成方法、装置和设备Target detection model training method, map generation method, device and equipment
本公开要求在2022年08月16日提交国家知识产权局、申请号为CN202210977934.5、发明名称为“目标检测模型的训练方法、地图的生成方法、装置和设备”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure requires the priority of a Chinese patent application submitted to the State Intellectual Property Office on August 16, 2022, with the application number CN202210977934.5 and the invention title "Target detection model training method, map generation method, device and equipment" , the entire contents of which are incorporated into this disclosure by reference.
技术领域Technical field
本公开涉及自动驾驶技术,尤其是一种目标检测模型的训练方法、地图的生成方法、装置和设备。The present disclosure relates to autonomous driving technology, and in particular, to a training method of a target detection model, a map generation method, device and equipment.
背景技术Background technique
在自动驾驶场景,通常需要利用车载环视相机和/或雷达进行道路元素(比如车道线、斑马线、路沿、可行驶区域等元素)的感知,用于在线地图的生成。In autonomous driving scenarios, it is usually necessary to use vehicle-mounted surround-view cameras and/or radars to perceive road elements (such as lane lines, zebra crossings, curbs, drivable areas, etc.) for online map generation.
发明内容Contents of the invention
本公开的实施例提供了一种目标检测模型的训练方法、地图的生成方法、装置和设备。Embodiments of the present disclosure provide a training method for a target detection model, a map generation method, an apparatus and a device.
根据本公开实施例的一个方面,提供了一种目标检测模型的训练方法,包括:获取训练输入数据及对应的第一标签数据,所述训练输入数据包括训练图像数据和/或训练点云数据,所述第一标签数据包括所述训练输入数据中第一数量的实例分别对应的有序点集,所述有序点集包括目标数量的第一坐标系下的坐标点;基于所述训练输入数据、所述第一标签数据、点对点损失函数及方向损失函数,对预先建立的目标检测网络进行训练,获得目标检测模型,所述点对点损失函数用于确定所述目标检测网络输出的训练实例点集相对于所述第一标签数据中实例的有序点集的点对点损失,所述方向损失函数用于确定所述训练实例点集中点与点之间的方向相对于所述第一标签数据中实例的有序点集的点与点之间的方向的损失。According to an aspect of an embodiment of the present disclosure, a training method for a target detection model is provided, including: obtaining training input data and corresponding first label data, where the training input data includes training image data and/or training point cloud data. , the first label data includes an ordered point set corresponding to a first number of instances in the training input data, and the ordered point set includes a target number of coordinate points in the first coordinate system; based on the training Input data, the first label data, a point-to-point loss function and a direction loss function to train the pre-established target detection network to obtain a target detection model. The point-to-point loss function is used to determine the training instance output by the target detection network. The point-to-point loss of the point set relative to the ordered point set of the instance in the first label data, the direction loss function is used to determine the direction between points in the training instance point set relative to the first label data The loss of the direction between points in the ordered point set of the instance.
根据本公开实施例的另一个方面,提供了一种地图的生成方法,包括:获取至少一个视角的第一图像数据和/或第一点云数据;基于所述第一图像数据和/或所述第一点云数据,采用预先训练获得的目标检测模型,获得目标实例有序点集,所述目标检测模型通过如上任一实施例所述的目标检测模型的训练方法获得,所述目标实例有序点集包括第一数量的实例分别对应的有序点集,所述有序点集包括目标数量的第一坐标系下的坐标点;基于所述目标实例有序点集,生成地图。According to another aspect of an embodiment of the present disclosure, a method for generating a map is provided, including: acquiring first image data and/or first point cloud data of at least one perspective; based on the first image data and/or the The first point cloud data uses a target detection model obtained by pre-training to obtain an ordered point set of target instances. The target detection model is obtained by the training method of the target detection model as described in any of the above embodiments. The target instance The ordered point set includes ordered point sets corresponding to the first number of instances, and the ordered point set includes a target number of coordinate points in the first coordinate system; based on the target instance ordered point set, a map is generated.
根据本公开实施例的再一个方面,提供了一种目标检测模型的训练装置,包括:第一获取模块,用于获取训练输入数据及对应的第一标签数据,所述训练输入数据包括训练图像数据和/或训练点云数据,所述第一标签数据包括所述训练输入数据中第一数量的实例分别对应的有序点集,所述有序点集包括目标数量的第一坐标系下的坐标点;第一处理模块,用于基于所述训练输入数据、所述第一标签数据、点对点损失函数及方向损失函数,对预先建立的目标检测网络进行训练,获得目标检测模型,所述点对点损失函数用于确定所述目标检测网络输出的训练实例点集相对于所述第一标签数据中实例的有序点集的点对点损失,所述方向损失函数用于确定所述训练实例点集中点与点之间的方向相对于所述第一标签数据中实例的有序点集的点与点之间的方向的损失。According to another aspect of the embodiment of the present disclosure, a training device for a target detection model is provided, including: a first acquisition module for acquiring training input data and corresponding first label data, where the training input data includes training images data and/or training point cloud data. The first label data includes an ordered point set corresponding to a first number of instances in the training input data. The ordered point set includes a target number of points in the first coordinate system. coordinate points; a first processing module configured to train a pre-established target detection network based on the training input data, the first label data, a point-to-point loss function and a direction loss function to obtain a target detection model, the The point-to-point loss function is used to determine the point-to-point loss of the training instance point set output by the target detection network relative to the ordered point set of instances in the first label data, and the direction loss function is used to determine the training instance point set. The loss of the direction between points relative to the direction between points of the ordered point set of instances in the first label data.
根据本公开实施例的又一方面,提供一种地图的生成装置,包括:第二获取模块,用于获取至少一个视角的第一图像数据和/或第一点云数据;第二处理模块,用于基于所述第一图像数据和/或所述第一点云数据,采用预先训练获得的目标检测模型,获得目标实例有序点集,所述目标检测模型通过如上任一实施例所述的目标检测模型的训练方法获得,所 述目标实例有序点集包括第一数量的实例分别对应的有序点集,所述有序点集包括目标数量的第一坐标系下的坐标点。According to yet another aspect of an embodiment of the present disclosure, a map generation device is provided, including: a second acquisition module for acquiring first image data and/or first point cloud data of at least one perspective; a second processing module, Used to obtain an ordered point set of target instances based on the first image data and/or the first point cloud data using a target detection model obtained through pre-training. The target detection model is as described in any of the above embodiments. The training method of the target detection model is obtained, so The ordered point set of the target instance includes an ordered point set corresponding to a first number of instances, and the ordered point set includes a target number of coordinate points in the first coordinate system.
根据本公开实施例的再一方面,提供一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于执行本公开上述任一实施例所述的目标检测模型的训练方法;或者,所述计算机程序用于执行本公开上述任一实施例所述的地图的生成方法。According to another aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, the storage medium stores a computer program, the computer program is used to perform training of the target detection model described in any of the above embodiments of the present disclosure. Method; or, the computer program is used to execute the map generation method described in any of the above embodiments of the present disclosure.
根据本公开实施例的又一方面,提供一种电子设备,所述电子设备包括:处理器;用于存储所述处理器可执行指令的存储器;所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现本公开上述任一实施例所述的目标检测模型的训练方法。According to yet another aspect of an embodiment of the present disclosure, an electronic device is provided. The electronic device includes: a processor; a memory for storing instructions executable by the processor; and the processor is configured to retrieve instructions from the memory. The executable instructions are read and executed to implement the training method of the target detection model described in any of the above embodiments of the present disclosure.
根据本公开实施例的再一方面,提供一种电子设备,所述电子设备包括:处理器;用于存储所述处理器可执行指令的存储器;所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现本公开上述任一实施例所述的地图的生成方法。According to yet another aspect of an embodiment of the present disclosure, an electronic device is provided. The electronic device includes: a processor; a memory for storing instructions executable by the processor; and the processor is configured to retrieve instructions from the memory. The executable instructions are read and executed to implement the map generation method described in any of the above embodiments of the present disclosure.
根据本公开实施例的又一方面,提供一种计算机程序产品,当所述计算机程序产品中的指令处理器执行时,执行本公开上述任一实施例所述的地图的生成方法或者所述的地图的生成方法。According to another aspect of the embodiments of the present disclosure, a computer program product is provided. When the instruction processor in the computer program product is executed, the map generation method or the map generation method described in any of the above embodiments of the present disclosure is executed. How to generate a map.
基于本公开上述实施例提供的目标检测模型的训练方法、地图的生成方法、装置和设备,通过实例对应的有序点集作为标签,结合点对点损失和方向损失对预先建立的目标检测网络进行训练,获得的目标检测模型能够对图像数据和/或点云数据进行实例的有序点集的预测,即实现地图元素坐标点级别的预测,相对于地图元素实例框级别的预测,本公开实施例能够有助于提高模型的预测精度。Based on the target detection model training method, map generation method, device and equipment provided by the above embodiments of the present disclosure, the pre-established target detection network is trained by using the ordered point set corresponding to the instance as a label and combining point-to-point loss and direction loss. , the obtained target detection model can predict the ordered point set of instances for image data and/or point cloud data, that is, to achieve prediction at the map element coordinate point level, relative to the prediction at the map element instance frame level, embodiments of the present disclosure It can help improve the prediction accuracy of the model.
附图说明Description of drawings
图1是本公开提供的目标检测模型的训练方法的一个示例性的应用场景;Figure 1 is an exemplary application scenario of the training method of the target detection model provided by the present disclosure;
图2是本公开一示例性实施例提供的目标检测模型的训练方法的流程示意图;Figure 2 is a schematic flowchart of a training method for a target detection model provided by an exemplary embodiment of the present disclosure;
图3是本公开一示例性实施例提供的步骤202的流程示意图;Figure 3 is a schematic flowchart of step 202 provided by an exemplary embodiment of the present disclosure;
图4是本公开一示例性实施例提供的目标检测网络的结构示意图;Figure 4 is a schematic structural diagram of a target detection network provided by an exemplary embodiment of the present disclosure;
图5是本公开另一示例性实施例提供的步骤202的流程示意图;Figure 5 is a schematic flowchart of step 202 provided by another exemplary embodiment of the present disclosure;
图6是本公开一示例性实施例提供的步骤2021的流程示意图;Figure 6 is a schematic flowchart of step 2021 provided by an exemplary embodiment of the present disclosure;
图7是本公开一示例性实施例提供的解码器网络的结构示意图;Figure 7 is a schematic structural diagram of a decoder network provided by an exemplary embodiment of the present disclosure;
图8是本公开一示例性实施例提供的Deformable DETR的原理示意图;Figure 8 is a schematic diagram of the principle of Deformable DETR provided by an exemplary embodiment of the present disclosure;
图9是本公开另一示例性实施例提供的训练实例点集的确定原理示意图;Figure 9 is a schematic diagram of the determination principle of a training instance point set provided by another exemplary embodiment of the present disclosure;
图10是本公开再一示例性实施例提供的步骤202的流程示意图;Figure 10 is a schematic flowchart of step 202 provided by yet another exemplary embodiment of the present disclosure;
图11是本公开一示例性实施例提供的预测类型的预测网络示意图;Figure 11 is a schematic diagram of a prediction network of a prediction type provided by an exemplary embodiment of the present disclosure;
图12是本公开一示例性实施例提供的步骤2024的流程示意图;Figure 12 is a schematic flowchart of step 2024 provided by an exemplary embodiment of the present disclosure;
图13是本公开一示例性实施例提供的地图的生成方法的流程示意图;Figure 13 is a schematic flowchart of a map generation method provided by an exemplary embodiment of the present disclosure;
图14是本公开一示例性实施例提供的目标检测模型的训练装置的结构示意图;Figure 14 is a schematic structural diagram of a training device for a target detection model provided by an exemplary embodiment of the present disclosure;
图15是本公开一示例性实施例提供的第一处理模块502的结构示意图;Figure 15 is a schematic structural diagram of the first processing module 502 provided by an exemplary embodiment of the present disclosure;
图16是本公开一示例性实施例提供的第二处理单元5022的结构示意图;Figure 16 is a schematic structural diagram of the second processing unit 5022 provided by an exemplary embodiment of the present disclosure;
图17是本公开一示例性实施例提供的第一处理单元5021的结构示意图;Figure 17 is a schematic structural diagram of the first processing unit 5021 provided by an exemplary embodiment of the present disclosure;
图18是本公开另一示例性实施例提供的第一处理模块502的结构示意图;Figure 18 is a schematic structural diagram of the first processing module 502 provided by another exemplary embodiment of the present disclosure;
图19是本公开一示例性实施例提供的地图的生成装置的结构示意图;Figure 19 is a schematic structural diagram of a map generation device provided by an exemplary embodiment of the present disclosure;
图20是本公开电子设备一个应用实施例的结构示意图。Figure 20 is a schematic structural diagram of an application embodiment of the electronic device of the present disclosure.
具体实施方式Detailed ways
为了解释本公开,下面将参考附图详细地描述本公开的示例实施例,显然,所描述的实施例仅是本公开的一部分实施例,而不是全部实施例,应理解,本公开不受示例性实施例的限制。 In order to explain the present disclosure, example embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, rather than all embodiments. It should be understood that the present disclosure is not intended to be exemplified. Limitations of sexual embodiment.
应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。It should be noted that the relative arrangement of components and steps, numerical expressions, and numerical values set forth in these examples do not limit the scope of the disclosure unless otherwise specifically stated.
本公开概述Overview of the Disclosure
在实现本公开的过程中,发明人发现,在自动驾驶场景,通常需要利用车载环视相机和/或雷达进行道路元素(比如车道线、斑马线、路沿、可行驶区域等元素)的感知,用于在线地图的生成。若采用相关技术中检测模型对各类道路元素对应的地图实例进行预测来获得地图实例框位置,并基于地图实例框位置生成地图会使得生成的地图精度较低。In the process of realizing the present disclosure, the inventor found that in autonomous driving scenarios, it is usually necessary to use vehicle-mounted surround-view cameras and/or radars to perceive road elements (such as lane lines, zebra crossings, curbs, drivable areas, etc.), using for the generation of online maps. If the detection model in related technologies is used to predict the map instances corresponding to various road elements to obtain the map instance frame position, and the map is generated based on the map instance frame position, the accuracy of the generated map will be lower.
示例性概述Illustrative overview
图1是本公开提供的目标检测模型的训练方法的一个示例性的应用场景。Figure 1 is an exemplary application scenario of the training method of the target detection model provided by the present disclosure.
在自动驾驶场景,利用本公开的目标检测模型的训练方法,可以基于预先采集获得的图像数据作为训练图像数据,将预先采集获得的点云数据作为训练点云数据,形成训练输入数据,将训练图像数据对应的标签及训练点云数据对应的标签作为第一标签数据,用于目标检测模型的训练,目标检测模型的网络输出可以包括第一数量的实例分别对应的有序点集,即每个实例可以对应一个有序点集,每个有序点集可以包括目标数量的第一坐标系下的坐标点,第一坐标系可以是鸟瞰视角对应的坐标系,实例可以是道路上各种元素在图像或点云中的表示,比如图像中的车道线、斑马线、箭头、路沿、可行驶区域等元素均可以为实例,也即每种元素在图像或点云中可以对应有一个或多个实例,每个实例可以预测出其对应的有序点集,该有序点集可以拟合出该实例对应的元素,比如车道线实例,有序点集包括3个坐标点,通过该3个坐标点可以拟合出一段车道线。训练过程中,基于点对点损失函数和方向损失函数对目标检测模型的网络参数进行调整,使得训练获得的目标检测模型可以有效检测出各实例分别对应的有序点集。由于训练获得目标检测模型预测的是实例的坐标点,相对于上述的对地图实例进行预测获得地图实例框位置的方式,本公开的实例坐标点级别的预测有助于提高预测结果的精度。进而可以将训练获得的目标检测模型部署到自动驾驶车辆的车载计算平台上的地图生成装置,用于自动驾驶车辆的在线建图,有助于提高生成的地图的精度。In an autonomous driving scenario, using the target detection model training method of the present disclosure, the pre-collected image data can be used as training image data, and the pre-collected point cloud data can be used as training point cloud data to form training input data, and the training input data can be formed The labels corresponding to the image data and the labels corresponding to the training point cloud data are used as the first label data for training the target detection model. The network output of the target detection model may include ordered point sets corresponding to the first number of instances, that is, each Each instance can correspond to an ordered point set. Each ordered point set can include a target number of coordinate points in a first coordinate system. The first coordinate system can be a coordinate system corresponding to a bird's-eye view. The instances can be various types of points on the road. The representation of elements in an image or point cloud, such as lane lines, zebra crossings, arrows, curbs, drivable areas and other elements in the image can be instances, that is, each element can have one or more corresponding elements in the image or point cloud. Multiple instances. Each instance can predict its corresponding ordered point set. The ordered point set can fit the elements corresponding to the instance, such as the lane line instance. The ordered point set includes 3 coordinate points. Through this 3 coordinate points can fit a lane line. During the training process, the network parameters of the target detection model are adjusted based on the point-to-point loss function and the direction loss function, so that the target detection model obtained through training can effectively detect the ordered point sets corresponding to each instance. Since the target detection model obtained through training predicts the coordinate points of instances, compared to the above-mentioned method of predicting map instances to obtain map instance frame positions, the prediction at the instance coordinate point level of the present disclosure helps to improve the accuracy of prediction results. The target detection model obtained by training can then be deployed to the map generation device on the on-board computing platform of the autonomous vehicle for online mapping of the autonomous vehicle, which helps to improve the accuracy of the generated map.
在实际应用中,本公开的目标检测模型的训练方法不限于自动驾驶场景,还可以根据实际需求应用于任意其他可实施的场景,具体可以根据实际需求设置。In practical applications, the training method of the target detection model of the present disclosure is not limited to autonomous driving scenarios, but can also be applied to any other implementable scenarios according to actual needs, and can be set according to actual needs.
示例性方法Example methods
图2是本公开一示例性实施例提供的目标检测模型的训练方法的流程示意图。本实施例可应用在电子设备上,具体比如服务器、终端等电子设备上,如图2所示,包括如下步骤:Figure 2 is a schematic flowchart of a training method for a target detection model provided by an exemplary embodiment of the present disclosure. This embodiment can be applied to electronic devices, such as servers, terminals and other electronic devices. As shown in Figure 2, it includes the following steps:
步骤201,获取训练输入数据及对应的第一标签数据,训练输入数据包括训练图像数据和/或训练点云数据,第一标签数据包括训练输入数据中第一数量的实例分别对应的有序点集,有序点集包括目标数量的第一坐标系下的坐标点。Step 201: Obtain training input data and corresponding first label data. The training input data includes training image data and/or training point cloud data. The first label data includes ordered points corresponding to the first number of instances in the training input data. Set, the ordered point set includes the target number of coordinate points in the first coordinate system.
其中,训练图像数据和训练点云数据可以是基于车载环视相机和雷达采集获得。比如通过设置有环视相机和雷达的采集车辆在道路上行驶,采集车辆周围的道路环境图像和道路点云数据分别作为训练图像数据和训练点云数据。第一标签数据可以是对训练图像数据和/或训练点云数据中的实例进行有序点集标注获得。第一标签数据包括第一坐标系下的坐标点,第一坐标系可以为鸟瞰视角对应的坐标系,训练图像数据是图像坐标系下的数据,训练点云数据可以是雷达坐标系下的数据,标注结果可以基于相机参数和雷达参数转换到第一坐标系下,获得对应的第一标签数据,具体转换原理不再赘述。实例可以是道路上各种元素在图像或点云中的表示,比如图像中的车道线、斑马线、箭头、路沿、可行驶区域等元素均可以为实例,也即每种元素在图像或点云中可以对应有一个或多个实例,每个实例可以对应一个有序点集,该有序点集可以拟合出该实例对应的元素,比如车道线实例,有序点集包括3个坐标点,通过该3个坐标点可以拟合出一段车道线。第一数量和目标数 量均可以根据实际需求设置。Among them, the training image data and training point cloud data can be obtained based on vehicle-mounted surround-view cameras and radar collection. For example, by driving a collection vehicle equipped with a surround-view camera and radar on the road, the road environment images and road point cloud data around the vehicle are collected as training image data and training point cloud data respectively. The first label data may be obtained by annotating ordered point sets for instances in the training image data and/or training point cloud data. The first label data includes coordinate points in a first coordinate system. The first coordinate system can be a coordinate system corresponding to a bird's-eye view. The training image data is data in an image coordinate system. The training point cloud data can be data in a radar coordinate system. , the labeling result can be converted to the first coordinate system based on the camera parameters and radar parameters to obtain the corresponding first label data. The specific conversion principle will not be described again. Instances can be the representation of various elements on the road in images or point clouds. For example, lane lines, zebra crossings, arrows, curbs, drivable areas and other elements in the image can be instances, that is, each element is represented in the image or point cloud. There can be one or more instances in the cloud. Each instance can correspond to an ordered point set. The ordered point set can be fitted to the elements corresponding to the instance, such as lane line instances. The ordered point set includes 3 coordinates. point, a segment of lane line can be fitted through these three coordinate points. first quantity and target quantity The amount can be set according to actual needs.
在一个可选示例中,该步骤201可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一获取模块执行。In an optional example, step 201 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the first acquisition module run by the processor.
步骤202,基于训练输入数据、第一标签数据、点对点损失函数及方向损失函数,对预先建立的目标检测网络进行训练,获得目标检测模型,点对点损失函数用于确定目标检测网络输出的训练实例点集相对于第一标签数据中实例的有序点集的点对点损失,方向损失函数用于确定训练实例点集中点与点之间的方向相对于第一标签数据中实例的有序点集的点与点之间的方向损失。Step 202, based on the training input data, the first label data, the point-to-point loss function and the direction loss function, train the pre-established target detection network to obtain the target detection model. The point-to-point loss function is used to determine the training instance point output by the target detection network. The point-to-point loss of the set relative to the ordered point set of the instance in the first label data, the direction loss function is used to determine the direction between the points in the training instance point set relative to the point of the ordered point set of the instance in the first label data and direction loss between points.
其中,目标检测网络可以根据实际需求设置,比如目标检测网络可以为基于可变形检测变换器(Deformable DEtection TRansformer,简称:Deformable DETR)的检测网络及其他可实施的检测网络。点对点损失函数和方向损失函数可以采用任意可实施的损失函数,比如点对点损失函数可以采用L1损失函数,L1损失函数是指L1范数损失函数,也被称为最小绝对值偏差(LAD)或最小绝对值误差(LAE),它是把目标值(本公开实施例中是指标注的标签值)与估计值(本公开实施例中是指目标检测网络的输出值)的绝对差值的总和最小化,方向损失函数可以采用相邻两点的方向向量的余弦相似度损失函数。在训练过程中,基于点对点损失函数和方向损失函数对目标检测模型的网络参数进行调整,使得训练获得的目标检测模型可以有效检测出各实例分别对应的有序点集。其中,点对点损失函数确定出的点对点损失用于监督目标检测网络的点级别的预测结果,使得目标检测网络能够准确地预测出实例的点,方向损失函数确定的方向损失用于监督点的顺序,使得目标检测网络能够预测出更加准确的有序点集,由于本公开实施例训练获得的目标检测模型预测的是实例的有序坐标点,相对于实例框的预测,本公开实施例有助于提高预测结果的精度。Among them, the target detection network can be set according to actual needs. For example, the target detection network can be a detection network based on a deformable detection transformer (Deformable DEtection TRansformer, referred to as: Deformable DETR) or other implementable detection networks. The point-to-point loss function and the direction loss function can use any implementable loss function. For example, the point-to-point loss function can use the L1 loss function. The L1 loss function refers to the L1 norm loss function, also known as the least absolute deviation (LAD) or minimum Absolute error (LAE), which is to minimize the sum of the absolute differences between the target value (in this embodiment, it refers to the label value of the annotation) and the estimated value (in this embodiment, it refers to the output value of the target detection network) ization, the direction loss function can adopt the cosine similarity loss function of the direction vectors of two adjacent points. During the training process, the network parameters of the target detection model are adjusted based on the point-to-point loss function and the direction loss function, so that the target detection model obtained through training can effectively detect the ordered point sets corresponding to each instance. Among them, the point-to-point loss determined by the point-to-point loss function is used to supervise the point-level prediction results of the target detection network, so that the target detection network can accurately predict the points of the instance, and the direction loss determined by the direction loss function is used to supervise the order of points. This enables the target detection network to predict a more accurate ordered point set. Since the target detection model obtained by training in the embodiments of the present disclosure predicts the ordered coordinate points of the instance, compared to the prediction of the instance frame, the embodiments of the present disclosure are helpful. Improve the accuracy of prediction results.
在一个可选示例中,该步骤202可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一处理模块执行。In an optional example, step 202 may be performed by the processor calling corresponding instructions stored in the memory, or may be performed by the first processing module run by the processor.
本实施例提供的目标检测模型的训练方法,基于点对点损失函数确定的点对点损失和基于方向损失函数确定的方向损失,对目标检测网络输出的实例点集的点和点的顺序进行监督,使得训练获得的目标检测模型可以准确有效地预测出各实例分别对应的有序点集,实现了实例的坐标点级别的预测,相对于地图实例框级别的预测,本公开实施例训练获得的目标检测模型有助于提高预测结果的精度,进而在用于地图生成时,有助于提高地图精度。The training method of the target detection model provided by this embodiment is based on the point-to-point loss determined by the point-to-point loss function and the direction loss determined based on the direction loss function, and supervises the points and the order of the points in the instance point set output by the target detection network, so that the training The obtained target detection model can accurately and effectively predict the ordered point sets corresponding to each instance, realizing the prediction at the coordinate point level of the instance. Compared with the prediction at the map instance box level, the target detection model obtained by training in the embodiment of the present disclosure It helps to improve the accuracy of prediction results, which in turn helps improve map accuracy when used for map generation.
在一个可选示例中,图3是本公开一示例性实施例提供的步骤202的流程示意图,在本示例中,步骤202具体可以包括以下步骤:In an optional example, Figure 3 is a schematic flowchart of step 202 provided by an exemplary embodiment of the present disclosure. In this example, step 202 may specifically include the following steps:
步骤2021,基于训练输入数据及目标检测网络,获得训练实例点集。Step 2021: Obtain the training instance point set based on the training input data and the target detection network.
其中,目标检测网络可以根据实际需求设置为只可以输入训练图像数据、只可以输入训练点云数据、既可以输入训练图像数据又可以输入训练点云数据三种情况中的任一种。Among them, the target detection network can be set according to actual needs to any one of three situations: only training image data can be input, only training point cloud data can be input, or both training image data and training point cloud data can be input.
示例性的,图4是本公开一示例性实施例提供的目标检测网络的结构示意图。如图4所示,目标检测网络可以包括特征提取网络、编码器网络、解码器网络及预测头网络。其中,特征提取网络可以根据实际需求包括第一特征提取网络和/或第二特征提取网络,第一特征提取网络用于训练图像数据的特征提取,获得训练图像特征,第二特征提取网络用于训练点云数据的提取,获得训练点云特征;编码器网络用于对训练图像特征和/或训练点云特征进行编码,获得第一坐标系下的训练特征图;解码器网络用于对训练特征图进行解码,获得训练解码结果;预测头网络用于基于训练解码结果预测训练实例点集。比如预测头网络可以为MLP(Multilayer Perceptron,多层感知机,或称前馈神经网络)等线性神经网络,具体可以根据实际需求设置。可以将训练输入数据作为目标检测网络的输入,经过目标检测网络的推理,可以获得输出的训练实例点集。Exemplarily, FIG. 4 is a schematic structural diagram of a target detection network provided by an exemplary embodiment of the present disclosure. As shown in Figure 4, the target detection network may include a feature extraction network, an encoder network, a decoder network and a prediction head network. Wherein, the feature extraction network may include a first feature extraction network and/or a second feature extraction network according to actual needs. The first feature extraction network is used for feature extraction of training image data to obtain training image features, and the second feature extraction network is used for Extract training point cloud data to obtain training point cloud features; the encoder network is used to encode training image features and/or training point cloud features to obtain the training feature map in the first coordinate system; the decoder network is used to train The feature map is decoded to obtain the training decoding results; the prediction head network is used to predict the training instance point set based on the training decoding results. For example, the prediction head network can be a linear neural network such as MLP (Multilayer Perceptron, or feedforward neural network), which can be set according to actual needs. The training input data can be used as the input of the target detection network, and through the reasoning of the target detection network, the output training instance point set can be obtained.
在一个可选示例中,对于不同的实例,可以采用不同的有序点集,比如对于车道线实例,可以采用开环的有序点集,即有序点集起始点与结束点不是同一点,拟合后获得的是 线段,再比如对于可行驶区域实例,有序点集可以为多边形点集,形成闭环的有序点集,拟合后是闭环的多边形,具体可以根据实际需求设置。不同实例对应的有序点集的坐标点的目标数量可以相同也可以不同,比如车道线实例设置3个坐标点的有序点集,斑马线设置5个坐标点的有序点集,具体不作限定。In an optional example, different ordered point sets can be used for different instances. For example, for lane line instances, an open-loop ordered point set can be used, that is, the starting point and the end point of the ordered point set are not the same point. , what is obtained after fitting is Line segments, for example, for drivable area examples, the ordered point set can be a polygon point set, forming a closed-loop ordered point set. After fitting, it becomes a closed-loop polygon. The details can be set according to actual needs. The target number of coordinate points of the ordered point sets corresponding to different instances can be the same or different. For example, the lane line instance sets an ordered point set of 3 coordinate points, and the zebra crossing sets an ordered point set of 5 coordinate points. There is no specific limit. .
在一个可选示例中,该步骤2021可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一处理单元执行。In an optional example, step 2021 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the first processing unit run by the processor.
步骤2022,基于训练实例点集、第一标签数据及点对点损失函数,确定第一损失。Step 2022: Determine the first loss based on the training instance point set, the first label data and the point-to-point loss function.
其中,在获得训练实例点集后,可以将训练实例点集中每个实例的点集与第一标签数据中该实例的有序点集进行点对点对比,确定第一损失,比如,可以将点对点的差值的绝对值作为该点的损失,从而获得各实例中各点的损失,进而可以基于各实例中各点的损失确定网络整体的点对点损失作为第一损失。再比如,可以将各实例的各点的损失加和,以用于获得第一损失。具体可以根据实际需求设置。Among them, after obtaining the training instance point set, the point set of each instance in the training instance point set can be compared point-to-point with the ordered point set of the instance in the first label data to determine the first loss. For example, the point-to-point point set can be compared The absolute value of the difference is used as the loss of the point, thereby obtaining the loss of each point in each instance, and then based on the loss of each point in each instance, the point-to-point loss of the entire network can be determined as the first loss. For another example, the losses at each point of each instance can be summed to obtain the first loss. The details can be set according to actual needs.
在一个可选示例中,该步骤2022可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二处理单元执行。In an optional example, step 2022 may be performed by the processor calling corresponding instructions stored in the memory, or may be performed by a second processing unit run by the processor.
步骤2023,基于训练实例点集、第一标签数据及方向损失函数,确定第二损失。Step 2023: Determine the second loss based on the training instance point set, the first label data and the direction loss function.
其中,方向损失函数用于确定训练实例点集中点与点之间的方向相对于第一标签数据中实例的有序点集的点与点之间方向的损失,该损失可以基于两点之间的方向向量的余弦相似度来确定。Among them, the direction loss function is used to determine the loss of the direction between the points in the training instance point set relative to the direction between the points of the ordered point set of the instance in the first label data. The loss can be based on the difference between the two points. Determined by the cosine similarity of the direction vectors.
示例性的,对于训练实例点集中的任一个实例中的相邻两点,可以基于该相邻两点的坐标值确定该相邻两点的第一方向向量,基于第一标签数据中与该相邻两点对应的两点的坐标值标签,确定该相邻两点的第二方向向量,基于第一方向向量和第二方向向量,确定两方向向量的余弦相似度。基于各实例中两两相邻点的余弦相似度确定网络整体的方向损失作为第二损失,具体可以根据实际需求设置。For example, for two adjacent points in any instance in the training instance point set, the first direction vector of the two adjacent points can be determined based on the coordinate values of the two adjacent points, based on the first label data and the The coordinate value labels of the two points corresponding to the two adjacent points are used to determine the second direction vector of the two adjacent points. Based on the first direction vector and the second direction vector, the cosine similarity of the two direction vectors is determined. Based on the cosine similarity of two adjacent points in each instance, the direction loss of the entire network is determined as the second loss, which can be set according to actual needs.
步骤2022和步骤2023不分先后顺序。Step 2022 and step 2023 are in no particular order.
在一个可选示例中,该步骤2023可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第三处理单元执行。In an optional example, step 2023 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by a third processing unit run by the processor.
步骤2024,基于第一损失和第二损失,对目标检测网络的网络参数进行调整,直至第一损失和第二损失满足预设条件,获得目标检测模型。Step 2024: Based on the first loss and the second loss, adjust the network parameters of the target detection network until the first loss and the second loss meet the preset conditions, and obtain the target detection model.
其中,第一损失和第二损失可以通过预设权重进行加权求和,作为综合损失,用于网络参数的调整。预设条件可以根据实际需求设置。网络参数的调整可以采用任意可实施的优化器实现,比如Adam训练优化器,具体可以根据实际需求设置。Adam训练优化器吸收了自适应学习率的梯度下降算法(Adagrad)和动量梯度下降算法的优点,既能适应稀疏梯度(即自然语言和计算机视觉问题),又能缓解梯度震荡的问题。Among them, the first loss and the second loss can be weighted and summed by preset weights as a comprehensive loss for adjusting network parameters. Preset conditions can be set according to actual needs. The adjustment of network parameters can be implemented using any implementable optimizer, such as the Adam training optimizer, which can be set according to actual needs. The Adam training optimizer absorbs the advantages of the adaptive learning rate gradient descent algorithm (Adagrad) and the momentum gradient descent algorithm. It can not only adapt to sparse gradients (that is, natural language and computer vision problems), but also alleviate the problem of gradient oscillation.
在一个可选示例中,该步骤2024可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第四处理单元执行。In an optional example, step 2024 may be performed by the processor calling corresponding instructions stored in the memory, or may be performed by a fourth processing unit run by the processor.
图5是本公开另一示例性实施例提供的步骤202的流程示意图。Figure 5 is a schematic flowchart of step 202 provided by another exemplary embodiment of the present disclosure.
在一个可选示例中,步骤2022的基于训练实例点集、第一标签数据及点对点损失函数,确定第一损失,包括:In an optional example, step 2022 determines the first loss based on the training instance point set, the first label data and the point-to-point loss function, including:
步骤20221,对于每个实例,基于第一标签数据中该实例对应的有序点集,分别以该有序点集的不同顺序,确定该有序点集中各点与训练实例点集中该实例的点的对应关系,获得各顺序分别对应的点对点关系。Step 20221: For each instance, based on the ordered point set corresponding to the instance in the first label data, determine the points in the ordered point set and the training instance point set of the instance in different orders of the ordered point set. The corresponding relationship between points is to obtain the point-to-point relationship corresponding to each sequence.
其中,有序点集的不同顺序是指将有序点集中的不同端点作为起始点的顺序,比如对于车道线等线段的有序点集,包括A1、A2、A3三个有序的坐标点,其中有两个端点A1和A3,该有序点集的不同顺序包括两种顺序,一种是A1-A2-A3,另一种是A3-A2-A1,不同顺序中任意两坐标点的相邻关系不变。再比如,对于可行驶区域或斑马线等多边形的有序点集,包括B1-B5的有序坐标点,其中,B5可以是与B1相等,以表示多边形,也可以是通过其他符号表示该有序点集对应的是多边形,拟合时需要首尾连接形成闭环,具体可 以根据实际需求设置,只要能够与线段的有序点集区分即可。也为确定不同顺序的点对点关系提供依据。对于B1-B5的有序点集,以B5不等于B1为例,由于每个坐标点都可以是多边形的顶点,都可以作为起始点,因此,该有序点集有5个起始点,再结合方向,该有序点集可以对应有10种顺序,包括B1-B5及反序B5-B1、B2-B3-B4-B5-B1及其反序、B3-B4-B5-B1-B2及其反序、B4-B5-B1-B2-B3及其反序、B5-B1-B2-B3-B4及其反序。通过将训练实例点集或第一标签数据中的有序点集中的坐标点顺序调整,确定出两者中对应实例中不同顺序对应的点对点关系。比如训练实例点集中一实例的有序点集为C1-C5,第一标签数据中该实例的有效点集标签为D1-D5,将D1-D5按照上述B1-B5的10种不同顺序排列后,分别与C1-C5对应,即形成不同顺序分别对应的点对点关系。比如C1-C5按顺序分别对应D5-D1,具体原理不再一一赘述。Among them, the different orders of ordered point sets refer to the order in which different endpoints of the ordered point set are used as the starting points. For example, for the ordered point set of line segments such as lane lines, it includes three ordered coordinate points A1, A2, and A3. , which has two endpoints A1 and A3. The different orders of this ordered point set include two orders, one is A1-A2-A3, and the other is A3-A2-A1. The coordinates of any two coordinate points in different orders The adjacent relationship remains unchanged. For another example, for an ordered point set of polygons such as drivable areas or zebra crossings, the ordered coordinate points of B1-B5 are included, where B5 can be equal to B1 to represent the polygon, or it can be represented by other symbols. The point set corresponds to a polygon. When fitting, it needs to be connected end to end to form a closed loop. Specifically, It can be set according to actual needs, as long as it can be distinguished from the ordered point set of line segments. It also provides a basis for determining point-to-point relationships in different sequences. For the ordered point set B1-B5, take B5 not equal to B1 as an example. Since each coordinate point can be a vertex of a polygon and can be used as a starting point, the ordered point set has 5 starting points. Combined with the direction, the ordered point set can correspond to 10 sequences, including B1-B5 and the reverse sequence B5-B1, B2-B3-B4-B5-B1 and its reverse sequence, B3-B4-B5-B1-B2 and Its reverse order, B4-B5-B1-B2-B3 and its reverse order, B5-B1-B2-B3-B4 and its reverse order. By adjusting the order of coordinate points in the training instance point set or the ordered point set in the first label data, point-to-point relationships corresponding to different orders in the corresponding instances in the two are determined. For example, the ordered point set of an instance in the training instance point set is C1-C5, and the valid point set labels of the instance in the first label data are D1-D5. D1-D5 are arranged in the 10 different orders of B1-B5 mentioned above. , corresponding to C1-C5 respectively, that is, forming a point-to-point relationship corresponding to different orders. For example, C1-C5 correspond to D5-D1 in order, and the specific principles will not be repeated one by one.
在一个可选示例中,该步骤20221可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一确定子单元执行。In an optional example, step 20221 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the first determination subunit executed by the processor.
步骤20222,基于各顺序分别对应的点对点关系,确定各顺序分别对应的点对点损失。Step 20222: Based on the point-to-point relationships corresponding to each sequence, determine the point-to-point losses corresponding to each sequence.
其中,在确定了各顺序分别对应的点对点关系后,点对点损失可以基于点对点损失函数获得。Among them, after determining the point-to-point relationship corresponding to each sequence, the point-to-point loss can be obtained based on the point-to-point loss function.
在一个可选示例中,该步骤20222可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二确定子单元执行。In an optional example, step 20222 may be performed by the processor calling corresponding instructions stored in the memory, or may be performed by the second determination subunit run by the processor.
步骤20223,将点对点损失最小的顺序作为该实例的目标顺序。Step 20223, use the order with the smallest point-to-point loss as the target order of this instance.
其中,在确定了各顺序分别对应的点对点损失后,可以确定出最小的点对点损失以及该最小点对点损失的顺序,将该顺序作为该实例的有序点集的目标顺序,用于后续网络整体点对点损失的确定。Among them, after determining the point-to-point losses corresponding to each order, the minimum point-to-point loss and the order of the minimum point-to-point loss can be determined, and this order is used as the target order of the ordered point set of the instance for subsequent point-to-point points in the entire network. Determination of loss.
在一个可选示例中,该步骤20223可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第三确定子单元执行。In an optional example, step 20223 may be executed by the processor calling a corresponding instruction stored in the memory, or may be executed by a third determination subunit run by the processor.
步骤20224,将目标顺序对应的点对点损失作为该实例的目标点对点损失。Step 20224: Use the point-to-point loss corresponding to the target sequence as the target point-to-point loss of this instance.
由于训练实例点集包括第一数量的实例分别对应的有序点集,当第一数量为多个时,每个实例可以确定出对应的目标顺序及对应的目标点对点损失。Since the training instance point set includes an ordered point set corresponding to the first number of instances, when the first number is multiple, the corresponding target sequence and the corresponding target point-to-point loss can be determined for each instance.
在一个可选示例中,该步骤20224可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第四确定子单元执行。In an optional example, step 20224 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the fourth determination subunit executed by the processor.
步骤20225,基于各实例的目标点对点损失,确定第一损失。Step 20225: Determine the first loss based on the target point-to-point loss of each instance.
具体的,可以综合各实例的目标点对点损失,确定出网络整体的第一损失。Specifically, the target point-to-point losses of each instance can be combined to determine the first loss of the entire network.
在一个可选示例中,该步骤20225可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第五确定子单元执行。In an optional example, step 20225 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the fifth determination subunit executed by the processor.
本公开实施例通过在训练过程中,针对有序点集各种可能的顺序,可以确定出最小的点对点损失用于网络整体点对点损失的确定,使得目标检测网络可以模拟实例最优的起始点以及对应实例的顺序,有助于进一步提高模型性能及预测结果的精准性。The embodiments of the present disclosure can determine the minimum point-to-point loss for determining the overall point-to-point loss of the network through various possible orders of the ordered point set during the training process, so that the target detection network can simulate the optimal starting point of the instance and The order of corresponding instances helps to further improve model performance and accuracy of prediction results.
在一个可选示例中,步骤2023的基于训练实例点集、第一标签数据及方向损失函数,确定第二损失,包括:In an optional example, step 2023 determines the second loss based on the training instance point set, the first label data and the direction loss function, including:
步骤20231,基于训练实例点集、第一标签数据、各实例分别对应的目标顺序及方向损失函数,确定第二损失。Step 20231: Determine the second loss based on the training instance point set, the first label data, the target order and direction loss function corresponding to each instance.
其中,由于第二损失是方向损失,涉及到相邻两点的方向性,因此,当点对点损失采用目标顺序的点对点关系时,方向损失也可以基于目标顺序的点对点关系确定的相邻两坐标点的方向向量确定,以保证预测的点与点之间方向与标签方向的一致性。Among them, since the second loss is a direction loss, it involves the directionality of two adjacent points. Therefore, when the point-to-point loss adopts the point-to-point relationship of the target sequence, the direction loss can also be based on the two adjacent coordinate points determined by the point-to-point relationship of the target sequence. The direction vector is determined to ensure the consistency between the predicted point and the direction of the label.
在一个可选示例中,该步骤20231可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第三处理单元执行。In an optional example, step 20231 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by a third processing unit run by the processor.
图6是本公开一示例性实施例提供的步骤2021的流程示意图。Figure 6 is a schematic flowchart of step 2021 provided by an exemplary embodiment of the present disclosure.
在一个可选示例中,训练输入数据还可以包括初始查询特征及初始参考点,初始查询特征包括第一数量的实例分别对应的目标数量的初始特征,初始参考点可以包括各初始特 征分别对应的参考坐标点;目标检测网络为基于可变形检测变换器的检测网络;相应的,步骤2021的基于训练输入数据及目标检测网络,获得训练实例点集,包括:In an optional example, the training input data may also include initial query features and initial reference points. The initial query features include initial features of the target number corresponding to the first number of instances. The initial reference points may include initial features. The reference coordinate points corresponding to each feature; the target detection network is a detection network based on the deformable detection transformer; accordingly, in step 2021, based on the training input data and the target detection network, a training instance point set is obtained, including:
步骤20211,基于目标检测网络中的第一特征提取网络对训练图像数据进行特征提取,获得第一训练图像特征。Step 20211: Extract features from the training image data based on the first feature extraction network in the target detection network to obtain the first training image features.
其中,可变形检测变换器(Deformable DETR)是对DETR进行改进获得的一种检测网络,其使用的是多尺度可变注意力模块代替DETR中的注意力模块来处理特征,有助于解决DETR计算复杂度高、收敛太慢等问题,DETR是端到端的目标检测器,充分融合了卷积神经网络(CNN)和变换器(Transformer),可以基于Transformer强大的建模能力实现目标检测。初始查询特征(queries)可以是随机初始化的查询特征,初始查询特征包括第一数量的实例分别对应的目标数量的初始特征,也即每个实例可以对应目标数量的初始特征。不同实例对应的初始特征的目标数量可以相同,也可以不同。初始查询特征用于目标检测网络中解码器注意力操作。初始参考点可以是随机初始化的各实例分别对应的初始参考坐标点集,第一特征提取网络可以采用任意可实施的特征提取网络,比如采用卷积神经网络作为特征提取网络,具体可以根据实际需求设置。Among them, Deformable Detection Transformer (Deformable DETR) is a detection network obtained by improving DETR. It uses a multi-scale variable attention module instead of the attention module in DETR to process features, which helps to solve DETR Problems such as high computational complexity and too slow convergence. DETR is an end-to-end target detector that fully integrates a convolutional neural network (CNN) and a transformer (Transformer). It can achieve target detection based on the powerful modeling capabilities of the Transformer. The initial query features (queries) may be randomly initialized query features, and the initial query features include initial features corresponding to the target number of the first number of instances, that is, each instance may correspond to the initial features of the target number. The target number of initial features corresponding to different instances can be the same or different. The initial query features are used in the decoder attention operation in the object detection network. The initial reference point can be a set of initial reference coordinate points corresponding to each randomly initialized instance. The first feature extraction network can use any implementable feature extraction network, such as using a convolutional neural network as the feature extraction network. The specifics can be based on actual needs. set up.
在一个可选示例中,该步骤20211可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一特征提取子单元执行。In an optional example, step 20211 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the first feature extraction subunit run by the processor.
步骤20212,基于目标检测网络中的第二特征提取网络对训练点云数据进行特征提取,获得第一训练点云特征。Step 20212: Perform feature extraction on the training point cloud data based on the second feature extraction network in the target detection network to obtain the first training point cloud features.
其中,第二特征提取网络可以采用任意可实施的特征提取网络,比如采用卷积神经网络作为特征提取网络,具体可以根据实际需求设置。Among them, the second feature extraction network can use any implementable feature extraction network, such as using a convolutional neural network as the feature extraction network, which can be set according to actual needs.
其中,步骤20211与步骤20212不分先后顺序。Among them, step 20211 and step 20212 are not in any order.
在一个可选示例中,该步骤20212可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二特征提取子单元执行。In an optional example, step 20212 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the second feature extraction subunit run by the processor.
步骤20213,基于目标检测网络中的编码器网络对第一训练图像特征和/或第一训练点云特征进行编码,获得第一坐标系下的目标训练特征图。Step 20213: Encode the first training image feature and/or the first training point cloud feature based on the encoder network in the target detection network to obtain the target training feature map in the first coordinate system.
其中,编码器网络可以包括至少一个编码器,编码器网络可以将第一训练图像特征和第一训练点云特征通过编码转换到第一坐标系下,获得对应的目标训练特征图。The encoder network may include at least one encoder, and the encoder network may convert the first training image features and the first training point cloud features into the first coordinate system through coding to obtain the corresponding target training feature map.
在一个可选示例中,该步骤20213可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的编码子单元执行。In an optional example, step 20213 may be performed by the processor calling corresponding instructions stored in the memory, or may be performed by a coding subunit executed by the processor.
步骤20214,基于目标训练特征图、初始查询特征、初始参考点、及目标检测网络中的解码器网络,获得训练解码结果,解码器网络包括至少一个解码器。Step 20214: Obtain the training decoding result based on the target training feature map, the initial query feature, the initial reference point, and the decoder network in the target detection network. The decoder network includes at least one decoder.
其中,训练解码结果可以是经至少一个解码器解码获得的解码结果。解码器网络可以基于初始参考点和目标训练特征图对初始查询特征不断更新,获得训练解码结果。The training decoding result may be a decoding result obtained by decoding by at least one decoder. The decoder network can continuously update the initial query features based on the initial reference points and target training feature maps to obtain training decoding results.
在一个可选示例中,该步骤20214可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的解码子单元执行。In an optional example, step 20214 may be performed by the processor calling corresponding instructions stored in the memory, or may be performed by a decoding subunit run by the processor.
步骤20215,基于训练解码结果,确定训练实例点集。Step 20215: Based on the training decoding results, determine the training instance point set.
其中,训练实例点集可以通过每个解码器的解码结果对初始参考点不断更新获得,即在每个解码器获得解码结果后,可以基于该解码结果预测各参考点对应的偏移量,以第一个解码器为例,其解码结果预测各初始参考点分别对应的偏移量,将各偏移量与对应初始参考点相加,获得更新后的参考点作为该第一个解码器对应的输出参考点。在第二个解码器解码后,基于其解码结果再预测第一个解码器的各输出参考点对应的偏移量,加到第一解码器对应的各输出参考点上,获得第二个解码器对应的输出参考点,以此类推,可以将最后一个解码器对应的输出参考点作为训练实例点集。Among them, the training instance point set can be obtained by continuously updating the initial reference point with the decoding result of each decoder. That is, after each decoder obtains the decoding result, the offset corresponding to each reference point can be predicted based on the decoding result, so as to Take the first decoder as an example. The decoding result predicts the offset corresponding to each initial reference point. Each offset is added to the corresponding initial reference point to obtain the updated reference point as the corresponding first decoder. output reference point. After decoding by the second decoder, the offset corresponding to each output reference point of the first decoder is predicted based on its decoding result, and added to each output reference point corresponding to the first decoder to obtain the second decoding The output reference point corresponding to the decoder, and so on, the output reference point corresponding to the last decoder can be used as the training instance point set.
示例性的,图7是本公开一示例性实施例提供的解码器网络的结构示意图。如图7所示,解码器网络可以包括N个解码器,初始查询特征可以包括实例1和实例2两个实例分别对应的3个初始特征,也即每个实例对应3个初始特征,其中,实例1对应的3个灰色块分别表示实例1的3个初始特征,实例2对应的3个黑色块分别表示实例2的3个初始 特征,初始参考点可以包括该两个实例分别对应的参考坐标点,每个初始特征对应一个参考坐标点,以实例1为例,3个初始特征对应3个参考坐标点(参见图7中的实例1对应的3个灰色圆点)。经N个解码器解码获得训练解码结果,进而基于训练解码结果获得训练实例点集。Exemplarily, FIG. 7 is a schematic structural diagram of a decoder network provided by an exemplary embodiment of the present disclosure. As shown in Figure 7, the decoder network can include N decoders, and the initial query features can include 3 initial features corresponding to the two instances of instance 1 and instance 2, that is, each instance corresponds to 3 initial features, where, The three gray blocks corresponding to instance 1 respectively represent the three initial features of instance 1, and the three black blocks corresponding to instance 2 represent the three initial features of instance 2 respectively. Features, the initial reference points may include reference coordinate points corresponding to the two instances. Each initial feature corresponds to a reference coordinate point. Taking Example 1 as an example, 3 initial features correspond to 3 reference coordinate points (see Figure 7 The 3 gray dots corresponding to Example 1). The training decoding results are obtained through decoding by N decoders, and then the training instance point set is obtained based on the training decoding results.
示例性的,图8是本公开一示例性实施例提供的Deformable DETR的原理示意图。如图8所示,以第一个解码器为例,Query Feature表示初始查询特征,Reference Point表示初始参考点,Input Feature Map表示目标训练特征图,解码器的可变注意力模块每次可以只关注参考点附近的一部分范围,而不考虑整张特征图的分辨率。Head表示注意力头,m(m=1、2、3)表示第m个注意力头,Attention Weights(Amqk)表示第m个注意力头对应的注意力权重,W′mx表示第m个注意力头对应的注意力操作中键向量(KEY)的编码,Linear表示线性层,Aggregate表示注意力权重与值向量Values中的关键点的聚合操作,Softmax表示激活函数,Output表示解码结果。相对于DETR来说,Deformable DETR可以仅收集参考点附近的主要特征点,因此,对于每个查询向量Query来说,可以仅有非常少量的键向量(Key),初始查询特征经线性层预测特征偏移量(Sampling Offsets)△pmqk,本示例中特征偏移量(Sampling Offsets)△pmqk中每个注意力头(比如Head1,即△pmqk中m=1)中可以预测3个特征偏移量,分别用3个箭头表示,特征偏移量是指在值向量中采集的关键点相对于初始参考点的位置偏移量,Input Feature Map经线性层获得值向量Values,每个注意力头获得对应的值向量,特征偏移量可以用于从值向量Values中初始参考点附近提取稀疏的值(即上述Values中的关键点)。Query Feature经线性层和Softmax,获得注意力权重Attention Weights(Amqk),Attention Weights(Amqk)与稀疏的值进行聚合操作,比如通过Attention Weights(Amqk)中Head1的3个权重0.5、0.3、0.2对从Head1的值向量中提取的3个关键点的值(堆叠的3个灰块)进行加权求和,获得该注意力头Head1对应的注意力结果,同理可以获得各注意力头分别对应的注意力结果(Aggregated Sampled Values),Aggregated Sampled Values经线性层获得解码结果(Output),或者该Output还可以经残差连接与Query Feature相加,将相加结果作为解码结果,具体可以根据实际需求设置。Deformable DETR的具体原理不再一一赘述。Exemplarily, FIG. 8 is a schematic diagram of the principle of Deformable DETR provided by an exemplary embodiment of the present disclosure. As shown in Figure 8, taking the first decoder as an example, Query Feature represents the initial query feature, Reference Point represents the initial reference point, and Input Feature Map represents the target training feature map. The variable attention module of the decoder can only Focus on a part of the range near the reference point, regardless of the resolution of the entire feature map. Head represents the attention head, m (m=1, 2, 3) represents the mth attention head, Attention Weights (A mqk ) represents the attention weight corresponding to the mth attention head, W′ m x represents the mth The encoding of the key vector (KEY) in the attention operation corresponding to each attention head, Linear represents the linear layer, Aggregate represents the aggregation operation of the key points in the attention weight and value vector Values, Softmax represents the activation function, and Output represents the decoding result. Compared with DETR, Deformable DETR can only collect the main feature points near the reference point. Therefore, for each query vector Query, there can be only a very small number of key vectors (Key). The initial query features are predicted by the linear layer Offsets (Sampling Offsets) △p mqk , in this example feature offsets (Sampling Offsets) △p mqk can predict 3 features in each attention head (such as Head1, that is, m=1 in △p mqk ) The offset is represented by three arrows respectively. The feature offset refers to the position offset of the key points collected in the value vector relative to the initial reference point. The Input Feature Map obtains the value vector Values through the linear layer. Each note The force head obtains the corresponding value vector, and the feature offset can be used to extract sparse values (that is, the key points in the above Values) from near the initial reference point in the value vector Values. Query Feature obtains attention weights through linear layer and Softmax. Attention Weights (A mqk ) are aggregated with sparse values, such as through the three weights of Head1 in Attention Weights (A mqk ) , 0.5 and 0.3. , 0.2 perform a weighted sum of the values of the three key points (three stacked gray blocks) extracted from the value vector of Head1 to obtain the attention result corresponding to the attention head Head1. In the same way, each attention head can be obtained Corresponding attention results (Aggregated Sampled Values) respectively, Aggregated Sampled Values obtain the decoding result (Output) through the linear layer, or the Output can also be added to the Query Feature through the residual connection, and the addition result is used as the decoding result. Specifically, Set according to actual needs. The specific principles of Deformable DETR will not be repeated one by one.
在一个可选示例中,该步骤20215可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一处理子单元执行。In an optional example, step 20215 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the first processing subunit run by the processor.
本公开实施例通过基于Deformable DETR的目标检测模型实现实例的坐标点级别的预测,相对于采用分割结合后处理的预测或者自回归预测,有助于提高预测精度,且由于基于Deformable DETR的目标检测模型,采用可变形卷积,在进行注意力操作时,可以仅收集参考点附近的主要特征点,有助于减少计算量,从而有助于提高预测速度。The embodiments of the present disclosure realize coordinate point level prediction of instances through a target detection model based on Deformable DETR. Compared with prediction using segmentation combined with post-processing or autoregressive prediction, it helps to improve prediction accuracy, and due to the target detection based on Deformable DETR The model, using deformable convolution, can collect only the main feature points near the reference point when performing attention operations, which helps to reduce the amount of calculation and thus helps to improve the prediction speed.
在一个可选示例中,步骤20214的基于目标训练特征图、初始查询特征、初始参考点、及目标检测网络中的解码器网络,获得训练解码结果,包括:对于解码器网络中的每个解码器,基于目标训练特征图及该解码器对应的输入查询特征和输入参考点,获得该解码器的解码结果,其中,第一个解码器对应的输入查询特征和输入参考点分别为初始查询特征和初始参考点,除第一个解码器之外的任一其他解码器对应的输入查询特征为该其他解码器的前一解码器的解码结果,该其他解码器的输入参考点为基于前一解码器的解码结果确定的输出参考点;将最后一个解码器的解码结果作为训练解码结果。In an optional example, step 20214 obtains training decoding results based on the target training feature map, initial query features, initial reference points, and the decoder network in the target detection network, including: for each decoding in the decoder network The decoding result of the decoder is obtained based on the target training feature map and the input query features and input reference points corresponding to the decoder. Among them, the input query features and input reference points corresponding to the first decoder are the initial query features respectively. and the initial reference point. The input query feature corresponding to any other decoder except the first decoder is the decoding result of the previous decoder of the other decoder. The input reference point of the other decoder is based on the previous decoder. The output reference point determined by the decoding result of the decoder; the decoding result of the last decoder is used as the training decoding result.
在一个可选示例中,在对于解码器网络中的每个解码器,基于目标训练特征图及该解码器对应的输入查询特征和输入参考点,获得该解码器的解码结果之后,还包括:基于该解码器的解码结果、及该解码器对应的偏移量预测网络,确定该解码器对应的第一偏移量;基于第一偏移量及该解码器对应的输入参考点,确定该解码器对应的输出参考点;相应的,基于训练解码结果,确定训练实例点集,包括:将基于训练解码结果确定的最后一个解码器对应的输出参考点作为训练实例点集。In an optional example, after obtaining the decoding result of each decoder in the decoder network based on the target training feature map and the input query feature and input reference point corresponding to the decoder, it also includes: Based on the decoding result of the decoder and the offset prediction network corresponding to the decoder, the first offset corresponding to the decoder is determined; based on the first offset and the input reference point corresponding to the decoder, the first offset is determined The output reference point corresponding to the decoder; accordingly, the training instance point set is determined based on the training decoding result, including: using the output reference point corresponding to the last decoder determined based on the training decoding result as the training instance point set.
示例性的,图9是本公开另一示例性实施例提供的训练实例点集的确定原理示意图。 如图9所示,每个解码器可以对应有一个偏移量预测网络,用于基于该解码器的解码结果预测参考点的第一偏移量,加到该解码器对应的输入参考点,解码器1的输入参考点为初始参考点,解码器i(i=2,3,…,N)的输入参考点为解码器i-1的输出参考点。通过训练不断精调参考点,获得准确的实例点集。Exemplarily, FIG. 9 is a schematic diagram of the determination principle of the training instance point set provided by another exemplary embodiment of the present disclosure. As shown in Figure 9, each decoder can correspond to an offset prediction network, which is used to predict the first offset of the reference point based on the decoding result of the decoder, and add it to the input reference point corresponding to the decoder. The input reference point of decoder 1 is the initial reference point, and the input reference point of decoder i (i=2, 3,...,N) is the output reference point of decoder i-1. Continuously fine-tune the reference points through training to obtain an accurate set of instance points.
图10是本公开再一示例性实施例提供的步骤202的流程示意图。Figure 10 is a schematic flowchart of step 202 provided by yet another exemplary embodiment of the present disclosure.
在一个可选示例中,第一标签数据还包括训练输入数据中各实例分别对应的类型标签;在步骤20214的基于目标训练特征图、初始查询特征、初始参考点、及目标检测网络中的解码器网络,获得训练解码结果之后,还包括:In an optional example, the first label data also includes the type label corresponding to each instance in the training input data; in step 20214, the target training feature map, the initial query feature, the initial reference point, and the decoding in the target detection network are After obtaining the training decoding results of the machine network, it also includes:
步骤20216,基于训练解码结果,确定训练类型结果,训练类型结果包括各实例分别对应的预测类型。Step 20216: Based on the training decoding results, determine the training type results. The training type results include the prediction types corresponding to each instance.
其中,实例的类型标签可以是预先标注获得的各实例的真实类型,比如车道线、路沿、斑马线、箭头、可行驶区域,等等。实例对应的预测类型是指目标检测网络预测的该实例所属的元素类型,元素类型可以包括车道线、路沿、斑马线、箭头、可行驶区域等类型。比如预测某实例属于车道线。Among them, the type label of the instance can be the real type of each instance obtained by pre-annotation, such as lane lines, curbs, zebra crossings, arrows, drivable areas, etc. The prediction type corresponding to an instance refers to the element type to which the instance is predicted by the target detection network. The element type can include lane lines, curbs, zebra crossings, arrows, drivable areas, etc. For example, predict that an instance belongs to a lane line.
示例性的,图11是本公开一示例性实施例提供的预测类型的预测网络示意图。其中,解码器N解码获得训练解码结果,经类型预测网络预测获得训练类型结果。类型预测网络可以是基于前馈神经网络的预测网络,具体可以根据实际需求设置。类型预测网络、各偏移量预测网络、各参考点更新网络可以统称为预测头网络。Exemplarily, FIG. 11 is a schematic diagram of a prediction network provided by an exemplary embodiment of the present disclosure. Among them, decoder N decodes to obtain the training decoding result, and predicts the training type result through the type prediction network. The type prediction network can be a prediction network based on a feedforward neural network, which can be set according to actual needs. The type prediction network, each offset prediction network, and each reference point update network can be collectively referred to as a prediction head network.
在一个可选示例中,该步骤20216可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第五处理单元执行。In an optional example, step 20216 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the fifth processing unit run by the processor.
步骤20217,基于训练类型结果及第一标签数据中的类型标签,确定类型损失。Step 20217: Determine the type loss based on the training type result and the type label in the first label data.
其中,类型损失可以基于预设的类型损失函数确定,类型损失函数可以采用任意可实施的损失函数,比如类型损失函数可以采用focal损失函数,具体可以根据实际需求设置。Among them, the type loss can be determined based on the preset type loss function, and the type loss function can use any implementable loss function. For example, the type loss function can use the focal loss function, which can be set according to actual needs.
在一个可选示例中,该步骤20217可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第六处理单元执行。In an optional example, step 20217 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the sixth processing unit run by the processor.
相应的,步骤2024的基于第一损失和第二损失,对目标检测网络的网络参数进行调整,直至第一损失和第二损失满足预设条件,获得目标检测模型,包括:Correspondingly, step 2024 adjusts the network parameters of the target detection network based on the first loss and the second loss until the first loss and the second loss meet the preset conditions, and obtains the target detection model, including:
步骤20241,基于第一损失、第二损失、类型损失及预设权重,确定综合损失。Step 20241: Determine the comprehensive loss based on the first loss, the second loss, the type loss and the preset weight.
其中,预设权重可以根据实际需求设置,比如第一损失l1、第二损失l2和类型损失l3的权重可以分别设置为λ1、λ2、λ3,则综合损失可以表示为:
L=λ1*l1+λ2*l2+λ3*l3
Among them, the preset weights can be set according to actual needs. For example, the weights of the first loss l1, the second loss l2, and the type loss l3 can be set to λ1, λ2, and λ3 respectively. Then the comprehensive loss can be expressed as:
L=λ1*l1+λ2*l2+λ3*l3
示例性的,λ1、λ2、λ3可以分别设置为5、0.1和2,具体不作限定。For example, λ1, λ2, and λ3 can be set to 5, 0.1, and 2 respectively, and there are no specific limitations.
在一个可选示例中,该步骤20241可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第三处理子单元执行。In an optional example, step 20241 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by a third processing subunit run by the processor.
步骤20242,基于综合损失,对目标检测网络的网络参数进行调整,直至综合损失满足预设条件,获得目标检测模型。Step 20242: Based on the comprehensive loss, adjust the network parameters of the target detection network until the comprehensive loss meets the preset conditions and obtain the target detection model.
在一个可选示例中,该步骤20242可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第四处理子单元执行。In an optional example, step 20242 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the fourth processing subunit run by the processor.
本公开实施例通过进一步结合类型损失、点对点损失和方向损失综合对网络参数进行调整,有助于进一步提高目标检测模型的性能及预测结果的准确性。The embodiments of the present disclosure help to further improve the performance of the target detection model and the accuracy of the prediction results by further combining type loss, point-to-point loss and direction loss to comprehensively adjust network parameters.
在一个可选示例中,图12是本公开一示例性实施例提供的步骤2024的流程示意图,在本示例中步骤2024的基于第一损失和第二损失,对目标检测网络的网络参数进行调整,直至第一损失和第二损失满足预设条件,获得目标检测模型,包括:In an optional example, Figure 12 is a schematic flowchart of step 2024 provided by an exemplary embodiment of the present disclosure. In this example, step 2024 adjusts the network parameters of the target detection network based on the first loss and the second loss. , until the first loss and the second loss meet the preset conditions, and the target detection model is obtained, including:
步骤20241a,基于第一损失和第二损失,确定综合损失。Step 20241a: Determine the comprehensive loss based on the first loss and the second loss.
其中,第一损失和第二损失可以按照一定的比例权重进行加权求和,获得综合损失,具体原理参见前述内容,在此不再赘述。Among them, the first loss and the second loss can be weighted and summed according to a certain proportional weight to obtain the comprehensive loss. For the specific principle, please refer to the above content and will not be repeated here.
在一个可选示例中,该步骤20241a可以由处理器调用存储器存储的相应指令执行,也 可以由被处理器运行的第四处理单元5024执行。In an optional example, step 20241a may be executed by the processor calling corresponding instructions stored in the memory, or Can be executed by a fourth processing unit 5024 executed by the processor.
步骤20242a,基于综合损失,对目标检测网络的网络参数进行调整,直至综合损失满足预设条件,获得目标检测模型。Step 20242a: Based on the comprehensive loss, adjust the network parameters of the target detection network until the comprehensive loss meets the preset conditions, and obtain the target detection model.
具体的网络参数调整原理参见前述内容,在此不再赘述。For the specific network parameter adjustment principle, please refer to the above content and will not be repeated here.
在一个可选示例中,该步骤20242a可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第四处理单元5024执行。In an optional example, step 20242a may be performed by the processor calling corresponding instructions stored in the memory, or may be performed by the fourth processing unit 5024 run by the processor.
本公开实施例的目标检测模型的训练方法,通过实例及对应的有序点集的层级化预测方式,结合点对点损失、方向损失、类型损失进行模型训练,使得获得的目标检测模型能够更加精准地预测出实例的有序点集,有助于进一步提高预测精度,并且结合可变形DETR网络使得目标检测模型在推理过程中的注意力操作可以只关注参考点周围邻近点的特征交互,有助于降低计算复杂度,从而有助于减少计算量,提高预测效率。并且本公开实施例的目标检测模型查询向量是点级别的,相对于实例框级别更加灵活。The training method of the target detection model in the embodiment of the present disclosure uses hierarchical prediction methods of examples and corresponding ordered point sets, and combines point-to-point loss, direction loss, and type loss to conduct model training, so that the obtained target detection model can be more accurate. Predicting the ordered point set of the instance helps to further improve the prediction accuracy, and combined with the deformable DETR network, the attention operation of the target detection model during the inference process can only focus on the feature interaction of neighboring points around the reference point, which helps Reduce the computational complexity, thereby helping to reduce the amount of calculation and improve prediction efficiency. Moreover, the target detection model query vector in the embodiment of the present disclosure is at the point level, which is more flexible than the instance box level.
本公开上述各实施例或可选示例可以单独实施也可以在不冲突的情况下以任意组合方式结合实施,具体可以根据实际需求设置,本公开实施例不作限定。The above-mentioned embodiments or optional examples of the present disclosure can be implemented individually or combined in any combination without conflict. The details can be set according to actual needs, and are not limited by the embodiments of the present disclosure.
图13是本公开一示例性实施例提供的地图的生成方法的流程示意图。本实施例可以应用在电子设备上,具体比如车载计算平台上。如图13所示,包括如下步骤:Figure 13 is a schematic flowchart of a map generation method provided by an exemplary embodiment of the present disclosure. This embodiment can be applied to electronic devices, specifically such as vehicle-mounted computing platforms. As shown in Figure 13, it includes the following steps:
步骤301,获取至少一个视角的第一图像数据和/或第一点云数据。Step 301: Obtain first image data and/or first point cloud data of at least one viewing angle.
其中,第一图像数据可以是设置在车辆上的至少一个相机在车辆行驶过程中实时采集的当前帧的图像数据,第一点云数据可以是设置在车辆上的雷达在车辆行驶过程中实时采集的当前帧的点云数据。The first image data may be the image data of the current frame collected in real time by at least one camera installed on the vehicle while the vehicle is driving, and the first point cloud data may be collected in real time by a radar installed on the vehicle while the vehicle is driving. The point cloud data of the current frame.
在一个可选示例中,该步骤301可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二获取模块执行。In an optional example, step 301 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the second acquisition module run by the processor.
步骤302,基于第一图像数据和/或第一点云数据,采用预先训练获得的目标检测模型,获得目标实例有序点集。Step 302: Based on the first image data and/or the first point cloud data, use the target detection model obtained by pre-training to obtain an ordered point set of target instances.
其中,目标检测模型通过如上任一实施例或可选示例提供的目标检测模型的训练方法获得,目标实例有序点集包括第一数量的实例分别对应的有序点集,有序点集包括目标数量的第一坐标系下的坐标点。Wherein, the target detection model is obtained through the training method of the target detection model provided in any of the above embodiments or optional examples. The target instance ordered point set includes ordered point sets corresponding to the first number of instances, and the ordered point set includes The target number of coordinate points in the first coordinate system.
其中,目标检测模型具体需要的输入数据可以根据实际需求设置并训练获得,可以支持图像数据或点云数据,也可以是同时支持图像数据和点云数据,具体参见前述实施例。目标检测模型的具体推理原理参见前述实施例,在此不再赘述。Among them, the specific input data required by the target detection model can be set and trained according to actual needs, and can support image data or point cloud data, or can support both image data and point cloud data. For details, please refer to the aforementioned embodiments. The specific reasoning principle of the target detection model can be found in the foregoing embodiments and will not be described again here.
在一个可选示例中,该步骤302可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二处理模块执行。In an optional example, step 302 may be performed by the processor calling corresponding instructions stored in the memory, or may be performed by a second processing module run by the processor.
步骤303,基于目标实例有序点集,生成地图。Step 303: Generate a map based on the ordered point set of the target instance.
其中,目标实例有序点集是第一坐标系(比如鸟瞰视角对应的坐标系)下的坐标点集,通过对目标实例点集中各实例的有序点集进行拟合,即可获得对应的地图元素,比如车道线、斑马线、路沿等。各实例的拟合结果即可作为车辆当前所在位置周围的局部道路地图。Among them, the target instance ordered point set is the coordinate point set under the first coordinate system (such as the coordinate system corresponding to the bird's-eye view). By fitting the ordered point set of each instance in the target instance point set, the corresponding Map elements, such as lane lines, zebra crossings, curbs, etc. The fitting results of each instance can be used as a local road map around the current location of the vehicle.
在实际应用中,还可以通过坐标转换将目标实例有序点集转换到全局坐标系下,从而可以根据区域增长方式生成全局道路地图,具体可以根据实际需求设置。其中,全局坐标系比如可以为世界坐标系或与世界坐标系刚性连接的相对稳定的坐标系,例如全局坐标系可以是以车辆起始位置为原点的预设坐标系。In practical applications, the ordered point set of the target instance can also be converted into the global coordinate system through coordinate transformation, so that a global road map can be generated according to the regional growth method, which can be set according to actual needs. The global coordinate system may be, for example, a world coordinate system or a relatively stable coordinate system rigidly connected to the world coordinate system. For example, the global coordinate system may be a preset coordinate system with the starting position of the vehicle as the origin.
在一个可选示例中,该步骤303可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第三处理模块执行。In an optional example, step 303 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by a third processing module run by the processor.
本公开实施例的地图的生成方法,基于目标检测模型实现地图实例的坐标点级别的预测,相对于地图实例框级别的预测,本公开实施例的方法可以有助于提高地图的精度。The map generation method in the embodiment of the present disclosure realizes prediction at the coordinate point level of the map instance based on the target detection model. Compared with the prediction at the frame level of the map instance, the method in the embodiment of the present disclosure can help to improve the accuracy of the map.
本公开实施例提供的任一种方法(包括目标检测模型的训练方法和地图的生成方法)可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。 或者,本公开实施例提供的任一种方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本公开实施例提及的任一种方法。下文不再赘述。Any method provided by the embodiments of the present disclosure (including the training method of the target detection model and the map generation method) can be executed by any appropriate device with data processing capabilities, including but not limited to: terminal devices and servers. Alternatively, any method provided by the embodiments of the present disclosure can be executed by a processor. For example, the processor executes any method mentioned in the embodiments of the present disclosure by calling corresponding instructions stored in the memory. No further details will be given below.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps to implement the above method embodiments can be completed by hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When the program is executed, It includes the steps of the above method embodiment; and the aforementioned storage medium includes: ROM, RAM, magnetic disk or optical disk and other various media that can store program codes.
示例性装置Exemplary device
图14是本公开一示例性实施例提供的目标检测模型的训练装置的结构示意图。该实施例的装置可用于实现本公开相应的目标检测模型的训练方法实施例,如图14所示的装置包括:第一获取模块501和第一处理模块502。Figure 14 is a schematic structural diagram of a training device for a target detection model provided by an exemplary embodiment of the present disclosure. The device of this embodiment can be used to implement the training method embodiment of the corresponding target detection model of the present disclosure. The device shown in Figure 14 includes: a first acquisition module 501 and a first processing module 502.
第一获取模块501,用于获取训练输入数据及对应的第一标签数据,训练输入数据包括训练图像数据和/或训练点云数据,第一标签数据包括训练输入数据中第一数量的实例分别对应的有序点集,有序点集包括目标数量的第一坐标系下的坐标点;第一处理模块502,用于基于第一获取模块501获取的训练输入数据、第一标签数据、点对点损失函数及方向损失函数,对预先建立的目标检测网络进行训练,获得目标检测模型,点对点损失函数用于确定目标检测网络输出的训练实例点集相对于第一标签数据中实例的有序点集的点对点损失,方向损失函数用于确定训练实例点集中点与点之间的方向相对于第一标签数据中实例的有序点集的点与点之间的方向的损失。The first acquisition module 501 is used to acquire training input data and corresponding first label data. The training input data includes training image data and/or training point cloud data. The first label data includes a first number of instances in the training input data. Corresponding ordered point set, the ordered point set includes a target number of coordinate points in the first coordinate system; the first processing module 502 is used to based on the training input data, first label data, point-to-point obtained by the first acquisition module 501 The loss function and the direction loss function train the pre-established target detection network to obtain the target detection model. The point-to-point loss function is used to determine the training instance point set output by the target detection network relative to the ordered point set of the instance in the first label data. Point-to-point loss, the direction loss function is used to determine the loss of the direction between points in the point set of the training instance relative to the direction between points in the ordered point set of the instance in the first label data.
在一个可选示例中,图15是本公开一示例性实施例提供的第一处理模块502的结构示意图。本示例中,第一处理模块502包括:第一处理单元5021、第二处理单元5022、第三处理单元5023和第四处理单元5024。In an optional example, FIG. 15 is a schematic structural diagram of the first processing module 502 provided by an exemplary embodiment of the present disclosure. In this example, the first processing module 502 includes: a first processing unit 5021, a second processing unit 5022, a third processing unit 5023 and a fourth processing unit 5024.
第一处理单元5021,用于基于训练输入数据及目标检测网络,获得训练实例点集;第二处理单元5022,用于基于第一处理单元5021获得的训练实例点集、第一标签数据及点对点损失函数,确定第一损失;第三处理单元5023,用于基于第一处理单元5021获得的训练实例点集、第一标签数据及方向损失函数,确定第二损失;第四处理单元5024,用于基于第一损失和第二损失,对目标检测网络的网络参数进行调整,直至第一损失和第二损失满足预设条件,获得目标检测模型。The first processing unit 5021 is used to obtain the training instance point set based on the training input data and the target detection network; the second processing unit 5022 is used to obtain the training instance point set, the first label data and the point-to-point based on the first processing unit 5021 The loss function determines the first loss; the third processing unit 5023 is used to determine the second loss based on the training instance point set, the first label data and the direction loss function obtained by the first processing unit 5021; the fourth processing unit 5024 uses Based on the first loss and the second loss, the network parameters of the target detection network are adjusted until the first loss and the second loss meet the preset conditions, and the target detection model is obtained.
在一个可选示例中,图16是本公开一示例性实施例提供的第二处理单元5022的结构示意图。在本示例中,第二处理单元5022包括:第一确定子单元50221、第二确定子单元50222、第三确定子单元50223、第四确定子单元50224和第五确定子单元50225。In an optional example, FIG. 16 is a schematic structural diagram of the second processing unit 5022 provided by an exemplary embodiment of the present disclosure. In this example, the second processing unit 5022 includes: a first determination sub-unit 50221, a second determination sub-unit 50222, a third determination sub-unit 50223, a fourth determination sub-unit 50224 and a fifth determination sub-unit 50225.
第一确定子单元50221,用于对于每个实例,基于第一标签数据中该实例对应的有序点集,分别以该有序点集的不同顺序,确定该有序点集中各点与训练实例点集中该实例的点的对应关系,获得各顺序分别对应的点对点关系;第二确定子单元50222,用于基于各顺序分别对应的点对点关系,确定各顺序分别对应的点对点损失;第三确定子单元50223,用于将点对点损失最小的顺序作为该实例的目标顺序;第四确定子单元50224,用于将目标顺序对应的点对点损失作为该实例的目标点对点损失;第五确定子单元50225,基于各实例的目标点对点损失,确定第一损失。The first determination subunit 50221 is used for each instance, based on the ordered point set corresponding to the instance in the first label data, and in different orders of the ordered point set, determine the relationship between each point in the ordered point set and the training The instance point set corresponds to the point of the instance to obtain the point-to-point relationship corresponding to each sequence; the second determination subunit 50222 is used to determine the point-to-point loss corresponding to each sequence based on the point-to-point relationship corresponding to each sequence; the third determination Subunit 50223 is used to use the order with the smallest point-to-point loss as the target order of this instance; the fourth determination subunit 50224 is used to use the point-to-point loss corresponding to the target order as the target point-to-point loss of this instance; the fifth determination subunit 50225, Based on the target point-to-point loss for each instance, a first loss is determined.
在一个可选示例中,第三处理单元5023具体用于:基于训练实例点集、第一标签数据、各实例分别对应的目标顺序及方向损失函数,确定第二损失。In an optional example, the third processing unit 5023 is specifically configured to determine the second loss based on the training instance point set, the first label data, and the target order and direction loss function corresponding to each instance.
图17是本公开一示例性实施例提供的第一处理单元5021的结构示意图。Figure 17 is a schematic structural diagram of the first processing unit 5021 provided by an exemplary embodiment of the present disclosure.
在一个可选示例中,训练输入数据还包括初始查询特征及初始参考点,初始查询特征包括第一数量的实例分别对应的目标数量的初始特征,初始参考点包括各初始特征分别对应的参考坐标点;目标检测网络为基于可变形检测变换器的检测网络;第一处理单元5021包括:第一特征提取子单元50211、第二特征提取子单元50212、编码子单元50213、解码子单元50214和第一处理子单元50215。In an optional example, the training input data also includes initial query features and initial reference points. The initial query features include the target number of initial features corresponding to the first number of instances, and the initial reference points include reference coordinates corresponding to each initial feature. point; the target detection network is a detection network based on a deformable detection transformer; the first processing unit 5021 includes: a first feature extraction subunit 50211, a second feature extraction subunit 50212, an encoding subunit 50213, a decoding subunit 50214 and a third A processing subunit 50215.
第一特征提取子单元50211,用于基于目标检测网络中的第一特征提取网络对训练图 像数据进行特征提取,获得第一训练图像特征;第二特征提取子单元50212,用于基于目标检测网络中的第二特征提取网络对训练点云数据进行特征提取,获得第一训练点云特征;编码子单元50213,用于基于目标检测网络中的编码器网络对第一训练图像特征和/或第一训练点云特征进行编码,获得第一坐标系下的目标训练特征图;解码子单元50214,用于基于目标训练特征图、初始查询特征、初始参考点、及目标检测网络中的解码器网络,获得训练解码结果,解码器网络包括至少一个解码器;第一处理子单元50215,用于基于训练解码结果,确定训练实例点集。The first feature extraction subunit 50211 is used to pair the training graph based on the first feature extraction network in the target detection network. Perform feature extraction on the image data to obtain the first training image features; the second feature extraction subunit 50212 is used to perform feature extraction on the training point cloud data based on the second feature extraction network in the target detection network to obtain the first training point cloud features. ; Encoding subunit 50213, used to encode the first training image feature and/or the first training point cloud feature based on the encoder network in the target detection network to obtain the target training feature map in the first coordinate system; decoding subunit 50214, used to obtain training decoding results based on the target training feature map, initial query features, initial reference points, and the decoder network in the target detection network. The decoder network includes at least one decoder; the first processing subunit 50215, with Based on the training decoding results, the training instance point set is determined.
在一个可选示例中,解码子单元50214具体用于:对于解码器网络中的每个解码器,基于目标训练特征图及该解码器对应的输入查询特征和输入参考点,获得该解码器的解码结果,其中,第一个解码器对应的输入查询特征和输入参考点分别为初始查询特征和初始参考点,除第一个解码器之外的任一其他解码器对应的输入查询特征为该其他解码器的前一解码器的解码结果,该其他解码器的输入参考点为基于前一解码器的解码结果确定的输出参考点;将最后一个解码器的解码结果作为训练解码结果。In an optional example, the decoding subunit 50214 is specifically configured to: for each decoder in the decoder network, obtain the decoder's decoder based on the target training feature map and the input query features and input reference points corresponding to the decoder. Decoding result, where the input query feature and input reference point corresponding to the first decoder are the initial query feature and the initial reference point respectively, and the input query feature corresponding to any other decoder except the first decoder is The decoding result of the previous decoder of other decoders, the input reference point of the other decoder is the output reference point determined based on the decoding result of the previous decoder; the decoding result of the last decoder is used as the training decoding result.
在一个可选示例中,第一处理单元5021还包括:偏移量预测子单元50216和第二处理子单元50217。In an optional example, the first processing unit 5021 also includes: an offset prediction sub-unit 50216 and a second processing sub-unit 50217.
偏移量预测子单元50216,用于基于该解码器的解码结果、及该解码器对应的偏移量预测网络,确定该解码器对应的第一偏移量;第二处理子单元50217,用于基于第一偏移量及该解码器对应的输入参考点,确定该解码器对应的输出参考点;相应的,第一处理子单元50215具体用于:将基于训练解码结果确定的最后一个解码器对应的输出参考点作为训练实例点集。The offset prediction subunit 50216 is used to determine the first offset corresponding to the decoder based on the decoding result of the decoder and the offset prediction network corresponding to the decoder; the second processing subunit 50217 is used to Based on the first offset and the input reference point corresponding to the decoder, the output reference point corresponding to the decoder is determined; accordingly, the first processing subunit 50215 is specifically used to: convert the last decoding determined based on the training decoding result The output reference points corresponding to the device are used as the training instance point set.
图18是本公开另一示例性实施例提供的第一处理模块502的结构示意图。Figure 18 is a schematic structural diagram of the first processing module 502 provided by another exemplary embodiment of the present disclosure.
在一个可选示例中,第一标签数据还包括训练输入数据中各实例分别对应的类型标签;第一处理模块502还包括:In an optional example, the first label data also includes type labels corresponding to each instance in the training input data; the first processing module 502 also includes:
第五处理单元5025,用于基于训练解码结果,确定训练类型结果,训练类型结果包括各实例分别对应的预测类型;第六处理单元5026,用于基于训练类型结果及第一标签数据中的类型标签,确定类型损失;相应的,第四处理单元5024包括:第三处理子单元50241,用于基于第一损失、第二损失、类型损失及预设权重,确定综合损失;第四处理子单元50242,用于基于综合损失,对目标检测网络的网络参数进行调整,直至综合损失满足预设条件,获得目标检测模型。The fifth processing unit 5025 is used to determine the training type result based on the training decoding result. The training type result includes the prediction type corresponding to each instance; the sixth processing unit 5026 is used to determine the training type result based on the training type result and the type in the first label data. tag to determine the type loss; correspondingly, the fourth processing unit 5024 includes: a third processing subunit 50241, used to determine the comprehensive loss based on the first loss, the second loss, the type loss and the preset weight; the fourth processing subunit 50242, used to adjust the network parameters of the target detection network based on the comprehensive loss until the comprehensive loss meets the preset conditions and obtain the target detection model.
在一个可选示例中,第四处理单元5024具体用于:基于第一损失和第二损失,确定综合损失;基于综合损失,对目标检测网络的网络参数进行调整,直至综合损失满足预设条件,获得目标检测模型。In an optional example, the fourth processing unit 5024 is specifically configured to: determine the comprehensive loss based on the first loss and the second loss; and adjust the network parameters of the target detection network based on the comprehensive loss until the comprehensive loss meets the preset conditions. , obtain the target detection model.
本装置示例性实施例对应的有益技术效果可以参见上述示例性方法部分的相应有益技术效果,在此不再赘述。The beneficial technical effects corresponding to the exemplary embodiments of this device can be found in the corresponding beneficial technical effects in the above exemplary method section, and will not be described again here.
图19是本公开一示例性实施例提供的地图的生成装置的结构示意图。该实施例的装置可用于实现本公开相应的地图的生成方法实施例,如图19所示的装置包括:第二获取模块601、第二处理模块602和第三处理模块603。Figure 19 is a schematic structural diagram of a map generation device provided by an exemplary embodiment of the present disclosure. The device of this embodiment can be used to implement the corresponding map generation method embodiment of the present disclosure. The device shown in Figure 19 includes: a second acquisition module 601, a second processing module 602, and a third processing module 603.
第二获取模块601,用于获取至少一个视角的第一图像数据和/或第一点云数据;第二处理模块602,用于基于第二获取模块601获取的第一图像数据和/或第一点云数据,采用预先训练获得的目标检测模型,获得目标实例有序点集,目标检测模型通过如上任一实施例或可选示例的目标检测模型的训练方法获得,目标实例有序点集包括第一数量的实例分别对应的有序点集,有序点集包括目标数量的第一坐标系下的坐标点;第三处理模块603,用于基于第二处理模块602获得的目标实例有序点集,生成地图。The second acquisition module 601 is used to acquire the first image data and/or the first point cloud data of at least one perspective; the second processing module 602 is used to acquire the first image data and/or the first point cloud data based on the second acquisition module 601. For point cloud data, a target detection model obtained through pre-training is used to obtain an ordered point set of target instances. The target detection model is obtained through the training method of the target detection model in any of the above embodiments or optional examples. An ordered point set of target instances is obtained. It includes an ordered point set corresponding to the first number of instances, and the ordered point set includes a target number of coordinate points in the first coordinate system; the third processing module 603 is configured to based on the target instance obtained by the second processing module 602: Sequence point set, generate map.
本装置示例性实施例对应的有益技术效果可以参见上述示例性方法部分的相应有益技术效果,在此不再赘述。The beneficial technical effects corresponding to the exemplary embodiments of this device can be found in the corresponding beneficial technical effects in the above exemplary method section, and will not be described again here.
示例性电子设备Example electronic device
图20是本公开电子设备一个应用实施例的结构示意图。本实施例中,该电子设备10包括一个或多个处理器11和存储器12。Figure 20 is a schematic structural diagram of an application embodiment of the electronic device of the present disclosure. In this embodiment, the electronic device 10 includes one or more processors 11 and memories 12 .
处理器11可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其他形式的处理单元,并且可以控制电子设备10中的其他组件以执行期望的功能。The processor 11 may be a central processing unit (CPU) or other form of processing unit with data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.
存储器12可以包括一个或多个计算机程序产品,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在计算机可读存储介质上可以存储一个或多个计算机程序指令,处理器11可以运行所述程序指令,以实现上文所述的本公开的各个实施例的方法以及/或者其他期望的功能。Memory 12 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache), etc. Non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 11 may execute the program instructions to implement the methods of various embodiments of the present disclosure described above and/or other desired functions.
在一个示例中,电子设备10还可以包括:输入装置13和输出装置14,这些组件通过总线系统和/或其他形式的连接机构(未示出)互连。In one example, the electronic device 10 may further include an input device 13 and an output device 14, and these components are interconnected through a bus system and/or other forms of connection mechanisms (not shown).
此外,该输入装置13还可以包括例如键盘、鼠标等等。In addition, the input device 13 may also include, for example, a keyboard, a mouse, and the like.
该输出装置14可以向外部输出各种信息,包括确定出的距离信息、方向信息等。该输出装置14可以包括例如显示器、扬声器、打印机、以及通信网络及其所连接的远程输出设备等等。The output device 14 can output various information to the outside, including determined distance information, direction information, etc. The output device 14 may include, for example, a display, a speaker, a printer, a communication network and remote output devices connected thereto, and the like.
当然,为了简化,图20中仅示出了该电子设备10中与本公开有关的组件中的一些,省略了诸如总线、输入/输出接口等等的组件。除此之外,根据具体应用情况,电子设备10还可以包括任何其他适当的组件。Of course, for simplicity, only some of the components in the electronic device 10 related to the present disclosure are shown in FIG. 20 , and components such as buses, input/output interfaces, etc. are omitted. In addition, the electronic device 10 may also include any other appropriate components depending on the specific application.
示例性计算机程序产品和计算机可读存储介质Example computer program products and computer-readable storage media
除了上述方法和设备以外,本公开的实施例还可以是计算机程序产品,其包括计算机程序指令,计算机程序指令在被处理器运行时使得所述处理器执行本说明书上述“示例性方法”部分中描述的根据本公开各种实施例的方法中的步骤。In addition to the above-mentioned methods and devices, embodiments of the present disclosure may also be a computer program product, which includes computer program instructions. When executed by a processor, the computer program instructions cause the processor to perform the above-mentioned “Example Methods” section of this specification. Described steps in methods according to various embodiments of the present disclosure.
计算机程序产品可以以一种或多种程序设计语言的任意组合来编写用于执行本公开实施例操作的程序代码,所述程序设计语言包括面向对象的程序设计语言,诸如Java、C++等,还包括常规的过程式程序设计语言,诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。The computer program product may have program code for performing operations of embodiments of the present disclosure written in any combination of one or more programming languages, including object-oriented programming languages such as Java, C++, etc., and Includes conventional procedural programming languages, such as the "C" language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on.
此外,本公开的实施例还可以是计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令在被处理器运行时使得所述处理器执行本说明书上述“示例性方法”部分中描述的根据本公开各种实施例的方法中的步骤。In addition, embodiments of the present disclosure may also be a computer-readable storage medium having computer program instructions stored thereon. The computer program instructions, when executed by a processor, cause the processor to execute the above-mentioned “example method” part of this specification. The steps in methods according to various embodiments of the present disclosure are described in .
计算机可读存储介质可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以包括但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。Computer-readable storage media can take the form of any combination of one or more computer-readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any combination thereof. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
以上结合具体实施例描述了本公开的基本原理,但是,在本公开中提及的优点、优势、效果等仅是示例而非限制,不能认为其是本公开的各个实施例必须具备的。另外,上述公开的具体细节仅是为了示例的作用和便于理解的作用,而非限制,上述细节并不限制本公开为必须采用上述具体的细节来实现。The basic principles of the present disclosure have been described above in conjunction with specific embodiments. However, the advantages, advantages, effects, etc. mentioned in the present disclosure are only examples and not limitations, and cannot be considered to be necessary for each embodiment of the present disclosure. In addition, the specific details disclosed above are only for the purpose of illustration and to facilitate understanding, and are not limiting. The above details do not limit the disclosure to be implemented by using the above specific details.
本领域的技术人员可以对本公开进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本公开权利要求及其等同技术的范围之内,则本公开也意图包含这些改动和变型在内。 Various changes and modifications can be made to the present disclosure by those skilled in the art without departing from the spirit and scope of the application. In this way, if these modifications and variations of the present application fall within the scope of the claims of the present disclosure and its equivalent technology, the present disclosure is also intended to include these modifications and variations.

Claims (15)

  1. 一种目标检测模型的训练方法,包括:A training method for a target detection model, including:
    获取训练输入数据及对应的第一标签数据,所述训练输入数据包括训练图像数据和/或训练点云数据,所述第一标签数据包括所述训练输入数据中第一数量的实例分别对应的有序点集,所述有序点集包括目标数量的第一坐标系下的坐标点;Obtain training input data and corresponding first label data. The training input data includes training image data and/or training point cloud data. The first label data includes corresponding labels of a first number of instances in the training input data. An ordered point set, the ordered point set includes a target number of coordinate points in the first coordinate system;
    基于所述训练输入数据、所述第一标签数据、点对点损失函数及方向损失函数,对预先建立的目标检测网络进行训练,获得目标检测模型,所述点对点损失函数用于确定所述目标检测网络输出的训练实例点集相对于所述第一标签数据中实例的有序点集的点对点损失,所述方向损失函数用于确定所述训练实例点集中点与点之间的方向相对于所述第一标签数据中实例的有序点集的点与点之间的方向的损失。Based on the training input data, the first label data, a point-to-point loss function and a direction loss function, the pre-established target detection network is trained to obtain a target detection model, and the point-to-point loss function is used to determine the target detection network The point-to-point loss of the output training instance point set relative to the ordered point set of the instance in the first label data, the direction loss function is used to determine the direction between the points in the training instance point set relative to the The loss of the point-to-point direction of the ordered point set of the instance in the first label data.
  2. 根据权利要求1所述的方法,其中,所述基于所述训练输入数据、所述第一标签数据、点对点损失函数及方向损失函数,对预先建立的目标检测网络进行训练,获得目标检测模型,包括:The method according to claim 1, wherein the pre-established target detection network is trained based on the training input data, the first label data, a point-to-point loss function and a direction loss function to obtain a target detection model, include:
    基于所述训练输入数据及所述目标检测网络,获得所述训练实例点集;Based on the training input data and the target detection network, obtain the training instance point set;
    基于所述训练实例点集、所述第一标签数据及所述点对点损失函数,确定第一损失;Determine a first loss based on the training instance point set, the first label data and the point-to-point loss function;
    基于所述训练实例点集、所述第一标签数据及所述方向损失函数,确定第二损失;Determine a second loss based on the training instance point set, the first label data and the direction loss function;
    基于所述第一损失和所述第二损失,对所述目标检测网络的网络参数进行调整,直至所述第一损失和所述第二损失满足预设条件,获得所述目标检测模型。Based on the first loss and the second loss, the network parameters of the target detection network are adjusted until the first loss and the second loss meet preset conditions, and the target detection model is obtained.
  3. 根据权利要求2所述的方法,其中,所述基于所述训练实例点集、所述第一标签数据及所述点对点损失函数,确定第一损失,包括:The method of claim 2, wherein determining the first loss based on the training instance point set, the first label data and the point-to-point loss function includes:
    对于每个所述实例,基于所述第一标签数据中该实例对应的有序点集,分别以该有序点集的不同顺序,确定该有序点集中各点与所述训练实例点集中该实例的点的对应关系,获得各顺序分别对应的点对点关系;For each instance, based on the ordered point set corresponding to the instance in the first label data, determine each point in the ordered point set and the training instance point set in different orders of the ordered point set. The corresponding relationship between the points of this instance is to obtain the point-to-point relationship corresponding to each sequence;
    基于各顺序分别对应的所述点对点关系,确定各顺序分别对应的点对点损失;Based on the point-to-point relationship corresponding to each sequence, determine the point-to-point loss corresponding to each sequence;
    将点对点损失最小的顺序作为该实例的目标顺序;The order with the smallest point-to-point loss is used as the target order of this instance;
    将所述目标顺序对应的点对点损失作为该实例的目标点对点损失;Use the point-to-point loss corresponding to the target sequence as the target point-to-point loss of this instance;
    基于各所述实例的所述目标点对点损失,确定所述第一损失。The first loss is determined based on the target point-to-point loss for each of the instances.
  4. 根据权利要求3所述的方法,其中,所述基于所述训练实例点集、所述第一标签数据及所述方向损失函数,确定第二损失,包括:The method of claim 3, wherein determining the second loss based on the training instance point set, the first label data and the direction loss function includes:
    基于所述训练实例点集、所述第一标签数据、各实例分别对应的目标顺序及所述方向损失函数,确定所述第二损失。The second loss is determined based on the training instance point set, the first label data, the target order corresponding to each instance, and the direction loss function.
  5. 根据权利要求2所述的方法,其中,所述训练输入数据还包括初始查询特征及初始参考点,所述初始查询特征包括所述第一数量的实例分别对应的目标数量的初始特征,所述初始参考点包括各所述初始特征分别对应的参考坐标点;所述目标检测网络为基于可变形检测变换器的检测网络;The method according to claim 2, wherein the training input data further includes initial query features and initial reference points, the initial query features include initial features of a target number corresponding to the first number of instances, and the The initial reference points include reference coordinate points corresponding to each of the initial features; the target detection network is a detection network based on a deformable detection transformer;
    所述基于所述训练输入数据及所述目标检测网络,获得所述训练实例点集,包括:Obtaining the training instance point set based on the training input data and the target detection network includes:
    基于所述目标检测网络中的第一特征提取网络对所述训练图像数据进行特征提取,获得第一训练图像特征;Perform feature extraction on the training image data based on the first feature extraction network in the target detection network to obtain first training image features;
    基于所述目标检测网络中的第二特征提取网络对所述训练点云数据进行特征提取,获得第一训练点云特征;Perform feature extraction on the training point cloud data based on the second feature extraction network in the target detection network to obtain first training point cloud features;
    基于所述目标检测网络中的编码器网络对所述第一训练图像特征和/或所述第一训练点云特征进行编码,获得第一坐标系下的目标训练特征图;Encode the first training image features and/or the first training point cloud features based on the encoder network in the target detection network to obtain a target training feature map in the first coordinate system;
    基于所述目标训练特征图、所述初始查询特征、所述初始参考点、及所述目标检测网络中的解码器网络,获得训练解码结果,所述解码器网络包括至少一个解码器;Obtain a training decoding result based on the target training feature map, the initial query feature, the initial reference point, and a decoder network in the target detection network, the decoder network including at least one decoder;
    基于所述训练解码结果,确定所述训练实例点集。 Based on the training decoding results, the training instance point set is determined.
  6. 根据权利要求5所述的方法,其中,所述基于所述目标训练特征图、所述初始查询特征、所述初始参考点、及所述目标检测网络中的解码器网络,获得训练解码结果,包括:The method of claim 5, wherein the training decoding result is obtained based on the target training feature map, the initial query feature, the initial reference point, and the decoder network in the target detection network, include:
    对于所述解码器网络中的每个所述解码器,基于所述目标训练特征图及该解码器对应的输入查询特征和输入参考点,获得该解码器的解码结果,其中,第一个解码器对应的输入查询特征和输入参考点分别为所述初始查询特征和所述初始参考点,除所述第一个解码器之外的任一其他解码器对应的输入查询特征为该其他解码器的前一解码器的解码结果,该其他解码器的输入参考点为基于前一解码器的解码结果确定的输出参考点;For each decoder in the decoder network, the decoding result of the decoder is obtained based on the target training feature map and the input query feature and input reference point corresponding to the decoder, where the first decoder The input query features and input reference points corresponding to the decoder are the initial query features and the initial reference point respectively, and the input query features corresponding to any other decoder except the first decoder are the other decoders. The decoding result of the previous decoder, the input reference point of the other decoder is the output reference point determined based on the decoding result of the previous decoder;
    将最后一个所述解码器的解码结果作为所述训练解码结果。The decoding result of the last decoder is used as the training decoding result.
  7. 根据权利要求6所述的方法,其中,在所述对于所述解码器网络中的每个所述解码器,基于所述目标训练特征图及该解码器对应的输入查询特征和输入参考点,获得该解码器的解码结果之后,还包括:The method of claim 6, wherein for each decoder in the decoder network, based on the target training feature map and the input query feature and input reference point corresponding to the decoder, After obtaining the decoding result of the decoder, it also includes:
    基于该解码器的解码结果、及该解码器对应的偏移量预测网络,确定该解码器对应的第一偏移量;Based on the decoding result of the decoder and the offset prediction network corresponding to the decoder, determine the first offset corresponding to the decoder;
    基于所述第一偏移量及该解码器对应的所述输入参考点,确定该解码器对应的输出参考点;Based on the first offset and the input reference point corresponding to the decoder, determine the output reference point corresponding to the decoder;
    所述基于所述训练解码结果,确定所述训练实例点集,包括:Determining the training instance point set based on the training decoding result includes:
    将基于所述训练解码结果确定的所述最后一个解码器对应的输出参考点作为所述训练实例点集。The output reference point corresponding to the last decoder determined based on the training decoding result is used as the training instance point set.
  8. 根据权利要求5所述的方法,其中,所述第一标签数据还包括所述训练输入数据中各所述实例分别对应的类型标签;The method according to claim 5, wherein the first label data further includes type labels corresponding to each of the instances in the training input data;
    在所述基于所述目标训练特征图、所述初始查询特征、所述初始参考点、及所述目标检测网络中的解码器网络,获得训练解码结果之后,还包括:After obtaining the training decoding result based on the target training feature map, the initial query feature, the initial reference point, and the decoder network in the target detection network, it also includes:
    基于所述训练解码结果,确定训练类型结果,所述训练类型结果包括各所述实例分别对应的预测类型;Based on the training decoding results, determine a training type result, where the training type result includes a prediction type corresponding to each of the instances;
    基于所述训练类型结果及所述第一标签数据中的类型标签,确定类型损失;Determine a type loss based on the training type result and the type label in the first label data;
    所述基于所述第一损失和所述第二损失,对所述目标检测网络的网络参数进行调整,直至所述第一损失和所述第二损失满足预设条件,获得所述目标检测模型,包括:Based on the first loss and the second loss, the network parameters of the target detection network are adjusted until the first loss and the second loss meet preset conditions, and the target detection model is obtained. ,include:
    基于所述第一损失、所述第二损失、所述类型损失及预设权重,确定综合损失;Determine comprehensive loss based on the first loss, the second loss, the type of loss and the preset weight;
    基于所述综合损失,对所述目标检测网络的网络参数进行调整,直至所述综合损失满足所述预设条件,获得所述目标检测模型。Based on the comprehensive loss, the network parameters of the target detection network are adjusted until the comprehensive loss meets the preset condition, and the target detection model is obtained.
  9. 根据权利要求2所述的方法,其中,所述基于所述第一损失和所述第二损失,对所述目标检测网络的网络参数进行调整,直至所述第一损失和所述第二损失满足预设条件,获得所述目标检测模型,包括:The method of claim 2, wherein based on the first loss and the second loss, network parameters of the target detection network are adjusted until the first loss and the second loss Satisfy the preset conditions and obtain the target detection model, including:
    基于所述第一损失和所述第二损失,确定综合损失;determining a comprehensive loss based on the first loss and the second loss;
    基于所述综合损失,对所述目标检测网络的网络参数进行调整,直至所述综合损失满足所述预设条件,获得所述目标检测模型。Based on the comprehensive loss, the network parameters of the target detection network are adjusted until the comprehensive loss meets the preset condition, and the target detection model is obtained.
  10. 一种地图的生成方法,包括:A map generation method, including:
    获取至少一个视角的第一图像数据和/或第一点云数据;Obtaining first image data and/or first point cloud data of at least one perspective;
    基于所述第一图像数据和/或所述第一点云数据,采用预先训练获得的目标检测模型,获得目标实例有序点集,所述目标检测模型通过如权利要求1-9任一所述的目标检测模型的训练方法获得,所述目标实例有序点集包括第一数量的实例分别对应的有序点集,所述有序点集包括目标数量的第一坐标系下的坐标点;Based on the first image data and/or the first point cloud data, a target detection model obtained by pre-training is used to obtain an ordered point set of target instances, and the target detection model is passed according to any one of claims 1-9. Obtained by the training method of the target detection model described above, the ordered point set of the target instance includes an ordered point set corresponding to a first number of instances, and the ordered point set includes a target number of coordinate points in the first coordinate system ;
    基于所述目标实例有序点集,生成地图。Based on the ordered point set of the target instance, a map is generated.
  11. 一种目标检测模型的训练装置,包括:A training device for a target detection model, including:
    第一获取模块,用于获取训练输入数据及对应的第一标签数据,所述训练输入数据包括训练图像数据和/或训练点云数据,所述第一标签数据包括所述训练输入数据中第一数量的实例分别对应的有序点集,所述有序点集包括目标数量的第一坐标系下的坐标点; The first acquisition module is used to acquire training input data and corresponding first label data. The training input data includes training image data and/or training point cloud data. The first label data includes the training input data. An ordered point set corresponding to a number of instances, the ordered point set including a target number of coordinate points in the first coordinate system;
    第一处理模块,用于基于所述训练输入数据、所述第一标签数据、点对点损失函数及方向损失函数,对预先建立的目标检测网络进行训练,获得目标检测模型,所述点对点损失函数用于确定所述目标检测网络输出的训练实例点集相对于所述第一标签数据中实例的有序点集的点对点损失,所述方向损失函数用于确定所述训练实例点集中点与点之间的方向相对于所述第一标签数据中实例的有序点集的点与点之间的方向的损失。The first processing module is used to train a pre-established target detection network based on the training input data, the first label data, a point-to-point loss function and a direction loss function to obtain a target detection model. The point-to-point loss function is In order to determine the point-to-point loss of the training instance point set output by the target detection network relative to the ordered point set of the instance in the first label data, the direction loss function is used to determine the point between the training instance point concentration point and the point The loss of the direction between points relative to the direction between points of the ordered point set of instances in the first label data.
  12. 一种地图的生成装置,包括:A map generating device including:
    第二获取模块,用于获取至少一个视角的第一图像数据和/或第一点云数据;a second acquisition module, configured to acquire first image data and/or first point cloud data of at least one perspective;
    第二处理模块,用于基于所述第一图像数据和/或所述第一点云数据,采用预先训练获得的目标检测模型,获得目标实例有序点集,所述目标检测模型通过如权利要求1-9任一所述的目标检测模型的训练方法获得,所述目标实例有序点集包括第一数量的实例分别对应的有序点集,所述有序点集包括目标数量的第一坐标系下的坐标点;The second processing module is used to obtain an ordered point set of target instances based on the first image data and/or the first point cloud data using a target detection model obtained through pre-training. The target detection model is configured as follows: The training method of the target detection model according to any one of claims 1 to 9 is obtained, the ordered point set of the target instance includes an ordered point set corresponding to a first number of instances, and the ordered point set includes a first number of target instances. A coordinate point in a coordinate system;
    第三处理模块,用于基于所述目标实例有序点集,生成地图。The third processing module is used to generate a map based on the ordered point set of the target instance.
  13. 一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于执行上述权利要求1-9任一所述的目标检测模型的训练方法;或者,所述计算机程序用于执行上述权利要求10所述的地图的生成方法。A computer-readable storage medium, the storage medium stores a computer program, the computer program is used to execute the training method of the target detection model according to any one of the above claims 1-9; or, the computer program is used to Implement the map generation method described in claim 10 above.
  14. 一种电子设备,所述电子设备包括:An electronic device, the electronic device includes:
    处理器;processor;
    用于存储所述处理器可执行指令的存储器;memory for storing instructions executable by the processor;
    所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现上述权利要求1-9任一所述的目标检测模型的训练方法。The processor is configured to read the executable instructions from the memory and execute the instructions to implement the training method of the target detection model described in any one of claims 1-9.
  15. 一种电子设备,所述电子设备包括:An electronic device, the electronic device includes:
    处理器;processor;
    用于存储所述处理器可执行指令的存储器;memory for storing instructions executable by the processor;
    所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现上述权利要求10所述的地图的生成方法。 The processor is configured to read the executable instructions from the memory and execute the instructions to implement the map generation method described in claim 10.
PCT/CN2023/113197 2022-08-16 2023-08-15 Target detection model training method and apparatus, map generation method and apparatus, and device WO2024037552A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210977934.5 2022-08-16
CN202210977934.5A CN115331188A (en) 2022-08-16 2022-08-16 Training method of target detection model, map generation method, map generation device and map generation equipment

Publications (1)

Publication Number Publication Date
WO2024037552A1 true WO2024037552A1 (en) 2024-02-22

Family

ID=83923850

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/113197 WO2024037552A1 (en) 2022-08-16 2023-08-15 Target detection model training method and apparatus, map generation method and apparatus, and device

Country Status (2)

Country Link
CN (1) CN115331188A (en)
WO (1) WO2024037552A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115331188A (en) * 2022-08-16 2022-11-11 北京地平线信息技术有限公司 Training method of target detection model, map generation method, map generation device and map generation equipment
CN117555979B (en) * 2024-01-11 2024-04-19 人民中科(北京)智能技术有限公司 Efficient bottom-up map position missing identification method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110796230A (en) * 2018-08-02 2020-02-14 株式会社理光 Method, equipment and storage medium for training and using convolutional neural network
CN111460984A (en) * 2020-03-30 2020-07-28 华南理工大学 Global lane line detection method based on key point and gradient balance loss
US10803328B1 (en) * 2017-11-15 2020-10-13 Uatc, Llc Semantic and instance segmentation
CN114241313A (en) * 2021-12-21 2022-03-25 贝壳找房网(北京)信息技术有限公司 Method, apparatus, medium, and program product for extracting road boundary
CN114626437A (en) * 2022-02-17 2022-06-14 北京三快在线科技有限公司 Model training method and device, storage medium and electronic equipment
CN115331188A (en) * 2022-08-16 2022-11-11 北京地平线信息技术有限公司 Training method of target detection model, map generation method, map generation device and map generation equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10803328B1 (en) * 2017-11-15 2020-10-13 Uatc, Llc Semantic and instance segmentation
CN110796230A (en) * 2018-08-02 2020-02-14 株式会社理光 Method, equipment and storage medium for training and using convolutional neural network
CN111460984A (en) * 2020-03-30 2020-07-28 华南理工大学 Global lane line detection method based on key point and gradient balance loss
CN114241313A (en) * 2021-12-21 2022-03-25 贝壳找房网(北京)信息技术有限公司 Method, apparatus, medium, and program product for extracting road boundary
CN114626437A (en) * 2022-02-17 2022-06-14 北京三快在线科技有限公司 Model training method and device, storage medium and electronic equipment
CN115331188A (en) * 2022-08-16 2022-11-11 北京地平线信息技术有限公司 Training method of target detection model, map generation method, map generation device and map generation equipment

Also Published As

Publication number Publication date
CN115331188A (en) 2022-11-11

Similar Documents

Publication Publication Date Title
WO2024037552A1 (en) Target detection model training method and apparatus, map generation method and apparatus, and device
US11074481B2 (en) Environment navigation using reinforcement learning
EP3568810B1 (en) Action selection for reinforcement learning using neural networks
WO2020102733A1 (en) Learning to generate synthetic datasets for training neural networks
US10887607B2 (en) Making object-level predictions of the future state of a physical system
CN110062934A (en) The structure and movement in image are determined using neural network
KR20180065498A (en) Method for deep learning and method for generating next prediction image using the same
CN112119409A (en) Neural network with relational memory
US20170213150A1 (en) Reinforcement learning using a partitioned input state space
Han et al. Streaming object detection for 3-d point clouds
CN115630651B (en) Text generation method and training method and device of text generation model
CN113902007A (en) Model training method and device, image recognition method and device, equipment and medium
Choi et al. Hierarchical latent structure for multi-modal vehicle trajectory forecasting
WO2020225247A1 (en) Unsupervised learning of object keypoint locations in images through temporal transport or spatio-temporal transport
CN116188893A (en) Image detection model training and target detection method and device based on BEV
US20230260271A1 (en) Aligning entities using neural networks
CN114528387A (en) Deep learning conversation strategy model construction method and system based on conversation flow bootstrap
EP4200746A1 (en) Neural networks implementing attention over object embeddings for object-centric visual reasoning
CN114067371B (en) Cross-modal pedestrian trajectory generation type prediction framework, method and device
CN116012677A (en) Training data enhancement method, device, medium and equipment for track prediction
CN113119996A (en) Trajectory prediction method and apparatus, electronic device and storage medium
CN115240171B (en) Road structure sensing method and device
JP7462206B2 (en) Learning device, learning method, and learning program
CN114708471B (en) Cross-modal image generation method and device, electronic equipment and storage medium
Ocsa Sánchez et al. Attention U-Net Oriented Towards 3D Depth Estimation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23854452

Country of ref document: EP

Kind code of ref document: A1