CN116030206A

CN116030206A - Map generation method, training device, electronic equipment and storage medium

Info

Publication number: CN116030206A
Application number: CN202211532936.XA
Authority: CN
Inventors: 马中行; 周尧; 万国伟; 张晔
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2023-04-28

Abstract

The disclosure discloses a map generation method, a training device, electronic equipment and a storage medium, relates to the technical field of artificial intelligence, and particularly relates to the technical fields of automatic driving, deep learning and computer vision. The specific implementation scheme is as follows: generating a fusion feature map according to the image data and the point cloud data; extracting a first point feature sequence of the fusion feature map; processing the first point feature sequence and the fusion feature map based on the attention strategy to obtain target element information; and generating a map according to the target element information.

Description

Map generation method, training device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of automatic driving, deep learning and computer vision, and can be applied to dynamically generating a map. In particular to a map generation method, a training device, electronic equipment and a storage medium.

Background

The high-precision map, also called an automatic driving map and a high-resolution map, is a new map data paradigm for automatic driving automobiles. The high-precision map can provide accurate and comprehensive road characteristic data for the automatic driving automobile.

With the development of artificial intelligence technology, deep learning technology and computer vision technology are widely used to generate high-precision maps.

Disclosure of Invention

The disclosure provides a map generation method, a training device, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a map generation method including: generating a fusion feature map according to the image data and the point cloud data; extracting a first point feature sequence of the fusion feature map; processing the first point feature sequence and the fusion feature map based on an attention strategy to obtain target element information; and generating a map according to the target element information.

According to another aspect of the present disclosure, there is provided a training method of a deep learning model, including: obtaining a first sample point feature sequence and a second sample point feature sequence according to a sample image, wherein the first sample point feature sequence comprises S first sample points, the second sample point feature sequence comprises S second sample points, S is an integer greater than 1, the sample image represents a fusion feature map generated according to image data and point cloud data, the S second sample points of the second sample point feature sequence are sample labels of the S first sample points in the first sample point feature sequence, and S is an integer greater than or equal to 1 and less than or equal to S; and training a deep learning model by using the first sample point feature sequence and the second sample point feature sequence to obtain a trained deep learning model.

According to another aspect of the present disclosure, there is provided a map generating apparatus including: the device comprises a first generation module, an extraction module, a first acquisition module and a second generation module. The first generation module is used for generating a fusion feature map according to the image data and the point cloud data. And the extraction module is used for extracting the first point feature sequence of the fusion feature map. The first obtaining module is used for processing the first point feature sequence and the fusion feature map based on the attention strategy to obtain target element information. And the second generation module is used for generating a map according to the target element information.

According to another aspect of the present disclosure, there is provided a training apparatus of a deep learning model, including: a second obtaining module and a third obtaining module. The second obtaining module is configured to obtain a first sample point feature sequence and a second sample point feature sequence according to a sample image, where the first sample point feature sequence includes S first sample points, the second sample point feature sequence includes S second sample points, S is an integer greater than 1, the sample image characterizes a fusion feature map generated according to image data and point cloud data, the S second sample points of the second sample point feature sequence are sample labels of the S first sample points in the first sample point feature sequence, and S is an integer greater than or equal to 1 and less than or equal to S. And the third obtaining module is used for training the deep learning model by utilizing the first sample point characteristic sequence and the second sample point characteristic sequence to obtain a trained deep learning model.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method as described above.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates an exemplary system architecture to which map generation methods and apparatus may be applied, according to embodiments of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a map generation method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a schematic diagram of generating a fused feature map in accordance with an embodiment of the present disclosure;

FIG. 4 schematically illustrates a schematic diagram of generating a first point feature sequence in accordance with an embodiment of the present disclosure;

FIG. 5 schematically illustrates a flowchart of extracting a first sparse point sequence from a first discrete point sequence, according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a flowchart of extracting a sequence of target discrete points from a first sequence of sparse points, according to an embodiment of the disclosure;

FIG. 7 schematically illustrates a flow chart for processing a first point feature sequence and a fused feature map to obtain target element information based on an attention policy according to an embodiment of the present disclosure;

FIG. 8 schematically illustrates a schematic diagram of processing a first point feature sequence and a fused feature map to obtain target feature information for an M+n point based on an attention policy according to an embodiment of the present disclosure;

fig. 9 schematically illustrates a schematic diagram of a map generation method according to an embodiment of the present disclosure;

FIG. 10 schematically illustrates a training method flow diagram of a deep learning model in accordance with an embodiment of the present disclosure;

FIG. 11 schematically illustrates an exemplary architecture diagram of a deep learning model in accordance with an embodiment of the present disclosure;

FIG. 12 schematically illustrates a schematic diagram of a training method of a deep learning model according to an embodiment of the present disclosure;

fig. 13 schematically shows a block diagram of a map generating apparatus according to an embodiment of the present disclosure;

FIG. 14 schematically illustrates a block diagram of a deep learning model training apparatus in accordance with an embodiment of the present disclosure; and

fig. 15 schematically illustrates a block diagram of an electronic device adapted to implement a map generation method, a deep learning model training method, according to an embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The high-precision map may provide topology information of roads for the autonomous vehicle. In the process of constructing the high-precision map, an off-line mode is generally adopted, an image comprising lane lines is collected, the lane lines are extracted from the image by utilizing an image processing technology, and then the extracted lane lines are converted into a 2D plane and then into a format of the high-precision map.

However, when converting image data into a 2D plane, a certain loss of accuracy is caused, resulting in poor accuracy of a high-precision map. In addition, in a scene with complex road conditions and high time efficiency requirements, the method for generating the map offline has the problem of poor time efficiency.

To this end, the disclosed embodiments provide a map generation scheme. For example: generating a fusion feature map according to the image data and the point cloud data; extracting a first point feature sequence of the fusion feature map; processing the first point feature sequence and the fusion feature map based on the attention strategy to obtain target element information; and generating a map according to the target element information. To improve the accuracy and timeliness of map generation.

FIG. 1 schematically illustrates an exemplary system architecture of training methods and apparatus to which a map generation or deep learning model may be applied, according to embodiments of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios. For example, in another embodiment, an exemplary system architecture to which the training method and apparatus of the map generation or deep learning model may be applied may include a terminal device, but the terminal device may implement the training method and apparatus of the map generation or deep learning model provided by the embodiments of the present disclosure without interacting with a server.

As shown in fig. 1, a system architecture 100 according to this embodiment may include a vehicle terminal device 101, a network 102, and a server 103. The network 102 is a medium used to provide a communication link between the vehicle terminal device 101 and the server 103. Network 102 may include various connection types, such as wireless communication links, and the like.

A user can interact with the server 103 through the network 102 using the vehicle terminal device 101 to receive or send messages or the like. Various communication client applications may be installed on the vehicle terminal device 101, such as a knowledge reading class application, a web browser application, a search class application, an instant messaging tool, a mailbox client and/or social platform software, to name a few.

The vehicle terminal device 101 may be various in-vehicle electronic devices having a display screen and supporting web browsing, including but not limited to a smart phone, a tablet computer, a laptop computer, and the like.

The server 103 may be a server that provides various services, such as a background management server (merely an example) that provides support for content browsed by a user using a vehicle terminal device. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that, the training method of the map generation or deep learning model provided by the embodiments of the present disclosure may be generally performed by a vehicle terminal device. Accordingly, the training apparatus of the map generation or deep learning model provided by the embodiment of the present disclosure may also be provided in the vehicle terminal device 101.

Alternatively, the training method of the map generation or deep learning model provided by the embodiments of the present disclosure may also be generally performed by the server 103. Accordingly, the training apparatus of the map generation or deep learning model provided by the embodiments of the present disclosure may be generally provided in the server 103. The training method of the map generation or deep learning model provided by the embodiment of the present disclosure may also be performed by a server or a server cluster that is different from the server 103 and that is capable of communicating with the vehicle terminal device 101 and/or the server 103. Accordingly, the training apparatus of the map generation or deep learning model provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 103 and is capable of communicating with the vehicle terminal device 101 and/or the server 103.

For example, when the user vehicle is traveling, the vehicle terminal apparatus 101 may acquire image data and point cloud data around the vehicle through an acquisition apparatus on the vehicle, and then transmit the acquired image data and point cloud data to the server 103, and the server 103 generates a fusion profile from the image data and point cloud data. And then extracting a first point characteristic sequence in the fusion graph, and processing the first point characteristic sequence and the target element information of the fusion characteristic graph based on the attention strategy. And finally, generating a map according to the target element information. And transmits the map to the vehicle terminal device 101. Or the image data and the point cloud data are processed by a server or a server cluster capable of communicating with the vehicle terminal device 101 and/or the server 103, and finally the generation of a map of the vicinity of the target vehicle is achieved.

It should be understood that the number of vehicle terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of vehicle terminal devices, networks and servers, as desired for implementation.

Fig. 2 schematically illustrates a flow chart of a training method of a map generation or deep learning model according to an embodiment of the present disclosure.

As shown in fig. 2, the method includes operations S210 to S240.

In operation S210, a fusion feature map is generated from the image data and the point cloud data.

In operation S220, a first point feature sequence of the fusion feature map is extracted.

In operation S230, the first point feature sequence and the fusion feature map are processed based on the attention policy to obtain target element information.

In operation S240, a map is generated from the target element information.

According to an embodiment of the present disclosure, the image data may be obtained by photographing a target area through a vision sensor mounted on an autonomous vehicle. The target area may be an area around the body of the autonomous vehicle during traveling. The vision sensor may be a camera or a video camera. The image data may be in any format, for example: either in RGB (Red Green Blue) format or YUV (Luminance Chrominance, luminance and chrominance) format.

According to embodiments of the present disclosure, the point cloud data may be obtained by scanning a target area with a lidar mounted on an autonomous vehicle.

According to embodiments of the present disclosure, the image data may be used to provide road information around the body of an autonomous vehicle, such as: lane line information, obstacle information, basic traffic equipment information, and the like. Although texture information in the image data is absent from the point cloud data, it may be used to provide 3D geometric information of roads farther from the body of the autonomous vehicle to supplement the information in the image data.

According to the embodiment of the disclosure, the image data and the point cloud data can be generated into a fusion feature map through a plurality of feature fusion modes. For example: image data may be processed first based on a deep learning method of the image to obtain image features. For example: convolutional neural networks. And processing the point cloud data based on the point cloud deep learning method to obtain the point cloud characteristics. For example: forming a three-dimensional grid from the point cloud data according to a fixed resolution; or dividing the point cloud data into unbalanced trees, and processing according to the point density of the region. And fusing the image features and the point cloud features to obtain a fused feature map.

According to embodiments of the present disclosure, each feature may include a location feature and an attribute feature of the map element in the fused feature map. The location features may characterize pixel coordinate features of map elements, such as: two-dimensional coordinate type features and coordinate values. The attribute features may characterize class features of map elements, such as: lane line attributes, obstacle attributes, and the like.

According to the embodiment of the disclosure, a first point feature sequence of the fusion feature map is extracted, and the position feature and the attribute feature of the map element of the fusion feature map can be extracted through the modes of target detection, semantic segmentation and the like, wherein the map element characterizes the feature point of the fusion feature map, and the first point feature sequence can comprise the position feature and the attribute feature of the feature point. For example: the pixel coordinates of the feature point on a certain lane line are (2, 3), the coordinate value of the x-type coordinate is 2, the coordinate value of the y-type coordinate is 3, the attribute feature is a lane line attribute, and the point feature of the feature point can be expressed as (x, 2, lane line), (y, 3, lane line).

According to the embodiment of the disclosure, due to the first point feature sequence extracted by means of target detection or semantic segmentation, relatively simplified map element information is provided. The first point feature sequence and the fusion feature map can be processed based on the attention strategy by utilizing the trained deep learning model so as to complement the map element information and obtain more complete target element information.

According to the embodiment of the disclosure, since the position features and the attribute features of the map elements are included in each piece of target element information, the data association algorithm can be utilized to perform data association between the target elements based on the position features and the attribute features of the target elements to obtain the map. The data correlation algorithm may be ICP (Iterative Closest Point, iterative closest point algorithm), PPICP (Point to Plane ICP, point-to-face iterative closest point algorithm), GICP (Generalized ICP, full iterative closest point algorithm), VGICP (Voxelized Generalized ICP, voxel iterative closest point algorithm), or the like.

Operations S210 to S240 may be performed by an electronic device according to an embodiment of the present disclosure. The electronic device may comprise a server or a terminal device. The server may be the server 103 in fig. 1. The vehicle terminal device may be the vehicle terminal device 101 in fig. 1.

According to the embodiment of the disclosure, the fusion feature map is generated through the image data and the point cloud data, so that the element features identified by the visual sensor and the element features identified by the laser radar sensor can be fused, and the precision of map elements in the generated fusion feature map is high. And then extracting a first point feature sequence from the fusion feature map, and processing the first point feature sequence and the fusion feature map based on the attention strategy to obtain the information of the target element. The method realizes the complement of element information of the first point feature sequence with simplified information and improves the integrity of map element features. And finally, generating a map by using the complete target element information. Therefore, the map in the target area of the distance from the automatic driving vehicle to the vehicle body can be dynamically generated in real time without depending on an offline map, the method can be suitable for the element generation requirement with higher timeliness in the map, and the map generation efficiency and the map precision are improved.

The method shown in fig. 2 is further described below with reference to fig. 3-9 in conjunction with the exemplary embodiment.

Fig. 3 schematically illustrates a schematic diagram of generating a fused feature map according to an embodiment of the present disclosure.

As shown in fig. 3, in 300, an image feature map 314 is generated from a predetermined feature map 311, image features 312 of the image data, and first parameters 316. A point cloud feature map 318 is generated from the predetermined feature map 315, the point cloud features 316 of the point cloud data, and the second parameters 317. The image feature map 314 and the point cloud feature map 318 are subjected to fusion processing, and a fusion feature map 319 is obtained.

According to embodiments of the present disclosure, image features 312 of image data may characterize the location features and attribute features of map elements in the image data. The image features 312 of the image data may be processed by an image feature extraction method. The image feature extraction method may include: fourier transform, windowed fourier transform, wavelet transform, least square, boundary direction histogram, and the like. The above image feature extraction method may be implemented by a convolutional neural network, for example: model scaled convolutional neural network effect-net can be implemented.

According to an embodiment of the present disclosure, the

predetermined feature map

311, 315 may be BEV feature maps (feature maps of Bird's Eye View Feature Map aerial view perspective) that are only gaussian-shaped and do not contain any features.

According to an embodiment of the present disclosure, the first parameter 316 characterizes an inner parameter and an outer parameter of the vision sensor. Intrinsic parameters of the vision sensor characterize parameters related to the characteristics of the vision sensor itself, such as: focal length, pixels, etc. External parameters of the vision sensor characterize parameters related to the acquisition environment, such as: mounting position, rotation direction, etc. of the vision sensor.

According to an embodiment of the present disclosure, the image feature map 314 characterizes a BEV feature map of the image features 312 containing image data. The image feature map 314 may be obtained by learning the image features 312 of the image data using the predetermined feature map 311 based on a deep learning technique.

According to embodiments of the present disclosure, the point cloud features 316 of the point cloud data may characterize the location features of the point cloud data. The point cloud characteristics 316 of the point cloud data are obtained by processing the point cloud data through a point cloud characteristic extraction network. For example: the Point cloud feature extraction network may be a Point pilar network.

According to an embodiment of the present disclosure, the second parameter 317 characterizes an inner parameter and an outer parameter of the lidar. Internal parameters of the laser radar represent parameters related to the characteristics of the laser radar, such as: measuring distance, scanning frequency, measuring resolution, etc. External parameters of the lidar characterize parameters related to the acquisition environment, such as: mounting position, rotation direction, etc. of the lidar.

In accordance with an embodiment of the present disclosure, the point cloud feature map 318 characterizes BEV feature maps of point cloud features 316 containing point cloud data. The point cloud features 316 of the point cloud data may be learned using a predetermined feature map 315 based on a deep learning technique, resulting in a point cloud feature map 318.

According to an embodiment of the present disclosure, the image feature map 314 and the point cloud feature map 318 may be subjected to a fusion process by means of feature stitching or feature stacking, to obtain a fused feature map 319.

According to the embodiment of the disclosure, the predetermined feature map only containing random features subject to Gaussian distribution is facilitated, and the image features of the image data and the point cloud features of the point cloud data are respectively learned based on a deep learning technology to obtain the fusion feature map containing the image features and the point cloud features at the same time. The method has the advantages that the position characteristics and the attribute characteristics of the map elements can be obtained without the conversion step from the image data to the 2D plane view angle, and the map generation accuracy is improved.

According to an embodiment of the present disclosure, operation S220 may extract a first point feature sequence of the fusion feature map by means of object detection using Bounding Box (Bounding Box) or SME (Start-Middle-End).

For example: the target element of the fusion feature map extracted by using the bounding box may be a map element a, and the category of the map element a may be a lane line. The bounding box bounding the map element a may be a rectangle, and four vertex coordinates of the rectangle may be (x ₁ ，y ₁ )、(x ₂ ，y ₂ )、(x ₃ ，y ₃ )、(x ₄ ，y ₄ ). The first point feature sequence may be expressed as [ (x, x) ₁ Lane line), (y, y ₁ Lane line), (x, x ₂ Lane line), (y, y ₂ Lane line), (x, x ₃ Lane line), (y, y ₃ Lane line), (x, x ₄ Lane line), (y, y ₄ Lane line]。

According to an embodiment of the present disclosure, operation S220 may further extract the first point feature sequence of the fused feature map by means of semantic segmentation. Comprising the following operations:

carrying out semantic segmentation processing on the fusion feature map to obtain a semantic feature map; performing example segmentation processing on the semantic feature map to obtain an example feature map; and extracting a first point feature sequence from the example feature map.

According to the embodiment of the disclosure, before the semantic segmentation processing is performed on the fused feature map, the fused feature map may be converted into a feature map that may be used for the semantic segmentation processing by a method of interpolating, complementing points or expanding the fused feature map.

According to embodiments of the present disclosure, the image semantic segmentation model may be used to perform semantic segmentation processing on the fused feature map, for example: the fully convolutional neural network model FCN. The semantic feature map may be instance segmented using a real-time instance segmentation model, such as: the real-time instance segmentation model sparseist of the full convolution.

According to an embodiment of the present disclosure, extracting a first point feature sequence from an example feature map may include the operations of:

and processing the example feature map to obtain a discrete point set, wherein the discrete point set comprises a plurality of discrete points and feature information of the plurality of discrete points. Extracting a target discrete point sequence from the plurality of discrete points according to the characteristic information of the plurality of discrete points, wherein the target discrete point sequence comprises the plurality of target discrete points and the characteristic information of the plurality of target discrete points. And generating a first point feature sequence according to the feature information of the plurality of target discrete points.

According to an embodiment of the present disclosure, processing an example feature map to obtain a set of discrete points may include the operations of:

carrying out corrosion treatment on the example feature map to obtain a sparse example feature map; and carrying out pooling treatment on the sparse instance feature map to obtain a discrete point set.

According to the embodiment of the disclosure, the pixel characteristics in the example feature map can be subjected to sparse processing by using an image processing algorithm, so as to obtain a sparse example feature map. In the sparse example feature map, the pixel width of each map element may reach a predetermined threshold, such as: the predetermined threshold value of the pixel width of the map element belonging to the lane line may be 5.

According to embodiments of the present disclosure, pooling of sparse instance feature graphs may include either or both of a maximum pooling operation and an average pooling operation to obtain a set of discrete points.

According to an embodiment of the present disclosure, extracting a target discrete point sequence from a plurality of discrete points according to characteristic information of the plurality of discrete points includes the operations of:

and determining a discrete point ordering strategy according to the characteristic information of the plurality of discrete points. And forward ordering the plurality of discrete points according to a discrete point ordering strategy to obtain a first discrete point sequence. A target discrete point sequence is extracted from the first discrete point sequence based on the selection policy.

According to an embodiment of the present disclosure, the feature information of the plurality of discrete points characterizes a coordinate fluctuation range feature of the plurality of discrete points. For example: for all of the discrete points, the range of fluctuation of the abscissa of the discrete points may be 1 to 200, and the range of fluctuation of the ordinate of the discrete points may be 0.5 to 100. The determined ordering strategy of the discrete points can be ordering from small to large according to the abscissa of the discrete points. And extracting partial discrete points from the first discrete point sequence based on the selection strategy to serve as a target discrete point sequence.

According to an embodiment of the present disclosure, extracting a portion of discrete points from the first discrete point sequence as a target discrete point sequence based on a selection policy may include the following operations:

and carrying out sparse processing on the first discrete point sequence to obtain a first sparse point sequence. And according to the ordering strategy, reversely ordering the first sparse point sequence to obtain a second sparse point sequence. And performing sparse processing on the second sparse point sequence to obtain a target discrete point sequence.

Fig. 4 schematically illustrates a schematic diagram of generating a first point feature sequence according to an embodiment of the present disclosure.

As shown in fig. 4, in 400, the fusion feature map 421 is subjected to semantic segmentation processing, and a semantic feature map 422 is obtained. The semantic feature map 422 is subjected to an example segmentation process to obtain an example feature map 423. A set of discrete points 424 is extracted from the example feature map 423. Based on the ranking strategy, the set of discrete points 424 is processed to obtain a first sequence of discrete points 425. Based on the selection strategy, the first sequence of discrete points 425 is processed to obtain a first sequence of sparse points 426. Based on the above-mentioned sorting strategy, the first sparse point sequence 426 is processed to obtain a second sparse point sequence 427. Based on the selection strategy, the second sparse point sequence 427 is processed to obtain a target discrete point sequence 428. From the target discrete point sequence 428, a first point feature sequence 429 is derived.

According to an embodiment of the present disclosure, the first discrete point sequence includes I discrete points, I being an integer greater than 1, and a method of extracting the target discrete point sequence from the first discrete point sequence based on the selection strategy is further described below with reference to fig. 5 and 6.

Fig. 5 schematically illustrates a flowchart of extracting a first sparse point sequence from a first discrete point sequence, according to an embodiment of the disclosure.

As shown in fig. 5, the method 500 includes operations S5510-S5540.

In operation S5510, k discrete points are deleted for the I-th discrete point of the I discrete points, resulting in a second discrete point sequence.

In operation S5520, the j-th discrete point is extracted from the second discrete point sequence as a sparse point, and i is incremented.

In operation S5530, it is determined whether I is less than I. If yes, operation S5540 is performed. If not, the operation S5510 is returned to be executed.

In operation S5540, a first sparse point sequence is obtained.

According to an embodiment of the present disclosure, the first discrete point sequence is subjected to I-round processing to obtain a first sparse point sequence.

According to an embodiment of the present disclosure, the pixel distance between each of the k discrete points and the i-th discrete point satisfies a preset threshold range. For example: the preset threshold range may include two: 3-5 pixels apart, greater than 10 pixels apart.

For example: the first discrete point sequence may include discrete point a ₁ Discrete point A ₂ … discrete points A _i … discrete points A _I . During the 1 st round of processing of the first discrete point sequence, for the 1 st discrete point a ₁ Delete and 1 st discrete point A ₁ Discrete points meeting a preset threshold range between the first discrete points and the second discrete points are obtained to obtain a second discrete point sequence discrete point A ₁ Discrete point A ₂ …, discrete points At. From the second sequence of discrete points, the discrete point A can be selected ₁ Discrete point A with closest pixel distance _t As the 1 st sparse point in the first sparse point sequence. And sequentially incrementing I until I is equal to I, so as to obtain a first sparse point sequence, for example: q discrete points may be included in the first sparse point sequence.

Operations S5510 to S5540 may be performed by an electronic device according to an embodiment of the present disclosure. The electronic device may comprise a server or a terminal device. The server may be the server 103 in fig. 1. The vehicle terminal device may be the vehicle terminal device 101 in fig. 1.

Fig. 6 schematically illustrates a flowchart of extracting a target discrete point sequence from a first sparse point sequence according to an embodiment of the present disclosure.

As shown in FIG. 6, the method includes operations S6610-S6650.

In operation S6610, according to the sorting strategy, the first sparse point sequence is reversely sorted to obtain the second sparse point sequence;

in operation S6620, deleting h discrete points for the Q-th discrete point in the Q discrete points to obtain a third sparse point sequence;

in operation S6630, extracting the p-th discrete point from the third sparse point sequence as a sparse point, and incrementing q;

in operation S6640, it is determined whether Q is smaller than Q. If yes, operation S6650 is performed. If not, the range performs operation S6610.

In operation S6650, a target discrete point sequence is obtained.

According to embodiments of the present disclosure, the ordering strategy may be to order from small to large in the discrete point abscissa. The first sparse point sequence is reversely ordered according to an ordering strategy, and the first sparse point sequence can be ordered from large to small according to the abscissa of Q discrete points, so that a second sparse point sequence is obtained.

According to an embodiment of the present disclosure, the pixel distance between each of the h discrete points and the q-th discrete point satisfies a preset threshold range. For example: the preset threshold range may include two: 3-5 pixels apart, greater than 10 pixels apart.

According to the embodiment of the present disclosure, the method for performing the sparse processing on the second sparse point sequence is the same as the method for performing the sparse processing on the first discrete point sequence to obtain the first sparse point sequence, which is not described herein.

According to the embodiment of the disclosure, the first sparse point sequence is reversely ordered according to the ordering strategy, and then sparse processing is performed to obtain the target discrete point sequence, so that the accuracy of the relative positions between the discrete points located at adjacent ordering positions in the target discrete point sequence is improved.

Operations S6610-S6650 may be performed by an electronic device according to an embodiment of the disclosure. The electronic device may comprise a server or a terminal device. The server may be the server 103 in fig. 1. The vehicle terminal device may be the vehicle terminal device 101 in fig. 1.

Fig. 7 schematically illustrates a flowchart of processing a first point feature sequence and a fused feature map to obtain target element information based on an attention policy according to an embodiment of the present disclosure.

As shown in fig. 7, the method 700 includes operations S7310 to S7340.

In operation S7310, the first point feature sequence and the fused feature map are processed based on the attention policy to obtain target feature information of the m+n-th point.

In operation S7320, generating a second point feature sequence according to the target feature information of the (m+n) th point and the first point feature sequence, and incrementing n;

in operation S7330, it is determined whether i is less than N. If yes, operation S7340 is performed. If not, the operation S7310 is returned to be performed.

In operation S7340, target element information is generated from the target feature information of m+n points.

According to embodiments of the present disclosure, an attention policy may be used to achieve focusing of important information with high weight, ignoring non-important information with low weight, and enabling information exchange with other information by sharing important information, thereby achieving transfer of important information. In the embodiment of the disclosure, the attention policy can extract the first point feature sequence itself, the inside of the fusion feature map, and the information between the first point feature sequence and the fusion feature map, so as to obtain the target feature information of (m+n) points according to the feature information of M points in the first point feature sequence and the feature information in the fusion feature map.

According to an embodiment of the present disclosure, the target feature information of the m+n-th point is obtained by processing the first point feature sequence and the fusion feature map based on the attention policy. Therefore, the target feature information of the M+n points participates in a global attention mechanism, and is coupled with the global information of the fusion feature map, so that the accuracy of the target feature information is improved.

According to embodiments of the present disclosure, attention policies may include self-attention mechanisms and interactive attention mechanisms. Operation S7310 may include the following operations:

And processing the first point characteristic sequence based on a self-attention mechanism to obtain first characteristic information of the (M+n) th point. And generating a third point characteristic sequence according to the first point characteristic sequence and the first characteristic information of the M+n points. And processing the third point feature sequence and the fusion feature map based on the interaction attention mechanism to obtain the target feature information of the M+n feature points.

According to an embodiment of the present disclosure, the first feature information of the m+n-th point is obtained by processing the first point feature sequence based on a self-attention mechanism. Thus, the first feature information of the m+nth point is obtained based on interactions of the position features and the attribute features of each feature point with other feature points in the first point feature sequence.

For example: the first point feature sequence may include a first point feature sequence of M points, where the coordinate arrangement of the M points shows a certain trend, and based on the self-attention mechanism, the processing of the first point feature sequence may predict the position feature and the attribute feature of the (m+n) th point according to the trend of the coordinate arrangement.

According to an embodiment of the present disclosure, the third point feature sequence includes both the initial first point feature sequence and the first feature information of the predicted point (m+n-th point). Based on the interaction attention mechanism, the feature information of the feature points in the third point feature sequence and the feature information of the feature points in the fusion feature map are interacted to obtain the target feature information of the predicted point, the feature information of the fusion feature map is coupled, and the accuracy of the target feature information of the predicted point is improved.

Operations S7310 to S7340 may be performed by an electronic apparatus according to an embodiment of the present disclosure. The electronic device may comprise a server or a terminal device. The server may be the server 103 in fig. 1. The vehicle terminal device may be the vehicle terminal device 101 in fig. 1.

Fig. 8 schematically illustrates a schematic diagram of processing a first point feature sequence and a fused feature map to obtain target feature information of an m+n point based on an attention policy according to an embodiment of the present disclosure.

As shown in fig. 8, in 800, the first point feature sequence 429 includes feature information of M points, which are feature information 429_1 of the 1 st point, and feature information 429_m of the M-th point. And obtaining the target characteristic information of the M+n points through n rounds of processing.

For example: during round 1 of the process. The first feature information 801_1 of the (m+1) -th point can be obtained by processing the feature information of the M points in the first point feature sequence 429 based on the self-attention mechanism. From the first feature sequence 429 and the first feature information 801_1 of the m+1th point, a third point feature sequence 802 is obtained. The third point feature sequence 802 includes feature information (802_1, …, 802_m) of the original M points in the first point feature sequence and first feature information 802_m+1 of the newly generated m+1st point. And processing the third point feature sequence 802 and the fusion feature map 319 based on the interaction attention mechanism to obtain the target feature information of the (M+1) th point.

During round 2 of treatment. The target feature information of the (m+1) -th point and the first point feature sequence 429 may be processed based on the self-attention mechanism to obtain the first feature information of the (m+2) -th point. And generates a third point feature sequence 802 from the first point feature sequence 429, the target feature information of the m+1th point, and the first feature information of the m+2th point. At this time, the third point feature sequence 802 includes feature information of the original M points in the first point feature sequence, the target feature information of the (m+1) th point and the first feature information of the (m+2) th point obtained by the 1 st round of processing. And processing the third point feature sequence 429 and the fusion feature map 319 based on the interaction attention mechanism to obtain the target feature information of the (M+2) th point. And the method is analogically performed until N rounds of processing are performed, so that the target characteristic information of M+N points can be obtained.

Fig. 9 schematically illustrates a schematic diagram of a map generation method according to an embodiment of the present disclosure.

As shown in fig. 9, in 900, an original image 901 is subjected to feature extraction to obtain an image feature 902, and a predetermined feature map is subjected to learning of the image feature 902 to obtain a fused feature map. The raw image 901 may include an RGB image acquired by a vision sensor and a point cloud image acquired by a lidar. Image features 902 may include features of an RGB image and features of a point cloud image.

The fused feature map 903 is processed by means of object detection or semantic segmentation to obtain a set of discrete points 904. The discrete point set obtained by the target detection method in the embodiment of the present disclosure may not need to perform sparse processing, so as to obtain the first point feature sequence 905. The discrete point set obtained by the semantic segmentation method needs to be subjected to sparse processing, so as to obtain a first point feature sequence 905.

The first point feature sequence 905 is input into a trained deep learning model, the target elements 906 are output, and a map 907 is generated from the target map elements 906.

For example: first point feature sequence 905 (x ₁ ，y ₁ ，x ₂ ，y ₂ 、…、x _n ，y _n ) In the trained deep learning model, the obtained point feature sequence can be (x) after the 1 st round of processing ₁ ，y ₁ ，x ₂ ，y ₂ 、…、x _n ，y _n ，x^ ₁ ，y^ ₁ ). And then carrying out the 2 nd round of processing, wherein the obtained point characteristic sequence can be (x) ₁ ，y ₁ ，x ₂ ，y ₂ 、…、x _n ，y _n ，x^ ₁ ，y^ ₁ ，x^ ₂ ，y^ ₂ ). Until the deep learning model outputs a stop symbol, where n=n, the target element information (x≡ ₁ ，y^ ₁ ，x^ ₂ ，y^ ₂ 、...、x^ _N ，y^ _N )。

The feature points in the first point feature sequence include not only the position feature but also the attribute feature. Since the attribute features of the first point feature sequence obtained for the same instance feature map are generally the same during the instance segmentation, the attribute features are not labeled in the above examples.

Fig. 10 schematically illustrates a training method flowchart of a deep learning model according to an embodiment of the present disclosure.

As shown in fig. 10, the method 1000 includes operations S1010 to S1020.

In operation S1010, a first sample point feature sequence and a second sample point feature sequence are obtained from a sample image.

In operation S1020, a deep learning model is trained using the first sample point feature sequence and the second sample point feature sequence, resulting in a trained deep learning model.

According to an embodiment of the disclosure, the first sample point feature sequence includes S first sample points, the second sample point feature sequence includes S second sample points, S is an integer greater than 1, the sample image characterizes a fusion feature map generated according to the image data and the point cloud data, the S second sample points of the second sample point feature sequence are sample labels of the S first sample points in the first sample point feature sequence, and S is an integer greater than or equal to 1 and less than or equal to S.

According to an embodiment of the present disclosure, operation S1020 may include the following operations:

and processing the first sample point feature sequence and the second sample point feature sequence based on the attention strategy to obtain a third sample point feature sequence, wherein the third sample point feature sequence comprises S third sample points. And generating an s-th loss value according to the s-th third sample point of the third sample point characteristic sequence and the s-th second sample point of the second sample point characteristic sequence based on the loss function. And adjusting model parameters of the deep learning model according to the loss values corresponding to the S third sample points to obtain a trained deep learning model.

FIG. 11 schematically illustrates an exemplary architecture diagram of a deep learning model in accordance with an embodiment of the present disclosure.

As shown in fig. 11, the deep learning model 1100 includes a self-attention mechanism based processing module 1101, an interactive attention mechanism based processing module 1102, and a feed forward neural network 1103.

According to an embodiment of the present disclosure, the deep learning model 1100 may be Transformer Decoder. The first sample point feature sequence is processed by the self-attention mechanism based processing module 1101 to complete interaction of self-feature information of the first sample point feature sequence. And processing the first sample point feature sequence and the second sample point feature sequence by using a processing module 1102 based on an interaction attention mechanism so as to realize interaction between feature information of the first sample point feature sequence and feature information of the second sample point feature sequence. And performs feature mapping using the feedforward neural network 1103 to obtain a third sequence of sample points.

According to an embodiment of the present disclosure, the self-attention mechanism based processing module 1101 may be constructed according to the following equation (1):

wherein X represents a feature matrix of the first sample point feature sequence; x is X ^T Represents a transpose of X; d, d _k Representing the dimension of the point feature.

According to an embodiment of the present disclosure, the interactive attention mechanism based processing module 1102 may be constructed according to the following equation (2):

wherein Q represents a feature matrix obtained by processing the first sample point feature sequence by the processing module 1101 based on the self-attention mechanism; K. v represents a feature matrix obtained by the feature sequence of the second sample point through different mapping functions; d, d _k Dimension representing point features _。

According to an embodiment of the present disclosure, the feedforward neural network 1103 may be constructed according to any mapping function that expands the dimension of the point feature first and then contracts, which is not specifically limited herein.

Fig. 12 schematically illustrates a schematic diagram of a training method of a deep learning model according to an embodiment of the present disclosure.

As shown in fig. 12, in 1200, a first sample point feature sequence 1221 and a second sample point feature sequence 1222 are input together into the deep learning model 1201. The second sample point feature sequence is a sample tag of the first sample point feature sequence. The first sample point feature sequence 1221 and the second sample point feature sequence 1222 are processed through the deep learning model 1201 to obtain a third sample point feature sequence. The third sample point feature sequence 1223 is in one-to-one correspondence with the second sample point feature sequence 1222, resulting in a loss value 1224 based on the loss function. And determines whether the loss value 1224 converges, and if the loss value 1224 does not converge, updates the model parameters. The deep learning model 1201 is adjusted using the updated model parameters 1225 and the deep learning model 1201 is retrained until the loss values 1224 converge to obtain a trained deep learning model.

According to an embodiment of the disclosure, in a training process of the deep learning model, each sample point of the third sample point feature sequence has a corresponding relationship with the first sample point feature sequence and the second sample point feature sequence.

For example: may be in the first sample point feature sequence 1221To include the sample point M ₁ Features of (1), sample points M ₂ Is a characteristic of … sample point M _n Is characterized by (3). Sample point M may be included in the second sample point feature sequence 1222 ₁ Tag features, sample points M ₂ Tag feature of … sample point M _n Is a label feature of (a).

According to an embodiment of the present disclosure, during the training of the deep learning model, the first sample point feature sequence 1221 is processed by the deep learning model 1201 to obtain the sample point T in the third sample point feature sequence ₁ Is characterized by (3). Sample points M in the first sample point feature sequence 1221 and the second sample point feature sequence 1222 ₁ The label features of the third sample point feature sequence are processed by the deep learning model 1201 to obtain a sample point T in the third sample point feature sequence ₂ Is characterized by (3). By analogy, when all sample point features of the first sample point feature sequence 1221 and the second sample point feature sequence 1222 are processed using the deep learning model, a stop symbol is output. The number of sample points in the third sample point feature sequence 1223 is the same as the number of sample points in the first sample point feature sequence 1221, the second sample point feature sequence 1222.

According to an embodiment of the present disclosure, deriving the loss value based on the loss function may be generating an average loss value from the loss values corresponding to the S third sample points. The loss values may be obtained by configuring different weights for different third sample points based on the loss values corresponding to the S third sample points and the weights corresponding to the third sample points.

According to the embodiment of the present disclosure, the loss function may be configured according to actual service requirements, which is not limited herein. For example, the loss function may include at least one of: cross entropy loss function, exponential loss function, and square loss function.

Fig. 13 schematically shows a block diagram of a map generating apparatus according to an embodiment of the present disclosure.

As shown in fig. 13, the map generating apparatus 1300 includes: a first generation module 1301, an extraction module 1302, a first obtaining module 1303 and a second generation module 1304.

The first generation module 1301 is configured to generate a fusion feature map according to the image data and the point cloud data.

An extraction module 1302 is configured to extract a first point feature sequence of the fused feature map.

The first obtaining module 1303 is configured to process the first point feature sequence and the fusion feature map based on the attention policy, so as to obtain target element information.

The second generation module 1304 is configured to generate a map according to the target element information.

According to an embodiment of the present disclosure, the first generation module 1301 includes a first generation sub-module, a second generation sub-module, and a third generation sub-module.

A first generation sub-module for generating an image feature map based on a predetermined feature map, the predetermined feature map comprising random features subject only to gaussian distribution, a first parameter characterizing a device parameter used to acquire the image data, and image features of the image data.

And the second generation submodule is used for generating a point cloud characteristic diagram according to the preset characteristic diagram, the point cloud characteristics of the point cloud data and second parameters, and the second parameters characterize equipment parameters for acquiring the point cloud data.

And the third generation sub-module is used for carrying out fusion processing on the image feature map and the point cloud feature map to generate a fusion feature map.

According to an embodiment of the present disclosure, the extraction module 1302 includes: the device comprises a first obtaining sub-module, a second obtaining sub-module and a third obtaining sub-module.

The first obtaining sub-module is used for carrying out semantic segmentation processing on the fusion feature images to obtain semantic feature images.

And the second obtaining submodule is used for carrying out example segmentation processing on the semantic feature images to obtain example feature images.

And a third obtaining sub-module for extracting the first point feature sequence from the example feature map.

According to an embodiment of the present disclosure, the third obtaining submodule includes: the device comprises a first obtaining unit, an extracting unit and a first generating unit.

The first obtaining unit is used for processing the example feature map to obtain a discrete point set, wherein the discrete point set comprises a plurality of discrete points and feature information of the plurality of discrete points.

And the extraction unit is used for extracting a target discrete point sequence from the plurality of discrete points according to the characteristic information of the plurality of discrete points, wherein the target discrete point sequence comprises the plurality of target discrete points and the characteristic information of the plurality of target discrete points.

The first generation unit is used for generating a first point characteristic sequence according to the characteristic information of the plurality of target discrete points.

According to an embodiment of the present disclosure, the first obtaining unit includes a first obtaining subunit and a second obtaining subunit.

And the first obtaining subunit is used for carrying out corrosion treatment on the example feature map to obtain a sparse example feature map.

And the second obtaining subunit is used for carrying out pooling treatment on the sparse instance feature map to obtain a discrete point set.

According to an embodiment of the present disclosure, the first extraction unit includes a first determination subunit, a third acquisition subunit, and an extraction subunit.

And the first determination subunit is used for determining a discrete point ordering strategy according to the characteristic information of the plurality of discrete points.

And the third obtaining subunit is used for forward ordering the plurality of discrete points according to the discrete point ordering strategy to obtain a first discrete point sequence.

An extraction subunit for extracting a target discrete point sequence from the first discrete point sequence based on the selection policy.

According to an embodiment of the disclosure, the extraction subunit is configured to perform sparse processing on the first discrete point sequence to obtain a first sparse point sequence. And according to the ordering strategy, reversely ordering the first sparse point sequence to obtain a second sparse point sequence. And carrying out sparse processing on the second sparse point sequence to obtain a target discrete point sequence.

According to an embodiment of the present disclosure, the first discrete point sequence comprises I discrete points, I being an integer greater than 1; performing sparse processing on the first discrete point sequence to obtain a first sparse point sequence, wherein the sparse processing comprises the following steps: and deleting k discrete points aiming at the ith discrete point of the I discrete points to obtain a second discrete point sequence, wherein the pixel distance between each discrete point of the k discrete points and the ith discrete point meets a preset threshold range, I is more than or equal to 1 and less than or equal to I, and k is an integer more than or equal to 1. And extracting the j-th discrete point from the second discrete point sequence as a sparse point, and increasing i, wherein j is an integer greater than or equal to 1. Returning to the operation of deleting k discrete points if it is determined that I is less than I; in case it is determined that I is equal to I, a first sparse point sequence is obtained.

According to an embodiment of the present disclosure, the first point feature sequence includes feature information of M points, M is an integer greater than 1, the target element information includes feature information of m+n points, and N is an integer greater than 1. The first obtaining module comprises a fourth obtaining sub-module, a fourth generating sub-module and a fifth generating sub-module.

And a fourth obtaining submodule, configured to process the first point feature sequence and the fusion feature map based on the attention policy to obtain target feature information of an mth+n point, where N is an integer greater than or equal to 1 and less than or equal to N.

And the fourth generation sub-module is used for generating a second point characteristic sequence according to the target characteristic information of the M+n points and the first point characteristic sequence, and increasing n.

A fifth generation sub-module, configured to, if N is determined to be less than N, perform a processing operation based on an attention policy back for the second point feature sequence and the fusion feature map; in the case where N is determined to be equal to N, target element information is generated from the target feature information of m+n points.

According to an embodiment of the present disclosure, the fourth obtaining sub-module includes a second obtaining unit, a second generating unit, and a third obtaining unit.

And the second obtaining unit is used for processing the first point characteristic sequence based on the self-attention mechanism to obtain the first characteristic information of the (M+n) th point.

And the second generation unit is used for generating a third point characteristic sequence according to the first point characteristic sequence and the first characteristic information of the M+n points.

And the third obtaining unit is used for processing the third point feature sequence and the fusion feature map based on the interaction attention mechanism to obtain the target feature information of the M+n feature points.

Fig. 14 schematically illustrates a block diagram of a deep learning model training apparatus according to an embodiment of the present disclosure.

As shown in fig. 14, the deep learning model training apparatus 1400 includes a second obtaining module 1401 and a third obtaining module 4102.

A second obtaining module 1401, configured to obtain a first sample point feature sequence and a second sample point feature sequence according to a sample image, where the first sample point feature sequence includes S first sample points, the second sample point feature sequence includes S second sample points, S is an integer greater than 1, the sample image represents a fusion feature map generated according to image data and point cloud data, S second sample points of the second sample point feature sequence are sample labels of S first sample points in the first sample point feature sequence, and S is an integer greater than or equal to 1 and less than or equal to S; and

a third obtaining module 1402 is configured to train the deep learning model using the first sample point feature sequence and the second sample point feature sequence to obtain a trained deep learning model.

According to an embodiment of the present disclosure, the third obtaining module 1402 includes a fifth obtaining sub-module, a sixth obtaining sub-module, and a seventh obtaining sub-module.

And a fifth obtaining sub-module, configured to process the first sample point feature sequence and the second sample point feature sequence based on the attention policy, to obtain a third sample point feature sequence, where the third sample point feature sequence includes S third sample points.

And a sixth obtaining submodule, configured to generate an s-th loss value according to the s-th third sample point of the third sample point feature sequence and the s-th second sample point of the second sample point feature sequence based on the loss function.

And a seventh obtaining sub-module, configured to adjust model parameters of the deep learning model according to the loss values corresponding to the S third sample points, to obtain a trained deep learning model.

According to an embodiment of the present disclosure, the seventh obtaining submodule includes a third generating unit and an adjusting unit.

And a third generation unit for generating an average loss value according to the loss values corresponding to the S third sample points.

And the adjusting unit is used for adjusting the model parameters of the deep learning model according to the average loss value.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the method as described above.

According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method as described above.

According to an embodiment of the present disclosure, a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.

Fig. 15 illustrates a schematic block diagram of an example electronic device 1500 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 15, the apparatus 1500 includes a computing unit 1501, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1502 or a computer program loaded from a storage unit 1508 into a Random Access Memory (RAM) 1503. In the RAM1503, various programs and data required for the operation of the device 1500 may also be stored. The computing unit 1501, the ROM 1502, and the RAM1503 are connected to each other through a bus 1504. An input/output (I/O) interface 1505 is also connected to bus 1504.

Various components in device 1500 are connected to I/O interface 1505, including: an input unit 1506 such as a keyboard, mouse, etc.; an output unit 1507 such as various types of displays, speakers, and the like; a storage unit 1508 such as a magnetic disk, an optical disk, or the like; and a communication unit 1509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 1509 allows the device 1500 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.

The computing unit 1501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The computing unit 1501 performs the respective methods and processes described above, such as a map generation method or a training method of a deep learning model. For example, in some embodiments, the map generation method or training method of the deep learning model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 1500 via the ROM 1502 and/or the communication unit 1509. When the computer program is loaded into the RAM1503 and executed by the computing unit 1501, one or more steps of the map generation method or the training method of the deep learning model described above may be performed. Alternatively, in other embodiments, the computing unit 1501 may be configured to perform the map generation method or the training method of the deep learning model by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A map generation method, comprising:

generating a fusion feature map according to the image data and the point cloud data;

extracting a first point feature sequence in the fusion feature map;

processing the first point feature sequence and the fusion feature map based on an attention strategy to obtain target element information; and

and generating a map according to the target element information.

2. The method of claim 1, wherein the generating a fused feature map from the image data and the point cloud data comprises:

Generating an image feature map from a predetermined feature map, a first parameter and image features of the image data, the predetermined feature map comprising random features subject only to gaussian distribution, the first parameter characterizing device parameters used to acquire the image data;

generating a point cloud feature map according to the predetermined feature map, the point cloud features of the point cloud data and a second parameter, wherein the second parameter characterizes equipment parameters for acquiring the point cloud data; and

and carrying out fusion processing on the image feature map and the point cloud feature map to generate the fusion feature map.

3. The method of claim 1, wherein the extracting a first point feature sequence in the fused feature map comprises:

carrying out semantic segmentation processing on the fusion feature map to obtain a semantic feature map;

performing instance segmentation processing on the semantic feature map to obtain an instance feature map; and

the first point feature sequence is extracted from the example feature map.

4. A method according to claim 3, wherein said extracting said first point feature sequence from said example feature map comprises:

processing the example feature map to obtain a discrete point set, wherein the discrete point set comprises a plurality of discrete points and feature information of the plurality of discrete points;

Extracting a target discrete point sequence from a plurality of discrete points according to the characteristic information of the plurality of discrete points, wherein the target discrete point sequence comprises a plurality of target discrete points and the characteristic information of the plurality of target discrete points; and

and generating a first point characteristic sequence according to the characteristic information of the target discrete points.

5. The method of claim 4, wherein the processing the example feature map to obtain a set of discrete points comprises:

carrying out corrosion treatment on the example feature map to obtain a sparse example feature map; and

and carrying out pooling treatment on the sparse example feature map to obtain the discrete point set.

6. The method of claim 4, wherein extracting a target discrete point sequence from a plurality of the discrete points according to the characteristic information of the discrete points comprises:

determining a discrete point ordering strategy according to the characteristic information of the plurality of discrete points;

forward ordering a plurality of discrete points according to the discrete point ordering strategy to obtain a first discrete point sequence; and

a target discrete point sequence is extracted from the first discrete point sequence based on a selection policy.

7. The method of claim 6, wherein the extracting the target discrete point sequence from the first discrete point sequence based on the selection policy comprises:

Performing sparse processing on the first discrete point sequence to obtain a first sparse point sequence;

according to the sorting strategy, the first sparse point sequence is reversely sorted to obtain a second sparse point sequence; and

and carrying out the sparse processing on the second sparse point sequence to obtain the target discrete point sequence.

8. The method of claim 7, wherein the first sequence of discrete points comprises I discrete points, I being an integer greater than 1; the sparse processing is performed on the first discrete point sequence to obtain a first sparse point sequence, which comprises the following steps:

deleting k discrete points aiming at the ith discrete point in the I discrete points to obtain a second discrete point sequence, wherein the pixel distance between each discrete point in the k discrete points and the ith discrete point meets a preset threshold range, I is greater than or equal to 1 and less than or equal to I, and k is an integer greater than or equal to 1;

extracting a j-th discrete point from the second discrete point sequence as a sparse point, and increasing i, wherein j is an integer greater than or equal to 1; and

returning to the operation of deleting k discrete points in the case that I is determined to be smaller than I; and under the condition that I is determined to be equal to I, obtaining the first sparse point sequence.

9. The method of claim 1, wherein the first point feature sequence includes feature information of M points, M is an integer greater than 1, the target element information includes feature information of m+n points, N is an integer greater than 1; the processing the first point feature sequence and the fusion feature map based on the attention policy to obtain target element information includes:

processing the first point feature sequence and the fusion feature map based on the attention strategy to obtain target feature information of an M+n point, wherein N is an integer greater than or equal to 1 and less than or equal to N;

generating a second point feature sequence according to the target feature information of the M+n points and the first point feature sequence, and increasing n;

in the case where N is determined to be less than N, performing processing operations based on the attention policy back for the second point feature sequence and the fused feature map; and

and under the condition that N is equal to N, generating the target element information according to the target characteristic information of M+N points.

10. The method of claim 9, wherein the processing the first point feature sequence and the fused feature map based on the attention policy to obtain target feature information of an m+n point includes:

Processing the first point characteristic sequence based on a self-attention mechanism to obtain first characteristic information of the (M+n) th point;

generating a third point feature sequence according to the first point feature sequence and the first feature information of the M+n points; and

and processing the third point feature sequence and the fusion feature map based on an interaction attention mechanism to obtain target feature information of the M+n feature points.

11. A training method of a deep learning model, comprising:

obtaining a first sample point feature sequence and a second sample point feature sequence according to a sample image, wherein the first sample point feature sequence comprises S first sample points, the second sample point feature sequence comprises S second sample points, S is an integer greater than 1, the sample image represents a fusion feature map generated according to image data and point cloud data, the S second sample points in the second sample point feature sequence are sample labels of the S first sample points in the first sample point feature sequence, and S is an integer greater than or equal to 1 and less than or equal to S; and

and training a deep learning model by using the first sample point characteristic sequence and the second sample point characteristic sequence to obtain a trained deep learning model.

12. The method of claim 11, wherein the training a deep learning model using the first sample point feature sequence and the second sample point feature sequence results in a trained deep learning model, comprising:

processing the first sample point feature sequence and the second sample point feature sequence based on an attention strategy to obtain a third sample point feature sequence, wherein the third sample point feature sequence comprises S third sample points;

generating an s-th loss value according to an s-th third sample point in the third sample point feature sequence and an s-th second sample point in the second sample point feature sequence based on a loss function; and

and adjusting model parameters of the deep learning model according to the loss values corresponding to the S third sample points to obtain the trained deep learning model.

13. The method of claim 12, wherein the adjusting model parameters of the deep learning model according to the loss values corresponding to the S third sample points comprises:

generating an average loss value according to the loss values corresponding to the S third sample points; and

and adjusting model parameters of the deep learning model according to the average loss value.

14. A map generation apparatus comprising:

the first generation module is used for generating a fusion feature map according to the image data and the point cloud data;

the extraction module is used for extracting a first point feature sequence of the fusion feature map;

the first obtaining module is used for processing the first point feature sequence and the fusion feature map based on the attention strategy to obtain target element information; and

and the second generation module is used for generating a map according to the target element information.

15. The apparatus of claim 14, wherein the first generation module comprises:

a first generation sub-module for generating an image feature map from a predetermined feature map, a first parameter, and image features of the image data, the predetermined feature map comprising random features subject only to gaussian distribution, the first parameter characterizing device parameters for acquiring the image data;

the second generation sub-module is used for generating a point cloud characteristic diagram according to the preset characteristic diagram, the point cloud characteristics of the point cloud data and second parameters, and the second parameters characterize equipment parameters for acquiring the point cloud data; and

and the third generation sub-module is used for carrying out fusion processing on the image feature map and the point cloud feature map to generate the fusion feature map.

16. The apparatus of claim 14, wherein the extraction module comprises:

the first obtaining submodule is used for carrying out semantic segmentation processing on the fusion feature images to obtain semantic feature images;

the second obtaining submodule is used for carrying out example segmentation processing on the semantic feature images to obtain example feature images; and

and a third obtaining sub-module, configured to extract the first point feature sequence from the example feature map.

17. The apparatus of claim 16, the third obtaining submodule comprising:

the first obtaining unit is used for processing the example feature map to obtain a discrete point set, wherein the discrete point set comprises a plurality of discrete points and feature information of the plurality of discrete points;

an extracting unit, configured to extract a target discrete point sequence from a plurality of discrete points according to feature information of the plurality of discrete points, where the target discrete point sequence includes a plurality of target discrete points and feature information of the plurality of target discrete points; and

the first generation unit is used for generating a first point characteristic sequence according to the characteristic information of the target discrete points.

18. The apparatus of claim 16, the first obtaining unit comprising:

The first obtaining subunit is used for carrying out corrosion treatment on the example feature map to obtain a sparse example feature map; and

and the second obtaining subunit is used for carrying out pooling treatment on the sparse instance feature map to obtain the discrete point set.

19. The apparatus of claim 16, the first extraction unit comprising:

a first determining subunit, configured to determine a discrete point ordering policy according to the feature information of the plurality of discrete points;

the third obtaining subunit is configured to forward sort the plurality of discrete points according to the discrete point sorting strategy to obtain a first discrete point sequence; and

an extraction subunit, configured to extract a target discrete point sequence from the first discrete point sequence based on a selection policy.

20. The apparatus of claim 17, the extraction subunit to:

21. The device of claim 20, the first sequence of discrete points comprising 1 discrete point, I being an integer greater than 1; the sparse processing is performed on the first discrete point sequence to obtain a first sparse point sequence, which comprises the following steps:

Deleting k discrete points aiming at the ith discrete point of the I discrete points to obtain a second discrete point sequence, wherein the pixel distance between each discrete point of the k discrete points and the ith discrete point meets a preset threshold range, I is greater than or equal to 1 and less than or equal to I, and k is an integer greater than or equal to 1;

22. The apparatus of claim 14, the first point feature sequence comprising feature information of M points, M being an integer greater than 1, the target element information comprising feature information of m+n points, N being an integer greater than 1; the first obtaining module includes:

a fourth obtaining sub-module, configured to process the first point feature sequence and the fusion feature map based on the attention policy, to obtain target feature information of an m+n point, where N is an integer greater than or equal to 1 and less than or equal to N;

A fourth generation sub-module, configured to generate a second point feature sequence according to the target feature information of the (m+n) th point and the first point feature sequence, and increment n;

a fifth generation sub-module, configured to, if N is determined to be less than N, perform processing operations based on the attention policy back for the second point feature sequence and the fusion feature map; and under the condition that N is equal to N, generating the target element information according to the target characteristic information of M+N points.

23. The apparatus of claim 22, wherein the fourth obtaining submodule comprises:

the second obtaining unit is used for processing the first point characteristic sequence based on a self-attention mechanism to obtain first characteristic information of the (M+n) th point;

the second generating unit is used for generating a third point characteristic sequence according to the first point characteristic sequence and the first characteristic information of the M+n points; and

and the third obtaining unit is used for processing the third point feature sequence and the fusion feature map based on an interaction attention mechanism to obtain the target feature information of the M+n feature points.

24. A training device for a deep learning model, comprising:

The second obtaining module is used for obtaining a first sample point feature sequence and a second sample point feature sequence according to a sample image, wherein the first sample point feature sequence comprises S first sample points, the second sample point feature sequence comprises S second sample points, S is an integer greater than 1, the sample image represents a fusion feature map generated according to image data and point cloud data, the S second sample points of the second sample point feature sequence are sample labels of the S first sample points in the first sample point feature sequence, and S is an integer greater than or equal to 1 and less than or equal to S; and

and the third obtaining module is used for training the deep learning model by utilizing the first sample point characteristic sequence and the second sample point characteristic sequence to obtain a trained deep learning model.

25. The apparatus of claim 24, wherein the third obtaining means comprises:

a fifth obtaining sub-module, configured to process the first sample point feature sequence and the second sample point feature sequence based on an attention policy, to obtain a third sample point feature sequence, where the third sample point feature sequence includes S third sample points;

A sixth obtaining submodule, configured to generate an s-th loss value according to an s-th third sample point of the third sample point feature sequence and an s-th second sample point of the second sample point feature sequence based on a loss function; and

and a seventh obtaining sub-module, configured to adjust model parameters of the deep learning model according to the loss values corresponding to the S third sample points, so as to obtain the trained deep learning model.

26. The apparatus of claim 25, wherein the seventh obtaining submodule comprises:

a third generating unit, configured to generate an average loss value according to the loss values corresponding to the S third sample points; and

27. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10 or 11-13.

28. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-10 or 11-13.

29. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-10 or 11-13.