CN114674328B

CN114674328B - Map generation method, map generation device, electronic device, storage medium, and vehicle

Info

Publication number: CN114674328B
Application number: CN202210352745.9A
Authority: CN
Inventors: 陈术义; 王玉斌; 湛波; 曾清喻; 李友浩; 常松涛
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-03-31
Filing date: 2022-03-31
Publication date: 2023-04-18
Anticipated expiration: 2042-03-31
Also published as: CN114674328A; CN116295466A

Abstract

The disclosure provides a map generation method, a map generation device, electronic equipment, a storage medium, a program product and an automatic driving vehicle, and relates to the technical field of computer vision, in particular to the technical field of automatic driving. The specific implementation scheme is as follows: determining a local feature point descriptor subset of each of the plurality of images; for each image in the plurality of images, carrying out aggregation processing on a local feature point descriptor set of the image, and determining a global descriptor of each image; calculating the global matching degree between any two images in the plurality of images based on the respective global descriptors of the plurality of images to obtain a plurality of global matching degrees; and determining a plurality of target images from the plurality of images based on the plurality of global matching degrees, wherein the plurality of target images are starting images for generating the target map.

Description

Map generation method, map generation device, electronic device, storage medium, and vehicle

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to the field of autonomous driving technologies, and more particularly, to a map generation method, apparatus, electronic device, storage medium, program product, and autonomous driving vehicle.

Background

The automatic driving technology based on the high-precision map provides an important role for realizing safe and reliable automatic driving of the automatic driving vehicle. The high-precision map plays an important role in high-precision positioning, environment perception, path planning and simulation experiments. The generation technology of high-precision maps with high precision requirements and wide detection range is increasingly emphasized.

Disclosure of Invention

The present disclosure provides a map generation method, apparatus, electronic device, storage medium, program product, and autonomous vehicle.

According to an aspect of the present disclosure, there is provided a map generation method including: determining a local feature point descriptor subset of each of the plurality of images; for each image in the plurality of images, carrying out aggregation processing on a local feature point descriptor set of the image, and determining a global descriptor of each image; calculating the global matching degree between any two images in the plurality of images based on the respective global descriptors of the plurality of images to obtain a plurality of global matching degrees; and determining a plurality of target images from the plurality of images based on the plurality of global matching degrees, wherein the plurality of target images are starting images used for generating a target map.

According to another aspect of the present disclosure, there is provided a map generating apparatus including: a first determining module, configured to determine a local feature point descriptor subset of each of the plurality of images; the aggregation module is used for performing aggregation processing on the local feature point descriptor set of the image aiming at each image in the plurality of images and determining the global descriptor of each image; the second determining module is used for calculating the global matching degree between any two images in the plurality of images based on the respective global descriptors of the plurality of images to obtain a plurality of global matching degrees; and a third determining module, configured to determine a plurality of target images from the plurality of images based on the plurality of global matching degrees, where the plurality of target images are starting images used for generating a target map.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform a method as disclosed herein.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method as disclosed herein.

According to another aspect of the present disclosure, there is provided an autonomous vehicle comprising an electronic device as disclosed herein.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates an exemplary system architecture to which the map generation methods and apparatus may be applied, according to an embodiment of the present disclosure;

FIG. 2 schematically shows a flow diagram of a map generation method according to an embodiment of the disclosure;

FIG. 3 schematically shows a flowchart for predicting content of interest to a user based on characteristic information of target content according to an embodiment of the present disclosure;

fig. 4 schematically illustrates an application scene diagram of content taking by a user in a process of reading an electronic book according to an embodiment of the present disclosure;

FIG. 5 schematically shows a block diagram of a map generation apparatus according to an embodiment of the present disclosure; and

fig. 6 schematically shows a block diagram of an electronic device adapted to implement a map generation method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

According to an embodiment of the present disclosure, there is provided a map generation method, which may include: determining a local feature point descriptor subset of each of the plurality of images; for each image in the plurality of images, carrying out aggregation processing on the local feature point descriptor set of the image, and determining the respective global descriptor of the plurality of images; calculating the global matching degree between any two images in the plurality of images based on the respective global descriptors of the plurality of images to obtain a plurality of global matching degrees; and determining a plurality of target images from the plurality of images based on the plurality of global matching degrees, wherein the plurality of target images are starting images used for generating the target map.

In the technical scheme of the disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and applying the personal information of the related users are all in accordance with the regulations of related laws and regulations, necessary security measures are taken, and the customs of public sequences is not violated.

In the technical scheme of the disclosure, before the personal information of the user is acquired or collected, the authorization or the consent of the user is acquired.

Fig. 1 schematically illustrates an exemplary system architecture to which the map generation method and apparatus may be applied, according to an embodiment of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, a system architecture 100 according to this embodiment may include

sensors

101, 102, 103, a network 104, and a server 105. Network 104 is used to provide a medium for communication links between

sensors

101, 102, 103 and server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.

The

sensors

101, 102, 103 may interact with a server 105 over a network 104 to receive or send messages or the like.

The

sensors

101, 102, 103 may be functional elements integrated on the autonomous vehicle 106, such as infrared sensors, ultrasonic sensors, millimeter wave radar, information acquisition devices, and the like. The

sensors

101, 102, 103 may be used to collect environmental information around the autonomous vehicle 106 as well as surrounding road information.

The server 105 may be integrated on the autonomous vehicle 106, but is not limited to this, and may be disposed at a remote end capable of establishing communication with the vehicle-mounted terminal, and may be embodied as a distributed server cluster composed of a plurality of servers, or may be embodied as a single server.

The server 105 may be a server that provides various services. For example, a map navigation application, a map generation application, and the like may be installed on the server 105. Taking the server 105 running the map generation class application as an example: the images transmitted from the

sensors

101, 102, 103 are received over a network 104. Determining a local feature point descriptor subset of each of the plurality of images; for each image in the plurality of images, carrying out aggregation processing on the local feature point descriptor set of the image, and determining the respective global descriptor of the plurality of images; calculating the global matching degree between any two images in the plurality of images based on the respective global descriptors of the plurality of images to obtain a plurality of global matching degrees; and determining a plurality of target images from the plurality of images based on the plurality of global matching degrees so as to generate a target map by taking the plurality of target images as starting images.

It should be noted that the map generation method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the map generation device provided by the embodiment of the present disclosure may also be disposed in the server 105.

It should be understood that the number of sensors, networks, and servers in fig. 1 is merely illustrative. There may be any number of sensors, networks, and servers, as desired for implementation.

It should be noted that the sequence numbers of the respective operations in the following methods are merely used as representations of the operations for description, and should not be construed as representing the execution order of the respective operations. The method need not be performed in the exact order shown, unless explicitly stated.

Fig. 2 schematically shows a flow chart of a map generation method according to an embodiment of the present disclosure.

As shown in fig. 2, the method includes operations S210 to S240.

In operation S210, a local feature point descriptor subset of each of the plurality of images is determined.

In operation S220, for each of the plurality of images, an aggregation process is performed on the local feature point descriptor set of the image, and a global descriptor of each of the plurality of images is determined.

In operation S230, a global matching degree between any two images of the plurality of images is calculated based on the respective global descriptors of the plurality of images, resulting in a plurality of global matching degrees.

In operation S240, a plurality of target images are determined from the plurality of images based on the plurality of global matching degrees, wherein the plurality of target images are starting images for generating a target map.

According to an embodiment of the present disclosure, the local feature point descriptor set may refer to: a set of a plurality of local feature point descriptors in one-to-one correspondence with the plurality of local feature points. Each local feature point descriptor in the plurality of local feature point descriptors may refer to a feature point descriptor having scale or rotation invariance. Each of the plurality of local feature point descriptors may be extracted from the image based on a rule, for example, a rule such as SIFT (Scale-invariant feature transform), ORB (organized Fast and Rotated Brief), or an algorithm. But is not limited thereto. Deep Learning models can also be extracted from the image, for example, by Super Point (Super Point) or ASLFeat (Learning Local Features of Accurate Shape and Localization).

According to the optional embodiment of the disclosure, the local feature can be extracted from the image by using the deep learning model so as to obtain the local feature point descriptor, and by adopting the method, the scene adaptability of the local feature point descriptor can be improved, so that the application range of the map generation method provided by the embodiment of the disclosure is expanded.

According to an embodiment of the present disclosure, for each of the plurality of images, performing aggregation processing on the local feature point descriptor set of the image, and determining the global descriptor of each of the plurality of images may include: for each image in the plurality of images, aggregation processing may be performed on the plurality of local feature point descriptors in the local feature point descriptor set of the image, so as to obtain a plurality of global matching degrees that are in one-to-one correspondence with the plurality of images.

According to other embodiments of the present disclosure, a global descriptor may be extracted from an image using a global feature extraction algorithm. For example, global descriptors in an image are extracted by DELF (DEep Local Features).

According to the embodiment of the disclosure, compared with the method that the global descriptor is extracted from the image by using the global feature extraction algorithm, the global descriptor is determined by using the local feature point descriptor of the image, so that the global descriptor is obtained on the basis of the local feature point descriptor set, the calculation speed is improved, and the extraction precision of the global descriptor is also improved.

According to the embodiment of the disclosure, the target image can be determined from the plurality of images based on the global matching degree between any two images in the plurality of images. The global matching degree may refer to vector similarity, and may be calculated based on respective global descriptors of any two images. The global matching degree may include cosine similarity, euclidean distance, or manhattan distance, and so on, which are not described herein again.

According to an embodiment of the present disclosure, the target image may refer to a start image used to generate the target map. The target image may be an initialized positioning image for the target map construction. The number of target images is not limited, and may be, for example, 2, but is not limited thereto, and may also be 3, 4, or 5, and may be set up according to the actual situation.

According to the embodiment of the disclosure, the image information common to the plurality of target images is clearer and more definite than the image information of other images except the plurality of target images in the plurality of images, and the map is constructed by using the image information in the plurality of target images, so that the generated target map is more consistent with the expressed actual environment information, and the construction of the associated frame common view plane of the target map is accelerated.

According to other embodiments of the present disclosure, a target image may be determined from a plurality of images based on local feature point descriptors in the local feature point descriptor set, and the target image may be used as a starting image of the target map.

According to the embodiment of the disclosure, compared with the method that the target image is directly determined by using the local feature point descriptor, the target image is determined by using the global descriptor, and the global matching degree is compared, so that the calculation is simple, the precision is high, the unordered map building is supported, and the generation speed of the target map is further improved.

In summary, the global descriptor is used to determine the global matching degree between any two images, and the global matching degree is used to determine the target image from the multiple images, so that the construction of the target map supporting the multiple unordered images can be realized, and the association search speed of the associated frame can be increased. Meanwhile, the global descriptor is determined on the basis of the local feature point descriptor set, and the calculation speed and the calculation precision of the global descriptor are improved.

Fig. 3 schematically shows a schematic diagram of determining a plurality of target images according to an embodiment of the present disclosure.

As shown in fig. 3, the unordered plurality of pictures may include a picture P _ a, a picture P _ B, a picture P _ C, and a picture P _ D. The local feature point descriptor set of any one of the plurality of images may include local feature point confidence of each of the M local feature points, local feature point position information of each of the M local feature points, and a local feature point descriptor of each of the M local feature points. M is an integer greater than or equal to 2. The number of the respective local feature points of any two images in the plurality of images may be the same or different, and is not described herein again.

As shown in fig. 3, taking the image P _ a as an example, the image P _ a includes 6 local feature points, such as local feature points P1, P2, P3, P4, P5, and P6. The local feature points are in one-to-one correspondence with the local feature point confidences. A confidence threshold may be predetermined, local feature points whose local feature point confidence is greater than the confidence threshold may be used as target local feature points, and local feature points whose local feature point confidence is less than or equal to the confidence threshold may be deleted. As shown in fig. 3, the confidence of the local feature point of each of the local feature points p1 and p2 is smaller than the confidence threshold, and the confidence of the local feature point of each of the local feature points p3, p4, p5, and p6 is larger than the confidence threshold, the local feature points p1 and p2 may be deleted, and the local feature points p3, p4, p5, and p6 may be retained as the target local feature points. Further, an image P _ a' with the target local feature points P3, P4, P5, and P6 retained is obtained. Similarly, the confidence thresholds can be used to screen the images P _ B, P _ C, and P _ D, that is, to filter out the local feature points whose confidence coefficients are less than the confidence thresholds, so as to obtain the images P _ B ', P _ C ', and P _ D '.

According to the embodiment of the disclosure, the M local feature points in the image can be screened based on the confidence threshold, and Q target local feature points in the image are reserved. The local feature point descriptors of the Q target local feature points and the position information of the Q target local feature points, for example, two-dimensional coordinate information (x, y), may be input into the local aggregation vector model, so as to obtain a global descriptor of the image.

According to an embodiment of the present disclosure, the local aggregation Vector model may include a VLAD (Vector of locally aggregated descriptors), but is not limited thereto, and may also include a BOF (bag-of-feature) model, an attention mechanism model, or a modified model based on a VLAD, such as NetVLAD.

According to the embodiment of the disclosure, the local aggregation vector model is used for processing the position information and the local feature point descriptors of the local feature points of the multiple targets, and the attention mechanism in the local aggregation vector model can be used for performing aggregation processing on the local feature points of the multiple targets, so that the precision of the global descriptor is improved.

As shown in fig. 3, a global degree of match between any two images of the plurality of images may be determined based on the respective global descriptors of the two images. Thus, a global matching degree between the image P _ a 'and the image P _ B', a global matching degree between the image P _ a 'and the image P _ C', a global matching degree between the image P _ a 'and the image P _ D', a global matching degree between the image P _ B 'and the image P _ C', a global matching degree between the image P _ B 'and the image P _ D', and a global matching degree between the image P _ C 'and the image P _ D' are obtained. And sequencing the global matching degrees according to the sequence from high to low, for example, to obtain a sequencing result. Based on the ranking result, a plurality of target images are determined from the plurality of images. For example, the image P _ B 'and the image P _ C' having the highest global matching degree are used as the target images.

According to the embodiment of the disclosure, a plurality of local feature points in an image can be screened by using a confidence threshold, local feature points with low confidence are screened, target local feature points with high confidence are reserved, and the determination accuracy of a global descriptor is improved while the calculation speed of the global descriptor is ensured. Furthermore, the index position of the target map can be quickly locked, and a plurality of target images are used as initial images of the target map.

Fig. 4 schematically shows a flow chart of determining a target local feature point pair according to an embodiment of the present disclosure.

As shown in fig. 4, determining the target local feature point pair may include operations S410 to S450.

In operation S410, for any two images of the plurality of images, N local feature point pairs and N local matching degrees are determined from the two images based on a plurality of local feature point descriptors of the respective two images.

According to an embodiment of the present disclosure, N is an integer greater than or equal to 2. The N local feature point pairs correspond to the N local matching degrees one by one. The local feature point descriptor set includes local feature point descriptors of the M local feature points. M is an integer greater than or equal to 2. The number of the respective local feature points of any two images in the plurality of images may be the same or different, and is not described herein again.

According to an embodiment of the present disclosure, determining N local feature point pairs and N local matching degrees from the two images based on the respective local feature point descriptors of the two images may include: and calculating the vector similarity between the two local feature point descriptors, and determining the local matching degree between the two local feature point descriptors. The vector similarity may include cosine similarity, euclidean distance, or manhattan distance, which will not be described herein again.

According to an embodiment of the present disclosure, a plurality of initial local feature point pairs may be determined from two images based on vector similarity. For a plurality of initial local feature point pairs, a basis matrix between two images is calculated by using an epipolar geometry principle, and N local feature point pairs are determined from the plurality of initial local feature point pairs through geometric validation (RANSAC). N initial local matching degrees corresponding to the N local feature point pairs one to one may be normalized to obtain N local matching degrees.

According to other embodiments of the present disclosure, the image sequence may be generated according to the sorting results of the plurality of global matching degrees arranged from high to low, and the local feature point pairs between two adjacent images in the image sequence may be calculated, so as to reduce the data processing amount.

In operation S420, N initial target local matching degrees are obtained based on the global matching degree between the two images and the respective local matching degrees of the N local feature points of the two images.

According to an embodiment of the present disclosure, obtaining N initial target local matching degrees based on a global matching degree between two images and respective local matching degrees of N local feature point pairs of the two images may include: and aiming at each local feature point in the N local feature points, taking the global matching degree as a weight, and performing weighting processing on the local matching degree to obtain the initial target local matching degree. But is not limited thereto. Conversion weights between two images can also be determined according to a predetermined conversion rule based on the global matching degree. And based on the conversion weight, carrying out weighting processing on the local matching degree to obtain the initial target local matching degree. For example, based on the global matching degree, a value between 0 and 1 is determined as the conversion weight according to a predetermined conversion rule.

According to the embodiment of the disclosure, the global matching degree of the global descriptor is weighted on the local matching degree, so that the local feature point pair can be determined by combining the global feature information and the local feature information, and the determination is more accurate and effective.

In operation S430, it is determined whether the initial target local matching degree is greater than a predetermined local matching degree threshold. In case that it is determined that the initial target local matching degree is greater than the predetermined local matching degree threshold, operation S440 is performed. In case it is determined that the initial target local matching degree is less than or equal to the predetermined local matching degree threshold, operation S450 is performed.

In operation S440, an initial local matching degree greater than a predetermined local matching degree threshold is used as a target local matching degree, and local feature point pairs corresponding to the target local matching degree are used as target local feature point pairs.

According to an embodiment of the present disclosure, operation S440 may be repeatedly performed, resulting in a plurality of target local feature point pairs of a plurality of images.

In operation S450, subsequent operations on the local feature point pair are stopped.

According to the embodiment of the present disclosure, a target local feature point pair is determined from a plurality of local feature points by a predetermined local matching degree threshold, and the determination speed of a key position in a target image is increased.

According to an embodiment of the present disclosure, the map generation method may further include operations of: and triangularizing the target local characteristic point pairs of any two target images in the plurality of target images to generate spatial three-dimensional points. And generating a map model based on the spatial three-dimensional points.

According to an embodiment of the present disclosure, the triangularization process may be understood as: the processing of restoring the depth information of a spatial three-dimensional point observed at a different position by using a trigonometric relationship is known as a two-dimensional projection point of the spatial three-dimensional point. The position information of each target local characteristic point pair of any two target images in the plurality of target images can be processed by triangulation, and the spatial three-dimensional point of the target local characteristic point pair can be recovered and obtained.

According to the embodiment of the disclosure, the three-dimensional points in space obtained by triangulation processing can be initial information, and can be optimized by minimizing reprojection errors by using Bundle Adjustment (BA) to obtain three-dimensional points in target space, so that a map model can be generated based on the three-dimensional points in target space.

According to an embodiment of the present disclosure, the map generation method may further include operations of: and registering other images into the map model based on other target local feature point pairs to generate a target map.

According to an embodiment of the present disclosure, the other target local feature point pairs include target local feature point pairs other than the target local feature points of any two target images among the plurality of target local feature point pairs, and the other images include images other than the target images in the images.

According to an embodiment of the present disclosure, generating a map model based on spatial three-dimensional points is an initial process of performing three-dimensional reconstruction based on a target image. The map model may be referred to as an initial point cloud model. The map model may be a three-dimensional map portion generated using a plurality of target images as a common view correlation map, and the target map may be generated based on other target local feature points in other images with the map model as a start position of the target map.

According to an embodiment of the present disclosure, registration may be understood as: and fusing other target local feature point pairs in other images into the map model according to the sequencing result obtained by the global matching degree, and registering.

According to the embodiment of the disclosure, other images are registered into the map model one by one based on other target local feature point pairs, and fusion of the other target local feature point pairs and the map model is completed until registration of all the images in the plurality of images is completed, so that the target map is obtained.

According to the embodiment of the disclosure, the target map is generated by combining the local feature point descriptors and the global descriptors, and the global descriptors are obtained by aggregation processing based on the local feature point descriptors, so that the map building speed is high, and the generated target map has high precision. Meanwhile, the target map is generated through the association searching mode of the global matching degree and the local matching degree and the incremental mode of utilizing registration, so that the robustness of generating the target map is good, and the operation is simple.

Fig. 5 schematically shows a block diagram of a map generation apparatus according to an embodiment of the present disclosure.

As shown in fig. 5, the map generating apparatus 500 may include a first determining module 510, an aggregating module 520, a second determining module 530, and a third determining module 540.

A first determining module 510, configured to determine a local feature point descriptor subset of each of the plurality of images.

An aggregation module 520, configured to perform aggregation processing on the local feature point descriptor set of the image for each of the multiple images, and determine a global descriptor of each of the multiple images.

A second determining module 530, configured to calculate a global matching degree between any two images in the multiple images based on the respective global descriptors of the multiple images, so as to obtain multiple global matching degrees.

A third determining module 540, configured to determine a plurality of target images from the plurality of images based on the plurality of global matching degrees, where the plurality of target images are starting images for generating a target map.

According to an embodiment of the present disclosure, the local feature point descriptor set includes local feature point descriptors of respective ones of the plurality of local feature points.

According to an embodiment of the present disclosure, the map generating apparatus may further include a fourth determining module, and a fifth determining module.

And a fourth determining module, configured to determine, for any two images of the multiple images, N local feature point pairs and N local matching degrees from the two images based on a plurality of local feature point descriptors of the two images, where N is an integer greater than or equal to 2.

And the fifth determining module is used for determining the target local characteristic point pairs from the two images based on the global matching degree and the N local matching degrees between the two images to obtain a plurality of target local characteristic point pairs of the plurality of images.

According to an embodiment of the present disclosure, the fifth determining module may include a local matching unit, a first filtering unit, and a point pair determining unit.

And the local matching unit is used for obtaining N initial target local matching degrees based on the global matching degree between the two images and the respective local matching degrees of the N local feature points of the two images.

And the first screening unit is used for determining the target local matching degree from the N initial target local matching degrees, wherein the target local matching degree is greater than a preset local matching degree threshold value.

And a point pair determination unit configured to take the local feature point pair corresponding to the target local matching degree as a target local feature point pair.

According to an embodiment of the present disclosure, the local feature point descriptor subset includes M local feature points, local feature point confidences of the M local feature points, and local feature point position information of the M local feature points. M is an integer greater than or equal to 2.

According to an embodiment of the present disclosure, the aggregation module may include a second screening unit, and an aggregation unit.

And the second screening unit is used for determining Q target local feature points from the M local feature points in the image according to the M local feature point confidence degrees aiming at any one image in the plurality of images. The confidence of the local feature point of any one of the Q target local feature points is greater than the confidence threshold.

And the aggregation unit is used for inputting the local feature point descriptors of the Q target local feature points and the position information of the Q target local feature points into the local aggregation vector model to obtain the global descriptors of the image.

According to an embodiment of the present disclosure, the map generating device may further include a triangularization module, and a generation module.

And the triangulation module is used for triangulating the target local characteristic point pairs of any two target images in the plurality of target images to generate a space three-dimensional point.

And the generating module is used for generating a map model based on the space three-dimensional points.

According to an embodiment of the present disclosure, the map generating apparatus may further include a registration module.

And the registration module is used for registering other images into the map model based on other target local characteristic point pairs to generate a target map, wherein the other target local characteristic point pairs comprise target local characteristic point pairs except the target local characteristic points of any two target images in the plurality of target local characteristic point pairs, and the other images comprise images except the target images in the images.

According to an embodiment of the present disclosure, the third determining module may include a sorting unit, and a result determining unit.

And the sequencing unit is used for sequencing the global matching degrees to obtain a sequencing result.

A result determination unit configured to determine a plurality of target images from the plurality of images based on the sorting result.

The present disclosure also provides an electronic device, a readable storage medium, a computer program product, and an autonomous vehicle according to embodiments of the present disclosure.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to an embodiment of the disclosure.

According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium having stored thereon computer instructions for causing a computer to perform a method as in an embodiment of the present disclosure.

According to an embodiment of the disclosure, a computer program product comprising a computer program which, when executed by a processor, implements a method as in an embodiment of the disclosure.

According to an embodiment of the present disclosure, an autonomous vehicle is configured with the electronic device, and the configured electronic device can realize the map generation method described in the above embodiment when a processor of the electronic device is executed.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the device 600 comprises a computing unit 601, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as the map generation method. For example, in some embodiments, the map generation method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM 603 and executed by the computing unit 601, one or more steps of the map generation method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the map generation method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A map generation method, comprising:

determining a local feature point descriptor subset of each of the plurality of images;

for each image in the plurality of images, carrying out aggregation processing on a local feature point descriptor set of the image, and determining a global descriptor of each image;

calculating the global matching degree between any two images in the plurality of images based on the respective global descriptors of the plurality of images to obtain a plurality of global matching degrees; and

determining a plurality of target images from the plurality of images based on the plurality of global matching degrees, wherein the plurality of target images are starting images used for generating a target map;

wherein the aggregating the local feature point descriptor sets and determining the global descriptors of the plurality of images comprises:

determining Q target local feature points from M local feature points in the local feature point descriptor subset based on respective local feature point confidence coefficients of the M local feature points, wherein M is an integer greater than or equal to 2; and

and inputting the local feature point descriptors of the Q target local feature points and the position information of the Q target local feature points into a local aggregation vector model to obtain a global descriptor of the image.

2. The method of claim 1, wherein the set of local feature point descriptors includes local feature point descriptors for each of a plurality of local feature points;

the method further comprises the following steps:

for any two images in the plurality of images, determining N local feature point pairs and N local matching degrees from the two images based on a plurality of local feature point descriptors of the two images, wherein N is an integer greater than or equal to 2; and

and determining target local characteristic point pairs from the two images based on the global matching degree and the N local matching degrees between the two images to obtain a plurality of target local characteristic point pairs of the plurality of images.

3. The method of claim 2, wherein the determining target local feature point pairs from the two images based on the global degree of match and the N local degrees of match between the two images comprises:

obtaining N initial target local matching degrees based on the global matching degree between the two images and the respective local matching degrees of the N local feature point pairs of the two images;

determining a target local matching degree from the N initial target local matching degrees, wherein the target local matching degree is greater than a predetermined local matching degree threshold value; and

and taking the local characteristic point pairs corresponding to the target local matching degree as the target local characteristic point pairs.

4. The method of claim 1, wherein,

and the local characteristic point confidence of any one of the Q target local characteristic points is greater than a confidence threshold.

5. The method of any of claims 1 to 4, further comprising:

triangularizing target local characteristic point pairs of any two target images in the plurality of target images to generate spatial three-dimensional points; and

and generating a map model based on the space three-dimensional points.

6. The method of claim 5, further comprising:

and registering other images into the map model based on other target local feature point pairs to generate the target map, wherein the other target local feature point pairs comprise target local feature point pairs except the target local feature points of any two target images in the plurality of target local feature point pairs, and the other images comprise images except the target images in the images.

7. The method of claim 1, wherein the determining a plurality of target images from the plurality of images based on the plurality of global degrees of matching comprises:

sequencing the global matching degrees to obtain a sequencing result; and

determining the plurality of target images from the plurality of images based on the ranking result.

8. A map generation apparatus comprising:

a first determining module, configured to determine a local feature point descriptor set of each of the plurality of images;

an aggregation module, configured to perform aggregation processing on a local feature point descriptor set of each of the plurality of images, and determine a global descriptor of each of the plurality of images;

a second determining module, configured to calculate a global matching degree between any two images in the multiple images based on respective global descriptors of the multiple images to obtain multiple global matching degrees; and

a third determining module, configured to determine a plurality of target images from the plurality of images based on the plurality of global matching degrees, where the plurality of target images are starting images used for generating a target map;

wherein the aggregation module is to:

determining Q target local feature points from M local feature points in the local feature point description subset based on the local feature point confidence of each of the M local feature points, wherein M is an integer greater than or equal to 2; and

9. The apparatus of claim 8, wherein the set of local feature point descriptors includes local feature point descriptors for each of a plurality of local feature points;

the device further comprises:

a fourth determining module, configured to determine, for any two images of the multiple images, N local feature point pairs and N local matching degrees from the two images based on a plurality of local feature point descriptors of the two images, where N is an integer greater than or equal to 2; and

a fifth determining module, configured to determine a target local feature point pair from the two images based on the global matching degree between the two images and the N local matching degrees, so as to obtain a plurality of target local feature point pairs of the multiple images.

10. The apparatus of claim 9, wherein the fifth determining means comprises:

the local matching unit is used for obtaining N initial target local matching degrees based on the global matching degree between the two images and the respective local matching degrees of the N local feature point pairs of the two images;

a first screening unit, configured to determine a target local matching degree from the N initial target local matching degrees, where the target local matching degree is greater than a predetermined local matching degree threshold; and

a point pair determination unit configured to use a local feature point pair corresponding to the target local matching degree as the target local feature point pair.

11. The apparatus of claim 8, wherein,

12. The apparatus of any of claims 8 to 11, further comprising:

the triangulation module is used for triangulating target local characteristic point pairs of any two target images in the plurality of target images to generate a spatial three-dimensional point; and

13. The apparatus of claim 12, further comprising:

a registering module, configured to register other images in the map model based on other target local feature point pairs to generate the target map, where the other target local feature point pairs include target local feature point pairs of the plurality of target local feature point pairs except for target local feature points of any two target images, and the other images include images of the images except for the target images.

14. The apparatus of claim 8, wherein the third determining means comprises:

the sorting unit is used for sorting the global matching degrees to obtain a sorting result; and

a result determination unit configured to determine the plurality of target images from the plurality of images based on the sorting result.

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1 to 7.

17. An autonomous vehicle comprising: the electronic device of claim 15.