WO2023216251A1

WO2023216251A1 - Map generation method, model training method, readable medium, and electronic device

Info

Publication number: WO2023216251A1
Application number: PCT/CN2022/092810
Authority: WO
Inventors: 王磊; 黄经纬; 何佳男; 刘吉哲
Original assignee: 华为技术有限公司
Priority date: 2022-05-13
Filing date: 2022-05-13
Publication date: 2023-11-16
Also published as: CN118613792A

Abstract

Disclosed in the present application are a map generation method, a model training method, a readable medium, and an electronic device. During a process of generating a vector map by using a neural network model, by means of learning geometric features of a map element in a sample image of a target area, a contour mask of the map element is converted into a vector map, which is not realized by means of setting a vectorization rule by technicians, thus improving the precision of the obtained vector map. In addition, by using the method provided by the present application, for different target areas, after retraining the neural network model by using sample images of the different target areas, the retrained neural network can be utilized to obtain vector maps of the different areas, without the need of performing complex parameter adjustment and vectorization rule setting, thus improving the vector map generation efficiency while ensuring the precision, and the method being more suitable for a scenario of generating a large-scale map.

Description

Map generation methods, model training methods, readable media and electronic devices

Technical field

This application relates to the field of image processing, and in particular to a map generation method, a model training method, a readable medium and an electronic device.

Background technique

With the development of artificial intelligence (AI) technology, the application of neural network models is becoming more and more widespread. For example, the neural network model can be used to obtain a vector map of a certain area based on remote sensing images of the area. At present, neural network model reasoning is usually used to obtain the initial outline of map elements (such as houses, lakes, roads, rivers, etc.) in remote sensing images, and then the initial outline of map elements is adjusted through vectorization rules set by developers. For example, adjust the angle between lines, etc., and convert the outline of map elements into a vector map.

However, due to the diversity of geographical environments and map elements, such as different geographical environments in different regions and large differences in geometric features of houses/roads, it is difficult for the vectorization rules set by developers to match the geographical environments and map elements of different regions. Based on the same Vectorization rules vectorize images in different areas, and the accuracy of the resulting vector map is also low. If different contour acquisition methods and vectorization rules need to be set for different areas, the process is complicated and is not suitable for large-scale vector map modeling scenarios.

Contents of the invention

In view of this, embodiments of the present application provide a map generation method, a model training method, a readable medium, and an electronic device. The neural network model learns the geometric characteristics of map elements in a certain area to convert the outline of the map element into the corresponding vector map, which is beneficial to improving the accuracy of the obtained vector map and is more suitable for large-scale vector map modeling scenarios.

In the first aspect, embodiments of the present application provide a map generation method, which is applied to electronic devices. The method includes: obtaining an image of a certain area, and the image includes map elements, where the map elements are elements in the image to be converted into vector maps. element; use the first model to reason on the image to obtain the first geometric figure corresponding to the map element, the first geometric figure includes geometric primitives; input the second model based on the first geometric figure to obtain the direction of each geometric primitive, and , based on the first geometric figure, a second geometric figure corresponding to the map element is obtained. The second geometric figure includes the same geometric primitives as the first geometric figure, and the position arrangement of the geometric primitives in the second geometric figure is the same as that of the first geometric figure. The positions of the geometric primitives in the graphics are arranged differently; using the third model, the topological relationship between each geometric primitive is obtained based on the direction of the geometric primitive and the second geometric figure; based on the topological relationship between each geometric primitive, each geometric primitive The direction of the element and the second geometric figure are used to obtain the vector map corresponding to the image.

In the embodiment of the present application, the electronic device can first use the first model (such as the shape initialization network below) to reason about the outline of the map element in the image of a certain area, and obtain the first geometric figure corresponding to the map element (such as The initial shape below), and then use the second model (such as the shape regression network below) to adjust the first geometric shape to obtain a second geometric shape with higher accuracy and more regular shape (such as the regression shape below ), and then use the third model (such as the topology reconstruction network below) to deduce the topological relationship between the geometric primitives in the second geometric shape (for example, when the second geometric shape is a polyline, deduce the points that make up the polyline ), and then based on the topological relationship between geometric primitives, the vector map corresponding to the image is obtained. In this way, the electronic device converts map elements into vector maps based on the pre-trained first model, second model, and third model, rather than obtaining vector maps based on vectorization rules set by technicians, which is beneficial to improving the obtained Vector map accuracy. Moreover, in the scenario of large-scale vector map modeling, retraining at least one of the first model, the second model, and the third model for different areas can well adapt to the geometric characteristics of map elements in different areas without the need for Setting up vectorization rules and adjusting complex parameters will help improve the efficiency of vector map modeling.

It can be understood that the geometric primitive is the basic component unit of the geometric figure. For example, when the first geometric figure is a polygon, the geometric primitive can be the line segments that make up the polygon. When the first geometric figure is a polyline, the geometric primitive can be the polygon. points of each line segment.

In a possible implementation of the first aspect, at least one of the first model, the second model, and the third model is trained based on geometric features of map elements in a certain area.

In this embodiment, at least one of the first model, the second model, and the third model may be trained based on the geometric features of the map elements in a certain area. That is to say, the electronic device uses the certain area. Vectorizing the map elements based on the geometric characteristics of the regional map elements is beneficial to improving the accuracy of the resulting vector map.

In a possible implementation of the above first aspect, when the geometric primitives are line segments, the second geometric figure also includes the connection sequence of each geometric primitive; and based on the topological relationship between each geometric primitive, each The direction of the geometric primitive and the second geometric figure are used to obtain the vector map corresponding to the image, including: adjusting the direction of the first geometric primitive in the second geometric figure to be the same as the direction corresponding to the first geometric primitive, where the first The direction of the geometric primitive in the second geometric figure is different from the direction corresponding to the first geometric primitive; connect the first geometric primitive and the second geometric primitive to obtain a polygon corresponding to the second geometric figure, where the second geometric figure The connection sequence of primitives is adjacent to the first geometric primitive.

In a possible implementation of the above first aspect, the polygon corresponding to the second geometric figure includes a first line segment, a second line segment and a third line segment connected in sequence; and based on the topological relationship between the geometric primitives, The direction of each geometric primitive and the second geometric figure are used to obtain the vector map corresponding to the image, which also includes: deleting the second line segment when the length of the second line segment is less than the preset side length threshold; and adding the first line segment and When the topological relationship of the third line segment is collinear or parallel, the first line segment and the second line segment are merged into one line segment; when the topological relationship of the first line segment and the third line segment is not collinear or parallel, Extend the first line segment and/or the third line segment so that the first and third line segments intersect.

In a possible implementation of the first aspect, when the geometric primitive is a point, the vector corresponding to the image is obtained based on the topological relationship between the geometric primitives, the direction of each geometric primitive, and the second geometric figure. The map includes: connecting the points whose topological relationship is connected to obtain the corresponding vectorized polyline.

In a possible implementation of the above first aspect, using the first model to reason on the image to obtain the first geometric figure corresponding to the map element includes: performing semantic segmentation on the image to obtain a contour mask of the map element, and the contour mask is The mask is used to indicate the area where the map element is located in the image; the mask edge of the contour mask is extracted; the mask edge is simplified to obtain the first geometric figure.

For example, the electronic device can use the semantic segmentation network below to obtain the contour mask of the area where the map element is located in the image, and use the edge extraction network below to extract the edges of the contour mask, obtain the mask edges, and then use The DP algorithm simplifies multiple edges or uses the NMS algorithm to simplify polylines to obtain a first geometric figure that includes fewer geometric primitives, thereby reducing the number of geometric primitives in the first geometric figure, which is beneficial to improving the electronic equipment based on the first The speed with which geometry can be reasoned about.

In a possible implementation of the above first aspect, the map elements include at least one of a house, a road, a lake, an ocean, a river, a forest, and a desert; and the first geometric figure corresponding to the house, a lake, an ocean, a forest, and a desert is Polygon; the first geometric figure corresponding to roads and rivers is polyline.

In this embodiment of the present application, an image may include one map element or multiple map elements. Electronic devices can represent map elements that need to be represented by specific shapes such as houses, lakes, oceans, forests, deserts, etc. as polygons, and represent roads, rivers, etc. as polylines.

In a possible implementation of the above first aspect, the above method further includes: training the first model in the following manner:

Obtain sample data, which includes a sample image set of a certain area and a reference outline corresponding to a map element in each sample image in the sample image set; use the first model to identify image features of each sample image, and obtain each sample based on the image features. The contour mask of the map element in the image, the contour mask indicates the area of the map element in the corresponding sample image; based on the contour mask, the first predicted geometry corresponding to the map element in each sample image is obtained; based on the first loss The first model is trained by the function value and the second loss function value, wherein the first loss function is used to indicate the accuracy of the contour mask, and the second loss function is used to indicate the similarity between the first predicted geometry and the reference contour.

That is to say, in the embodiment of the present application, the first model is trained based on the sample image set of a certain area, so that the first model extracts the contour mask and the first predicted geometric shape of the map elements in the sample image. All have a high degree of similarity with the reference contour corresponding to the map element, and the geometric characteristics of the map element in the sample image are learned, so that the first model infers the first geometric shape of the map element in the image of a certain area. The geometric characteristics of the map elements that conform to the certain area are beneficial to improving the accuracy of the first geometric shape, and thus are beneficial to improving the accuracy of the obtained vector map.

For example, in some embodiments, the first loss function may be the cross-entropy loss L _11-12-CEL below, and the second loss function may be the L2 loss L _13-L2 below.

In a possible implementation of the above first aspect, the above method further includes: training the second model in the following manner: obtaining sample data, which includes references corresponding to map elements in each sample image in the sample image set of a certain area. contour, the reference direction corresponding to each geometric primitive in the reference contour, and the third geometric figure corresponding to the map element in each sample image obtained by using the first model; using the second model, obtain the corresponding reference direction of each map element in each sample image The second prediction geometry, the prediction direction of the geometric primitives in the third geometry, wherein the second prediction geometry includes the same geometric primitives as the third geometry, and the geometric primitives in the second prediction geometry The arrangement is different from that of the third geometric figure; the second model is trained based on the third loss function and the fourth loss function, where the third loss function is used to indicate the predicted direction and correspondence of the geometric primitives in the third geometric figure The similarity of the reference direction and the fourth loss function are used to indicate the similarity between the second predicted geometric figure and the corresponding reference outline.

For example, in some embodiments, the third loss function may be the L2 loss L _23-L2 below, and the fourth loss function may be the relative shape loss below.

In a possible implementation of the above first aspect, the above method further includes: obtaining sample data, the sample data includes a sample image set of a certain area, and the reference between geometric primitives in the reference outline corresponding to the map element of each sample image. Topological relationships, as well as the fourth geometric figures corresponding to the map elements in each sample image obtained by the first model, and the directions of the geometric primitives in the fourth geometric figures; using the third model, determine each of the fourth geometric figures. The latent space characteristics of the geometric primitives, and based on the latent space characteristics, determine the predicted topological relationship between the geometric primitives in the fourth geometric figure; train the third model based on the fifth loss function and the sixth loss function, where the fifth The loss function is used to indicate the matching degree between the predicted topological relationship between the geometric primitives in the fourth geometric figure and the corresponding reference topological relationship, and the sixth loss function is used to indicate the predicted topological relationship between geometric primitives that are parallel, collinear or connected. similarity between latent space features.

In the embodiment of the present application, when training the third model, the sixth loss function (such as the supervised contrast loss below) indicates the similarity of latent space features between geometric primitives whose topological relationships are predicted to be parallel, collinear, or connected. degree, so that when using the third model to reason about the above-mentioned second geometric figure, the similarity of the latent space features of parallel, collinear or connected geometric primitives is also higher, which is beneficial to improving the obtained second geometric figure. The accuracy of the topological relationship between the geometric primitives in the map is conducive to improving the accuracy of the vector map based on the topological relationship. For example, in some embodiments, the fifth loss function may be the cross-entropy loss _LCEL below, and the sixth loss function may be the supervised contrast loss _LSCL below.

In the second aspect, embodiments of the present application provide a model training method, which is applied to electronic devices. The method includes:

Obtain sample data. The sample data includes the reference outline corresponding to the map element in each sample image in the sample image set of a certain area, the fifth geometric figure or sixth geometric figure corresponding to each map element, and the geometric primitives in the fifth geometric figure. direction, and the image features of the geometric primitives in the fifth geometric figure, where the image features of the geometric primitives in the fifth geometric figure are generated when the fifth geometric figure of each map element is obtained through reasoning using the fourth model, The similarity between the fifth geometric figure and the corresponding reference outline is lower than the similarity between the sixth geometric figure and the corresponding reference outline, and the fifth geometric figure and the sixth geometric figure have the same geometric primitive;

Based on inputting the fifth geometric figure or the sixth geometric figure, the image features of the geometric primitives in the fifth geometric figure, and the direction of the geometric primitives in the fifth geometric figure into the fifth model with the first network parameters, each The latent space characteristics corresponding to the geometric primitives, and based on the latent space characteristics corresponding to each geometric primitive, the predicted topological relationship between each geometric primitive is inferred;

Based on the predicted topological relationship between the geometric primitives in the fifth geometric figure and the corresponding reference topological relationship, the seventh loss function and the eighth loss function are determined, wherein the reference topological relationship can be based on the reference contour corresponding to the map element in each sample image. It is determined that the seventh loss function is used to indicate the matching degree between the predicted topological relationship between the geometric primitives in the fifth geometric figure and the corresponding reference topological relationship, and the eighth loss function is used to indicate whether the predicted topological relationship is parallel, collinear or connected. The similarity of the latent space features between the geometric primitives; when the seventh loss function and the eighth loss function satisfy the termination condition, save the fifth model with the first network parameters; when the seventh loss function and the eighth loss function If the function does not meet the termination condition, adjust the network parameters of the fifth model to the second network parameters for the next round of training.

In the embodiment of the present application, the fifth model may be used to obtain the characteristics of the geometric figures and the directions of the geometric primitives in the geometric figures based on the geometric figures corresponding to the map elements, such as the third model in the above-mentioned first aspect, and the following topology reconstruction network, etc. During the training process of this model, the fifth geometric shape with lower accuracy can be used as input to train the fifth model, so that the predicted topological relationship between the geometric primitives in the fifth geometric shape obtained by the fifth model is consistent with the corresponding reference topological relationship. The area degree is high, so that the fifth model can obtain higher-precision output data with lower-precision input data, which is conducive to improving the anti-noise ability of the fifth model, so that the second geometry obtained by the above-mentioned second model can Even when the accuracy of the graphics is low, a more accurate topological relationship between geometric primitives can be obtained, thereby improving the accuracy of the vector map obtained based on the topological relationship between the geometric primitives.

For example, in some embodiments, the seventh loss function may be the cross-entropy loss _LCEL below, and the eighth loss function may be the supervised contrast loss _LSCL below.

In a possible implementation of the second aspect, when the geometric primitive in the fifth geometric figure is a line segment, it is determined that the seventh loss function and the eighth loss function satisfy the termination condition in the following way: based on the fifth geometry The direction of the geometric primitives in the graphics determines the directional relationship between the geometric primitives and the reference direction relationship corresponding to the topological relationship, and determines the ninth loss function. The ninth loss function is used to indicate the predicted topological relationship of each geometric primitive. Consistency with direction;

When the seventh loss function, the eighth loss function, and the ninth loss function all converge, or the seventh loss function, the eighth loss function, and the ninth loss function are all smaller than the corresponding preset loss function value, or the total loss function converges, or When the total loss function is less than the corresponding preset total loss function value, it is determined that the termination condition is met, where the total loss function includes the weighted sum of the seventh loss function, the eighth loss function, and the ninth loss function.

For example, in some embodiments, the ninth loss function may be the consistency loss L _C of geometric attributes and relationships below.

In a possible implementation of the second aspect, based on the fifth geometric figure or the sixth geometric figure, the image features of the geometric primitives in the fifth geometric figure, and the direction of the geometric primitives in the fifth geometric figure, we obtain The latent space features corresponding to each geometric primitive include: when the geometric primitive of the fifth geometric figure is a point, image features based on the fifth geometric figure, the geometric primitives in the fifth geometric figure, the fifth geometric figure The direction of the geometric primitives in , the corresponding latent space characteristics of each geometric primitive are obtained; when the geometric primitive of the fifth geometric figure is a line segment, based on the geometric primitives in the sixth geometric figure and the fifth geometric figure The image features and the direction of the geometric primitives in the fifth geometric figure are used to obtain the latent space characteristics corresponding to each geometric primitive.

In the embodiment of the present application, since the complexity of polylines is lower than that of polygons, when training the fifth model, when the fifth geometric figure is a polyline, the fifth geometric figure is used as the input of the fifth model. When the graphic is a polygon, the sixth geometric image with higher accuracy is used as the input of the fifth model, thereby ensuring the accuracy of the fifth model's inference of the topological relationships of geometric primitives in more complex polygons while improving the accuracy of the reasoning. The noise immunity of simpler polyline input data.

In a third aspect, embodiments of the present application provide a map generation device, which includes: a data acquisition unit configured to acquire an image of a certain area, where the image includes map elements, wherein the map elements are the Elements in the image to be converted into vector maps; an initial shape generation unit used to use the first model to reason on the image to obtain a first geometric figure corresponding to the map element, where the first geometric figure includes a geometric base element; a shape regression unit, configured to input a second model based on the first geometric figure to obtain the direction of each geometric primitive, and obtain a second geometric figure corresponding to the map element based on the first geometric figure. , the second geometric figure includes the same geometric primitives as the first geometric figure, and the position arrangement of the geometric primitives in the second geometric figure is the same as the position of the geometric primitives in the first geometric figure. The arrangements are different; the topology reconstruction unit is used to use the third model to obtain the topological relationship between the geometric primitives based on the direction of the geometric primitives and the second geometric figure; the post-processing unit is used to obtain the topological relationship between the geometric primitives based on the direction of each geometric primitive. The topological relationship between the geometric primitives, the direction of each geometric primitive, and the second geometric figure are used to obtain a vector map corresponding to the image.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium. The computer-readable storage medium includes instructions. When the instructions are executed by an electronic device, the electronic device enables the electronic device to implement the above-mentioned first aspect and each aspect of the first aspect. Any of the possible implementations, the above-mentioned second aspect, or any method provided by various possible implementations of the above-mentioned second aspect.

In a fifth aspect, embodiments of the present application provide an electronic device. The electronic device includes: a memory for storing instructions executed by one or more processors of the electronic device; and a processor that is one of the processors of the electronic device. 1. For executing instructions stored in the memory to implement any one of the methods provided by the above-mentioned first aspect, various possible implementations of the above-mentioned first aspect, the above-mentioned second aspect, and various possible implementations of the above-mentioned second aspect.

In a sixth aspect, embodiments of the present application provide a computer program product. The computer program product includes a computer program/instruction. When the computer program/instruction is executed by a processor, the above-mentioned first aspect and various possibilities of the above-mentioned first aspect are realized. In implementation, any method provided by the above-mentioned second aspect or various possible implementations of the above-mentioned second aspect.

Description of the drawings

Figure 1A shows a schematic diagram of a process of obtaining a vector map through images according to some embodiments of the present application;

Figure 1B shows a schematic image diagram including only one map element according to some embodiments of the present application;

Figure 1C shows a schematic image diagram including multiple map elements according to some embodiments of the present application;

Figure 2 shows a schematic diagram of a process of generating a map using a neural network model according to some embodiments of the present application;

Figure 3 shows a schematic structural diagram of a shape initialization network 1 according to some embodiments of the present application;

Figure 4 shows a schematic diagram of the training process of the shape initialization network 1 according to some embodiments of the present application.

Figure 5 shows a schematic diagram of a house and a corresponding contour mask in image IM2 according to some embodiments of the present application;

Figure 6A shows a schematic diagram of the coordinate distance between points in the edge area of a contour mask and a reference contour according to some embodiments of the present application;

Figure 6B shows a schematic diagram of the coordinate distance between a point in the edge area of the contour mask and the outermost pixel of the contour mask according to some embodiments of the present application;

Figure 7 shows a schematic structural diagram of a shape regression network 2 according to some embodiments of the present application;

Figure 8 shows a schematic diagram of the training process of the shape regression network 2 according to some embodiments of the present application;

Figure 9 shows a schematic structural diagram of a topology reconstruction network 3 according to some embodiments of the present application;

Figure 10 shows a schematic diagram of the training process of the topology reconstruction network 3 according to some embodiments of the present application;

Figure 11 shows a schematic diagram of the calculation process of a topological relationship and cross-entropy loss according to some embodiments of the present application;

Figure 12 shows a schematic process diagram of a training process and an inference process according to some embodiments of the present application;

Figure 13 shows a schematic flow chart of a map generation method according to some embodiments of the present application;

Figure 14 shows a schematic diagram of post-processing polygons according to some embodiments of the present application;

Figure 15 shows a schematic diagram of the results of vectorizing houses in some remote sensing images using neural network model 0 according to some embodiments of the present application;

Figure 16 shows a schematic diagram of the results of vectorizing roads in remote sensing images using neural network model 0 according to some embodiments of the present application;

Figure 17A shows a schematic diagram of the reconstruction effect of a road in a relatively complex remote sensing image using neural network model 0 according to some embodiments of the present application;

Figure 17B shows a schematic diagram of the reconstruction effect of a road in another complex remote sensing image using neural network model 0 according to some embodiments of the present application;

Figure 18 shows a schematic structural diagram of a map generation device according to some embodiments of the present application;

FIG. 19 shows a schematic structural diagram of an electronic device 100 for executing embodiments of the present application according to some embodiments of the present application.

Detailed ways

Illustrative embodiments of the present application include, but are not limited to, map generation methods, model training methods, readable media, program products, apparatus, and electronic devices.

To facilitate understanding, the terms involved in this application are first explained.

(1)Loss function

During the training process of the neural network model, because the goal is to make the output of the neural network model as close as possible to the value that you really want to predict, you can compare the predicted value of the current network with the really desired target value, and then based on the two to update the weight vector of each layer of the neural network according to the difference between them (of course, there is usually an initialization process before the first update, that is, pre-configuring parameters for each layer in the neural network model). For example, if the network When the predicted value is high, adjust the weight vector to lower the predicted value, and continue to adjust until the neural network model can predict the truly desired target value or a value that is very close to the truly desired target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value". This is the loss function (loss function) or objective function (objective function), which is used to measure the difference between the predicted value and the target value. Important equations. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference. Then the training of the neural network model becomes a process of reducing the loss as much as possible. Therefore, whether the setting of the loss function is reasonable directly affects the quality of the neural network model training method.

The technical solutions of the embodiments of the present application will be described in detail below with reference to the accompanying drawings.

As mentioned before, vectorization rules are used to convert the outlines of map elements inferred by the neural network model into vector maps. If the same vectorization rules are used to vectorize maps of images in different areas, the accuracy of the resulting vector map will be It may be lower. For example, for different cities or regions, due to different architectural styles, the outline characteristics of the buildings are quite different. For example, in the vectorization rules, the corners of the geometric outline of the house are optimized to be right angles, and the geometric outline of the house is circular, Or areas of the image that are irregular polygons, you will get wrong house outlines. If different contour acquisition methods and vectorization rules need to be set for different areas, the process is complicated, resulting in low efficiency in generating vector maps, and is not suitable for large-scale vector map modeling scenarios.

It can be understood that vectorization rules refer to rules set by technical personnel to adjust the relationship between lines or points in the outline of map elements to obtain a more reasonable vectorized map. For example, set two lines whose included angle is greater than the preset value to be parallel or collinear, set two lines whose included angle is in a certain range to be perpendicular, and move all points whose distance from a certain straight line is smaller than the preset value to The straight line is equal. It is not difficult to understand that the effectiveness of the setting of vectorization rules depends on the experience of the technician and the outline of the referenced map elements, and the adjustment process is complicated. When the geometric characteristics of the map elements referenced by the vectorization rules are greatly different from the characteristics of the map elements in the area where the map is actually to be vectorized, a vector map with lower accuracy will be obtained. For example, the edge of the house in the reference map element will be obtained. Most of the included angles are right angles, and the houses in a certain area are mainly circular. Using vectorization rules, you may get a vector map that is quite different from the shape of the actual map elements.

In order to solve the above problems, embodiments of the present application provide a map generation method, which is implemented based on a neural network model. In the neural network model, map elements are represented by geometric figures (for example, polylines are used to represent roads, polygons are used to represent roads, etc.) to represent houses, etc.), and by learning the geometric features of the map elements in the sample images of the target area, to ensure that the geometric figures derived by the neural network model based on the outlines of the map elements can be consistent with the geometry of the map elements of the target area. The features are matched, and then the predicted image of the target area is inferred based on the trained neural network model to obtain the geometric figures corresponding to the map elements in the target area, thereby obtaining the vector map. That is to say, in the embodiment of the present application, the map elements in the image are vectorized based on the learning of the geometric characteristics of the map elements in the target area, rather than the developer setting vectorization rules to vectorize the map elements. ization, which can improve the accuracy of vector maps. In addition, when conducting vector map inference for different areas, it is only necessary to retrain the neural network model using the sample images of each area, so that it can well adapt to the geometric characteristics of the map elements in the area without the need for complex vectorization. Rule setting and adjustment can improve the efficiency of vector map generation while ensuring the accuracy of vector maps in large-scale map construction scenarios, such as vectorizing maps for areas that include multiple regions, cities, or countries.

For example, referring to Figure 1A, in some embodiments, a semantic segmentation network (Semantic Segmentation Network) can be used to obtain the outline mask of map elements (such as houses, roads, etc.) in the image, and then based on heuristic rules set by the developer ( i.e. vectorization rules) to convert the outlines of map elements into vector maps. In some embodiments of the present application, heuristic rules can be replaced by geometric feature learning and topological reconstruction, and the neural network model is used to learn the geometric features of the map elements in the sample image of the target area, so that the map obtained by the neural network model The geometric figures of the elements can more accurately reflect the geometric characteristics of the map elements in the target area. Secondly, the neural network model is used to determine the topological relationship between the geometric primitives in the geometric figures, and then the geometric primitives of the geometric figures are topologically connected. To get a vector map, for example, connect points into polylines to represent roads or rivers, and connect lines into polygons to represent houses or lakes.

It can be understood that the geometric primitives refer to the basic constituent elements of each geometric figure. For example, the geometric primitives of polylines can be points, and the geometric primitives of polygons can be ordered line segments.

It can be understood that map elements may include but are not limited to houses, lakes, oceans, roads, rivers, forests, deserts, etc. For map elements that need to be described with specific shapes on the map, such as houses, lakes, oceans, forests, deserts, etc. , can be represented by polygons, and map elements that do not need to be described by specific shapes on the map, such as roads, rivers, etc., can be represented by polylines. For the convenience of description, in the following embodiments, the map elements represented by polygons are houses and the map elements represented by polylines are roads.

It can be understood that an image may include at least one map element. For example, referring to Figure 1B, the image IM11 only includes the road RD1, and the road RD1 can be represented by a polyline in the vector map; for another example, referring to Figure 1C, the image IM12 includes the house HE1, the road RD2, and the river RR1, and the house HE1 can be represented by a polygon. Indicates that road RD2 and river RR1 can be expressed as polylines.

For ease of understanding, the process of converting remote sensing images into vector maps using neural network models is first introduced.

Figure 2 shows a schematic diagram of a process of generating a map using a neural network model according to some embodiments of the present application. As shown in Figure 2, using neural network model 0 to convert remote sensing images into vector maps usually includes the following steps:

S21: Marking features. Mark the map elements in the partial image of the remote sensing image of the target area, such as houses, roads, lakes, etc., and obtain the reference outline of the map elements in the partial image (such as the vectorized outline of the house, the vectorized center line of the road, etc. ), this part of the image and the reference outline of the corresponding map element can be used as a sample image set;

S22: Model training. Use the sample image set to train the neural network model 0, so that the neural network model 0 can vectorize the map elements in each sample image in the sample image set, and obtain a predicted shape that is highly similar to the reference outline of each map element;

S23: Map reasoning. The trained neural network model 0 is used to infer the predicted image set (that is, images other than the sample image set) in the remote sensing image of the target area, and the predicted shape of the map element in each predicted image is obtained, where the predicted shape includes Predict the geometric primitives of shapes and the topological relationships between geometric primitives;

S24: Post-processing. Post-process the predicted shapes output by the neural network model 0, such as connecting the geometric primitives in each predicted shape to obtain the vectorized shape of the map elements, splicing the vectorized shapes of the map elements in different remote sensing images, etc., to obtain the predicted vector map;

S25: Correction. The predicted vector map is corrected by surveying and mapping personnel to obtain a vector map of the target area to ensure the accuracy of the vector map.

Continuing to refer to FIG. 2 , in some embodiments, the above-mentioned neural network model 0 may include a shape initialization network 1 , a shape regression network 2 and a topology reconstruction network 3 .

Among them, the shape initialization network 1 is used to extract the initial shape of each map element in the remote sensing image. For example, the shape initialization network 1 can extract the houses in the remote sensing image IM2 as polygons and the roads as polylines. In some embodiments, the shape initialization network 1 is also used to determine key points in roads and rivers, such as intersection points (intersection points of roads) in roads, divergence points and convergence points in rivers, etc.

Shape regression network 2 is used to optimize the initial shape obtained by shape initialization network 1, obtain the regression shape of each map element, and the direction data of each geometric primitive in the regression shape, so as to improve the accuracy of the geometry of each map element.

Topology reconstruction network 3 is used to infer the topological relationship between geometric primitives in the regression shape of each map element. For example, the geometric primitives of map elements described by polygons can be line segments, and the topological relationships between line segments can include but are not limited to collinearity. , parallel, etc. For another example, the map elements described by polylines can be points, and the topological relationships between points can include connections and non-connections.

After obtaining the topological relationship between the geometric primitives of each map element, the post-processing module 4 can be used to connect the geometric primitives, splice the map elements, etc. according to the topological relationship between the geometric primitives of each map element. Post-processing operations produce vector maps. It can be understood that in some embodiments, the post-processing module 4 can be implemented as a neural network or other processing logic, which is not limited here.

It can be understood that each network of the neural network model 0 may include one or more neural network layers, including but not limited to semantic segmentation network, convolutional network, pooling network, classification network, activation network, attention mechanism network, fully connected network, recurrent neural network, batch normalization (Batch Normalization, BN) network, etc.

It can be understood that the structure of the neural network model 0 shown in Figure 2 is just an example. In other embodiments, the neural network model 0 can also include more or less networks, and some networks can also be combined or split. No limitation is made here. For example, in some embodiments, the post-processing module 4 implemented in the form of a neural network may be included in the neural network model 0.

The following introduces the training process of each network in neural network model 0.

First, the training process of shape initialization network 1 is introduced.

Figure 3 shows a schematic structural diagram of a shape initialization network 1 according to some embodiments of the present application.

As shown in Figure 3, the shape initialization network 1 includes a semantic segmentation network 11, a mask generation network 12, an edge extraction network 13 and a shape generation network 14.

Among them, the semantic segmentation network 11 is used to extract the image features (Image Embedding) of each sample image in the sample image set. In some embodiments, the semantic segmentation network 11 may include a target detection network (Feature Pyramid Networks, FPN).

The mask generation network 12 is used to obtain the outline mask of each map element based on the image characteristics of each sample image in the sample image set. For example, in some embodiments, the mask generation network 12 may include a series of convolutional networks, batch normalization Batch Normalization (BN) network and activation network (such as linear rectification function Rectified Linear Unit, ReLU).

The edge extraction network 13 is used to extract the edges of the contour mask according to the contour mask and image features of each map element to obtain the mask edge of each map element, where the mask edge is used to describe the contour of the map element. In some embodiments, the edge extraction network 13 may include a concatenated convolutional network, a batch normalization (Batch Normalization, BN) network, and an activation network (such as a linear rectification function).

The shape generation network 14 is used to simplify the mask edges of each map element and obtain the initial shape of each map element to reduce the number of geometric primitives included in the initial shape, such as reducing the number of line segments included in polygons and reducing the number of polyline segments. The number of points included, etc., to improve the speed of inference on the input image using the neural network model 0. In some embodiments, algorithms such as Douglas Peucker (DP) can be used to simplify polygons, and algorithms such as non-maximum suppression (NMS) can be used to simplify polylines.

It can be understood that the structure of the shape initialization network 1 shown in Figure 3 is just an example. In other embodiments, the shape initialization network 1 can also adopt other structures, and each network can also be implemented using other types of neural networks. In This is not limited.

The training process of shape initialization network 1 is introduced below based on the structure of shape initialization network 1 shown in Figure 3.

Specifically, FIG. 4 shows a schematic diagram of the training process of the shape initialization network 1 according to some embodiments of the present application. The execution subject of this process is electronic equipment, as shown in Figure 4. The process includes the following steps:

S401: Obtain a sample image set.

The electronic device acquires a sample image set in the target area, and the sample image set includes the reference outline of the map element in each sample image.

In some embodiments, the sample image set may include N sample images, the size of each sample image is H×W (ie, H pixels high, W pixels wide), and each pixel in the sample image may include n channels ( n is the number of color channels of the sample image. For example, assuming that the sample image is an RGB image, n=3), so the sample image set can be expressed as a 4-dimensional matrix P. The size of the matrix P is N×n×H×W. The matrix The element P(i,j,k,m) of P represents the value of the j-th color channel of the element in the k-th row and m-th column of the i-th sample image.

It can be understood that in some embodiments, the reference outline of the map elements in the sample image set may include manually determined vector data of each map element, such as roads represented by polylines, houses represented by polygons, etc., so as to facilitate training the neural network. During the network model building process, the reference profile can be used to evaluate the quality of the results inferred by the neural network model, and the network parameters of the neural network model can be adjusted based on the evaluation results.

S402: Use the semantic segmentation network 11 to obtain the image features of the map elements in the sample image set.

The electronic device uses a semantic segmentation network 11, such as an FPN network, to extract features from the sample image set to obtain the image features of the sample image set.

In some embodiments, the electronic device can input the matrix P into the semantic segmentation network 11 to obtain the image feature matrix F of the sample image set. The size of the matrix F is N×C×H×W, where C is the semantic segmentation. The number of image features extracted by network 11 for each sample image. The element P(i, j, k, m) in the matrix F represents the value of the j-th feature of the k-th row and m-th column element of the i-th sample image, which is predetermined by the type of the semantic segmentation network 11 or by the developer. set up.

S403: Based on the image features of the sample image set, use the mask generation network 12 to obtain the contour masks of each map element, and use the edge extraction network 13 to obtain the mask edges of each contour mask.

In some embodiments, the electronic device can input the image feature matrix F of the sample image into the mask generation network 12 to obtain the outline mask of each map element. For example, in some embodiments, the outline mask of each map element It can be expressed as a matrix M. The size of the matrix M is N×p×H×W, where p represents the number of classifications of map elements (the following is an example of dividing map elements into 2 categories). For example, map elements can be divided into polygons. The map element represented by a polyline (p=1) and the map element represented by a polyline (p=2). Therefore, for a given p and N, a submatrix of size 1×1×H×W in the matrix M represents the contour mask of the p-th type map element in the N-th sample image. For example, in the sub-matrix with a size of 1×1×H×W, the values of elements belonging to the same type of map elements can be the same. Refer to Figure 5. For the sample image IM2, in the sub-matrix with a size of 1×1×H×W, The pixels where the house is located can all have a value of 1, and the other pixels can have a value of 0.

After obtaining the contour mask of the map element, in some embodiments, the electronic device can input the aforementioned contour mask and the corresponding reference contour into the edge extraction network 13 to obtain the mask edge of the contour mask. For example, referring to Figure 6A, the edge extraction network 13 can infer the coordinate distance DT between each pixel in the edge area of the contour mask and the reference contour. The size of DT is N×2×H×W. The element DT in DT ( i, j, k, m) represents the coordinate distance between the k-th row and m-th column pixel in the j-th type map element of the i-th sample image and the reference outline. DT(i, j, k, m) can include two The elements dx and dy represent the coordinate distance in the H direction and W direction respectively. For example, B1 is a point in the edge area of the contour mask shown in Figure 6A, the coordinates are (x, y), and the distance DT from B1 to the reference contour is (dx, dy), then B1 (x, y) corresponds to the mask The coordinates of the point on the edge are (x+dx,y+dy). In this way, by adding the coordinates of all points in the edge area and the corresponding coordinate distances, the coordinates of the points on the mask edge of the contour mask can be obtained. Assuming that there are L pixels in the edge area of the contour mask, the mask outline of the contour mask obtained from the L pixels can be expressed as a point set (x _i +dx _i ,y _i +dy _i ) (i= 1, 2,...,L).

It can be understood that the size of the edge area of the outline mask can be preset. For example, the edge area can be an area composed of pixels whose distance from the outermost pixel of the outline mask is less than a preset edge distance threshold.

It can be understood that in some embodiments, the coordinate distance from a pixel point in the edge area to the reference outline may be the difference between the coordinates of the point closest to the pixel point on the reference outline and the coordinates of the pixel point.

It can be understood that in the process of using the trained shape initialization network 1 to extract the mask edge corresponding to the contour mask of the map element, since there is no reference contour, the above DT can be a point-to-contour mask within the edge area of the contour mask. The coordinate distance of the outline composed of the outermost pixels of the membrane. For example, referring to Figure 6B, for the coordinate distance from the point E1 (x, y) in the edge area of the contour mask to the contour composed of the outermost pixels of the contour mask is (dx, dy), then the point E1 corresponds to the contour mask The coordinates of point E1' on the mask edge of the film are (x+dx, y+dy).

It can be understood that in other embodiments, the mask edge of the contour mask can also be obtained in other ways, for example, directly using the outermost point of the contour mask as the mask edge of the contour mask, which is not the case here. Make limitations.

S404: Use the shape generation network 14 to simplify the mask edges and obtain the initial shape.

The electronic device simplifies the mask edge of the outline mask of each map element to obtain the initial shape of each map element to reduce the number of geometric primitives included in the initial shape and improve the electronic device's use of neural network model 0 to reason about remote sensing images. speed. For example, through polygon simplification algorithms, such as the DP algorithm, polygons are simplified to obtain an initial polygon shape that includes fewer line segments; another example is through line simplification algorithms, such as the NMS algorithm, which thin out the points on the polyline to obtain an initial shape that includes fewer line segments. The initial shape of the polyline at the point.

It can be understood that the above simplification of the mask edge using the DP algorithm or the NMS algorithm is just an example. In other embodiments, other algorithms can also be used for simplification, which will not be described again here.

It can be understood that in some embodiments, the initial shape obtained by the electronic device may also include image features of each geometric primitive in the initial shape.

S405: Calculate the loss function and determine whether the termination condition is met based on the loss function.

The electronic device calculates the loss function based on the coordinates of each predicted point in the initial shape and the coordinates of the corresponding reference point on the reference contour, and determines whether the termination condition is met based on the loss function. If it is met, it indicates the initial shape obtained by the shape initialization network 1 If the requirements are met, go to step S406; otherwise, it means that the shape initialization network 1 cannot obtain an initial shape that meets the requirements based on the current network parameters, and go to step S407.

It can be understood that in some embodiments, different loss functions may be used for each network in the shape initialization network 1.

In some embodiments, for the semantic segmentation network 11 and the mask generation network 12, the loss function may be a cross-entropy loss function (Cross Entropy Loss Function), a focal loss function (Focal Loss Function), 0-1 loss, entropy and Cross entropy loss, softmax loss, etc.

For example, assuming that an image includes N1 pixels, and the semantic segmentation network 11 and the mask generation network 12 classify the N1 pixels into M1 categories (i.e., into M1 map elements), then the semantic segmentation network 11 and the mask generation network 12 The cross entropy loss L _11-12-CEL of the generation network 12 can be expressed as the following formula (1).

In formula (1), y _ij is a 0-1 variable. When the i-th pixel is within the j-th map element outline mask area, y _ij =1, otherwise y _ij =0; p _ij is the mask generated The network 12 determines the probability that the i-th pixel is within the contour mask area of the j-th type map element. It can be seen from formula (1) that the cross entropy loss L _11-12-CEL is used to indicate the accuracy of the contour mask obtained by the mask generation network 12. The smaller the L _11-12-CEL , the smaller the The accuracy of the contour mask is higher.

It can be understood that the cross-entropy loss L _11-12-CEL of the semantic segmentation network 11 and the mask generation network 12 reflects the accuracy of the contour mask obtained by using the semantic segmentation network 11 and the mask generation network 12, L _11-12-CEL The smaller the value, the higher the accuracy.

In some embodiments, corresponding to the edge extraction network 13, the loss function may include mean squared error (MSE, also known as L2 loss).

Assume that a certain initial shape includes N2 prediction points, the coordinates of the i-th prediction point are (x _i , y _i ), and the coordinates of the reference point corresponding to the i-th prediction point in the reference contour are (x _si , y _si ) , then the L2 loss L _13-L2 of the edge extraction network 13 can be expressed as the following formula (2):

It can be understood that the L2 loss L _13-L2 of the edge extraction network 13 reflects the similarity between the mask edge of the contour mask obtained by the edge extraction network 13 and the corresponding reference contour. The smaller the L _13-L2 , the greater the similarity. The higher, the higher the accuracy of edge extraction network 13.

It can be understood that in other embodiments, other types of loss functions can also be used to determine whether the termination condition is met.

It can be understood that the termination condition may include at least one of the following conditions: the loss function corresponding to each network converges, and the loss function value corresponding to each network is less than the corresponding preset loss function value. For example, when both the cross-entropy loss function converges and the L2 loss converges, the termination condition is determined to be satisfied; for another example, when the cross-entropy loss function is less than the corresponding first preset loss function value and the L2 loss value is less than the corresponding second preset loss function value, it is determined that the termination condition is met.

It can be understood that in other embodiments, the termination condition may also include other conditions, which are not limited here. For example, in some embodiments, the loss functions of each network can also be weighted and summed (that is, the loss functions of each network are multiplied by their corresponding weight values and then added) to obtain the total loss function. When the total loss function converges or is less than When presetting the total loss function value, it is determined that the termination condition is met. For example, in the case where the loss function includes the aforementioned L _11-12-CEL and L _13-L2 , the total loss function can be expressed as λ ₁ L _11-12-CEL + λ ₂ L _13-L2 , where λ ₁ represents cross entropy The weight of the loss and λ ₂ represent the weight of the L2 loss, and λ ₁ and λ ₂ can be preset by the developer.

S406: Store network parameters and obtain shape initialization network 1.

The electronic device stores the network parameters currently used by the shape initialization network 1 to obtain the shape initialization network 1 .

S407: Adjust network parameters and conduct the next round of training.

When the electronic device determines that the termination condition is not met, it adjusts the network parameters of the shape initialization network 1 and performs the next round of training. For example, when the corresponding loss functions of all networks do not meet the corresponding termination conditions, adjust the network parameters of all networks and conduct the next round of training; for another example, when the loss functions of some networks meet the corresponding termination conditions, the loss functions of other parts of the network meet the corresponding termination conditions. When the loss function does not meet the corresponding termination conditions, adjust the network parameters of the network whose loss function does not meet the corresponding termination conditions and conduct the next round of training; for another example, when the total loss function does not meet the corresponding termination conditions, Adjust the network parameters of at least part of the network and proceed to the next round of training.

It can be seen from the training process of the above-mentioned shape initialization network 1 that the shape initialization network 1 is based on the learning of the reference contours of each map element in the sample image of the target area, that is, the learning of the geometric characteristics of the map elements of the target area. The image is semantically segmented and the initial shape of each map element is obtained, which can better adapt to the geometric characteristics of the map elements in the target area and improve the accuracy of the initial shape of each map element.

For the trained shape initialization network 1, the electronic device can input the remote sensing image into the network to obtain the initial shape and image features of the map elements in the remote sensing image.

The training process of shape regression network 2 is introduced below.

Figure 7 shows a schematic structural diagram of a shape regression network 2 according to some embodiments of the present application.

As shown in FIG. 7 , the shape regression network 2 includes a pooling network 21 , a feature encoding network 22 , a direction generation network 23 and a shape adjustment network 24 .

Among them, the pooling network 21 is used to pool and interpolate the characteristic parameters of each geometric primitive in the initial shape to obtain the pooled characteristics of each geometric primitive. For example, the image features of the sample image extracted by the aforementioned semantic segmentation network 11 are in pixel units, but after passing through the aforementioned shape generation network 13, the coordinates of each geometric primitive in the initial shape are combined with the image features obtained by the semantic segmentation network 11. There is no one-to-one correspondence. At this time, the characteristics of each geometric primitive in the initial shape can be obtained through interpolation through the pooling network 21 according to the image features of adjacent pixels of each geometric primitive in the initial shape. For example, for the initial shape represented by polygons, the image features of the geometric primitives can be interpolated through the line feature interpolation (Line of Interest, LOI) method. For the initial shape represented by polylines, the point feature interpolation (Point of Interest) can be used. , POI) method for interpolation. Specific examples will be given below and will not be described in detail here.

The feature encoding network 22 is used to re-encode the pooling features of each geometric primitive in the initial shape to obtain the regression coding features of each geometric primitive. The regression coding features can be used to infer the direction data of each geometric primitive, and to Adjust the initial shape, etc. In some embodiments, the feature encoding network 22 may include a Multi-Head-Attention Network.

The direction generation network 23 is used to obtain the direction data of each geometric primitive based on the regression encoding characteristics of each geometric primitive. Wherein, when the geometric primitive is a point, the direction of the geometric primitive may be the tangent direction of the point; when the geometric primitive is a line segment, the direction of the geometric primitive may be the direction of the line segment. In some embodiments, the direction generation network 23 may include a convolutional network, a BN network, and an activation network in series.

It can be understood that in some embodiments, during the process of training the shape regression network 2, the direction data obtained by the direction generation network 23 can be used to calculate the angle constraint loss and L2 loss of the direction data, and according to the obtained angle constraint loss and L2 The loss is used to adjust the network parameters of the shape regression network 2 to improve the accuracy of the direction data of the geometric primitives obtained by the direction generation network 23. The specific calculation method will be introduced below and will not be described in detail here.

The shape adjustment network 24 is used to adjust the position of each geometric primitive in the initial shape according to the regression encoding characteristics of each geometric primitive to obtain a more accurate regression shape, and calculate the predicted points in each geometric primitive in the regression shape. The coordinates are relative to the coordinate residual of the corresponding point on the reference contour, which is used to calculate the loss function. In some embodiments, the direction generation network 23 may include a convolutional network, a BN network, and an activation network in series.

It can be understood that in some embodiments, the regression shape generated by the shape adjustment network 24 can be used to calculate the relative shape loss (Relative Shape Loss), and adjust the network parameters of the shape regression network 2 according to the relative shape loss to improve the shape adjustment network 24 The accuracy of the regression shape obtained, the specific calculation method will be introduced below, and will not be described in detail here.

Specifically, FIG. 8 shows a schematic diagram of the training process of the shape regression network 2 according to some embodiments of the present application. The execution subject of this process is electronic equipment, as shown in Figure 8. The process includes the following steps.

S801: Perform feature pooling on the image features and classification features of the geometric primitives in the initial shape to obtain the pooled features of the initial shape.

As mentioned before, the image features and classification features of the sample image obtained by the shape initialization network 1 are obtained based on pixel points, and there is no one-to-one correspondence between the points included in the initial shape and the pixel points. Therefore, the initial shape can be The image features and classification features of the geometric primitives are pooled, interpolated, etc., and the pooled features of each geometric primitive are obtained.

For example, for an initial shape represented by a polyline, whose geometric primitives are points, the POI algorithm can be used to obtain the pooling features of each geometric primitive in the initial shape. Specifically, assuming that the aforementioned semantic segmentation network 11 obtains the vector corresponding to the image feature of point A as c0 and the vector c1 corresponding to the image feature of point B, then in the initial shape, the vector corresponding to the image feature of point C located on line segment AB is c0+(c1-c0)l _AC /l _AB , where l _AC is the length of line segment AC, and l _AB is the length of line segment AB.

For another example, for an initial shape represented by a polygon, whose geometric primitives are line segments, the LOI algorithm can be used to obtain the pooling features of each geometric primitive in the initial shape. Specifically, multiple points on the line segment (for example, 32 points) can be obtained, and then the multiple points can be divided into several groups (for example, 32 points can be divided into 4 groups), and the multiple point pools can be obtained through the POI algorithm. features, and then average the pooled features of the midpoints of each group to obtain the pooled features of the group (for example, the vectors corresponding to the pooled features of 4 groups are n1, n2, n3, n4), and then average the pooled features of each group. The pooling features are connected to obtain the pooling features of the line segment (for example, the aforementioned vectors n1, n2, n3, and n4 are connected to obtain the vector n5 = [n1n2n3n4] corresponding to the pooling features of the line segment.

It can be understood that in other embodiments, the pooling characteristics of each geometric primitive in the initial shape can also be determined in other ways, which is not limited here.

S802: Use the feature encoding network 22 to encode the pooling features of the initial shape to obtain regression encoding features.

The electronic device uses the feature encoding network 22 to re-encode the pooled features of the initial shape, for example, discarding the features in the pooled features that have a small impact on the shape adjustment and orientation data, re-extracting the features that have a greater impact on the shape adjustment and orientation data, etc. , get the regression coding features.

In some embodiments, the feature encoding network 22 may be an encoding network based on a global attention mechanism, such as the aforementioned multi-head attention network.

S803: Based on the regression encoding features of the initial shape, use the direction generation network 23 to obtain the predicted direction data of the initial shape, and use the shape adjustment network 24 to adjust the initial shape to obtain the regression shape.

The electronic device inputs the encoding features of each geometric primitive of the initial shape into the direction generation network 23 and the shape adjustment network 24 to obtain the predicted direction data and regression shape of each geometric primitive in the initial shape respectively.

It can be understood that in some embodiments, when the initial shape is a polygon and the geometric primitive is a line segment, the direction of the geometric primitive is the direction of the line segment; when the initial shape is a polyline and the geometric primitive is a point, then the geometric primitive The direction of is the tangent direction of the point.

It can be understood that in some embodiments, the obtained regression shape includes the direction data of each geometric primitive, the coordinate data of the points in each geometric primitive, the order of each geometric primitive, etc.

S804: Orientation data and calculation of loss function based on regressed shape and geometric primitives.

The electronic device calculates the loss function based on the direction data of each geometric primitive in the initial shape and the regression shape.

For example, in some embodiments, the loss function may include an L2 loss based on the predicted direction data of each geometric primitive and the direction data of the corresponding point or line of the geometric primitive in the reference outline, used to indicate the direction generation network 23 The accuracy of the obtained directions of each geometric primitive, the smaller the L2 loss, indicates that the direction data of the geometric primitives obtained by the direction generation network 23 is more accurate. Specifically, assuming that a certain initial shape includes N3 primitives, the direction data of the i-th primitive is dr _i , and the direction data of the point or line corresponding to the i-th primitive in the reference outline is dr _si , then the direction is generated The L2 loss L _23-L2 of network 23 can be expressed as the following formula (3):

For another example, in some embodiments, the loss function may include a relative shape loss (Relative Shape Loss) obtained based on the coordinate residuals of each geometric primitive in the regression shape and the corresponding point or line in the reference contour, used to evaluate the regression Shape accuracy. The relative shape loss can be expressed by the average, sum, etc. of the shape loss of all points in the initial shape, where the shape loss of the non-intersection point in the polyline can be the projection distance of the point to the reference contour, the shape of the intersection point in the polyline The loss can be the distance from the point to the corresponding reference intersection point in the reference contour. The shape loss of a point in the polygon can be the projection distance from the point to the reference contour. The relative shape loss is used to indicate the similarity between the regression shape of the map element and the reference contour. The lower the relative shape loss, the higher the similarity between the regression shape obtained by using the shape regression network 2 and the corresponding reference contour, and the higher the accuracy of the regression shape. high.

For another example, in some embodiments, an angle constraint loss may be included in the loss function to improve the regularity of the regression shape. In some embodiments, the angle constraint loss L _TV can be expressed as the following formula (4).

In formula (4), N4 is the number of corners in the initial shape,

is the average angle of each corner in the initial shape, α _k is the angle of the k-th corner. It can be understood that the smaller L _TV is, the more regular the regression shape is.

It can be understood that in other embodiments, the loss function may also include other losses, such as smooth constraint loss used to improve the smoothness of the regression shape, etc., which are not limited here.

S805: Determine whether the termination condition is met based on the loss function.

The electronic device determines whether the termination condition is met based on the loss function. If it is met, it means that the regression shape meets the requirements and goes to step S806; otherwise, it means that the regression shape does not meet the requirements and goes to step S807.

It can be understood that the termination condition may include at least one of the following conditions: the loss function of each network converges, and the loss function value of each network is less than the corresponding preset loss function value.

It can be understood that the termination condition may also include that the total loss function is less than the total loss function threshold or the total loss function converges, wherein the total loss function can be obtained by the weighted sum of each loss function in the aforementioned step S805.

It can be understood that in other embodiments, the termination condition may also include other conditions, which are not limited here.

S806: Store network parameters and obtain shape regression network 2.

The electronic device stores the network parameters of the shape regression network 2 to obtain the shape regression network 2.

S807: Adjust network parameters and conduct the next round of training.

When the electronic device determines that the termination condition is not met, the electronic device adjusts the network parameters of the shape regression network 2 and performs the next round of training. For example, when the loss functions of each network in shape regression network 2 do not meet the corresponding termination conditions, the network parameters of each network can be adjusted; for another example, when the loss functions of only some networks do not meet the corresponding termination conditions When , you can only adjust the network parameters of this part of the network and perform the next round of training; for another example, when the total loss function does not meet the corresponding termination condition, you can adjust at least part of the network parameters of each network and perform the next round of training.

The training process of topology reconstruction network 3 is introduced below.

Figure 9 shows a schematic structural diagram of a topology reconstruction network 3 according to some embodiments of the present application. As shown in Figure 9, the topology reconstruction network 3 includes a pooling network 31, a feature encoding network 32 and a relationship reasoning network 33.

Among them, the pooling network 31 is used to interpolate the image features and direction data of each geometric primitive in the initial shape to obtain the pooling features of each geometric primitive. For details, reference may be made to the relevant description of the aforementioned pooling network 21, which will not be described again here.

The feature encoding network 32 is used to re-encode the pooled features of each geometric primitive in the initial shape, such as discarding features that have a small impact on topological relationship reasoning, extracting features that have a greater impact on topological relationship reasoning, etc., to obtain each geometric basis. Meta-inferential encoding features. In some embodiments, feature encoding network 32 may include a multi-head attention network.

The relational reasoning network 33 is used to obtain the topological relationship between each geometric primitive in the initial shape based on the inference encoding characteristics of each geometric primitive. Among them, when the initial shape is a polygon, the geometric primitives are line segments, and the topological relationships between two geometric primitives include: collinear, parallel, etc.; when the initial shape is a polyline, the geometric primitives are points, and each The relationship between geometric primitives includes connection/disconnection. In some embodiments, the relational reasoning network 33 may include a convolutional network, a BN network, an activation network, etc.

It can be understood that in some embodiments, during the process of training the topology reconstruction network 3, the predicted topological relationships between geometric primitives obtained by the relational reasoning network 33 can be used to calculate cross-entropy loss and supervised contrast loss, and to compare with the geometric basis. The direction data of the elements are combined to calculate loss functions such as geometric attributes and relationship consistency loss, and the network parameters of the topology reconstruction network 3 are adjusted based on the loss function to improve the accuracy of the predicted topological relationships between geometric primitives obtained by the topology reconstruction network 3. The specific calculation method will be introduced below and will not be described in detail here.

Specifically, FIG. 10 shows a schematic diagram of the training process of the topology reconstruction network 3 according to some embodiments of the present application. The execution subject of this process is electronic equipment, as shown in Figure 10. This process includes the following steps.

S1001: Perform feature pooling on the image features and direction data of the geometric primitives to obtain the pooled features of the geometric primitives.

The electronic device performs feature pooling on the image features and direction data of the geometric primitives in the initial shape to obtain the pooled features of each geometric primitive. For details, please refer to the relevant description of step S801, which will not be described again here.

It can be understood that for map elements represented by polygons, the electronic device can perform feature pooling on the image features and direction data of the regression shape to obtain the pooled features of the geometric primitives.

S1002: Use the feature encoding network 32 to encode the pooled features of the geometric primitives to obtain inference encoding features.

The electronic device uses the feature encoding network 32 to encode the pooling features of the initial shape to obtain the inference encoding features of each geometric primitive. For details, please refer to the aforementioned step S802, which will not be described again here.

S1003: Based on the inference coding features, use the relational inference network 33 to obtain the predicted topological relationship between geometric primitives.

The electronic device inputs the encoding features of each geometric primitive into the relational reasoning network 33 to obtain the predicted topological relationship between each geometric primitive in the initial shape.

In some embodiments, the predicted topological relationship between each geometric primitive in the initial shape can be represented by a matrix R. The size of the matrix R is K×K, where K is the number of geometric primitives in the initial shape, and in the matrix R The element R(i,j) is used to indicate the topological relationship between the i-th geometric primitive and the j-th geometric primitive, such as connection, collinearity, parallelism, etc.

For example, referring to Figure 11, the predicted topological relationship between the geometric primitives of the initial shape consisting of 8 points P1, P2, P3, P4, P5, P6, P7, P8 can be a matrix of size 8×8, i The element R(i,j) in row j-th column represents the topological relationship between point Pi and point Pj. For example, R(3,5)=0 means that the topological relationship between point P3 and point P5 is not connected, R(3,4)=1 means that the topological relationship between point P3 and point P4 is connected, R(1,1)= -1 indicates that there is no topological relationship between point P1 and itself.

It can be understood that in some embodiments, for expressing the initial shape in a polyline manner, the relational reasoning network 33 can perform feature extraction again according to the inference encoding features of each geometric primitive to obtain the hidden space (Hidden Space, also known as Hidden Space) of each geometric primitive. feature space) (hereinafter referred to as latent space features), by calculating the distance between each geometric primitive and the latent space feature of a certain primitive, and determining the preset number of geometries with the smallest distance from the latent space of this geometric primitive. The topological relationship between the primitive and this geometric primitive is set to connected.

S1004: Calculate the loss function based on the predicted topological relationship between each geometric primitive in the initial shape.

The electronic device calculates the loss function based on the topological relationship between each geometric primitive obtained by the relational reasoning network 33 .

For example, referring to Figure 11, in some embodiments, the loss function may include a cross-entropy loss _LCEL between the predicted topological relationship of each geometric primitive and the reference topological relationship, the loss is based on the predicted topological relationship between each geometric primitive in the initial shape and the reference topological relationship of each geometric primitive is determined. For example, in some embodiments, _LCEL can be calculated by the following formula (5).

In formula (5), N5 is the number of geometric primitives in the initial shape, R _ij is the predicted topological relationship value of the i-th geometric primitive and the j-th geometric primitive (for example, the i-th row and j-th column of the aforementioned matrix R element value), R0 _ij is the reference topological relationship between the i-th geometric primitive and the j-th geometric primitive. For example, based on formula (5), the polyline in Figure 11 has a consistency loss of geometric attributes and relationships L _CEL =-18.

For another example, in some embodiments, when the initial shape is a polygon, the loss function may also include a consistency loss _LC of geometric attributes and relationships, which is used to characterize the attributes of geometric primitives and the topological relationship between geometric primitives. consistency, during the training process, by reducing L _C , the accuracy of the predicted topological relationship determined by the relational reasoning network 33 can be improved. In some embodiments, _LC can be calculated by the following formula (6).

In formula (6), N6 is the number of geometric primitives in the initial shape; c _i is the attribute of the i-th geometric primitive; c _j is the attribute of the j-th geometric primitive; tr is the topological relationship between geometric primitives: The ideal distance corresponding to the attributes of the two geometric primitives of r. For example, when r indicates that the topological relationship between the two geometric primitives is parallel or collinear, the attributes of the geometric primitive can include the direction data of the geometric primitive, such as the tangent direction. , when two line segments are parallel or collinear, the tangent directions of the two line segments should be the same, so tr should be 0.

For another example, in some embodiments, the loss function may also include a supervised contrastive loss (Supervised Contrastive Loss), in order to improve the relationship reasoning network 33 in the process of determining the topological relationship between geometric primitives based on the inference encoding characteristics of each geometric primitive. , the consistency of the topological relationship between the latent space features of each extracted geometric primitive and the inferred geometric primitive. That is to say, by making the supervised contrast loss satisfy the termination condition, such as being less than the preset supervised contrast loss or the supervised contrast loss function converging, the latent space characteristics of geometric primitives with topological relationships such as connections and collinearities can also be similar. Therefore, when the inference coding features corresponding to the geometric primitives of the map elements in the predicted image set are input to the relational reasoning network 33, the latent space features of each geometric primitive extracted by the relational reasoning network 33 are more consistent with the reasoning results. Improve the accuracy of predicted topological relationships. Specifically, in some embodiments, the supervised contrast loss L _SCL can be calculated by the following formula (7).

In formula (7), I represents the set of geometric primitives; P(i) represents the set of geometric primitives that have a connection or collinear relationship with the i-th geometric primitive, |P(i)| represents the set P(i ) (that is, the number of elements included in the set P(i)); A(i) represents the set of geometric primitives that do not have a connection or collinear relationship with the i-th geometric primitive; z _i represents the geometric primitive The vector corresponding to the latent space feature of i; z _p represents the vector corresponding to the latent space feature of geometric primitive p; z _α represents the vector corresponding to the latent space feature of geometric primitive α; τ is the scalar temperature parameter, Is a positive real domain hyperparameter (i.e. τ∈R ⁺ ), which can be preset by developers; ·Represents the vector dot product. It can be seen from formula (7) that the smaller the value of the supervised contrast loss, the greater the similarity of the vectors corresponding to the latent space features of the geometric primitives with connected or collinear relationships, and the greater the similarity of the vectors corresponding to the geometric primitives that do not have connected or collinear relationships. The smaller the similarity of the vectors corresponding to the latent space features.

It can be understood that in other embodiments, the supervised contrast loss can also be calculated in other ways, which is not limited here.

It can be understood that in other embodiments, the loss function may also include more loss functions, which is not limited here.

It can be understood that in some embodiments, the reference topological relationship of each geometric primitive can be calculated dynamically, that is, the reference points of the points in each geometric primitive in the reference outline are first determined, and the topological relationship between the reference points is used as the reference point of each geometric primitive. Topological relationships of geometric primitives.

S1005: Determine whether the termination condition is met based on the loss function.

Based on the loss function, the electronic device determines whether the termination condition is met. If so, it means that the predicted topology relationship obtained by the topology reconstruction network 3 meets the requirements, and go to step S1006; otherwise, it means that the predicted topology relationship obtained by the topology reconstruction network 3 does not meet the requirements, and go to step S1006. Go to step S1007.

It can be understood that in some embodiments, the electronic device may determine that the termination condition is met when each loss function converges or each loss function value is less than the corresponding preset loss function value.

In other embodiments, when the electronic device determines multiple loss functions in step S1005, the total loss function obtained by the weighted sum of the multiple loss functions can converge or be less than the preset total loss function value. , it is determined that the termination conditions are met. For example, in the case where the loss function includes the cross-entropy loss L _CEL , the consistency loss of geometric attributes and relations L _C and the supervised contrast loss L _SCL , the total loss function can be expressed as λ ₃ L _CEL + λ ₄ L _C + λ ₅ L _SCL , where λ ₃ represents the weight of the cross entropy loss L _CEL , λ ₄ represents the weight of the consistency loss of geometric attributes and relationships _LC , λ ₅ represents the weight of the supervision contrast loss L _SCL , λ ₃ , λ ₄ , λ ₅ Can be preset by the developer.

S1006: Store network parameters and obtain topology reconstruction network 3.

When the electronic device determines that the termination conditions are met, the electronic device stores the network parameters of the topology reconstruction network 3 to obtain the topology reconstruction network.

S1007: Adjust network parameters and conduct the next round of training.

When the electronic device determines that the termination conditions are not met, the electronic device adjusts the network parameters of the topology reconstruction network 3 and performs the next round of training.

Through the training process shown in the embodiments shown in Figure 3 to Figure 11, the network parameters of the neural network model 0 can be trained, and based on the network parameters, the predicted images in the predicted image set of the target area are inferred to obtain each Vectorized map of predicted images. Specifically, the shape initialization network 1 can be used to first obtain the initial shape of the map element in the predicted image, and then the shape regression network 2 can be used to adjust the initial shape to obtain a regression shape with higher accuracy, and then the topology reconstruction network 3 can be used to obtain the regression shape. The topological relationship between the geometric primitives in the vector is finally connected through the post-processing module 4 to connect the geometric primitives in the regression shape to obtain a vectorized map of the map elements.

It can be seen from this that, referring to Figure 12, the training process of the neural network model 0 and the process of inferring the image using the trained neural network model 0 may be asymmetric. For map elements represented by polygons, since polygons are more complex than polylines, during the training process, the regression shape is used as the input of the topology reconstruction network 3 to train the topology reconstruction network 3. For map elements represented by polylines, the input , since the polyline is relatively simple, the polyline of the initial shape is used as the input of the topology reconstruction network 3 to train the topology reconstruction network 3. Since the initial shape has worse accuracy than the regression shape, the topology reconstruction network 3 can be used when the input data accuracy is low. Correct prediction results are still obtained, which can improve the anti-noise ability of the topology reconstruction network 3 and improve the stability of the topology reconstruction network 3.

The following introduces the process of generating vector maps using the previously trained neural network model 0.

Figure 13 shows a schematic flowchart of a map generation method according to some embodiments of the present application. The execution subject of this process is electronic equipment, as shown in Figure 13. This process includes the following steps.

S1301: Use the shape initialization network 1 to obtain the initial shape of the map element in the predicted image.

The electronic device inputs the predicted image into the shape initialization network 1, uses the semantic segmentation network 11 to extract image features of the predicted image, then uses the mask generation network 12 to obtain the contour mask of the map elements in the predicted image, and then uses the edge extraction network 13 The mask edges of the contour mask are extracted, and finally the shape generation network 14 is used to simplify the mask edges to obtain the initial shape of the map primitive. For example, referring to the aforementioned Figure 2, after inputting the image IM2 to the shape initialization network 1, the initial shapes of the houses and roads in the image IM2 can be obtained.

S1302: Use the shape regression network 2 to infer the initial shape and obtain the regression shape of the initial shape and the direction data of the geometric primitives.

The electronic device inputs the initial shape of the map element into the shape regression network 2, uses the pooling network 21 and the feature encoding network 22 to obtain the regression encoding features of the geometric primitives in the initial shape, and then uses the direction generation network 23 to obtain the direction of the geometric primitives. Data, the shape adjustment network 24 is used to adjust the initial shape to obtain a regression shape with higher accuracy and more regular shape.

S1303: Use topological reconstruction network 3 to obtain the topological relationship between geometric primitives in the regression shape.

The electronic device inputs the regression shape into the topology reconstruction network 3, uses the pooling network 31 and the feature encoding network 32 to obtain the inference coding features of each geometric primitive in the regression shape, and then uses the relational reasoning network 33 to obtain the relationship between the geometric primitives in the regression shape. topological relationship. For example, after inputting the regression shape of the road in the aforementioned image IM2 into the topology reconstruction network 3, the topological relationship shown in Figure 11 can be obtained.

S1304: Based on the regression shape, the topological relationship between the geometric primitives, and the direction data of the geometric primitives, use the post-processing module 4 to obtain the vector map.

The electronic device uses the post-processing module 4 to obtain the vector map based on the regression shape, the topological relationship between the geometric primitives, and the direction data of the geometric primitives.

In some embodiments, when the regression shape is a polygon, the post-processing module 4 may first rotate each line segment in the regression shape to be the same as the direction data of the line segment. For example, with reference to Figure 14, line segments S1S2 and S2 in the regression shape The direction of line segment S2S3 is inconsistent with the direction (horizontal) generated by the aforementioned shape regression network 2. The post-processing module can rotate line segment S1S2 clockwise to horizontal and rotate line segment S2S3 counterclockwise to horizontal.

Then, use the post-processing module 4 to connect the line segments that are no longer connected due to the adjustment of the line segment direction (that is, connect the end point of each line segment to the point closest to the end point among the line segments adjacent to its end point) to obtain a closed polygon, such as , referring to Figure 14, after rotating the line segment S1S2 and the line segment S2S3 to the level, the line segment S1S2', the line segment S2S3' and the line segment S3S4 are no longer connected, and the post-processing module 4 connects the end point S2' of the line segment S1S2' with the adjacent line segment S2S3' Connect the endpoint S2 of the line segment S2S3' to the endpoint S3 of the adjacent line segment S3S4.

Finally, the post-processing module 4 can delete the line segments in the obtained closed polygon whose length is less than the preset side length threshold. When deleting a line segment, it can determine whether the two lines connected at both ends are parallel/collinear. If they are parallel/collinear, Then the two line segments are merged into one line segment, otherwise the two line segments are extended until they intersect. In this way, the simplicity of the output polygon can be adjusted by setting different preset side length thresholds. For example, referring to Figure 14, the lengths of line segments S2'S2, S3S3', S6S7, and S9S10 are less than the preset side length threshold, and the line segments S2'S2, S3S3', S6S7, and S9S10 can be deleted. Since the line segment S1S2' , line segment S2S3' and line segment S3S4 have the same direction and can be merged into one line segment S1S4. Since line segments S5S6 and S7S8 are neither parallel nor collinear, and line segments S8S9 and S10S11 are not parallel, line segment S7S8 will be extended to intersect with line segment S5S6 to obtain line segment S6S8, Extend the line segment S10S11 and intersect the line segment S8S9 to obtain the line segment S9S11, and obtain a vector map with a regular shape.

In some embodiments, when the regression shape is a polyline, the post-processing module 4 can connect points with connection relationships based on the points in the regression shape and the topological relationship between the points to obtain a vector polyline.

Through the method provided by the embodiments of this application, since the neural network model 0 is obtained based on learning the geometric characteristics of the map elements of the target area, a vector map with higher accuracy can be obtained based on the remote sensing image of the target area. In addition, for different target areas, by marking the reference contours of map elements in the remote sensing images of the different target areas and retraining the neural network model 0, the trained neural network model 0 can be used, based on the different targets The remote sensing images of the area can be used to obtain vector maps of different target areas without the need for complex heuristic rule settings and parameter adjustments. In large-scale map construction scenarios, such as mapping areas that include multiple regions, cities or countries In vectorized scenarios, while ensuring the accuracy of vector maps, the efficiency of vector map generation can be improved.

In order to further verify the accuracy of the map generation method provided in this application, remote sensing images in public data sets were used for verification.

First, based on the data set in the open source data set CrowdAI, the vectorization results of houses provided by the map generation method provided in the embodiment of this application are compared with the effect of the current SOTA algorithm with higher accuracy. The results are shown in Table 1.

Table 1 Test results on the CrowdAI data set

方法method	平均正切角度误差mean tangent angle error
SOTA算法SOTA algorithm	31.9°31.9°
本申请this application	26.7°26.7°

As can be seen from Table 1, the mean max tangent angle errors (Mean max tangent angle errors) of the house vectorization results of the SOTA algorithm are 31.9°, while the mean max tangent angle errors of the house vectorization results of the map generation method provided by this application are 26.7°. , an increase of 16.3%. Among them, the average tangent angle error refers to the average tangent angle error between the lines in the vector maps in different remote sensing images and the corresponding reference lines inferred. The lower the error value, the more accurate the model is in vectorizing houses. high. For example, assuming that a certain model is used to vectorize N6 remote sensing images, and the maximum value of the direction difference between each line segment in each remote sensing image and the corresponding reference line segment is dtan(i), then the model will vectorize the N6 remote sensing images. The average tangent angle error of remote sensing images can be recorded as

Furthermore, FIG. 15 shows a schematic diagram of the result of vectorizing houses in part of remote sensing images using neural network model 0 according to some embodiments of the present application. It can be seen from Figure 15 that the contour mask of the map element in the image obtained by the semantic segmentation network 11 is quite different from the actual contour of the house, and through the shape regression network 2, the regression shape obtained is highly similar to the actual contour of the house. , and then the houses in the vector map obtained are also highly similar to the actual shapes of the houses.

In addition, based on the open source data set SpaceNet3_Road, the road vectorization results of the map generation method provided in the embodiment of this application were compared with the current Sat2Graph algorithm with higher accuracy. The results are shown in Table 2.

Table 2 Test results on SpaceNet3_Road data set

方法method	模型大小Model size	拓扑结构相似度topological similarity	平均路径长度相似度average path length similarity
Sat2Graph算法Sat2Graph algorithm	200M200M	80.9780.97	64.4364.43
本申请this application	100M100M	86.6386.63	67.6767.67

It can be seen from Table 2 that the neural network model of this application occupies less space than the model of the Sat2Graph algorithm. The obtained vector map has a higher topology similarity (Topology Similarity) and a higher average path length similarity (Average Path Length) with the reference vector map. Similarity, APLS) is also higher. Among them, topological similarity refers to the similarity between the topological structure of the vector road network obtained by model inference and the reference vector road network. The higher the score, the higher the accuracy of the vector road network obtained by the model; APLS is used to indicate the model The similarity between the lines in the inferred vector road network and the lines in the reference vector road network. The higher the score, the higher the accuracy of the vector road network obtained by using the model.

Further, Figure 16 shows a schematic diagram of the result of vectorizing roads in remote sensing images using neural network model 0 according to some embodiments of the present application. It can be seen from Figure 16 that the direction of the points in the polyline obtained by the neural network model 0 is consistent with the direction of the reference road.

17A and 17B are schematic diagrams showing the reconstruction effect of some relatively complex roads in remote sensing images using neural network model 0 according to some embodiments of the present application. It can be seen from Figure 17A and Figure 17B that the vector road network obtained by the map generation method provided by the embodiment of the present application has a high degree of coincidence with the road center in the remote sensing map, indicating that the accuracy of the obtained vector map is high.

It can be understood that the use of remote sensing images in the foregoing embodiments to introduce the technical solutions of the present application is only an example. The technical solutions of the embodiments of the present application can also be applied to any other images including map elements (such as photos, aerial images, etc.) Vectorize the map elements in .

Furthermore, embodiments of the present application also provide a map generation device for implementing the map generation method provided by the foregoing embodiments.

Specifically, FIG. 18 shows a schematic diagram of the results of the map generation device 200 according to some embodiments of the present application. As shown in FIG. 18 , the map generation device 200 includes: a data acquisition unit 201 , an initial shape generation unit 202 , a shape regression unit 203 , a topology reconstruction unit 204 and a post-processing unit 205 .

Among them, the data acquisition unit 201 is used to acquire an image of a certain area, and the image includes map elements, where the map elements are elements in the image to be converted into vector maps.

The initial shape generation unit 202 is configured to use a first model (such as the aforementioned shape initialization network 1) to perform inference on the image to obtain a first geometric figure corresponding to the map element, where the first geometric figure includes geometric primitives. For details, reference may be made to the relevant description of step S1301, which will not be described again here.

The shape regression unit 203 is used to input a second model (such as the aforementioned shape regression network 2) based on the first geometric figure to obtain the direction of each geometric primitive, and to obtain the second geometric figure corresponding to the map element based on the first geometric figure. The second geometric figure includes the same geometric primitives as the first geometric figure, and the position arrangement of the geometric primitives in the second geometric figure is different from the position arrangement of the geometric primitives in the first geometric figure. For details, reference may be made to the relevant description of step S1302, which will not be described again here.

The topology reconstruction unit 204 is configured to use a third model (such as the aforementioned topology reconstruction network 3) to obtain the topological relationship between each geometric primitive based on the direction of the geometric primitive and the second geometric figure. For details, reference may be made to the relevant description of step S1303, which will not be described again here.

The post-processing unit 205 obtains a vector map corresponding to the image based on the topological relationship between each geometric primitive, the direction of each geometric primitive, and the second geometric figure. For example, in some embodiments, the post-processing unit 205 may be used to perform related operations of the aforementioned post-processing module 4. For details, reference may be made to the related description of the aforementioned step S1304, which will not be described again here.

It can be understood that the structure of the map generation device 200 shown in Figure 18 is only a schematic. In other embodiments, the map generation device 200 may also include more or less units, or some units may be merged or split. No limitation is made here.

It can be understood that in the above embodiments, the electronic device used to train the neural network model 0 or the electronic device used to perform inference using the neural network model 0 can be any electronic device capable of training or inferring the neural network model, including but not Limited to laptops, desktops, tablets, servers, etc., without limitation here. The following uses the electronic device 100 as an example to illustrate the structure of an electronic device used to train the neural network model 0 or to perform inference using the neural network model 0. Specifically, FIG. 19 shows a schematic structural diagram of an electronic device 100 for executing embodiments of the present application according to some embodiments of the present application. The electronic device 100 may include one or more processors 101, system memory 102, non-volatile memory (NVM) 103, input/output (I/O) devices 104, communication interface 105, and System control logic 106 couples processor 101, system memory 102, non-volatile memory 103, input/output (I/O) devices 104, and communication interface 105. in:

The processor 101 may include one or more processing units. For example, the processor 101 may include a central processing unit (CPU), an application processor (AP), a modem processor, a graphics processor ( graphics processing unit (GPU), image signal processor (ISP), controller, video codec, digital signal processor (DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc. Among them, different processing units can be independent devices or integrated in one or more processors. In some embodiments, the processor 101 can be used to execute relevant instructions for training the aforementioned neural network model 0 or using the trained neural network model 0 to perform inference on remote sensing images.

In particular, in some embodiments, the NPU can be used to run related instructions of the neural network model 0 to perform semantic segmentation of the image, generate a contour mask of the map element, generate a mask outline of the contour mask, and generate an initialization of the map element. Shape/regression shape, direction data/topological relationship of generated geometric primitives, etc.

System memory 102 is a volatile memory, such as random access memory (Random-Access Memory, RAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), etc. The system memory is used to temporarily store data and/or instructions. For example, in some embodiments, the system memory 102 can be used to temporarily store network parameters of the neural network model 0, sample image sets, train the neural network model 0, or utilize neural network model 0. Network model 0 performs intermediate data in the inference process, stores vector maps, etc.

Non-volatile memory 103 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions. In some embodiments, the non-volatile memory 103 may include any suitable non-volatile memory such as flash memory and/or any suitable non-volatile storage device, such as a hard disk drive (Hard Disk Drive, HDD), optical disk ( Compact Disc (CD), Digital Versatile Disc (DVD), Solid-State Drive (SSD), etc. In some embodiments, the non-volatile memory 103 may also be a removable storage medium, such as a Secure Digital (Secure Digital, SD) memory card, etc. In other embodiments, the non-volatile memory 103 can be used to permanently store network parameters of the neural network model 0, sample image sets, intermediate data in the process of training the neural network model 0 or using the neural network model 0 for inference. , store vector maps, etc.

In particular, system memory 102 and/or non-volatile storage 103 may include copies of instructions 107 . When executed by at least one of the processors 101, the instructions 107 cause the electronic device 100 to train all or at least a part of the neural network model 0 through the method provided by the embodiment of the present application, or use the neural network model 0 to perform inference.

Input/output (I/O) device 104 may include a user interface that enables a user to interact with electronic device 100, such as selecting or inputting a sample image set, marking map elements in the sample image set, etc.

Network interface 105 may include a transceiver for providing a wired or wireless communications interface for electronic device 100 to communicate with any other suitable device over one or more networks. In some embodiments, the electronic device 100 can establish a communication connection with other electronic devices through the network interface 105 to obtain sample image sets, predicted image sets, etc. from other electronic devices.

System control logic 106 may include any suitable interface controller to provide any suitable interfaces to other modules of electronic device 100 . For example, in some embodiments, system control logic 106 may include one or more memory controllers to provide an interface for processor 101 to system memory 102 and non-volatile memory 103 . For another example, in other embodiments, the system control logic 106 may include at least one Peripheral Component Interconnect (PCI) controller to provide the processor 101 to use the PCI bus to connect to the computer connected to the computer through the PCI interface. Interfaces of devices/devices/modules (such as graphics cards, sound cards, etc.) of the electronic device 100.

In some embodiments, at least one of the processors 101 may be packaged with logic for one or more controllers of the system control logic 106 to form a system in package (SiP). In other embodiments, at least one of the processors 101 may also be integrated on the same chip with the logic of one or more controllers for the system control logic 106 to form a system-on-chip (SoC). ).

It can be understood that the electronic device 100 can be any electronic device capable of deep learning model training, including but not limited to laptop computers, desktop computers, tablet computers, servers, etc., which are not limited here.

It can be understood that the structure of the electronic device 100 shown in the embodiment of the present application does not constitute a specific limitation on the electronic device 100 . In other embodiments, the electronic device 100 may include more or fewer components than illustrated, some components may be combined, some components may be separated, or components may be arranged differently. The components illustrated may be implemented in hardware, software, or a combination of software and hardware.

Various embodiments of the mechanisms disclosed in this application may be implemented in hardware, software, firmware, or a combination of these implementation methods. Embodiments of the present application may be implemented as a computer program or program code executing on a programmable system including at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements) , at least one input device and at least one output device.

Program code may be applied to input instructions to perform the functions described herein and to generate output information. Output information can be applied to one or more output devices in a known manner. For the purposes of this application, a processing system includes any system having a processor such as a digital signal processor (DSP), microcontroller, application specific integrated circuit (ASIC), or microprocessor.

Program code may be implemented in a high-level procedural language or an object-oriented programming language to communicate with the processing system. When necessary, assembly language or machine language can also be used to implement program code. In fact, the mechanisms described in this application are not limited to the scope of any particular programming language. In either case, the language may be a compiled or interpreted language.

In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried on or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be operated by one or more processors Read and execute. For example, instructions may be distributed over a network or through other computer-readable media. Thus, machine-readable media may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy disks, optical disks, optical disks, read-only memories (CD-ROMs), magnetic Optical disk, read-only memory (ROM), random-access memory (RAM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), magnetic or optical card, flash memory, or Tangible machine-readable storage used to transmit information (e.g., carrier waves, infrared signals, digital signals, etc.) using electrical, optical, acoustic, or other forms of propagated signals over the Internet. Thus, machine-readable media includes any type of machine-readable media suitable for storing or transmitting electronic instructions or information in a form readable by a machine (eg, computer).

In the drawings, some structural or methodological features may be shown in specific arrangements and/or orders. However, it should be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a different manner and/or order than shown in the illustrative figures. Additionally, the inclusion of structural or methodological features in a particular figure is not meant to imply that such features are required in all embodiments, and in some embodiments these features may not be included or may be combined with other features.

It should be noted that each unit/module mentioned in each device embodiment of this application is a logical unit/module. Physically, a logical unit/module can be a physical unit/module, or it can be a physical unit/module. Part of the module can also be implemented as a combination of multiple physical units/modules. The physical implementation of these logical units/modules is not the most important. The combination of functions implemented by these logical units/modules is what solves the problem of this application. Key technical issues raised. In addition, in order to highlight the innovative part of this application, the above-mentioned equipment embodiments of this application do not introduce units/modules that are not closely related to solving the technical problems raised by this application. This does not mean that the above-mentioned equipment embodiments do not exist. Other units/modules.

It should be noted that in the examples and descriptions of this patent, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply There is no such actual relationship or sequence between these entities or operations. Furthermore, the terms "comprises," "comprises," or any other variations thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also those not expressly listed other elements, or elements inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a" does not exclude the presence of additional identical elements in a process, method, article, or device that includes the stated element.

The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited thereto. Any person familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present invention. should be covered by the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

A map generation method, applied to electronic devices, characterized in that the method includes:

Obtain an image of a certain area, where the image includes a map element, where the map element is an element in the image to be converted into a vector map;

Using the first model to reason on the image, obtain a first geometric figure corresponding to the map element, where the first geometric figure includes geometric primitives;

Input a second model based on the first geometric figure to obtain the direction of each geometric primitive, and obtain a second geometric figure corresponding to the map element based on the first geometric figure, in which It includes the same geometric primitives as the first geometric figure, and the position arrangement of the geometric primitives in the second geometric figure is different from the position arrangement of the geometric primitives in the first geometric figure;

Using a third model, the topological relationship between each of the geometric primitives is obtained based on the direction of the geometric primitives and the second geometric figure;

Based on the topological relationship between each of the geometric primitives, the direction of each of the geometric primitives, and the second geometric figure, a vector map corresponding to the image is obtained.
The method according to claim 1, characterized in that at least one of the first model, the second model, and the third model is trained based on the geometric characteristics of the map elements of the certain area.
The method according to claim 1, characterized in that, when the geometric primitive is a line segment, the second geometric figure also includes a connection sequence of each geometric primitive; and,

The vector map corresponding to the image is obtained based on the topological relationship between the geometric primitives, the direction of each geometric primitive, and the second geometric figure, including:

Adjust the direction of the first geometric primitive in the second geometric figure to be the same as the direction corresponding to the first geometric primitive, wherein the direction of the first geometric primitive in the second geometric figure is the same as The directions corresponding to the first geometric primitives are different;

Connect the first geometric primitive and the second geometric primitive to obtain a polygon corresponding to the second geometric figure, wherein the connection sequence of the second geometric primitive is the same as that of the first geometric primitive. adjacent.
The method according to claim 3, characterized in that the polygon corresponding to the second geometric figure includes a first line segment, a second line segment and a third line segment connected in sequence; and, based on each of the geometric bases, The topological relationship between elements, the direction of each geometric primitive, and the second geometric figure are used to obtain a vector map corresponding to the image, which also includes:

If the length of the second line segment is less than the preset side length threshold, delete the second line segment; and

When the topological relationship between the first line segment and the third line segment is collinear or parallel, merge the first line segment and the second line segment into one line segment;

When the topological relationship between the first line segment and the third line segment is not collinear or parallel, extend the first line segment and/or the third line segment so that the first line segment and the The third line segment intersects.
The method according to claim 1, characterized in that when the geometric primitives are points, the method is based on the topological relationship between the geometric primitives, the direction of each geometric primitive, the For the second geometric figure, obtain the vector map corresponding to the image, including:

Connect the points whose topological relationship is connected to obtain the corresponding vectorized polyline.
The method according to claim 1, characterized in that said using the first model to reason on the image to obtain the first geometric figure corresponding to the map element includes:

Perform semantic segmentation on the image to obtain a contour mask of the map element, where the contour mask is used to indicate the area where the map element is located in the image;

Extract the mask edge of the contour mask;

Simplify the mask edge to obtain the first geometric figure.
The method of claim 1, wherein the map elements include at least one of houses, roads, lakes, oceans, rivers, forests, and deserts; and

The first geometric figure corresponding to houses, lakes, oceans, forests, and deserts is polygon;

The first geometric figure corresponding to roads and rivers is polyline.
The method according to any one of claims 1 to 7, characterized in that the method further includes:

The first model is trained by:

Obtain sample data, which includes a sample image set of a certain area and a reference outline corresponding to a map element in each sample image in the sample image set;

The first model is used to compare the image features of each sample image, and based on the image features, a contour mask of the map element in each sample image is obtained. The contour mask indicates that the map element is in the corresponding sample image. area in;

Based on the contour mask, obtain the first predicted geometry corresponding to the map element in each of the sample images;

The first model is trained based on a first loss function value and a second loss function value, wherein the first loss function is used to indicate the accuracy of the contour mask, and the second loss function is used to indicate The similarity between the first predicted geometry and the reference contour.
The method according to any one of claims 1 to 7, characterized in that the method further includes:

The second model is trained by:

Obtain sample data, which includes reference contours corresponding to map elements in each sample image in the sample image set of a certain area, reference directions corresponding to each geometric primitive in the reference contour, and using the first The third geometric figure corresponding to the map element in each sample image obtained by the model;

Using the second model, obtain the second predicted geometric figure corresponding to each map element in each of the sample images, and the predicted direction of the geometric primitive in the third geometric figure, wherein the second predicted geometric figure includes and The third geometric figure has the same geometric primitives, and the arrangement of the geometric primitives in the second predicted geometric figure is different from that of the third geometric figure;

The second model is trained based on a third loss function and a fourth loss function, wherein the third loss function is used to indicate the difference between the predicted direction of the geometric primitive in the third geometric figure and the corresponding reference direction. The similarity and the fourth loss function are used to indicate the similarity between the second predicted geometric figure and the corresponding reference contour.
The method according to any one of claims 1 to 7, characterized in that the method further includes:

Obtain sample data, which includes the sample image set of a certain area, the reference topological relationship between the geometric primitives in the reference outline corresponding to the map element of each sample image, and each of the geometric primitives obtained using the first model. The fourth geometric figure corresponding to the map element in the sample image, and the direction of the geometric primitive in the fourth geometric figure;

Using the third model, the latent space characteristics of each geometric primitive in the fourth geometric figure are determined, and based on the latent space characteristics, the inter-geometric primitives in the fourth geometric figure are determined. predicted topological relationships;

The third model is trained based on a fifth loss function and a sixth loss function, wherein the fifth loss function is used to indicate the predicted topological relationship between the geometric primitives in the fourth geometric figure and the corresponding reference topological relationship. The matching degree, the sixth loss function is used to indicate the similarity of the latent space features between geometric primitives whose predicted topological relationships are parallel, collinear or connected.
A model training method, applied to electronic equipment, characterized in that the method includes:

Obtain sample data, which includes the reference contour corresponding to the map element in each sample image in the sample image set of a certain area, the fifth geometric figure or sixth geometric figure corresponding to each map element, and the fifth geometric figure corresponding to the fifth geometric figure. The direction of the geometric primitives, and the image features of the geometric primitives in the fifth geometric figure, wherein the image features of the geometric primitives in the fifth geometric figure are obtained by reasoning with the fourth model to obtain each map element The fifth geometric figure is generated when the similarity between the fifth geometric figure and the corresponding reference outline is lower than the similarity between the sixth geometric figure and the corresponding reference outline, and the fifth geometric figure and the sixth geometric figure are Shapes have the same geometric primitives;

Based on inputting the fifth geometric figure or the sixth geometric figure, the image features of the geometric primitives in the fifth geometric figure, and the direction of the geometric primitives in the fifth geometric figure into a network with first network parameters. The fifth model obtains the latent space characteristics corresponding to each of the geometric primitives, and infers the predicted topological relationship between each of the geometric primitives based on the latent space characteristics corresponding to each of the geometric primitives;

Determine the seventh loss function and the eighth loss function based on the predicted topological relationship between the geometric primitives in the fifth geometric figure and the corresponding reference topological relationship, wherein the reference topological relationship may be based on the map elements in each sample image The corresponding reference contour is determined, the seventh loss function is used to indicate the matching degree of the predicted topological relationship between the geometric primitives in the fifth geometric figure and the corresponding reference topological relationship, and the eighth loss function is used to indicate Predict the similarity of the latent space features between geometric primitives whose topological relationships are parallel, collinear or connected;

When the seventh loss function and the eighth loss function satisfy the termination condition, save the fifth model with the first network parameters;

When the seventh loss function and the eighth loss function do not satisfy the termination condition, the network parameters of the fifth model are adjusted to the second network parameters and the next round of training is performed.
The method according to claim 11, characterized in that, when the geometric primitive in the fifth geometric figure is a line segment, it is determined in the following manner that the seventh loss function and the eighth loss function satisfy the termination condition:

Based on the directions of the geometric primitives in the fifth geometric figure, the directional relationship between the geometric primitives and the reference direction relationship corresponding to the topological relationship are determined, and a ninth loss function is determined, and the ninth loss function is used to indicate The predicted topological relationship of each geometric primitive is consistent with the direction;

When the seventh loss function, the eighth loss function, and the ninth loss function all converge, or the seventh loss function, the eighth loss function, and the ninth loss function are all smaller than the corresponding predetermined Assuming that the loss function value, or the total loss function converges, or the total loss function is less than the corresponding preset total loss function value, it is determined that the termination condition is met, wherein the total loss function includes the seventh loss function, the The eighth loss function and the weighted sum of the ninth loss function.
The method according to claim 11, characterized in that the image features based on the fifth geometric figure or the sixth geometric figure, the geometric primitives in the fifth geometric figure, the The direction of the geometric primitives is used to obtain the latent space features corresponding to each geometric primitive, including:

In the case where the geometric primitive of the fifth geometric figure is a point, based on the fifth geometric figure, the image features of the geometric primitive in the fifth geometric figure, the geometric primitive in the fifth geometric figure, The direction of each geometric primitive is obtained to obtain the latent space characteristics corresponding to each geometric primitive;

In the case where the geometric primitive of the fifth geometric figure is a line segment, based on the image features of the sixth geometric figure, the geometric primitive in the fifth geometric figure, the geometric primitive in the fifth geometric figure, The direction of the element is used to obtain the latent space characteristics corresponding to each geometric primitive.
A computer-readable storage medium, characterized in that the computer-readable storage medium includes instructions that, when executed by an electronic device, cause the electronic device to implement the method described in any one of claims 1 to 13 method.
An electronic device, characterized by including:

memory for storing instructions for execution by one or more processors of the electronic device; and

The processor is one of the processors of the electronic device, and is configured to execute the instructions stored in the memory to implement the method according to any one of claims 1 to 13.
A computer program product, characterized by comprising a computer program/instruction that implements the method of any one of claims 1 to 13 when executed by a processor.