WO2023216251A1 - Map generation method, model training method, readable medium, and electronic device - Google Patents

Map generation method, model training method, readable medium, and electronic device Download PDF

Info

Publication number
WO2023216251A1
WO2023216251A1 PCT/CN2022/092810 CN2022092810W WO2023216251A1 WO 2023216251 A1 WO2023216251 A1 WO 2023216251A1 CN 2022092810 W CN2022092810 W CN 2022092810W WO 2023216251 A1 WO2023216251 A1 WO 2023216251A1
Authority
WO
WIPO (PCT)
Prior art keywords
geometric
loss function
map
primitives
primitive
Prior art date
Application number
PCT/CN2022/092810
Other languages
French (fr)
Chinese (zh)
Inventor
王磊
黄经纬
何佳男
刘吉哲
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202280090751.XA priority Critical patent/CN118613792A/en
Priority to PCT/CN2022/092810 priority patent/WO2023216251A1/en
Publication of WO2023216251A1 publication Critical patent/WO2023216251A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/56Information retrieval; Database structures therefor; File system structures therefor of still image data having vectorial format

Definitions

  • This application relates to the field of image processing, and in particular to a map generation method, a model training method, a readable medium and an electronic device.
  • the neural network model can be used to obtain a vector map of a certain area based on remote sensing images of the area.
  • neural network model reasoning is usually used to obtain the initial outline of map elements (such as houses, lakes, roads, rivers, etc.) in remote sensing images, and then the initial outline of map elements is adjusted through vectorization rules set by developers. For example, adjust the angle between lines, etc., and convert the outline of map elements into a vector map.
  • embodiments of the present application provide a map generation method, a model training method, a readable medium, and an electronic device.
  • the neural network model learns the geometric characteristics of map elements in a certain area to convert the outline of the map element into the corresponding vector map, which is beneficial to improving the accuracy of the obtained vector map and is more suitable for large-scale vector map modeling scenarios.
  • embodiments of the present application provide a map generation method, which is applied to electronic devices.
  • the method includes: obtaining an image of a certain area, and the image includes map elements, where the map elements are elements in the image to be converted into vector maps. element; use the first model to reason on the image to obtain the first geometric figure corresponding to the map element, the first geometric figure includes geometric primitives; input the second model based on the first geometric figure to obtain the direction of each geometric primitive, and , based on the first geometric figure, a second geometric figure corresponding to the map element is obtained.
  • the second geometric figure includes the same geometric primitives as the first geometric figure, and the position arrangement of the geometric primitives in the second geometric figure is the same as that of the first geometric figure.
  • the positions of the geometric primitives in the graphics are arranged differently; using the third model, the topological relationship between each geometric primitive is obtained based on the direction of the geometric primitive and the second geometric figure; based on the topological relationship between each geometric primitive, each geometric primitive The direction of the element and the second geometric figure are used to obtain the vector map corresponding to the image.
  • the electronic device can first use the first model (such as the shape initialization network below) to reason about the outline of the map element in the image of a certain area, and obtain the first geometric figure corresponding to the map element (such as The initial shape below), and then use the second model (such as the shape regression network below) to adjust the first geometric shape to obtain a second geometric shape with higher accuracy and more regular shape (such as the regression shape below ), and then use the third model (such as the topology reconstruction network below) to deduce the topological relationship between the geometric primitives in the second geometric shape (for example, when the second geometric shape is a polyline, deduce the points that make up the polyline ), and then based on the topological relationship between geometric primitives, the vector map corresponding to the image is obtained.
  • the first model such as the shape initialization network below
  • the electronic device converts map elements into vector maps based on the pre-trained first model, second model, and third model, rather than obtaining vector maps based on vectorization rules set by technicians, which is beneficial to improving the obtained Vector map accuracy.
  • retraining at least one of the first model, the second model, and the third model for different areas can well adapt to the geometric characteristics of map elements in different areas without the need for Setting up vectorization rules and adjusting complex parameters will help improve the efficiency of vector map modeling.
  • the geometric primitive is the basic component unit of the geometric figure.
  • the geometric primitive can be the line segments that make up the polygon.
  • the geometric primitive can be the polygon. points of each line segment.
  • At least one of the first model, the second model, and the third model is trained based on geometric features of map elements in a certain area.
  • At least one of the first model, the second model, and the third model may be trained based on the geometric features of the map elements in a certain area. That is to say, the electronic device uses the certain area. Vectorizing the map elements based on the geometric characteristics of the regional map elements is beneficial to improving the accuracy of the resulting vector map.
  • the second geometric figure when the geometric primitives are line segments, the second geometric figure also includes the connection sequence of each geometric primitive; and based on the topological relationship between each geometric primitive, each The direction of the geometric primitive and the second geometric figure are used to obtain the vector map corresponding to the image, including: adjusting the direction of the first geometric primitive in the second geometric figure to be the same as the direction corresponding to the first geometric primitive, where the first The direction of the geometric primitive in the second geometric figure is different from the direction corresponding to the first geometric primitive; connect the first geometric primitive and the second geometric primitive to obtain a polygon corresponding to the second geometric figure, where the second geometric figure The connection sequence of primitives is adjacent to the first geometric primitive.
  • the polygon corresponding to the second geometric figure includes a first line segment, a second line segment and a third line segment connected in sequence; and based on the topological relationship between the geometric primitives, The direction of each geometric primitive and the second geometric figure are used to obtain the vector map corresponding to the image, which also includes: deleting the second line segment when the length of the second line segment is less than the preset side length threshold; and adding the first line segment and When the topological relationship of the third line segment is collinear or parallel, the first line segment and the second line segment are merged into one line segment; when the topological relationship of the first line segment and the third line segment is not collinear or parallel, Extend the first line segment and/or the third line segment so that the first and third line segments intersect.
  • the vector corresponding to the image is obtained based on the topological relationship between the geometric primitives, the direction of each geometric primitive, and the second geometric figure.
  • the map includes: connecting the points whose topological relationship is connected to obtain the corresponding vectorized polyline.
  • using the first model to reason on the image to obtain the first geometric figure corresponding to the map element includes: performing semantic segmentation on the image to obtain a contour mask of the map element, and the contour mask is The mask is used to indicate the area where the map element is located in the image; the mask edge of the contour mask is extracted; the mask edge is simplified to obtain the first geometric figure.
  • the electronic device can use the semantic segmentation network below to obtain the contour mask of the area where the map element is located in the image, and use the edge extraction network below to extract the edges of the contour mask, obtain the mask edges, and then use
  • the DP algorithm simplifies multiple edges or uses the NMS algorithm to simplify polylines to obtain a first geometric figure that includes fewer geometric primitives, thereby reducing the number of geometric primitives in the first geometric figure, which is beneficial to improving the electronic equipment based on the first The speed with which geometry can be reasoned about.
  • the map elements include at least one of a house, a road, a lake, an ocean, a river, a forest, and a desert; and the first geometric figure corresponding to the house, a lake, an ocean, a forest, and a desert is Polygon; the first geometric figure corresponding to roads and rivers is polyline.
  • an image may include one map element or multiple map elements.
  • Electronic devices can represent map elements that need to be represented by specific shapes such as houses, lakes, oceans, forests, deserts, etc. as polygons, and represent roads, rivers, etc. as polylines.
  • the above method further includes: training the first model in the following manner:
  • sample data which includes a sample image set of a certain area and a reference outline corresponding to a map element in each sample image in the sample image set; use the first model to identify image features of each sample image, and obtain each sample based on the image features.
  • the contour mask of the map element in the image indicates the area of the map element in the corresponding sample image; based on the contour mask, the first predicted geometry corresponding to the map element in each sample image is obtained; based on the first loss
  • the first model is trained by the function value and the second loss function value, wherein the first loss function is used to indicate the accuracy of the contour mask, and the second loss function is used to indicate the similarity between the first predicted geometry and the reference contour.
  • the first model is trained based on the sample image set of a certain area, so that the first model extracts the contour mask and the first predicted geometric shape of the map elements in the sample image. All have a high degree of similarity with the reference contour corresponding to the map element, and the geometric characteristics of the map element in the sample image are learned, so that the first model infers the first geometric shape of the map element in the image of a certain area.
  • the geometric characteristics of the map elements that conform to the certain area are beneficial to improving the accuracy of the first geometric shape, and thus are beneficial to improving the accuracy of the obtained vector map.
  • the first loss function may be the cross-entropy loss L 11-12-CEL below
  • the second loss function may be the L2 loss L 13-L2 below.
  • the above method further includes: training the second model in the following manner: obtaining sample data, which includes references corresponding to map elements in each sample image in the sample image set of a certain area. contour, the reference direction corresponding to each geometric primitive in the reference contour, and the third geometric figure corresponding to the map element in each sample image obtained by using the first model; using the second model, obtain the corresponding reference direction of each map element in each sample image
  • the second prediction geometry the prediction direction of the geometric primitives in the third geometry, wherein the second prediction geometry includes the same geometric primitives as the third geometry, and the geometric primitives in the second prediction geometry
  • the arrangement is different from that of the third geometric figure; the second model is trained based on the third loss function and the fourth loss function, where the third loss function is used to indicate the predicted direction and correspondence of the geometric primitives in the third geometric figure
  • the similarity of the reference direction and the fourth loss function are used to indicate the similarity between the second predicted geometric figure and the corresponding reference outline.
  • the third loss function may be the L2 loss L 23-L2 below, and the fourth loss function may be the relative shape loss below.
  • the above method further includes: obtaining sample data, the sample data includes a sample image set of a certain area, and the reference between geometric primitives in the reference outline corresponding to the map element of each sample image. Topological relationships, as well as the fourth geometric figures corresponding to the map elements in each sample image obtained by the first model, and the directions of the geometric primitives in the fourth geometric figures; using the third model, determine each of the fourth geometric figures.
  • the latent space characteristics of the geometric primitives determine the predicted topological relationship between the geometric primitives in the fourth geometric figure; train the third model based on the fifth loss function and the sixth loss function, where the fifth The loss function is used to indicate the matching degree between the predicted topological relationship between the geometric primitives in the fourth geometric figure and the corresponding reference topological relationship, and the sixth loss function is used to indicate the predicted topological relationship between geometric primitives that are parallel, collinear or connected. similarity between latent space features.
  • the sixth loss function (such as the supervised contrast loss below) indicates the similarity of latent space features between geometric primitives whose topological relationships are predicted to be parallel, collinear, or connected. degree, so that when using the third model to reason about the above-mentioned second geometric figure, the similarity of the latent space features of parallel, collinear or connected geometric primitives is also higher, which is beneficial to improving the obtained second geometric figure.
  • the accuracy of the topological relationship between the geometric primitives in the map is conducive to improving the accuracy of the vector map based on the topological relationship.
  • the fifth loss function may be the cross-entropy loss LCEL below
  • the sixth loss function may be the supervised contrast loss LSCL below.
  • embodiments of the present application provide a model training method, which is applied to electronic devices.
  • the method includes:
  • the sample data includes the reference outline corresponding to the map element in each sample image in the sample image set of a certain area, the fifth geometric figure or sixth geometric figure corresponding to each map element, and the geometric primitives in the fifth geometric figure. direction, and the image features of the geometric primitives in the fifth geometric figure, where the image features of the geometric primitives in the fifth geometric figure are generated when the fifth geometric figure of each map element is obtained through reasoning using the fourth model,
  • the similarity between the fifth geometric figure and the corresponding reference outline is lower than the similarity between the sixth geometric figure and the corresponding reference outline, and the fifth geometric figure and the sixth geometric figure have the same geometric primitive;
  • each The latent space characteristics corresponding to the geometric primitives, and based on the latent space characteristics corresponding to each geometric primitive, the predicted topological relationship between each geometric primitive is inferred;
  • the seventh loss function and the eighth loss function are determined, wherein the reference topological relationship can be based on the reference contour corresponding to the map element in each sample image. It is determined that the seventh loss function is used to indicate the matching degree between the predicted topological relationship between the geometric primitives in the fifth geometric figure and the corresponding reference topological relationship, and the eighth loss function is used to indicate whether the predicted topological relationship is parallel, collinear or connected.
  • the fifth model may be used to obtain the characteristics of the geometric figures and the directions of the geometric primitives in the geometric figures based on the geometric figures corresponding to the map elements, such as the third model in the above-mentioned first aspect, and the following topology reconstruction network, etc.
  • the fifth geometric shape with lower accuracy can be used as input to train the fifth model, so that the predicted topological relationship between the geometric primitives in the fifth geometric shape obtained by the fifth model is consistent with the corresponding reference topological relationship.
  • the area degree is high, so that the fifth model can obtain higher-precision output data with lower-precision input data, which is conducive to improving the anti-noise ability of the fifth model, so that the second geometry obtained by the above-mentioned second model can Even when the accuracy of the graphics is low, a more accurate topological relationship between geometric primitives can be obtained, thereby improving the accuracy of the vector map obtained based on the topological relationship between the geometric primitives.
  • the seventh loss function may be the cross-entropy loss LCEL below, and the eighth loss function may be the supervised contrast loss LSCL below.
  • the seventh loss function and the eighth loss function satisfy the termination condition in the following way: based on the fifth geometry
  • the direction of the geometric primitives in the graphics determines the directional relationship between the geometric primitives and the reference direction relationship corresponding to the topological relationship, and determines the ninth loss function.
  • the ninth loss function is used to indicate the predicted topological relationship of each geometric primitive. Consistency with direction;
  • the seventh loss function, the eighth loss function, and the ninth loss function all converge, or the seventh loss function, the eighth loss function, and the ninth loss function are all smaller than the corresponding preset loss function value, or the total loss function converges, or When the total loss function is less than the corresponding preset total loss function value, it is determined that the termination condition is met, where the total loss function includes the weighted sum of the seventh loss function, the eighth loss function, and the ninth loss function.
  • the ninth loss function may be the consistency loss L C of geometric attributes and relationships below.
  • the latent space features corresponding to each geometric primitive include: when the geometric primitive of the fifth geometric figure is a point, image features based on the fifth geometric figure, the geometric primitives in the fifth geometric figure, the fifth geometric figure The direction of the geometric primitives in , the corresponding latent space characteristics of each geometric primitive are obtained; when the geometric primitive of the fifth geometric figure is a line segment, based on the geometric primitives in the sixth geometric figure and the fifth geometric figure The image features and the direction of the geometric primitives in the fifth geometric figure are used to obtain the latent space characteristics corresponding to each geometric primitive.
  • the fifth geometric figure is used as the input of the fifth model.
  • the graphic is a polygon
  • the sixth geometric image with higher accuracy is used as the input of the fifth model, thereby ensuring the accuracy of the fifth model's inference of the topological relationships of geometric primitives in more complex polygons while improving the accuracy of the reasoning.
  • the noise immunity of simpler polyline input data since the complexity of polylines is lower than that of polygons, when training the fifth model, when the fifth geometric figure is a polyline, the fifth geometric figure is used as the input of the fifth model.
  • a map generation device which includes: a data acquisition unit configured to acquire an image of a certain area, where the image includes map elements, wherein the map elements are the Elements in the image to be converted into vector maps; an initial shape generation unit used to use the first model to reason on the image to obtain a first geometric figure corresponding to the map element, where the first geometric figure includes a geometric base element; a shape regression unit, configured to input a second model based on the first geometric figure to obtain the direction of each geometric primitive, and obtain a second geometric figure corresponding to the map element based on the first geometric figure.
  • the second geometric figure includes the same geometric primitives as the first geometric figure, and the position arrangement of the geometric primitives in the second geometric figure is the same as the position of the geometric primitives in the first geometric figure.
  • the arrangements are different;
  • the topology reconstruction unit is used to use the third model to obtain the topological relationship between the geometric primitives based on the direction of the geometric primitives and the second geometric figure;
  • the post-processing unit is used to obtain the topological relationship between the geometric primitives based on the direction of each geometric primitive.
  • the topological relationship between the geometric primitives, the direction of each geometric primitive, and the second geometric figure are used to obtain a vector map corresponding to the image.
  • the electronic device can first use the first model (such as the shape initialization network below) to reason about the outline of the map element in the image of a certain area, and obtain the first geometric figure corresponding to the map element (such as The initial shape below), and then use the second model (such as the shape regression network below) to adjust the first geometric shape to obtain a second geometric shape with higher accuracy and more regular shape (such as the regression shape below ), and then use the third model (such as the topology reconstruction network below) to deduce the topological relationship between the geometric primitives in the second geometric shape (for example, when the second geometric shape is a polyline, deduce the points that make up the polyline ), and then based on the topological relationship between geometric primitives, the vector map corresponding to the image is obtained.
  • the first model such as the shape initialization network below
  • the electronic device converts map elements into vector maps based on the pre-trained first model, second model, and third model, rather than obtaining vector maps based on vectorization rules set by technicians, which is beneficial to improving the obtained Vector map accuracy.
  • retraining at least one of the first model, the second model, and the third model for different areas can well adapt to the geometric characteristics of map elements in different areas without the need for Setting up vectorization rules and adjusting complex parameters will help improve the efficiency of vector map modeling.
  • embodiments of the present application provide a computer-readable storage medium.
  • the computer-readable storage medium includes instructions.
  • the electronic device When the instructions are executed by an electronic device, the electronic device enables the electronic device to implement the above-mentioned first aspect and each aspect of the first aspect. Any of the possible implementations, the above-mentioned second aspect, or any method provided by various possible implementations of the above-mentioned second aspect.
  • inventions of the present application provide an electronic device.
  • the electronic device includes: a memory for storing instructions executed by one or more processors of the electronic device; and a processor that is one of the processors of the electronic device. 1.
  • a processor that is one of the processors of the electronic device. 1.
  • inventions of the present application provide a computer program product.
  • the computer program product includes a computer program/instruction.
  • the above-mentioned first aspect and various possibilities of the above-mentioned first aspect are realized.
  • any method provided by the above-mentioned second aspect or various possible implementations of the above-mentioned second aspect are realized.
  • Figure 1A shows a schematic diagram of a process of obtaining a vector map through images according to some embodiments of the present application
  • Figure 1B shows a schematic image diagram including only one map element according to some embodiments of the present application
  • Figure 1C shows a schematic image diagram including multiple map elements according to some embodiments of the present application.
  • Figure 2 shows a schematic diagram of a process of generating a map using a neural network model according to some embodiments of the present application
  • Figure 3 shows a schematic structural diagram of a shape initialization network 1 according to some embodiments of the present application
  • Figure 4 shows a schematic diagram of the training process of the shape initialization network 1 according to some embodiments of the present application.
  • Figure 5 shows a schematic diagram of a house and a corresponding contour mask in image IM2 according to some embodiments of the present application
  • Figure 6A shows a schematic diagram of the coordinate distance between points in the edge area of a contour mask and a reference contour according to some embodiments of the present application
  • Figure 6B shows a schematic diagram of the coordinate distance between a point in the edge area of the contour mask and the outermost pixel of the contour mask according to some embodiments of the present application;
  • Figure 7 shows a schematic structural diagram of a shape regression network 2 according to some embodiments of the present application.
  • Figure 8 shows a schematic diagram of the training process of the shape regression network 2 according to some embodiments of the present application.
  • Figure 9 shows a schematic structural diagram of a topology reconstruction network 3 according to some embodiments of the present application.
  • Figure 10 shows a schematic diagram of the training process of the topology reconstruction network 3 according to some embodiments of the present application.
  • Figure 11 shows a schematic diagram of the calculation process of a topological relationship and cross-entropy loss according to some embodiments of the present application
  • Figure 12 shows a schematic process diagram of a training process and an inference process according to some embodiments of the present application
  • Figure 13 shows a schematic flow chart of a map generation method according to some embodiments of the present application.
  • Figure 14 shows a schematic diagram of post-processing polygons according to some embodiments of the present application.
  • Figure 15 shows a schematic diagram of the results of vectorizing houses in some remote sensing images using neural network model 0 according to some embodiments of the present application;
  • Figure 16 shows a schematic diagram of the results of vectorizing roads in remote sensing images using neural network model 0 according to some embodiments of the present application
  • Figure 17A shows a schematic diagram of the reconstruction effect of a road in a relatively complex remote sensing image using neural network model 0 according to some embodiments of the present application;
  • Figure 17B shows a schematic diagram of the reconstruction effect of a road in another complex remote sensing image using neural network model 0 according to some embodiments of the present application;
  • Figure 18 shows a schematic structural diagram of a map generation device according to some embodiments of the present application.
  • FIG. 19 shows a schematic structural diagram of an electronic device 100 for executing embodiments of the present application according to some embodiments of the present application.
  • Illustrative embodiments of the present application include, but are not limited to, map generation methods, model training methods, readable media, program products, apparatus, and electronic devices.
  • the neural network model Because the goal is to make the output of the neural network model as close as possible to the value that you really want to predict, you can compare the predicted value of the current network with the really desired target value, and then based on the two to update the weight vector of each layer of the neural network according to the difference between them (of course, there is usually an initialization process before the first update, that is, pre-configuring parameters for each layer in the neural network model). For example, if the network When the predicted value is high, adjust the weight vector to lower the predicted value, and continue to adjust until the neural network model can predict the truly desired target value or a value that is very close to the truly desired target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value".
  • loss function loss function
  • objective function objective function
  • vectorization rules are used to convert the outlines of map elements inferred by the neural network model into vector maps. If the same vectorization rules are used to vectorize maps of images in different areas, the accuracy of the resulting vector map will be It may be lower. For example, for different cities or regions, due to different architectural styles, the outline characteristics of the buildings are quite different. For example, in the vectorization rules, the corners of the geometric outline of the house are optimized to be right angles, and the geometric outline of the house is circular, Or areas of the image that are irregular polygons, you will get wrong house outlines. If different contour acquisition methods and vectorization rules need to be set for different areas, the process is complicated, resulting in low efficiency in generating vector maps, and is not suitable for large-scale vector map modeling scenarios.
  • vectorization rules refer to rules set by technical personnel to adjust the relationship between lines or points in the outline of map elements to obtain a more reasonable vectorized map. For example, set two lines whose included angle is greater than the preset value to be parallel or collinear, set two lines whose included angle is in a certain range to be perpendicular, and move all points whose distance from a certain straight line is smaller than the preset value to The straight line is equal. It is not difficult to understand that the effectiveness of the setting of vectorization rules depends on the experience of the technician and the outline of the referenced map elements, and the adjustment process is complicated.
  • map elements are represented by geometric figures (for example, polylines are used to represent roads, polygons are used to represent roads, etc.) to represent houses, etc.), and by learning the geometric features of the map elements in the sample images of the target area, to ensure that the geometric figures derived by the neural network model based on the outlines of the map elements can be consistent with the geometry of the map elements of the target area.
  • the features are matched, and then the predicted image of the target area is inferred based on the trained neural network model to obtain the geometric figures corresponding to the map elements in the target area, thereby obtaining the vector map.
  • the map elements in the image are vectorized based on the learning of the geometric characteristics of the map elements in the target area, rather than the developer setting vectorization rules to vectorize the map elements.
  • ization which can improve the accuracy of vector maps.
  • Rule setting and adjustment can improve the efficiency of vector map generation while ensuring the accuracy of vector maps in large-scale map construction scenarios, such as vectorizing maps for areas that include multiple regions, cities, or countries.
  • a semantic segmentation network can be used to obtain the outline mask of map elements (such as houses, roads, etc.) in the image, and then based on heuristic rules set by the developer (i.e. vectorization rules) to convert the outlines of map elements into vector maps.
  • heuristic rules can be replaced by geometric feature learning and topological reconstruction, and the neural network model is used to learn the geometric features of the map elements in the sample image of the target area, so that the map obtained by the neural network model The geometric figures of the elements can more accurately reflect the geometric characteristics of the map elements in the target area.
  • the neural network model is used to determine the topological relationship between the geometric primitives in the geometric figures, and then the geometric primitives of the geometric figures are topologically connected.
  • a vector map for example, connect points into polylines to represent roads or rivers, and connect lines into polygons to represent houses or lakes.
  • geometric primitives refer to the basic constituent elements of each geometric figure.
  • the geometric primitives of polylines can be points, and the geometric primitives of polygons can be ordered line segments.
  • map elements may include but are not limited to houses, lakes, oceans, roads, rivers, forests, deserts, etc.
  • map elements that need to be described with specific shapes on the map such as houses, lakes, oceans, forests, deserts, etc.
  • map elements that do not need to be described by specific shapes on the map such as roads, rivers, etc.
  • the map elements represented by polygons are houses and the map elements represented by polylines are roads.
  • an image may include at least one map element.
  • the image IM11 only includes the road RD1, and the road RD1 can be represented by a polyline in the vector map; for another example, referring to Figure 1C, the image IM12 includes the house HE1, the road RD2, and the river RR1, and the house HE1 can be represented by a polygon. Indicates that road RD2 and river RR1 can be expressed as polylines.
  • FIG. 2 shows a schematic diagram of a process of generating a map using a neural network model according to some embodiments of the present application.
  • using neural network model 0 to convert remote sensing images into vector maps usually includes the following steps:
  • S21 Marking features. Mark the map elements in the partial image of the remote sensing image of the target area, such as houses, roads, lakes, etc., and obtain the reference outline of the map elements in the partial image (such as the vectorized outline of the house, the vectorized center line of the road, etc. ), this part of the image and the reference outline of the corresponding map element can be used as a sample image set;
  • S22 Model training. Use the sample image set to train the neural network model 0, so that the neural network model 0 can vectorize the map elements in each sample image in the sample image set, and obtain a predicted shape that is highly similar to the reference outline of each map element;
  • the trained neural network model 0 is used to infer the predicted image set (that is, images other than the sample image set) in the remote sensing image of the target area, and the predicted shape of the map element in each predicted image is obtained, where the predicted shape includes Predict the geometric primitives of shapes and the topological relationships between geometric primitives;
  • Post-processing Post-processing. Post-process the predicted shapes output by the neural network model 0, such as connecting the geometric primitives in each predicted shape to obtain the vectorized shape of the map elements, splicing the vectorized shapes of the map elements in different remote sensing images, etc., to obtain the predicted vector map;
  • the predicted vector map is corrected by surveying and mapping personnel to obtain a vector map of the target area to ensure the accuracy of the vector map.
  • the above-mentioned neural network model 0 may include a shape initialization network 1 , a shape regression network 2 and a topology reconstruction network 3 .
  • the shape initialization network 1 is used to extract the initial shape of each map element in the remote sensing image.
  • the shape initialization network 1 can extract the houses in the remote sensing image IM2 as polygons and the roads as polylines.
  • the shape initialization network 1 is also used to determine key points in roads and rivers, such as intersection points (intersection points of roads) in roads, divergence points and convergence points in rivers, etc.
  • Shape regression network 2 is used to optimize the initial shape obtained by shape initialization network 1, obtain the regression shape of each map element, and the direction data of each geometric primitive in the regression shape, so as to improve the accuracy of the geometry of each map element.
  • Topology reconstruction network 3 is used to infer the topological relationship between geometric primitives in the regression shape of each map element.
  • the geometric primitives of map elements described by polygons can be line segments, and the topological relationships between line segments can include but are not limited to collinearity. , parallel, etc.
  • the map elements described by polylines can be points, and the topological relationships between points can include connections and non-connections.
  • the post-processing module 4 can be used to connect the geometric primitives, splice the map elements, etc. according to the topological relationship between the geometric primitives of each map element. Post-processing operations produce vector maps. It can be understood that in some embodiments, the post-processing module 4 can be implemented as a neural network or other processing logic, which is not limited here.
  • each network of the neural network model 0 may include one or more neural network layers, including but not limited to semantic segmentation network, convolutional network, pooling network, classification network, activation network, attention mechanism network, fully connected network, recurrent neural network, batch normalization (Batch Normalization, BN) network, etc.
  • neural network layers including but not limited to semantic segmentation network, convolutional network, pooling network, classification network, activation network, attention mechanism network, fully connected network, recurrent neural network, batch normalization (Batch Normalization, BN) network, etc.
  • the structure of the neural network model 0 shown in Figure 2 is just an example.
  • the neural network model 0 can also include more or less networks, and some networks can also be combined or split. No limitation is made here.
  • the post-processing module 4 implemented in the form of a neural network may be included in the neural network model 0.
  • Figure 3 shows a schematic structural diagram of a shape initialization network 1 according to some embodiments of the present application.
  • the shape initialization network 1 includes a semantic segmentation network 11, a mask generation network 12, an edge extraction network 13 and a shape generation network 14.
  • the semantic segmentation network 11 is used to extract the image features (Image Embedding) of each sample image in the sample image set.
  • the semantic segmentation network 11 may include a target detection network (Feature Pyramid Networks, FPN).
  • the mask generation network 12 is used to obtain the outline mask of each map element based on the image characteristics of each sample image in the sample image set.
  • the mask generation network 12 may include a series of convolutional networks, batch normalization Batch Normalization (BN) network and activation network (such as linear rectification function Rectified Linear Unit, ReLU).
  • BN batch normalization Batch Normalization
  • ReLU linear rectification function Rectified Linear Unit
  • the edge extraction network 13 is used to extract the edges of the contour mask according to the contour mask and image features of each map element to obtain the mask edge of each map element, where the mask edge is used to describe the contour of the map element.
  • the edge extraction network 13 may include a concatenated convolutional network, a batch normalization (Batch Normalization, BN) network, and an activation network (such as a linear rectification function).
  • the shape generation network 14 is used to simplify the mask edges of each map element and obtain the initial shape of each map element to reduce the number of geometric primitives included in the initial shape, such as reducing the number of line segments included in polygons and reducing the number of polyline segments. The number of points included, etc., to improve the speed of inference on the input image using the neural network model 0.
  • algorithms such as Douglas Peucker (DP) can be used to simplify polygons, and algorithms such as non-maximum suppression (NMS) can be used to simplify polylines.
  • the structure of the shape initialization network 1 shown in Figure 3 is just an example.
  • the shape initialization network 1 can also adopt other structures, and each network can also be implemented using other types of neural networks. In This is not limited.
  • shape initialization network 1 The training process of shape initialization network 1 is introduced below based on the structure of shape initialization network 1 shown in Figure 3.
  • FIG. 4 shows a schematic diagram of the training process of the shape initialization network 1 according to some embodiments of the present application.
  • the execution subject of this process is electronic equipment, as shown in Figure 4.
  • the process includes the following steps:
  • the electronic device acquires a sample image set in the target area, and the sample image set includes the reference outline of the map element in each sample image.
  • the sample image set may include N sample images, the size of each sample image is H ⁇ W (ie, H pixels high, W pixels wide), and each pixel in the sample image may include n channels ( n is the number of color channels of the sample image.
  • n is the number of color channels of the sample image.
  • the sample image set can be expressed as a 4-dimensional matrix P.
  • the size of the matrix P is N ⁇ n ⁇ H ⁇ W.
  • the element P(i,j,k,m) of P represents the value of the j-th color channel of the element in the k-th row and m-th column of the i-th sample image.
  • the reference outline of the map elements in the sample image set may include manually determined vector data of each map element, such as roads represented by polylines, houses represented by polygons, etc., so as to facilitate training the neural network.
  • the reference profile can be used to evaluate the quality of the results inferred by the neural network model, and the network parameters of the neural network model can be adjusted based on the evaluation results.
  • S402 Use the semantic segmentation network 11 to obtain the image features of the map elements in the sample image set.
  • the electronic device uses a semantic segmentation network 11, such as an FPN network, to extract features from the sample image set to obtain the image features of the sample image set.
  • a semantic segmentation network such as an FPN network
  • the electronic device can input the matrix P into the semantic segmentation network 11 to obtain the image feature matrix F of the sample image set.
  • the size of the matrix F is N ⁇ C ⁇ H ⁇ W, where C is the semantic segmentation.
  • the element P(i, j, k, m) in the matrix F represents the value of the j-th feature of the k-th row and m-th column element of the i-th sample image, which is predetermined by the type of the semantic segmentation network 11 or by the developer. set up.
  • S403 Based on the image features of the sample image set, use the mask generation network 12 to obtain the contour masks of each map element, and use the edge extraction network 13 to obtain the mask edges of each contour mask.
  • the electronic device can input the image feature matrix F of the sample image into the mask generation network 12 to obtain the outline mask of each map element.
  • the outline mask of each map element It can be expressed as a matrix M.
  • the size of the matrix M is N ⁇ p ⁇ H ⁇ W, where p represents the number of classifications of map elements (the following is an example of dividing map elements into 2 categories).
  • a submatrix of size 1 ⁇ 1 ⁇ H ⁇ W in the matrix M represents the contour mask of the p-th type map element in the N-th sample image.
  • the values of elements belonging to the same type of map elements can be the same. Refer to Figure 5.
  • the sample image IM2 in the sub-matrix with a size of 1 ⁇ 1 ⁇ H ⁇ W, The pixels where the house is located can all have a value of 1, and the other pixels can have a value of 0.
  • the electronic device can input the aforementioned contour mask and the corresponding reference contour into the edge extraction network 13 to obtain the mask edge of the contour mask.
  • the edge extraction network 13 can infer the coordinate distance DT between each pixel in the edge area of the contour mask and the reference contour.
  • the size of DT is N ⁇ 2 ⁇ H ⁇ W.
  • the element DT in DT ( i, j, k, m) represents the coordinate distance between the k-th row and m-th column pixel in the j-th type map element of the i-th sample image and the reference outline.
  • DT(i, j, k, m) can include two The elements dx and dy represent the coordinate distance in the H direction and W direction respectively.
  • B1 is a point in the edge area of the contour mask shown in Figure 6A
  • the coordinates are (x, y)
  • the distance DT from B1 to the reference contour is (dx, dy)
  • B1 (x, y) corresponds to the mask
  • the coordinates of the point on the edge are (x+dx,y+dy).
  • the size of the edge area of the outline mask can be preset.
  • the edge area can be an area composed of pixels whose distance from the outermost pixel of the outline mask is less than a preset edge distance threshold.
  • the coordinate distance from a pixel point in the edge area to the reference outline may be the difference between the coordinates of the point closest to the pixel point on the reference outline and the coordinates of the pixel point.
  • the above DT can be a point-to-contour mask within the edge area of the contour mask.
  • the coordinate distance of the outline composed of the outermost pixels of the membrane is (dx, dy)
  • the point E1 corresponds to the contour mask
  • the coordinates of point E1' on the mask edge of the film are (x+dx, y+dy).
  • the mask edge of the contour mask can also be obtained in other ways, for example, directly using the outermost point of the contour mask as the mask edge of the contour mask, which is not the case here. Make limitations.
  • S404 Use the shape generation network 14 to simplify the mask edges and obtain the initial shape.
  • the electronic device simplifies the mask edge of the outline mask of each map element to obtain the initial shape of each map element to reduce the number of geometric primitives included in the initial shape and improve the electronic device's use of neural network model 0 to reason about remote sensing images. speed.
  • polygon simplification algorithms such as the DP algorithm
  • polygons are simplified to obtain an initial polygon shape that includes fewer line segments
  • line simplification algorithms such as the NMS algorithm
  • the initial shape obtained by the electronic device may also include image features of each geometric primitive in the initial shape.
  • S405 Calculate the loss function and determine whether the termination condition is met based on the loss function.
  • the electronic device calculates the loss function based on the coordinates of each predicted point in the initial shape and the coordinates of the corresponding reference point on the reference contour, and determines whether the termination condition is met based on the loss function. If it is met, it indicates the initial shape obtained by the shape initialization network 1 If the requirements are met, go to step S406; otherwise, it means that the shape initialization network 1 cannot obtain an initial shape that meets the requirements based on the current network parameters, and go to step S407.
  • different loss functions may be used for each network in the shape initialization network 1.
  • the loss function may be a cross-entropy loss function (Cross Entropy Loss Function), a focal loss function (Focal Loss Function), 0-1 loss, entropy and Cross entropy loss, softmax loss, etc.
  • y ij is a 0-1 variable.
  • p ij is the mask generated
  • the network 12 determines the probability that the i-th pixel is within the contour mask area of the j-th type map element. It can be seen from formula (1) that the cross entropy loss L 11-12-CEL is used to indicate the accuracy of the contour mask obtained by the mask generation network 12. The smaller the L 11-12-CEL , the smaller the The accuracy of the contour mask is higher.
  • the cross-entropy loss L 11-12-CEL of the semantic segmentation network 11 and the mask generation network 12 reflects the accuracy of the contour mask obtained by using the semantic segmentation network 11 and the mask generation network 12, L 11-12-CEL The smaller the value, the higher the accuracy.
  • the loss function may include mean squared error (MSE, also known as L2 loss).
  • MSE mean squared error
  • the L2 loss L 13-L2 of the edge extraction network 13 can be expressed as the following formula (2):
  • the L2 loss L 13-L2 of the edge extraction network 13 reflects the similarity between the mask edge of the contour mask obtained by the edge extraction network 13 and the corresponding reference contour.
  • the smaller the L 13-L2 the greater the similarity.
  • the termination condition may include at least one of the following conditions: the loss function corresponding to each network converges, and the loss function value corresponding to each network is less than the corresponding preset loss function value. For example, when both the cross-entropy loss function converges and the L2 loss converges, the termination condition is determined to be satisfied; for another example, when the cross-entropy loss function is less than the corresponding first preset loss function value and the L2 loss value is less than the corresponding second preset loss function value, it is determined that the termination condition is met.
  • the termination condition may also include other conditions, which are not limited here.
  • the loss functions of each network can also be weighted and summed (that is, the loss functions of each network are multiplied by their corresponding weight values and then added) to obtain the total loss function. When the total loss function converges or is less than When presetting the total loss function value, it is determined that the termination condition is met.
  • the total loss function can be expressed as ⁇ 1 L 11-12-CEL + ⁇ 2 L 13-L2 , where ⁇ 1 represents cross entropy
  • the weight of the loss and ⁇ 2 represent the weight of the L2 loss, and ⁇ 1 and ⁇ 2 can be preset by the developer.
  • S406 Store network parameters and obtain shape initialization network 1.
  • the electronic device stores the network parameters currently used by the shape initialization network 1 to obtain the shape initialization network 1 .
  • the electronic device determines that the termination condition is not met, it adjusts the network parameters of the shape initialization network 1 and performs the next round of training. For example, when the corresponding loss functions of all networks do not meet the corresponding termination conditions, adjust the network parameters of all networks and conduct the next round of training; for another example, when the loss functions of some networks meet the corresponding termination conditions, the loss functions of other parts of the network meet the corresponding termination conditions. When the loss function does not meet the corresponding termination conditions, adjust the network parameters of the network whose loss function does not meet the corresponding termination conditions and conduct the next round of training; for another example, when the total loss function does not meet the corresponding termination conditions, Adjust the network parameters of at least part of the network and proceed to the next round of training.
  • the shape initialization network 1 is based on the learning of the reference contours of each map element in the sample image of the target area, that is, the learning of the geometric characteristics of the map elements of the target area.
  • the image is semantically segmented and the initial shape of each map element is obtained, which can better adapt to the geometric characteristics of the map elements in the target area and improve the accuracy of the initial shape of each map element.
  • the electronic device can input the remote sensing image into the network to obtain the initial shape and image features of the map elements in the remote sensing image.
  • Figure 7 shows a schematic structural diagram of a shape regression network 2 according to some embodiments of the present application.
  • the shape regression network 2 includes a pooling network 21 , a feature encoding network 22 , a direction generation network 23 and a shape adjustment network 24 .
  • the pooling network 21 is used to pool and interpolate the characteristic parameters of each geometric primitive in the initial shape to obtain the pooled characteristics of each geometric primitive.
  • the image features of the sample image extracted by the aforementioned semantic segmentation network 11 are in pixel units, but after passing through the aforementioned shape generation network 13, the coordinates of each geometric primitive in the initial shape are combined with the image features obtained by the semantic segmentation network 11. There is no one-to-one correspondence.
  • the characteristics of each geometric primitive in the initial shape can be obtained through interpolation through the pooling network 21 according to the image features of adjacent pixels of each geometric primitive in the initial shape.
  • the image features of the geometric primitives can be interpolated through the line feature interpolation (Line of Interest, LOI) method.
  • the point feature interpolation Point of Interest
  • POI point of Interest
  • the feature encoding network 22 is used to re-encode the pooling features of each geometric primitive in the initial shape to obtain the regression coding features of each geometric primitive.
  • the regression coding features can be used to infer the direction data of each geometric primitive, and to Adjust the initial shape, etc.
  • the feature encoding network 22 may include a Multi-Head-Attention Network.
  • the direction generation network 23 is used to obtain the direction data of each geometric primitive based on the regression encoding characteristics of each geometric primitive.
  • the direction of the geometric primitive may be the tangent direction of the point; when the geometric primitive is a line segment, the direction of the geometric primitive may be the direction of the line segment.
  • the direction generation network 23 may include a convolutional network, a BN network, and an activation network in series.
  • the direction data obtained by the direction generation network 23 can be used to calculate the angle constraint loss and L2 loss of the direction data, and according to the obtained angle constraint loss and L2 The loss is used to adjust the network parameters of the shape regression network 2 to improve the accuracy of the direction data of the geometric primitives obtained by the direction generation network 23.
  • the specific calculation method will be introduced below and will not be described in detail here.
  • the shape adjustment network 24 is used to adjust the position of each geometric primitive in the initial shape according to the regression encoding characteristics of each geometric primitive to obtain a more accurate regression shape, and calculate the predicted points in each geometric primitive in the regression shape.
  • the coordinates are relative to the coordinate residual of the corresponding point on the reference contour, which is used to calculate the loss function.
  • the direction generation network 23 may include a convolutional network, a BN network, and an activation network in series.
  • the regression shape generated by the shape adjustment network 24 can be used to calculate the relative shape loss (Relative Shape Loss), and adjust the network parameters of the shape regression network 2 according to the relative shape loss to improve the shape adjustment network 24
  • the relative shape loss Relative Shape Loss
  • the accuracy of the regression shape obtained, the specific calculation method will be introduced below, and will not be described in detail here.
  • FIG. 8 shows a schematic diagram of the training process of the shape regression network 2 according to some embodiments of the present application.
  • the execution subject of this process is electronic equipment, as shown in Figure 8.
  • the process includes the following steps.
  • S801 Perform feature pooling on the image features and classification features of the geometric primitives in the initial shape to obtain the pooled features of the initial shape.
  • the image features and classification features of the sample image obtained by the shape initialization network 1 are obtained based on pixel points, and there is no one-to-one correspondence between the points included in the initial shape and the pixel points. Therefore, the initial shape can be
  • the image features and classification features of the geometric primitives are pooled, interpolated, etc., and the pooled features of each geometric primitive are obtained.
  • the POI algorithm can be used to obtain the pooling features of each geometric primitive in the initial shape.
  • the aforementioned semantic segmentation network 11 obtains the vector corresponding to the image feature of point A as c0 and the vector c1 corresponding to the image feature of point B, then in the initial shape, the vector corresponding to the image feature of point C located on line segment AB is c0+(c1-c0)l AC /l AB , where l AC is the length of line segment AC, and l AB is the length of line segment AB.
  • the LOI algorithm can be used to obtain the pooling features of each geometric primitive in the initial shape.
  • multiple points on the line segment for example, 32 points
  • the multiple points can be divided into several groups (for example, 32 points can be divided into 4 groups), and the multiple point pools can be obtained through the POI algorithm.
  • the pooling characteristics of each geometric primitive in the initial shape can also be determined in other ways, which is not limited here.
  • S802 Use the feature encoding network 22 to encode the pooling features of the initial shape to obtain regression encoding features.
  • the electronic device uses the feature encoding network 22 to re-encode the pooled features of the initial shape, for example, discarding the features in the pooled features that have a small impact on the shape adjustment and orientation data, re-extracting the features that have a greater impact on the shape adjustment and orientation data, etc. , get the regression coding features.
  • the feature encoding network 22 may be an encoding network based on a global attention mechanism, such as the aforementioned multi-head attention network.
  • the electronic device inputs the encoding features of each geometric primitive of the initial shape into the direction generation network 23 and the shape adjustment network 24 to obtain the predicted direction data and regression shape of each geometric primitive in the initial shape respectively.
  • the direction of the geometric primitive is the direction of the line segment; when the initial shape is a polyline and the geometric primitive is a point, then the geometric primitive The direction of is the tangent direction of the point.
  • the obtained regression shape includes the direction data of each geometric primitive, the coordinate data of the points in each geometric primitive, the order of each geometric primitive, etc.
  • the electronic device calculates the loss function based on the direction data of each geometric primitive in the initial shape and the regression shape.
  • the loss function may include an L2 loss based on the predicted direction data of each geometric primitive and the direction data of the corresponding point or line of the geometric primitive in the reference outline, used to indicate the direction generation network 23
  • the accuracy of the obtained directions of each geometric primitive indicates that the direction data of the geometric primitives obtained by the direction generation network 23 is more accurate.
  • the direction data of the i-th primitive is dr i
  • the direction data of the point or line corresponding to the i-th primitive in the reference outline is dr si .
  • the loss function may include a relative shape loss (Relative Shape Loss) obtained based on the coordinate residuals of each geometric primitive in the regression shape and the corresponding point or line in the reference contour, used to evaluate the regression Shape accuracy.
  • the relative shape loss can be expressed by the average, sum, etc. of the shape loss of all points in the initial shape, where the shape loss of the non-intersection point in the polyline can be the projection distance of the point to the reference contour, the shape of the intersection point in the polyline
  • the loss can be the distance from the point to the corresponding reference intersection point in the reference contour.
  • the shape loss of a point in the polygon can be the projection distance from the point to the reference contour.
  • the relative shape loss is used to indicate the similarity between the regression shape of the map element and the reference contour.
  • the lower the relative shape loss the higher the similarity between the regression shape obtained by using the shape regression network 2 and the corresponding reference contour, and the higher the accuracy of the regression shape. high.
  • an angle constraint loss may be included in the loss function to improve the regularity of the regression shape.
  • the angle constraint loss L TV can be expressed as the following formula (4).
  • N4 is the number of corners in the initial shape, is the average angle of each corner in the initial shape, ⁇ k is the angle of the k-th corner. It can be understood that the smaller L TV is, the more regular the regression shape is.
  • the loss function may also include other losses, such as smooth constraint loss used to improve the smoothness of the regression shape, etc., which are not limited here.
  • S805 Determine whether the termination condition is met based on the loss function.
  • the electronic device determines whether the termination condition is met based on the loss function. If it is met, it means that the regression shape meets the requirements and goes to step S806; otherwise, it means that the regression shape does not meet the requirements and goes to step S807.
  • the termination condition may include at least one of the following conditions: the loss function of each network converges, and the loss function value of each network is less than the corresponding preset loss function value.
  • the termination condition may also include that the total loss function is less than the total loss function threshold or the total loss function converges, wherein the total loss function can be obtained by the weighted sum of each loss function in the aforementioned step S805.
  • the termination condition may also include other conditions, which are not limited here.
  • the electronic device stores the network parameters of the shape regression network 2 to obtain the shape regression network 2.
  • the electronic device determines that the termination condition is not met, the electronic device adjusts the network parameters of the shape regression network 2 and performs the next round of training. For example, when the loss functions of each network in shape regression network 2 do not meet the corresponding termination conditions, the network parameters of each network can be adjusted; for another example, when the loss functions of only some networks do not meet the corresponding termination conditions When , you can only adjust the network parameters of this part of the network and perform the next round of training; for another example, when the total loss function does not meet the corresponding termination condition, you can adjust at least part of the network parameters of each network and perform the next round of training.
  • topology reconstruction network 3 The training process of topology reconstruction network 3 is introduced below.
  • FIG 9 shows a schematic structural diagram of a topology reconstruction network 3 according to some embodiments of the present application.
  • the topology reconstruction network 3 includes a pooling network 31, a feature encoding network 32 and a relationship reasoning network 33.
  • the pooling network 31 is used to interpolate the image features and direction data of each geometric primitive in the initial shape to obtain the pooling features of each geometric primitive.
  • the pooling network 31 is used to interpolate the image features and direction data of each geometric primitive in the initial shape to obtain the pooling features of each geometric primitive.
  • the feature encoding network 32 is used to re-encode the pooled features of each geometric primitive in the initial shape, such as discarding features that have a small impact on topological relationship reasoning, extracting features that have a greater impact on topological relationship reasoning, etc., to obtain each geometric basis. Meta-inferential encoding features.
  • feature encoding network 32 may include a multi-head attention network.
  • the relational reasoning network 33 is used to obtain the topological relationship between each geometric primitive in the initial shape based on the inference encoding characteristics of each geometric primitive.
  • the geometric primitives are line segments, and the topological relationships between two geometric primitives include: collinear, parallel, etc.; when the initial shape is a polyline, the geometric primitives are points, and each The relationship between geometric primitives includes connection/disconnection.
  • the relational reasoning network 33 may include a convolutional network, a BN network, an activation network, etc.
  • the predicted topological relationships between geometric primitives obtained by the relational reasoning network 33 can be used to calculate cross-entropy loss and supervised contrast loss, and to compare with the geometric basis.
  • the direction data of the elements are combined to calculate loss functions such as geometric attributes and relationship consistency loss, and the network parameters of the topology reconstruction network 3 are adjusted based on the loss function to improve the accuracy of the predicted topological relationships between geometric primitives obtained by the topology reconstruction network 3.
  • the specific calculation method will be introduced below and will not be described in detail here.
  • FIG. 10 shows a schematic diagram of the training process of the topology reconstruction network 3 according to some embodiments of the present application.
  • the execution subject of this process is electronic equipment, as shown in Figure 10. This process includes the following steps.
  • S1001 Perform feature pooling on the image features and direction data of the geometric primitives to obtain the pooled features of the geometric primitives.
  • the electronic device performs feature pooling on the image features and direction data of the geometric primitives in the initial shape to obtain the pooled features of each geometric primitive. For details, please refer to the relevant description of step S801, which will not be described again here.
  • the electronic device can perform feature pooling on the image features and direction data of the regression shape to obtain the pooled features of the geometric primitives.
  • S1002 Use the feature encoding network 32 to encode the pooled features of the geometric primitives to obtain inference encoding features.
  • the electronic device uses the feature encoding network 32 to encode the pooling features of the initial shape to obtain the inference encoding features of each geometric primitive. For details, please refer to the aforementioned step S802, which will not be described again here.
  • the electronic device inputs the encoding features of each geometric primitive into the relational reasoning network 33 to obtain the predicted topological relationship between each geometric primitive in the initial shape.
  • the predicted topological relationship between each geometric primitive in the initial shape can be represented by a matrix R.
  • the size of the matrix R is K ⁇ K, where K is the number of geometric primitives in the initial shape, and in the matrix R
  • the element R(i,j) is used to indicate the topological relationship between the i-th geometric primitive and the j-th geometric primitive, such as connection, collinearity, parallelism, etc.
  • the predicted topological relationship between the geometric primitives of the initial shape consisting of 8 points P1, P2, P3, P4, P5, P6, P7, P8 can be a matrix of size 8 ⁇ 8, i
  • the element R(i,j) in row j-th column represents the topological relationship between point Pi and point Pj.
  • the relational reasoning network 33 can perform feature extraction again according to the inference encoding features of each geometric primitive to obtain the hidden space (Hidden Space, also known as Hidden Space) of each geometric primitive. feature space) (hereinafter referred to as latent space features), by calculating the distance between each geometric primitive and the latent space feature of a certain primitive, and determining the preset number of geometries with the smallest distance from the latent space of this geometric primitive. The topological relationship between the primitive and this geometric primitive is set to connected.
  • Hidden Space also known as Hidden Space
  • S1004 Calculate the loss function based on the predicted topological relationship between each geometric primitive in the initial shape.
  • the electronic device calculates the loss function based on the topological relationship between each geometric primitive obtained by the relational reasoning network 33 .
  • the loss function may include a cross-entropy loss LCEL between the predicted topological relationship of each geometric primitive and the reference topological relationship, the loss is based on the predicted topological relationship between each geometric primitive in the initial shape and the reference topological relationship of each geometric primitive is determined.
  • LCEL can be calculated by the following formula (5).
  • N5 is the number of geometric primitives in the initial shape
  • R ij is the predicted topological relationship value of the i-th geometric primitive and the j-th geometric primitive (for example, the i-th row and j-th column of the aforementioned matrix R element value)
  • R0 ij is the reference topological relationship between the i-th geometric primitive and the j-th geometric primitive.
  • the loss function may also include a consistency loss LC of geometric attributes and relationships, which is used to characterize the attributes of geometric primitives and the topological relationship between geometric primitives. consistency, during the training process, by reducing L C , the accuracy of the predicted topological relationship determined by the relational reasoning network 33 can be improved.
  • LC can be calculated by the following formula (6).
  • the attributes of the geometric primitive can include the direction data of the geometric primitive, such as the tangent direction.
  • the tangent directions of the two line segments should be the same, so tr should be 0.
  • the loss function may also include a supervised contrastive loss (Supervised Contrastive Loss), in order to improve the relationship reasoning network 33 in the process of determining the topological relationship between geometric primitives based on the inference encoding characteristics of each geometric primitive. , the consistency of the topological relationship between the latent space features of each extracted geometric primitive and the inferred geometric primitive. That is to say, by making the supervised contrast loss satisfy the termination condition, such as being less than the preset supervised contrast loss or the supervised contrast loss function converging, the latent space characteristics of geometric primitives with topological relationships such as connections and collinearities can also be similar.
  • supervised contrastive loss Supervised Contrastive Loss
  • the supervised contrast loss L SCL can be calculated by the following formula (7).
  • I represents the set of geometric primitives
  • P(i) represents the set of geometric primitives that have a connection or collinear relationship with the i-th geometric primitive,
  • represents the set P(i ) (that is, the number of elements included in the set P(i));
  • A(i) represents the set of geometric primitives that do not have a connection or collinear relationship with the i-th geometric primitive;
  • ⁇ R + which can be preset by developers; ⁇ Represents the vector dot product. It can be seen from formula (7) that the smaller the value of the supervised contrast loss, the greater the similarity of the vectors corresponding to the latent space features of the geometric primitives with connected or collinear relationships, and the greater the similarity of the vectors corresponding to the geometric primitives that do not have connected or collinear relationships. The smaller the similarity of the vectors corresponding to the latent space features.
  • the supervised contrast loss can also be calculated in other ways, which is not limited here.
  • the loss function may also include more loss functions, which is not limited here.
  • the reference topological relationship of each geometric primitive can be calculated dynamically, that is, the reference points of the points in each geometric primitive in the reference outline are first determined, and the topological relationship between the reference points is used as the reference point of each geometric primitive. Topological relationships of geometric primitives.
  • S1005 Determine whether the termination condition is met based on the loss function.
  • the electronic device determines whether the termination condition is met. If so, it means that the predicted topology relationship obtained by the topology reconstruction network 3 meets the requirements, and go to step S1006; otherwise, it means that the predicted topology relationship obtained by the topology reconstruction network 3 does not meet the requirements, and go to step S1006. Go to step S1007.
  • the electronic device may determine that the termination condition is met when each loss function converges or each loss function value is less than the corresponding preset loss function value.
  • the total loss function obtained by the weighted sum of the multiple loss functions can converge or be less than the preset total loss function value. , it is determined that the termination conditions are met.
  • the total loss function can be expressed as ⁇ 3 L CEL + ⁇ 4 L C + ⁇ 5 L SCL , where ⁇ 3 represents the weight of the cross entropy loss L CEL , ⁇ 4 represents the weight of the consistency loss of geometric attributes and relationships LC , ⁇ 5 represents the weight of the supervision contrast loss L SCL , ⁇ 3 , ⁇ 4 , ⁇ 5 Can be preset by the developer.
  • the electronic device determines that the termination conditions are met, the electronic device stores the network parameters of the topology reconstruction network 3 to obtain the topology reconstruction network.
  • the electronic device determines that the termination conditions are not met, the electronic device adjusts the network parameters of the topology reconstruction network 3 and performs the next round of training.
  • the network parameters of the neural network model 0 can be trained, and based on the network parameters, the predicted images in the predicted image set of the target area are inferred to obtain each Vectorized map of predicted images.
  • the shape initialization network 1 can be used to first obtain the initial shape of the map element in the predicted image, and then the shape regression network 2 can be used to adjust the initial shape to obtain a regression shape with higher accuracy, and then the topology reconstruction network 3 can be used to obtain the regression shape.
  • the topological relationship between the geometric primitives in the vector is finally connected through the post-processing module 4 to connect the geometric primitives in the regression shape to obtain a vectorized map of the map elements.
  • the training process of the neural network model 0 and the process of inferring the image using the trained neural network model 0 may be asymmetric.
  • the regression shape is used as the input of the topology reconstruction network 3 to train the topology reconstruction network 3.
  • the input since the polyline is relatively simple, the polyline of the initial shape is used as the input of the topology reconstruction network 3 to train the topology reconstruction network 3. Since the initial shape has worse accuracy than the regression shape, the topology reconstruction network 3 can be used when the input data accuracy is low. Correct prediction results are still obtained, which can improve the anti-noise ability of the topology reconstruction network 3 and improve the stability of the topology reconstruction network 3.
  • the following introduces the process of generating vector maps using the previously trained neural network model 0.
  • FIG 13 shows a schematic flowchart of a map generation method according to some embodiments of the present application.
  • the execution subject of this process is electronic equipment, as shown in Figure 13. This process includes the following steps.
  • S1301 Use the shape initialization network 1 to obtain the initial shape of the map element in the predicted image.
  • the electronic device inputs the predicted image into the shape initialization network 1, uses the semantic segmentation network 11 to extract image features of the predicted image, then uses the mask generation network 12 to obtain the contour mask of the map elements in the predicted image, and then uses the edge extraction network 13 The mask edges of the contour mask are extracted, and finally the shape generation network 14 is used to simplify the mask edges to obtain the initial shape of the map primitive.
  • the initial shapes of the houses and roads in the image IM2 can be obtained.
  • S1302 Use the shape regression network 2 to infer the initial shape and obtain the regression shape of the initial shape and the direction data of the geometric primitives.
  • the electronic device inputs the initial shape of the map element into the shape regression network 2, uses the pooling network 21 and the feature encoding network 22 to obtain the regression encoding features of the geometric primitives in the initial shape, and then uses the direction generation network 23 to obtain the direction of the geometric primitives.
  • the shape adjustment network 24 is used to adjust the initial shape to obtain a regression shape with higher accuracy and more regular shape.
  • S1303 Use topological reconstruction network 3 to obtain the topological relationship between geometric primitives in the regression shape.
  • the electronic device inputs the regression shape into the topology reconstruction network 3, uses the pooling network 31 and the feature encoding network 32 to obtain the inference coding features of each geometric primitive in the regression shape, and then uses the relational reasoning network 33 to obtain the relationship between the geometric primitives in the regression shape.
  • topological relationship For example, after inputting the regression shape of the road in the aforementioned image IM2 into the topology reconstruction network 3, the topological relationship shown in Figure 11 can be obtained.
  • the electronic device uses the post-processing module 4 to obtain the vector map based on the regression shape, the topological relationship between the geometric primitives, and the direction data of the geometric primitives.
  • the post-processing module 4 may first rotate each line segment in the regression shape to be the same as the direction data of the line segment.
  • line segments S1S2 and S2 in the regression shape The direction of line segment S2S3 is inconsistent with the direction (horizontal) generated by the aforementioned shape regression network 2.
  • the post-processing module can rotate line segment S1S2 clockwise to horizontal and rotate line segment S2S3 counterclockwise to horizontal.
  • the post-processing module 4 uses the post-processing module 4 to connect the line segments that are no longer connected due to the adjustment of the line segment direction (that is, connect the end point of each line segment to the point closest to the end point among the line segments adjacent to its end point) to obtain a closed polygon, such as , referring to Figure 14, after rotating the line segment S1S2 and the line segment S2S3 to the level, the line segment S1S2', the line segment S2S3' and the line segment S3S4 are no longer connected, and the post-processing module 4 connects the end point S2' of the line segment S1S2' with the adjacent line segment S2S3' Connect the endpoint S2 of the line segment S2S3' to the endpoint S3 of the adjacent line segment S3S4.
  • the post-processing module 4 can delete the line segments in the obtained closed polygon whose length is less than the preset side length threshold.
  • deleting a line segment it can determine whether the two lines connected at both ends are parallel/collinear. If they are parallel/collinear, Then the two line segments are merged into one line segment, otherwise the two line segments are extended until they intersect.
  • the simplicity of the output polygon can be adjusted by setting different preset side length thresholds. For example, referring to Figure 14, the lengths of line segments S2'S2, S3S3', S6S7, and S9S10 are less than the preset side length threshold, and the line segments S2'S2, S3S3', S6S7, and S9S10 can be deleted.
  • line segment S1S2' line segment S2S3' and line segment S3S4 have the same direction and can be merged into one line segment S1S4. Since line segments S5S6 and S7S8 are neither parallel nor collinear, and line segments S8S9 and S10S11 are not parallel, line segment S7S8 will be extended to intersect with line segment S5S6 to obtain line segment S6S8, Extend the line segment S10S11 and intersect the line segment S8S9 to obtain the line segment S9S11, and obtain a vector map with a regular shape.
  • the post-processing module 4 can connect points with connection relationships based on the points in the regression shape and the topological relationship between the points to obtain a vector polyline.
  • the neural network model 0 is obtained based on learning the geometric characteristics of the map elements of the target area, a vector map with higher accuracy can be obtained based on the remote sensing image of the target area.
  • the trained neural network model 0 can be used, based on the different targets.
  • the remote sensing images of the area can be used to obtain vector maps of different target areas without the need for complex heuristic rule settings and parameter adjustments.
  • large-scale map construction scenarios such as mapping areas that include multiple regions, cities or countries
  • vectorized scenarios while ensuring the accuracy of vector maps, the efficiency of vector map generation can be improved.
  • the mean max tangent angle errors (Mean max tangent angle errors) of the house vectorization results of the SOTA algorithm are 31.9°
  • the mean max tangent angle errors of the house vectorization results of the map generation method provided by this application are 26.7°. , an increase of 16.3%.
  • the average tangent angle error refers to the average tangent angle error between the lines in the vector maps in different remote sensing images and the corresponding reference lines inferred. The lower the error value, the more accurate the model is in vectorizing houses. high.
  • the model will vectorize the N6 remote sensing images.
  • the average tangent angle error of remote sensing images can be recorded as
  • FIG. 15 shows a schematic diagram of the result of vectorizing houses in part of remote sensing images using neural network model 0 according to some embodiments of the present application. It can be seen from Figure 15 that the contour mask of the map element in the image obtained by the semantic segmentation network 11 is quite different from the actual contour of the house, and through the shape regression network 2, the regression shape obtained is highly similar to the actual contour of the house. , and then the houses in the vector map obtained are also highly similar to the actual shapes of the houses.
  • Model size topological similarity average path length similarity Sat2Graph algorithm 200M 80.97 64.43 this application 100M 86.63 67.67
  • topological similarity refers to the similarity between the topological structure of the vector road network obtained by model inference and the reference vector road network.
  • APLS is used to indicate the model The similarity between the lines in the inferred vector road network and the lines in the reference vector road network. The higher the score, the higher the accuracy of the vector road network obtained by using the model.
  • Figure 16 shows a schematic diagram of the result of vectorizing roads in remote sensing images using neural network model 0 according to some embodiments of the present application. It can be seen from Figure 16 that the direction of the points in the polyline obtained by the neural network model 0 is consistent with the direction of the reference road.
  • FIG. 17A and 17B are schematic diagrams showing the reconstruction effect of some relatively complex roads in remote sensing images using neural network model 0 according to some embodiments of the present application. It can be seen from Figure 17A and Figure 17B that the vector road network obtained by the map generation method provided by the embodiment of the present application has a high degree of coincidence with the road center in the remote sensing map, indicating that the accuracy of the obtained vector map is high.
  • embodiments of the present application also provide a map generation device for implementing the map generation method provided by the foregoing embodiments.
  • FIG. 18 shows a schematic diagram of the results of the map generation device 200 according to some embodiments of the present application.
  • the map generation device 200 includes: a data acquisition unit 201 , an initial shape generation unit 202 , a shape regression unit 203 , a topology reconstruction unit 204 and a post-processing unit 205 .
  • the data acquisition unit 201 is used to acquire an image of a certain area, and the image includes map elements, where the map elements are elements in the image to be converted into vector maps.
  • the initial shape generation unit 202 is configured to use a first model (such as the aforementioned shape initialization network 1) to perform inference on the image to obtain a first geometric figure corresponding to the map element, where the first geometric figure includes geometric primitives.
  • a first model such as the aforementioned shape initialization network 1
  • the shape regression unit 203 is used to input a second model (such as the aforementioned shape regression network 2) based on the first geometric figure to obtain the direction of each geometric primitive, and to obtain the second geometric figure corresponding to the map element based on the first geometric figure.
  • the second geometric figure includes the same geometric primitives as the first geometric figure, and the position arrangement of the geometric primitives in the second geometric figure is different from the position arrangement of the geometric primitives in the first geometric figure. For details, reference may be made to the relevant description of step S1302, which will not be described again here.
  • the topology reconstruction unit 204 is configured to use a third model (such as the aforementioned topology reconstruction network 3) to obtain the topological relationship between each geometric primitive based on the direction of the geometric primitive and the second geometric figure.
  • a third model such as the aforementioned topology reconstruction network 3
  • the post-processing unit 205 obtains a vector map corresponding to the image based on the topological relationship between each geometric primitive, the direction of each geometric primitive, and the second geometric figure.
  • the post-processing unit 205 may be used to perform related operations of the aforementioned post-processing module 4.
  • the post-processing unit 205 may be used to perform related operations of the aforementioned post-processing module 4.
  • map generation device 200 shown in Figure 18 is only a schematic. In other embodiments, the map generation device 200 may also include more or less units, or some units may be merged or split. No limitation is made here.
  • the electronic device used to train the neural network model 0 or the electronic device used to perform inference using the neural network model 0 can be any electronic device capable of training or inferring the neural network model, including but not Limited to laptops, desktops, tablets, servers, etc., without limitation here.
  • the following uses the electronic device 100 as an example to illustrate the structure of an electronic device used to train the neural network model 0 or to perform inference using the neural network model 0.
  • FIG. 19 shows a schematic structural diagram of an electronic device 100 for executing embodiments of the present application according to some embodiments of the present application.
  • the electronic device 100 may include one or more processors 101, system memory 102, non-volatile memory (NVM) 103, input/output (I/O) devices 104, communication interface 105, and System control logic 106 couples processor 101, system memory 102, non-volatile memory 103, input/output (I/O) devices 104, and communication interface 105.
  • processors 101 system memory 102, non-volatile memory (NVM) 103, input/output (I/O) devices 104, communication interface 105
  • NVM non-volatile memory
  • I/O input/output
  • System control logic 106 couples processor 101, system memory 102, non-volatile memory 103, input/output (I/O) devices 104, and communication interface 105. in:
  • the processor 101 may include one or more processing units.
  • the processor 101 may include a central processing unit (CPU), an application processor (AP), a modem processor, a graphics processor ( graphics processing unit (GPU), image signal processor (ISP), controller, video codec, digital signal processor (DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc.
  • different processing units can be independent devices or integrated in one or more processors.
  • the processor 101 can be used to execute relevant instructions for training the aforementioned neural network model 0 or using the trained neural network model 0 to perform inference on remote sensing images.
  • the NPU can be used to run related instructions of the neural network model 0 to perform semantic segmentation of the image, generate a contour mask of the map element, generate a mask outline of the contour mask, and generate an initialization of the map element. Shape/regression shape, direction data/topological relationship of generated geometric primitives, etc.
  • System memory 102 is a volatile memory, such as random access memory (Random-Access Memory, RAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), etc.
  • the system memory is used to temporarily store data and/or instructions.
  • the system memory 102 can be used to temporarily store network parameters of the neural network model 0, sample image sets, train the neural network model 0, or utilize neural network model 0.
  • Network model 0 performs intermediate data in the inference process, stores vector maps, etc.
  • Non-volatile memory 103 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions.
  • the non-volatile memory 103 may include any suitable non-volatile memory such as flash memory and/or any suitable non-volatile storage device, such as a hard disk drive (Hard Disk Drive, HDD), optical disk ( Compact Disc (CD), Digital Versatile Disc (DVD), Solid-State Drive (SSD), etc.
  • the non-volatile memory 103 may also be a removable storage medium, such as a Secure Digital (Secure Digital, SD) memory card, etc.
  • the non-volatile memory 103 can be used to permanently store network parameters of the neural network model 0, sample image sets, intermediate data in the process of training the neural network model 0 or using the neural network model 0 for inference. , store vector maps, etc.
  • system memory 102 and/or non-volatile storage 103 may include copies of instructions 107 .
  • the instructions 107 When executed by at least one of the processors 101, the instructions 107 cause the electronic device 100 to train all or at least a part of the neural network model 0 through the method provided by the embodiment of the present application, or use the neural network model 0 to perform inference.
  • I/O device 104 may include a user interface that enables a user to interact with electronic device 100, such as selecting or inputting a sample image set, marking map elements in the sample image set, etc.
  • Network interface 105 may include a transceiver for providing a wired or wireless communications interface for electronic device 100 to communicate with any other suitable device over one or more networks.
  • the electronic device 100 can establish a communication connection with other electronic devices through the network interface 105 to obtain sample image sets, predicted image sets, etc. from other electronic devices.
  • System control logic 106 may include any suitable interface controller to provide any suitable interfaces to other modules of electronic device 100 .
  • system control logic 106 may include one or more memory controllers to provide an interface for processor 101 to system memory 102 and non-volatile memory 103 .
  • the system control logic 106 may include at least one Peripheral Component Interconnect (PCI) controller to provide the processor 101 to use the PCI bus to connect to the computer connected to the computer through the PCI interface. Interfaces of devices/devices/modules (such as graphics cards, sound cards, etc.) of the electronic device 100.
  • PCI Peripheral Component Interconnect
  • At least one of the processors 101 may be packaged with logic for one or more controllers of the system control logic 106 to form a system in package (SiP). In other embodiments, at least one of the processors 101 may also be integrated on the same chip with the logic of one or more controllers for the system control logic 106 to form a system-on-chip (SoC). ).
  • SiP system in package
  • SoC system-on-chip
  • the electronic device 100 can be any electronic device capable of deep learning model training, including but not limited to laptop computers, desktop computers, tablet computers, servers, etc., which are not limited here.
  • the structure of the electronic device 100 shown in the embodiment of the present application does not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or fewer components than illustrated, some components may be combined, some components may be separated, or components may be arranged differently.
  • the components illustrated may be implemented in hardware, software, or a combination of software and hardware.
  • Embodiments of the mechanisms disclosed in this application may be implemented in hardware, software, firmware, or a combination of these implementation methods.
  • Embodiments of the present application may be implemented as a computer program or program code executing on a programmable system including at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements) , at least one input device and at least one output device.
  • Program code may be applied to input instructions to perform the functions described herein and to generate output information.
  • Output information can be applied to one or more output devices in a known manner.
  • a processing system includes any system having a processor such as a digital signal processor (DSP), microcontroller, application specific integrated circuit (ASIC), or microprocessor.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • Program code may be implemented in a high-level procedural language or an object-oriented programming language to communicate with the processing system.
  • assembly language or machine language can also be used to implement program code.
  • the mechanisms described in this application are not limited to the scope of any particular programming language. In either case, the language may be a compiled or interpreted language.
  • the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof.
  • the disclosed embodiments may also be implemented as instructions carried on or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be operated by one or more processors Read and execute.
  • instructions may be distributed over a network or through other computer-readable media.
  • machine-readable media may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy disks, optical disks, optical disks, read-only memories (CD-ROMs), magnetic Optical disk, read-only memory (ROM), random-access memory (RAM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), magnetic or optical card, flash memory, or Tangible machine-readable storage used to transmit information (e.g., carrier waves, infrared signals, digital signals, etc.) using electrical, optical, acoustic, or other forms of propagated signals over the Internet.
  • machine-readable media includes any type of machine-readable media suitable for storing or transmitting electronic instructions or information in a form readable by a machine (eg, computer).
  • each unit/module mentioned in each device embodiment of this application is a logical unit/module.
  • a logical unit/module can be a physical unit/module, or it can be a physical unit/module.
  • Part of the module can also be implemented as a combination of multiple physical units/modules.
  • the physical implementation of these logical units/modules is not the most important.
  • the combination of functions implemented by these logical units/modules is what solves the problem of this application. Key technical issues raised.
  • the above-mentioned equipment embodiments of this application do not introduce units/modules that are not closely related to solving the technical problems raised by this application. This does not mean that the above-mentioned equipment embodiments do not exist. Other units/modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed in the present application are a map generation method, a model training method, a readable medium, and an electronic device. During a process of generating a vector map by using a neural network model, by means of learning geometric features of a map element in a sample image of a target area, a contour mask of the map element is converted into a vector map, which is not realized by means of setting a vectorization rule by technicians, thus improving the precision of the obtained vector map. In addition, by using the method provided by the present application, for different target areas, after retraining the neural network model by using sample images of the different target areas, the retrained neural network can be utilized to obtain vector maps of the different areas, without the need of performing complex parameter adjustment and vectorization rule setting, thus improving the vector map generation efficiency while ensuring the precision, and the method being more suitable for a scenario of generating a large-scale map.

Description

地图生成方法、模型训练方法、可读介质和电子设备Map generation methods, model training methods, readable media and electronic devices 技术领域Technical field
本申请涉及图像处理领域,特别涉及一种地图生成方法、模型训练方法、可读介质和电子设备。This application relates to the field of image processing, and in particular to a map generation method, a model training method, a readable medium and an electronic device.
背景技术Background technique
随着人工智能(Artificial Intelligence,AI)技术的发展,神经网络模型的应用越来越广泛,例如可以利用神经网络模型基于某一区域的遥感图像来得到该区域的矢量地图。目前,通常先利用神经网络模型推理得到遥感图像中的地图元素(例如房屋、湖泊、道路、河流等)的初始轮廓,再通过开发人员设置的矢量化规则,对地图元素的初始轮廓进行调整,例如调整线与线间的角度等,将地图元素的轮廓转换为矢量地图。With the development of artificial intelligence (AI) technology, the application of neural network models is becoming more and more widespread. For example, the neural network model can be used to obtain a vector map of a certain area based on remote sensing images of the area. At present, neural network model reasoning is usually used to obtain the initial outline of map elements (such as houses, lakes, roads, rivers, etc.) in remote sensing images, and then the initial outline of map elements is adjusted through vectorization rules set by developers. For example, adjust the angle between lines, etc., and convert the outline of map elements into a vector map.
但是由于地理环境和地图元素的多样性,例如不同区域的地理环境不同、房屋/道路的几何特征差异较大,开发人员设置的矢量化规则难以匹配不同区域的地理环境和地图元素,基于相同的矢量化规则对不同区域的图像进行地图矢量化,得到的矢量地图的精度也较低。如果针对不同区域需要设置不同的轮廓获取方法和矢量化规则,过程复杂,不适合大尺度矢量地图建模的场景。However, due to the diversity of geographical environments and map elements, such as different geographical environments in different regions and large differences in geometric features of houses/roads, it is difficult for the vectorization rules set by developers to match the geographical environments and map elements of different regions. Based on the same Vectorization rules vectorize images in different areas, and the accuracy of the resulting vector map is also low. If different contour acquisition methods and vectorization rules need to be set for different areas, the process is complicated and is not suitable for large-scale vector map modeling scenarios.
发明内容Contents of the invention
有鉴于此,本申请实施例提供了一种地图生成方法、模型训练方法、可读介质和电子设备。通过神经网络模型学习某一区域的地图元素的几何特征,来将地图元素的轮廓转换为对应的矢量地图,有利于提高得到的矢量地图的精度,更适合大尺度矢量地图建模的场景中。In view of this, embodiments of the present application provide a map generation method, a model training method, a readable medium, and an electronic device. The neural network model learns the geometric characteristics of map elements in a certain area to convert the outline of the map element into the corresponding vector map, which is beneficial to improving the accuracy of the obtained vector map and is more suitable for large-scale vector map modeling scenarios.
第一方面,本申请实施例提供了一种地图生成方法,应用于电子设备,方法包括:获取某一区域的图像,图像中包括地图元素,其中,地图元素是图像中待转换为矢量地图的元素;利用第一模型对图像进行推理,得到地图元素对应的第一几何图形,第一几何图形中包括几何基元;基于第一几何图形输入第二模型以得到各几何基元的方向,以及,基于第一几何图形得到地图元素对应的第二几何图形,第二几何图形中包括与第一几何图形相同的几何基元,且第二几何图形中的几何基元位置排布与第一几何图形中的几何基元位置排布不同;利用第三模型,基于几何基元的方向、第二几何图形得到各几何基元间的拓扑关系;基于各几何基元间的拓扑关系、各几何基元的方向、第二几何图形,得到图像对应的矢量地图。In the first aspect, embodiments of the present application provide a map generation method, which is applied to electronic devices. The method includes: obtaining an image of a certain area, and the image includes map elements, where the map elements are elements in the image to be converted into vector maps. element; use the first model to reason on the image to obtain the first geometric figure corresponding to the map element, the first geometric figure includes geometric primitives; input the second model based on the first geometric figure to obtain the direction of each geometric primitive, and , based on the first geometric figure, a second geometric figure corresponding to the map element is obtained. The second geometric figure includes the same geometric primitives as the first geometric figure, and the position arrangement of the geometric primitives in the second geometric figure is the same as that of the first geometric figure. The positions of the geometric primitives in the graphics are arranged differently; using the third model, the topological relationship between each geometric primitive is obtained based on the direction of the geometric primitive and the second geometric figure; based on the topological relationship between each geometric primitive, each geometric primitive The direction of the element and the second geometric figure are used to obtain the vector map corresponding to the image.
在本申请实施例中,电子设备可先利用第一模型(例如下文中的形状初始化网络)对某一区域的图像中的地图元素的轮廓进行推理,得到地图元素对应的第一几何图形(例如下文中的初始形状),再得用第二模型(例如下文中的形状回归网络)对第一几何图形进行调整,得到精度更高、形状更规则的第二几何形状(例如下文中的回归形状),再利用第三模型(例如下文中的拓扑重建网络)推理出第二几何形状中的几何基元间的拓扑关系(例如在第二几何形状为折线的情况下,推理出组成折线的点之间的拓扑关系),再根据几何基元间的拓扑关系,得到图像对应的矢量地图。如此,电子设备是基于预先训练的第一模型、第二模型、第三模型来实现将地图元素转化为矢量地图,而不是基于技术人员设置的矢量化规则来得到矢量地图,有利于提高得到的矢量地图的精度。并且,在大尺度矢量地图建模的场景中,针对不同区域重新训练第一模型、第二模型、第三模型中的至少一个即可很好的适应不同区域的地 图元素的几何特征,而无需进行矢量化规则的设置和复杂的参数调校,有利于提高矢量地图建模的效率。In the embodiment of the present application, the electronic device can first use the first model (such as the shape initialization network below) to reason about the outline of the map element in the image of a certain area, and obtain the first geometric figure corresponding to the map element (such as The initial shape below), and then use the second model (such as the shape regression network below) to adjust the first geometric shape to obtain a second geometric shape with higher accuracy and more regular shape (such as the regression shape below ), and then use the third model (such as the topology reconstruction network below) to deduce the topological relationship between the geometric primitives in the second geometric shape (for example, when the second geometric shape is a polyline, deduce the points that make up the polyline ), and then based on the topological relationship between geometric primitives, the vector map corresponding to the image is obtained. In this way, the electronic device converts map elements into vector maps based on the pre-trained first model, second model, and third model, rather than obtaining vector maps based on vectorization rules set by technicians, which is beneficial to improving the obtained Vector map accuracy. Moreover, in the scenario of large-scale vector map modeling, retraining at least one of the first model, the second model, and the third model for different areas can well adapt to the geometric characteristics of map elements in different areas without the need for Setting up vectorization rules and adjusting complex parameters will help improve the efficiency of vector map modeling.
可以理解,几何基元是几何图形的基本组成单元,例如在第一几何图形为多边形时,几何基元可以是组成多边形的线段,在第一几何图形为折线时,几何基元可以是折线中各线段的点。It can be understood that the geometric primitive is the basic component unit of the geometric figure. For example, when the first geometric figure is a polygon, the geometric primitive can be the line segments that make up the polygon. When the first geometric figure is a polyline, the geometric primitive can be the polygon. points of each line segment.
在上述第一方面的一种可能实现中,上述第一模型、第二模型、第三模型中的至少一个是基于某一区域的地图元素的几何特征训练得到。In a possible implementation of the first aspect, at least one of the first model, the second model, and the third model is trained based on geometric features of map elements in a certain area.
在本实施例中,上述第一模型、第二模型、第三模型中的至少一个是可以是基于上述某一区域的地图元素的几何特征训练得到,也就是说,电子设备是利用该某一区域的地图元素的几何特征来将地图元素进行矢量化,有利于提高得到的矢量地图的精度。In this embodiment, at least one of the first model, the second model, and the third model may be trained based on the geometric features of the map elements in a certain area. That is to say, the electronic device uses the certain area. Vectorizing the map elements based on the geometric characteristics of the regional map elements is beneficial to improving the accuracy of the resulting vector map.
在上述第一方面的一种可能实现中,在几何基元为线段的情况下,第二几何图形中还包括各几何基元的连接顺序;并且,基于各几何基元间的拓扑关系、各几何基元的方向、第二几何图形,得到图像对应的矢量地图,包括:将第二几何图形中的第一几何基元的方向调整为与第一几何基元对应的方向相同,其中第一几何基元在第二几何图形中的方向与第一几何基元对应的方向不同;将第一几何基元和第二几何基元连接,得到第二几何图形对应的多边形,其中,第二几何基元的连接顺序与第一几何基元相邻。In a possible implementation of the above first aspect, when the geometric primitives are line segments, the second geometric figure also includes the connection sequence of each geometric primitive; and based on the topological relationship between each geometric primitive, each The direction of the geometric primitive and the second geometric figure are used to obtain the vector map corresponding to the image, including: adjusting the direction of the first geometric primitive in the second geometric figure to be the same as the direction corresponding to the first geometric primitive, where the first The direction of the geometric primitive in the second geometric figure is different from the direction corresponding to the first geometric primitive; connect the first geometric primitive and the second geometric primitive to obtain a polygon corresponding to the second geometric figure, where the second geometric figure The connection sequence of primitives is adjacent to the first geometric primitive.
在上述第一方面的一种可能实现中,上述第二几何图形对应的多边形中包括顺序连接的第一线段、第二线段和第三线段;并且,基于各几何基元间的拓扑关系、各几何基元的方向、第二几何图形,得到图像对应的矢量地图,还包括:在第二线段的长度小于预设边长阈值的情况下,删除第二线段;并且在第一线段和第三线段的拓扑关系为共线或平行的情况下,将第一线段和第二线段合并为一条线段;在第一线段和第三线段的拓扑关系不是共线或平行的情况下,延长第一线段和/或第三线段,使第一线段和第三线段相交。In a possible implementation of the above first aspect, the polygon corresponding to the second geometric figure includes a first line segment, a second line segment and a third line segment connected in sequence; and based on the topological relationship between the geometric primitives, The direction of each geometric primitive and the second geometric figure are used to obtain the vector map corresponding to the image, which also includes: deleting the second line segment when the length of the second line segment is less than the preset side length threshold; and adding the first line segment and When the topological relationship of the third line segment is collinear or parallel, the first line segment and the second line segment are merged into one line segment; when the topological relationship of the first line segment and the third line segment is not collinear or parallel, Extend the first line segment and/or the third line segment so that the first and third line segments intersect.
在上述第一方面的一种可能实现中,在几何基元为点的情况下,上述基于各几何基元间的拓扑关系、各几何基元的方向、第二几何图形,得到图像对应的矢量地图,包括:将拓扑关系为连接的点进行连接,得到对应的矢量化折线。In a possible implementation of the first aspect, when the geometric primitive is a point, the vector corresponding to the image is obtained based on the topological relationship between the geometric primitives, the direction of each geometric primitive, and the second geometric figure. The map includes: connecting the points whose topological relationship is connected to obtain the corresponding vectorized polyline.
在上述第一方面的一种可能实现中,上述利用第一模型对图像进行推理,得到地图元素对应的第一几何图形,包括:对图像进行语义分割,得到地图元素的轮廓掩膜,轮廓掩模用于指示图像中地图元素所在的区域;提取轮廓掩膜的掩膜边缘;简化掩膜边缘,得到第一几何图形。In a possible implementation of the above first aspect, using the first model to reason on the image to obtain the first geometric figure corresponding to the map element includes: performing semantic segmentation on the image to obtain a contour mask of the map element, and the contour mask is The mask is used to indicate the area where the map element is located in the image; the mask edge of the contour mask is extracted; the mask edge is simplified to obtain the first geometric figure.
例如,电子设备可以利用下文中的语义分割网络,来得到图像中地图元素所在区域的轮廓掩膜,并利用下文中的边缘提取网络来对轮廓掩膜进行边缘提取,得到掩膜边缘,再利用DP算法简化多个边或利用NMS算法简化折线,得到包括几何基元更少的第一几何图形,从而可以减小第一几何图形中的几何基元的数量,有利于提高电子设备基于第一几何图形进行推理的速度。For example, the electronic device can use the semantic segmentation network below to obtain the contour mask of the area where the map element is located in the image, and use the edge extraction network below to extract the edges of the contour mask, obtain the mask edges, and then use The DP algorithm simplifies multiple edges or uses the NMS algorithm to simplify polylines to obtain a first geometric figure that includes fewer geometric primitives, thereby reducing the number of geometric primitives in the first geometric figure, which is beneficial to improving the electronic equipment based on the first The speed with which geometry can be reasoned about.
在上述第一方面的一种可能实现中,地图元素包括房屋、道路、湖泊、海洋、河流、森林、沙漠中的至少一个;并且房屋、湖泊、海洋、森林、沙漠对应的第一几何图形为多边形;道路、河流对应的第一几何图形为折线。In a possible implementation of the above first aspect, the map elements include at least one of a house, a road, a lake, an ocean, a river, a forest, and a desert; and the first geometric figure corresponding to the house, a lake, an ocean, a forest, and a desert is Polygon; the first geometric figure corresponding to roads and rivers is polyline.
在本申请实施例中,一个图像中可以包括一个地图元素,也可以包括多个地图元素。电子设备可以将房屋、湖泊、海洋、森林、沙漠等需要用具体形状表示的地图元素表示为多边形,将道路、河流等表示为折线。In this embodiment of the present application, an image may include one map element or multiple map elements. Electronic devices can represent map elements that need to be represented by specific shapes such as houses, lakes, oceans, forests, deserts, etc. as polygons, and represent roads, rivers, etc. as polylines.
在上述第一方面的一种可能实现中,上述方法还包括:通过以下方式训练第一模型:In a possible implementation of the above first aspect, the above method further includes: training the first model in the following manner:
获取样本数据,样本数据中包括某一区域的样本图像集、样本图像集中各样本图像中的地图元素对应的参考轮廓;利用第一模型对各样本图像的图像特征,并基于图像特征得到各样本图像中的地图元素的轮廓掩膜,轮廓掩膜指示地图元素在对应的样本图像中的区域;基于轮廓掩膜,得到各样本图像中的 地图元素对应的第一预测几何图形;基于第一损失函数值和第二损失函数值对第一模型进行训练,其中,第一损失函数用于指示轮廓掩膜的准确度,第二损失函数用于指示第一预测几何图形与参考轮廓的相似度。Obtain sample data, which includes a sample image set of a certain area and a reference outline corresponding to a map element in each sample image in the sample image set; use the first model to identify image features of each sample image, and obtain each sample based on the image features. The contour mask of the map element in the image, the contour mask indicates the area of the map element in the corresponding sample image; based on the contour mask, the first predicted geometry corresponding to the map element in each sample image is obtained; based on the first loss The first model is trained by the function value and the second loss function value, wherein the first loss function is used to indicate the accuracy of the contour mask, and the second loss function is used to indicate the similarity between the first predicted geometry and the reference contour.
也就是说,在本申请实施例中,第一模型是基于该某一区域的样本图像集训练得到,从而第一模形对样本图像中的地图元素提取的轮廓掩膜、第一预测几何形状都和该地图元素对应的参考轮廓具有较高的相似度,学习了样本图像中的地图元素的几何特征,从而第一模型推理得到该某一区域的图像中的地图元素的第一几何形状更符合该某区域的地图元素的几何特征,有利于提高第一几何形状的精度、进而有利于提高得到的矢量地图的精度。That is to say, in the embodiment of the present application, the first model is trained based on the sample image set of a certain area, so that the first model extracts the contour mask and the first predicted geometric shape of the map elements in the sample image. All have a high degree of similarity with the reference contour corresponding to the map element, and the geometric characteristics of the map element in the sample image are learned, so that the first model infers the first geometric shape of the map element in the image of a certain area. The geometric characteristics of the map elements that conform to the certain area are beneficial to improving the accuracy of the first geometric shape, and thus are beneficial to improving the accuracy of the obtained vector map.
例如,在一些实施例中,第一损失函数可以是下文的交叉熵损失L 11-12-CEL、第二损失函数可以是下文的L2损失L 13-L2For example, in some embodiments, the first loss function may be the cross-entropy loss L 11-12-CEL below, and the second loss function may be the L2 loss L 13-L2 below.
在上述第一方面的一种可能实现中,上述方法还包括:通过以下方式训练第二模型:获取样本数据,样本数据中包括某一区域的样本图像集中各样本图像中的地图元素对应的参考轮廓、参考轮廓中各几何基元对应的参考方向、利用第一模型得到的各样本图像中的地图元素对应的第三几何图形;利用第二模型,得到各样本图像中的各地图元素对应的第二预测几何图形,第三几何图形中的几何基元的预测方向,其中,第二预测几何图形包括和第三几何图形相同的几何基元,并且第二预测几何图形中的几何基元的排布方式与第三几何图形不同;基于第三损失函数和第四损失函数对第二模型进行训练,其中,第三损失函数用于指示第三几何图形中的几何基元的预测方向与对应的参考方向的相似度、第四损失函数用于指示第二预测几何图形与对应的参考轮廓间的相似度。In a possible implementation of the above first aspect, the above method further includes: training the second model in the following manner: obtaining sample data, which includes references corresponding to map elements in each sample image in the sample image set of a certain area. contour, the reference direction corresponding to each geometric primitive in the reference contour, and the third geometric figure corresponding to the map element in each sample image obtained by using the first model; using the second model, obtain the corresponding reference direction of each map element in each sample image The second prediction geometry, the prediction direction of the geometric primitives in the third geometry, wherein the second prediction geometry includes the same geometric primitives as the third geometry, and the geometric primitives in the second prediction geometry The arrangement is different from that of the third geometric figure; the second model is trained based on the third loss function and the fourth loss function, where the third loss function is used to indicate the predicted direction and correspondence of the geometric primitives in the third geometric figure The similarity of the reference direction and the fourth loss function are used to indicate the similarity between the second predicted geometric figure and the corresponding reference outline.
例如,在一些实施例中第三损失函数可是下文中的L2损失L 23-L2、第四损失函数可以是下文中的相对形状损失。 For example, in some embodiments, the third loss function may be the L2 loss L 23-L2 below, and the fourth loss function may be the relative shape loss below.
在上述第一方面的一种可能实现中,上述方法还包括:获取样本数据,样本数据中包括某一区域的样本图像集中,各样本图像的地图元素对应的参考轮廓中几何基元间的参考拓扑关系,以及利用第一模型得到的各样本图像中的地图元素对应的第四几何图形、第四几何图形中的几何基元的方向;利用第三模型,确定出第四几何图形中,各几何基元的隐空间特征,并基于隐空间特征,确定出第四几何图形中的几何基元间的预测拓扑关系;基于第五损失函数和第六损失函数训练第三模型,其中,第五损失函数用于指示第四几何图形中的几何基元间的预测拓扑关系与对应的参考拓扑关系的匹配度,第六损失函数用于指示预测拓扑关系为平行、共线或连接的几何基元间的隐空间特征的相似度。In a possible implementation of the above first aspect, the above method further includes: obtaining sample data, the sample data includes a sample image set of a certain area, and the reference between geometric primitives in the reference outline corresponding to the map element of each sample image. Topological relationships, as well as the fourth geometric figures corresponding to the map elements in each sample image obtained by the first model, and the directions of the geometric primitives in the fourth geometric figures; using the third model, determine each of the fourth geometric figures. The latent space characteristics of the geometric primitives, and based on the latent space characteristics, determine the predicted topological relationship between the geometric primitives in the fourth geometric figure; train the third model based on the fifth loss function and the sixth loss function, where the fifth The loss function is used to indicate the matching degree between the predicted topological relationship between the geometric primitives in the fourth geometric figure and the corresponding reference topological relationship, and the sixth loss function is used to indicate the predicted topological relationship between geometric primitives that are parallel, collinear or connected. similarity between latent space features.
在本申请实施例中,在训练第三模型时,第六损失函数(例如下文中的监督对比损失)指示了预测拓扑关系为平行、共线或连接的几何基元间的隐空间特征的相似度,从而可以使得利用第三模型对上述第二几何图形进行推理时对具有平行、共线或连接的几何基元的隐空间特征的相似度也较高,有利于提高得到的第二几何图形中的几何基元间的拓扑关系的准确性,进而有利于提高基于该拓扑关系得到的矢量地图的精度。例如,在一些实施例中,第五损失函数可以是下文中的交叉熵损失L CEL、第六损失函数可以是下文中的监督对比损失L SCLIn the embodiment of the present application, when training the third model, the sixth loss function (such as the supervised contrast loss below) indicates the similarity of latent space features between geometric primitives whose topological relationships are predicted to be parallel, collinear, or connected. degree, so that when using the third model to reason about the above-mentioned second geometric figure, the similarity of the latent space features of parallel, collinear or connected geometric primitives is also higher, which is beneficial to improving the obtained second geometric figure. The accuracy of the topological relationship between the geometric primitives in the map is conducive to improving the accuracy of the vector map based on the topological relationship. For example, in some embodiments, the fifth loss function may be the cross-entropy loss LCEL below, and the sixth loss function may be the supervised contrast loss LSCL below.
第二方面,本申请实施例提供了一种模型训练方法,应用于电子设备,方法包括:In the second aspect, embodiments of the present application provide a model training method, which is applied to electronic devices. The method includes:
获取样本数据,样本数据中包括某一区域的样本图像集中各样本图像中地图元素对应的参考轮廓、各地图元素对应的第五几何图形或第六几何图形、第五几何图形中的几何基元的方向、以及第五几何图形中的几何基元的图像特征,其中,第五几何图形中的几何基元的图像特征,在利用第四模型推理得到各地图元素的第五几何图形时生成,第五几何图形与对应的参考轮廓的相似度低于第六几何图形与对应的参考轮廓的相似度,并且第五几何图形和第六几何图形具有相同的几何基元;Obtain sample data. The sample data includes the reference outline corresponding to the map element in each sample image in the sample image set of a certain area, the fifth geometric figure or sixth geometric figure corresponding to each map element, and the geometric primitives in the fifth geometric figure. direction, and the image features of the geometric primitives in the fifth geometric figure, where the image features of the geometric primitives in the fifth geometric figure are generated when the fifth geometric figure of each map element is obtained through reasoning using the fourth model, The similarity between the fifth geometric figure and the corresponding reference outline is lower than the similarity between the sixth geometric figure and the corresponding reference outline, and the fifth geometric figure and the sixth geometric figure have the same geometric primitive;
基于将第五几何图形或第六几何图形、第五几何图形中的几何基元的图像特征、第五几何图形中的几何基元的方向输入到具有第一网络参数的第五模型,得到各几何基元对应的隐空间特征,并根据各几何基元对应的隐空间特征,推理得到各几何基元间的预测拓扑关系;Based on inputting the fifth geometric figure or the sixth geometric figure, the image features of the geometric primitives in the fifth geometric figure, and the direction of the geometric primitives in the fifth geometric figure into the fifth model with the first network parameters, each The latent space characteristics corresponding to the geometric primitives, and based on the latent space characteristics corresponding to each geometric primitive, the predicted topological relationship between each geometric primitive is inferred;
基于第五几何图形中的几何基元间的预测拓扑关系和对应的参考拓扑关系,确定第七损失函数和第八损失函数,其中,参考拓扑关系可以基于各样本图像中地图元素对应的参考轮廓确定,第七损失函数用于指示第五几何图形中的几何基元间的预测拓扑关系和对应的参考拓扑关系的匹配度,第八损失函数用于指示预测拓扑关系为平行、共线或连接的几何基元间的隐空间特征的相似度;在第七损失函数和第八损失函数满足终止条件的情况下,保存具有第一网络参数的第五模型;在第七损失函数和第八损失函数不满足终止条件的情况下,调整将第五模型的网络参数调整为第二网络参数,进行下一轮训练。Based on the predicted topological relationship between the geometric primitives in the fifth geometric figure and the corresponding reference topological relationship, the seventh loss function and the eighth loss function are determined, wherein the reference topological relationship can be based on the reference contour corresponding to the map element in each sample image. It is determined that the seventh loss function is used to indicate the matching degree between the predicted topological relationship between the geometric primitives in the fifth geometric figure and the corresponding reference topological relationship, and the eighth loss function is used to indicate whether the predicted topological relationship is parallel, collinear or connected. The similarity of the latent space features between the geometric primitives; when the seventh loss function and the eighth loss function satisfy the termination condition, save the fifth model with the first network parameters; when the seventh loss function and the eighth loss function If the function does not meet the termination condition, adjust the network parameters of the fifth model to the second network parameters for the next round of training.
在本申请实施例,第五模型可以是用于根据地图元素对应的几何图形及得到几何图形的特征、几何图形中的几何基元的方向,例如上述第一方面中的第三模型、下文中的拓扑重建网络等。该模型训练过程中,可以以精度较低的第五几何形状作为输入,训练第五模型,使得第五模型得到的第五几何形状中的几何基元间的预测拓扑关系与对应的参考拓扑关系的区域度较高,以即是使得第五模型以精度较低的输入数据得到精度较高的输出数据,有利于提高第五模型的抗噪声能力,从而在上述第二模型得到的第二几何图形的精度较低的情况下,也能得到较准确的几何基元间的拓扑关系,进而提高基该几何基元间的拓扑关系得到的矢量地图的精度。In the embodiment of the present application, the fifth model may be used to obtain the characteristics of the geometric figures and the directions of the geometric primitives in the geometric figures based on the geometric figures corresponding to the map elements, such as the third model in the above-mentioned first aspect, and the following topology reconstruction network, etc. During the training process of this model, the fifth geometric shape with lower accuracy can be used as input to train the fifth model, so that the predicted topological relationship between the geometric primitives in the fifth geometric shape obtained by the fifth model is consistent with the corresponding reference topological relationship. The area degree is high, so that the fifth model can obtain higher-precision output data with lower-precision input data, which is conducive to improving the anti-noise ability of the fifth model, so that the second geometry obtained by the above-mentioned second model can Even when the accuracy of the graphics is low, a more accurate topological relationship between geometric primitives can be obtained, thereby improving the accuracy of the vector map obtained based on the topological relationship between the geometric primitives.
例如,在一些实施例中,第七损失函数可以是下文中的交叉熵损失L CEL、第八损失函数可以是下文中的监督对比损失L SCLFor example, in some embodiments, the seventh loss function may be the cross-entropy loss LCEL below, and the eighth loss function may be the supervised contrast loss LSCL below.
在上述第二方面的一种可能实现中,上述在第五几何图形中的几何基元为线段的情况下,通过以下方式确定第七损失函数和第八损失函数满足终止条件:基于第五几何图形中的几何基元的方向,确定出几何基元间的方向关系,以及拓扑关系对应的参考方向关系,确定出第九损失函数,第九损失函数用于指示各几何基元的预测拓扑关系与方向的一致性;In a possible implementation of the second aspect, when the geometric primitive in the fifth geometric figure is a line segment, it is determined that the seventh loss function and the eighth loss function satisfy the termination condition in the following way: based on the fifth geometry The direction of the geometric primitives in the graphics determines the directional relationship between the geometric primitives and the reference direction relationship corresponding to the topological relationship, and determines the ninth loss function. The ninth loss function is used to indicate the predicted topological relationship of each geometric primitive. Consistency with direction;
在第七损失函数、第八损失函数、第九损失函数都收敛,或第七损失函数、第八损失函数、第九损失函数都小于对应的预设损失函数值,或总损失函数收敛,或总损失函数小于对应的预设总损失函数值的情况下,确定满足终止条件,其中,总损失函数包括第七损失函数、第八损失函数、第九损失函数的加权和。When the seventh loss function, the eighth loss function, and the ninth loss function all converge, or the seventh loss function, the eighth loss function, and the ninth loss function are all smaller than the corresponding preset loss function value, or the total loss function converges, or When the total loss function is less than the corresponding preset total loss function value, it is determined that the termination condition is met, where the total loss function includes the weighted sum of the seventh loss function, the eighth loss function, and the ninth loss function.
例如,在一些实施例中,第九损失函数可是下文中的几何属性与关系的一致性损失L CFor example, in some embodiments, the ninth loss function may be the consistency loss L C of geometric attributes and relationships below.
在上述第二方面的一种可能实现中,上述基于第五几何图形或第六几何图形、第五几何图形中的几何基元的图像特征、第五几何图形中的几何基元的方向,得到各几何基元对应的隐空间特征,包括:在第五几何图形的几何基元为点的情况下,基于第五几何图形、第五几何图形中的几何基元的图像特征、第五几何图形中的几何基元的方向,得到各几何基元对应的隐空间特征;在第五几何图形的几何基元为线段的情况下,基于第六几何图形、第五几何图形中的几何基元的图像特征、第五几何图形中的几何基元的方向,得到各几何基元对应的隐空间特征。In a possible implementation of the second aspect, based on the fifth geometric figure or the sixth geometric figure, the image features of the geometric primitives in the fifth geometric figure, and the direction of the geometric primitives in the fifth geometric figure, we obtain The latent space features corresponding to each geometric primitive include: when the geometric primitive of the fifth geometric figure is a point, image features based on the fifth geometric figure, the geometric primitives in the fifth geometric figure, the fifth geometric figure The direction of the geometric primitives in , the corresponding latent space characteristics of each geometric primitive are obtained; when the geometric primitive of the fifth geometric figure is a line segment, based on the geometric primitives in the sixth geometric figure and the fifth geometric figure The image features and the direction of the geometric primitives in the fifth geometric figure are used to obtain the latent space characteristics corresponding to each geometric primitive.
在本申请实施例中,由于折线的复杂度较多边形更低,因此在训练第五模型中,在第五几何图形为折线时,以第五几何图形为第五模型的输入,在第五几何图形为多边形时,以精度较高的第六几何图像为第五模型的输入,从而可以在保证第五模型对较复杂的多边形中的几何基元的拓扑关系的推理准确度的同时,提高对较简单的折线输入数据的抗噪声能力。In the embodiment of the present application, since the complexity of polylines is lower than that of polygons, when training the fifth model, when the fifth geometric figure is a polyline, the fifth geometric figure is used as the input of the fifth model. When the graphic is a polygon, the sixth geometric image with higher accuracy is used as the input of the fifth model, thereby ensuring the accuracy of the fifth model's inference of the topological relationships of geometric primitives in more complex polygons while improving the accuracy of the reasoning. The noise immunity of simpler polyline input data.
第三方面,本申请实施例提供了一种地图生成装置,该装置包括:数据获取单元,用于获取某一区域的图像,所述图像中包括地图元素,其中,所述地图元素是所述图像中待转换为矢量地图的元素;初 始形状生成单元,用于利用第一模型对所述图像进行推理,得到所述地图元素对应的第一几何图形,所述第一几何图形中包括几何基元;形状回归单元,用于基于所述第一几何图形输入第二模型以得到各所述几何基元的方向,以及,基于所述第一几何图形得到所述地图元素对应的第二几何图形,所述第二几何图形中包括与所述第一几何图形相同的几何基元,且所述第二几何图形中的几何基元位置排布与所述第一几何图形中的几何基元位置排布不同;拓扑重建单元,用于利用第三模型,基于所述几何基元的方向、所述第二几何图形得到各所述几何基元间的拓扑关系;后处理单元,用于基于各所述几何基元间的拓扑关系、各所述几何基元的方向、所述第二几何图形,得到所述图像对应的矢量地图。In a third aspect, embodiments of the present application provide a map generation device, which includes: a data acquisition unit configured to acquire an image of a certain area, where the image includes map elements, wherein the map elements are the Elements in the image to be converted into vector maps; an initial shape generation unit used to use the first model to reason on the image to obtain a first geometric figure corresponding to the map element, where the first geometric figure includes a geometric base element; a shape regression unit, configured to input a second model based on the first geometric figure to obtain the direction of each geometric primitive, and obtain a second geometric figure corresponding to the map element based on the first geometric figure. , the second geometric figure includes the same geometric primitives as the first geometric figure, and the position arrangement of the geometric primitives in the second geometric figure is the same as the position of the geometric primitives in the first geometric figure. The arrangements are different; the topology reconstruction unit is used to use the third model to obtain the topological relationship between the geometric primitives based on the direction of the geometric primitives and the second geometric figure; the post-processing unit is used to obtain the topological relationship between the geometric primitives based on the direction of each geometric primitive. The topological relationship between the geometric primitives, the direction of each geometric primitive, and the second geometric figure are used to obtain a vector map corresponding to the image.
在本申请实施例中,电子设备可先利用第一模型(例如下文中的形状初始化网络)对某一区域的图像中的地图元素的轮廓进行推理,得到地图元素对应的第一几何图形(例如下文中的初始形状),再得用第二模型(例如下文中的形状回归网络)对第一几何图形进行调整,得到精度更高、形状更规则的第二几何形状(例如下文中的回归形状),再利用第三模型(例如下文中的拓扑重建网络)推理出第二几何形状中的几何基元间的拓扑关系(例如在第二几何形状为折线的情况下,推理出组成折线的点之间的拓扑关系),再根据几何基元间的拓扑关系,得到图像对应的矢量地图。如此,电子设备是基于预先训练的第一模型、第二模型、第三模型来实现将地图元素转化为矢量地图,而不是基于技术人员设置的矢量化规则来得到矢量地图,有利于提高得到的矢量地图的精度。并且,在大尺度矢量地图建模的场景中,针对不同区域重新训练第一模型、第二模型、第三模型中的至少一个即可很好的适应不同区域的地图元素的几何特征,而无需进行矢量化规则的设置和复杂的参数调校,有利于提高矢量地图建模的效率。In the embodiment of the present application, the electronic device can first use the first model (such as the shape initialization network below) to reason about the outline of the map element in the image of a certain area, and obtain the first geometric figure corresponding to the map element (such as The initial shape below), and then use the second model (such as the shape regression network below) to adjust the first geometric shape to obtain a second geometric shape with higher accuracy and more regular shape (such as the regression shape below ), and then use the third model (such as the topology reconstruction network below) to deduce the topological relationship between the geometric primitives in the second geometric shape (for example, when the second geometric shape is a polyline, deduce the points that make up the polyline ), and then based on the topological relationship between geometric primitives, the vector map corresponding to the image is obtained. In this way, the electronic device converts map elements into vector maps based on the pre-trained first model, second model, and third model, rather than obtaining vector maps based on vectorization rules set by technicians, which is beneficial to improving the obtained Vector map accuracy. Moreover, in the scenario of large-scale vector map modeling, retraining at least one of the first model, the second model, and the third model for different areas can well adapt to the geometric characteristics of map elements in different areas without the need for Setting up vectorization rules and adjusting complex parameters will help improve the efficiency of vector map modeling.
第四方面,本申请实施例提供了一种计算机可读存储介质,该计算机可读存储介质包括指令,在指令被电子设备执行时,使电子设备实现上述第一方面、上述第一方面的各种可能实现中、上述第二方面、上述第二方面的各种可能实现提供的任意一种方法。In a fourth aspect, embodiments of the present application provide a computer-readable storage medium. The computer-readable storage medium includes instructions. When the instructions are executed by an electronic device, the electronic device enables the electronic device to implement the above-mentioned first aspect and each aspect of the first aspect. Any of the possible implementations, the above-mentioned second aspect, or any method provided by various possible implementations of the above-mentioned second aspect.
第五方面,本申请实施例提供了一种电子设备,该电子设备包括:存储器,用于存储由电子设备的一个或多个处理器执行的指令;以及处理器,是电子设备的处理器之一,用于执行存储器中存储的指令以实现上述第一方面、上述第一方面的各种可能实现中、上述第二方面、上述第二方面的各种可能实现提供的任意一种方法。In a fifth aspect, embodiments of the present application provide an electronic device. The electronic device includes: a memory for storing instructions executed by one or more processors of the electronic device; and a processor that is one of the processors of the electronic device. 1. For executing instructions stored in the memory to implement any one of the methods provided by the above-mentioned first aspect, various possible implementations of the above-mentioned first aspect, the above-mentioned second aspect, and various possible implementations of the above-mentioned second aspect.
第六方面,本申请实施例提供了一种计算机程序产品,该计算机程序产品包括计算机程序/指令,计算机程序/指令被处理器执行时实现现上述第一方面、上述第一方面的各种可能实现中、上述第二方面、上述第二方面的各种可能实现提供的任意一种方法。In a sixth aspect, embodiments of the present application provide a computer program product. The computer program product includes a computer program/instruction. When the computer program/instruction is executed by a processor, the above-mentioned first aspect and various possibilities of the above-mentioned first aspect are realized. In implementation, any method provided by the above-mentioned second aspect or various possible implementations of the above-mentioned second aspect.
附图说明Description of the drawings
图1A根据本申请的一些实施例,示出了一种通过图像得到矢量地图的过程示意图;Figure 1A shows a schematic diagram of a process of obtaining a vector map through images according to some embodiments of the present application;
图1B根据本申请的一些实施例,示出了一种只包括一种地图元素的图像示意图;Figure 1B shows a schematic image diagram including only one map element according to some embodiments of the present application;
图1C根据本申请的一些实施例,示出了一种包括多种地图元素的图像示意图;Figure 1C shows a schematic image diagram including multiple map elements according to some embodiments of the present application;
图2根据本申请的一些实施例,示出了一种利用神经网络模型生成地图的过程示意图;Figure 2 shows a schematic diagram of a process of generating a map using a neural network model according to some embodiments of the present application;
图3根据本申请的一些实施例,示出了一种形状初始化网络1的结构示意图;Figure 3 shows a schematic structural diagram of a shape initialization network 1 according to some embodiments of the present application;
图4根据本申请的一些实施例,示出了一种形状初始化网络1的训练流程示意图Figure 4 shows a schematic diagram of the training process of the shape initialization network 1 according to some embodiments of the present application.
图5根据本申请的一些实施例,示出了一种图像IM2中的房屋及对应的轮廓掩膜的示意图;Figure 5 shows a schematic diagram of a house and a corresponding contour mask in image IM2 according to some embodiments of the present application;
图6A根据本申请的一些实施例,示出了一种轮廓掩膜边缘区域的点与参考轮廓的坐标距离的示意图;Figure 6A shows a schematic diagram of the coordinate distance between points in the edge area of a contour mask and a reference contour according to some embodiments of the present application;
图6B根据本申请的一些实施例,示出了一种轮廓掩膜边缘区域的点与轮廓掩膜最外层像素的坐标 距离的示意图;Figure 6B shows a schematic diagram of the coordinate distance between a point in the edge area of the contour mask and the outermost pixel of the contour mask according to some embodiments of the present application;
图7根据本申请的一些实施例,示出了一种形状回归网络2的结构示意图;Figure 7 shows a schematic structural diagram of a shape regression network 2 according to some embodiments of the present application;
图8根据本申请的一些实施例,示出了一种形状回归网络2的训练流程示意图;Figure 8 shows a schematic diagram of the training process of the shape regression network 2 according to some embodiments of the present application;
图9根据本申请的一些实施例,示出了一种拓扑重建网络3的结构示意图;Figure 9 shows a schematic structural diagram of a topology reconstruction network 3 according to some embodiments of the present application;
图10根据本申请的一些实施例,示出了一种拓扑重建网络3的训练流程示意图;Figure 10 shows a schematic diagram of the training process of the topology reconstruction network 3 according to some embodiments of the present application;
图11根据本申请的一些实施例,示出了一种拓扑关系及交叉熵损失的计算过程示意图;Figure 11 shows a schematic diagram of the calculation process of a topological relationship and cross-entropy loss according to some embodiments of the present application;
图12根据本申请的一些实施例,示出了一种训练过程和推理过程的过程示意图;Figure 12 shows a schematic process diagram of a training process and an inference process according to some embodiments of the present application;
图13根据本申请的一些实施例,示出了一种地图生成方法的流程示意图;Figure 13 shows a schematic flow chart of a map generation method according to some embodiments of the present application;
图14根据本申请的一些实施例,示出了一种对多边形进行后处理的示意图;Figure 14 shows a schematic diagram of post-processing polygons according to some embodiments of the present application;
图15根据本申请的一些实施例,示出了一种利用神经网络模型0对部分遥感图像中的房屋进行矢量化的结果示意图;Figure 15 shows a schematic diagram of the results of vectorizing houses in some remote sensing images using neural network model 0 according to some embodiments of the present application;
图16根据本申请的一些实施例,示出了一种利用神经网络模型0对遥感图像中的道路进行矢量化的结果示意图;Figure 16 shows a schematic diagram of the results of vectorizing roads in remote sensing images using neural network model 0 according to some embodiments of the present application;
图17A根据本申请的一些实施例,示出了一种利用神经网络模型0对一张较复杂的遥感图像中的道路的重建效果示意图;Figure 17A shows a schematic diagram of the reconstruction effect of a road in a relatively complex remote sensing image using neural network model 0 according to some embodiments of the present application;
图17B根据本申请的一些实施例,示出了一种利用神经网络模型0对另一张复杂的遥感图像中的道路的重建效果示意图;Figure 17B shows a schematic diagram of the reconstruction effect of a road in another complex remote sensing image using neural network model 0 according to some embodiments of the present application;
图18根据本申请的一些实施例,示出了一种地图生成装置的结构示意图;Figure 18 shows a schematic structural diagram of a map generation device according to some embodiments of the present application;
图19根据本申请的一些实施例,示出了一种用于执行本申请实施例的电子设备100的结构示意图。FIG. 19 shows a schematic structural diagram of an electronic device 100 for executing embodiments of the present application according to some embodiments of the present application.
具体实施方式Detailed ways
本申请的说明性实施例包括但不限于地图生成方法、模型训练方法、可读介质、程序产品、装置和电子设备。Illustrative embodiments of the present application include, but are not limited to, map generation methods, model training methods, readable media, program products, apparatus, and electronic devices.
为便于理解,首先对本申请涉及到的术语进行解释To facilitate understanding, the terms involved in this application are first explained.
(1)损失函数(1)Loss function
在神经网络模型的训练过程中,因为目标是使神经网络模型的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为神经网络模型中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量以降低预测值,不断地调整,直到神经网络模型能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么神经网络模型的训练就变成了尽可能缩小这个loss的过程。因此,损失函数的设置是否合理,直接影响神经网络模型训练方法的优劣。During the training process of the neural network model, because the goal is to make the output of the neural network model as close as possible to the value that you really want to predict, you can compare the predicted value of the current network with the really desired target value, and then based on the two to update the weight vector of each layer of the neural network according to the difference between them (of course, there is usually an initialization process before the first update, that is, pre-configuring parameters for each layer in the neural network model). For example, if the network When the predicted value is high, adjust the weight vector to lower the predicted value, and continue to adjust until the neural network model can predict the truly desired target value or a value that is very close to the truly desired target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value". This is the loss function (loss function) or objective function (objective function), which is used to measure the difference between the predicted value and the target value. Important equations. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference. Then the training of the neural network model becomes a process of reducing the loss as much as possible. Therefore, whether the setting of the loss function is reasonable directly affects the quality of the neural network model training method.
下面结合附图详细说明本申请实施例的技术方案。The technical solutions of the embodiments of the present application will be described in detail below with reference to the accompanying drawings.
如前所述,采用矢量化规则将由神经网络模型推理得到的地图元素的轮廓转换为矢量地图的方法,若采用相同的矢量化规则对不同区域的图像进行地图矢量化,得到的矢量地图的精度可能会较低。例如,对于不同的城市或区域,由于建筑的风格不同,建筑物的轮廓特征差异较大,如矢量化规则中将房屋的几何轮廓拐角处都优化为直角,对于房屋的几何轮廓为圆形、或不规则多边形图像的区域,将会得到错 误的房屋轮廓。若针对不同区域需要设置不同的轮廓获取方法和矢量化规则,过程复杂,导致生成矢量地图的效率较低,不适合大尺度矢量地图建模的场景。As mentioned before, vectorization rules are used to convert the outlines of map elements inferred by the neural network model into vector maps. If the same vectorization rules are used to vectorize maps of images in different areas, the accuracy of the resulting vector map will be It may be lower. For example, for different cities or regions, due to different architectural styles, the outline characteristics of the buildings are quite different. For example, in the vectorization rules, the corners of the geometric outline of the house are optimized to be right angles, and the geometric outline of the house is circular, Or areas of the image that are irregular polygons, you will get wrong house outlines. If different contour acquisition methods and vectorization rules need to be set for different areas, the process is complicated, resulting in low efficiency in generating vector maps, and is not suitable for large-scale vector map modeling scenarios.
可以理解,矢量化规则是指由技术人员设置的、调整地图元素的轮廓中的线或点间的关系的,以得到较为合理的矢量化地图的规则。例如,将夹角大于预设值的两条线设置为平行或共线、将夹角处于某一区间的两条线设置为垂直、将离某一直线距离小于预设值的点都移动到该条直线上等。不难理解,矢量化规则的设置的有效性取决于技术人员的经验以及所参考的地图元素的轮廓、调校过程复杂。当设置矢量化规则参考的地图元素的几何特征与实际要进行地图矢量化的区域的地图元素的特征差异较大时,会得到精度较低的矢量地图,例如参考的地图元素中房屋中边的夹角多为直角,而某一区域的房屋以圆形为主,则通过矢量化规则可能会得到与实际地图元素的外形差异较大的矢量地图。It can be understood that vectorization rules refer to rules set by technical personnel to adjust the relationship between lines or points in the outline of map elements to obtain a more reasonable vectorized map. For example, set two lines whose included angle is greater than the preset value to be parallel or collinear, set two lines whose included angle is in a certain range to be perpendicular, and move all points whose distance from a certain straight line is smaller than the preset value to The straight line is equal. It is not difficult to understand that the effectiveness of the setting of vectorization rules depends on the experience of the technician and the outline of the referenced map elements, and the adjustment process is complicated. When the geometric characteristics of the map elements referenced by the vectorization rules are greatly different from the characteristics of the map elements in the area where the map is actually to be vectorized, a vector map with lower accuracy will be obtained. For example, the edge of the house in the reference map element will be obtained. Most of the included angles are right angles, and the houses in a certain area are mainly circular. Using vectorization rules, you may get a vector map that is quite different from the shape of the actual map elements.
为了解决上述问题,本申请实施例提供了一种地图生成方法,该方法是基于神经网络模型来实现,该神经网络模型中通过几何图形来表示地图元素(例如用折线来表示道路等,用多边形来表示房屋等),并通过对目标区域的样本图像来中地图元素的几何特征的学习,来确保神经网络模型根据地图元素的轮廓推理得到的几何图形,能够与该目标区域的地图元素的几何特征相匹配,再基于训练好的神经网络模型来对目标区域的预测图像进行推理,得到该目标区域中的地图元素对应的几何图形,从而得到矢量地图。也就是说,在本申请实施例中,是基于对目标区域的地图元素的几何特征的学习,来将图像中的地图元素进行矢量化,而不是通过开发人员设置矢量化规则来将地图元素矢量化,可以提高矢量地图的精度。此外,在对不同区域进行矢量地图推理时,只需要利用各区域的样本图像重新对神经网络模型进行训练,即可很好地适应该区域的地图元素的几何特征,而无需进行复杂的矢量化规则设置和调校,在大尺度地图的构建场景,例如对包括多个地区、城市或国家的区域进行地图矢量化的场景中,在确保矢量地图精度的同时,可以提高矢量地图的生成效率。In order to solve the above problems, embodiments of the present application provide a map generation method, which is implemented based on a neural network model. In the neural network model, map elements are represented by geometric figures (for example, polylines are used to represent roads, polygons are used to represent roads, etc.) to represent houses, etc.), and by learning the geometric features of the map elements in the sample images of the target area, to ensure that the geometric figures derived by the neural network model based on the outlines of the map elements can be consistent with the geometry of the map elements of the target area. The features are matched, and then the predicted image of the target area is inferred based on the trained neural network model to obtain the geometric figures corresponding to the map elements in the target area, thereby obtaining the vector map. That is to say, in the embodiment of the present application, the map elements in the image are vectorized based on the learning of the geometric characteristics of the map elements in the target area, rather than the developer setting vectorization rules to vectorize the map elements. ization, which can improve the accuracy of vector maps. In addition, when conducting vector map inference for different areas, it is only necessary to retrain the neural network model using the sample images of each area, so that it can well adapt to the geometric characteristics of the map elements in the area without the need for complex vectorization. Rule setting and adjustment can improve the efficiency of vector map generation while ensuring the accuracy of vector maps in large-scale map construction scenarios, such as vectorizing maps for areas that include multiple regions, cities, or countries.
例如,参考图1A,在一些实施例中,可以利用语义分割网络(Semantic Segmentation Network)得到图像中的地图元素(例如房屋、道路等)的轮廓掩膜,再基于开发人员设置的启发式规则(即矢量化规则),来将地图元素的轮廓转换为矢量地图。而在本申请的一些实施例中,可以用几何特征学习和拓扑重建来替代启发式规则,利用神经网络模型来学习目标区域的样本图像中的地图元素的几何特征,使得神经网络模型得到的地图元素的几何图形可以更精确地反应该目标区域的地图元素的几何特征,其次利用神经网络模型来确定几何图形中的几何基元间的拓扑关系,再将几何图形的几何基元进行拓扑连接,来得到矢量地图,例如将点连接成折线来表示道路或河流、将线连接成多边形来表示房屋或湖泊。For example, referring to Figure 1A, in some embodiments, a semantic segmentation network (Semantic Segmentation Network) can be used to obtain the outline mask of map elements (such as houses, roads, etc.) in the image, and then based on heuristic rules set by the developer ( i.e. vectorization rules) to convert the outlines of map elements into vector maps. In some embodiments of the present application, heuristic rules can be replaced by geometric feature learning and topological reconstruction, and the neural network model is used to learn the geometric features of the map elements in the sample image of the target area, so that the map obtained by the neural network model The geometric figures of the elements can more accurately reflect the geometric characteristics of the map elements in the target area. Secondly, the neural network model is used to determine the topological relationship between the geometric primitives in the geometric figures, and then the geometric primitives of the geometric figures are topologically connected. To get a vector map, for example, connect points into polylines to represent roads or rivers, and connect lines into polygons to represent houses or lakes.
可以理解,几何基元是指各几何图形的基本组成元素,例如折线的几何基元可以是点、多边形的几何基元可以是有序的线段。It can be understood that the geometric primitives refer to the basic constituent elements of each geometric figure. For example, the geometric primitives of polylines can be points, and the geometric primitives of polygons can be ordered line segments.
可以理解,地图元素可以包括但不限于房屋、湖泊、海洋、道路、河流、森林、沙漠等,对于在地图上需要用具体形状来描述的地图元素,例如房屋、湖泊、海洋、森林、沙漠等,可以用多边形来表示,对于在地图上不需要用具体形状来描述的地图元素,例如道路、河流等,可以用折线来表示。为便于描述,以下各实施例中,以用多边形表示的地图元素为房屋、以折线表示的地图元素为道路进行介绍。It can be understood that map elements may include but are not limited to houses, lakes, oceans, roads, rivers, forests, deserts, etc. For map elements that need to be described with specific shapes on the map, such as houses, lakes, oceans, forests, deserts, etc. , can be represented by polygons, and map elements that do not need to be described by specific shapes on the map, such as roads, rivers, etc., can be represented by polylines. For the convenience of description, in the following embodiments, the map elements represented by polygons are houses and the map elements represented by polylines are roads.
可以理解,一张图像中可以包括至少一种地图元素。例如参考图1B,图像IM11中只包括道路RD1,道路RD1在矢量地图中可以用折线表示;又例如,参考图1C,图像IM12中包括房屋HE1、道路RD2和河流RR1,其中房屋HE1可以用多边形表示,道路RD2和河流RR1可以折线表示。It can be understood that an image may include at least one map element. For example, referring to Figure 1B, the image IM11 only includes the road RD1, and the road RD1 can be represented by a polyline in the vector map; for another example, referring to Figure 1C, the image IM12 includes the house HE1, the road RD2, and the river RR1, and the house HE1 can be represented by a polygon. Indicates that road RD2 and river RR1 can be expressed as polylines.
为便于理解,首先介绍利用神经网络模型将遥感图像转换为矢量地图的过程。For ease of understanding, the process of converting remote sensing images into vector maps using neural network models is first introduced.
图2根据本申请的一些实施例,示出了一种利用神经网络模型生成地图的过程示意图。如图2所示,利用神经网络模型0将遥感图像转换为矢量地图通常包括如下步骤:Figure 2 shows a schematic diagram of a process of generating a map using a neural network model according to some embodiments of the present application. As shown in Figure 2, using neural network model 0 to convert remote sensing images into vector maps usually includes the following steps:
S21:标记特征。将目标区域的遥感图像的部分图像中的地图元素,例如房屋、道路、湖泊等进行标记,得到该部分图像中的地图元素的参考轮廓(例如房屋的矢量化轮廓、道路的矢量化中心线等),该部分图像以及对应的地图元素的参考轮廓可以作为样本图像集;S21: Marking features. Mark the map elements in the partial image of the remote sensing image of the target area, such as houses, roads, lakes, etc., and obtain the reference outline of the map elements in the partial image (such as the vectorized outline of the house, the vectorized center line of the road, etc. ), this part of the image and the reference outline of the corresponding map element can be used as a sample image set;
S22:模型训练。利用样本图像集训练神经网络模型0,使得神经网络模型0可以将样本图像集中的各样本图像中的地图元素进行矢量化,得到与各地图元素的参考轮廓相似度较高的预测形状;S22: Model training. Use the sample image set to train the neural network model 0, so that the neural network model 0 can vectorize the map elements in each sample image in the sample image set, and obtain a predicted shape that is highly similar to the reference outline of each map element;
S23:地图推理。利用训练好的神经网络模型0,对目标区域的遥感图像中的预测图像集(即样本图像集以外的图像)进行推理,得到各预测图像中的地图元素的预测形状,其中预测形状中包括了预测形状的几何基元及几何基元间的拓扑关系;S23: Map reasoning. The trained neural network model 0 is used to infer the predicted image set (that is, images other than the sample image set) in the remote sensing image of the target area, and the predicted shape of the map element in each predicted image is obtained, where the predicted shape includes Predict the geometric primitives of shapes and the topological relationships between geometric primitives;
S24:后处理。对神经网络模型0输出的预测形状进行后处理,例如连接各预测形状中的几何基元得到地图元素的矢量化形状、拼接不同遥感图像中的地图元素的矢量化形状等,得到预测矢量地图;S24: Post-processing. Post-process the predicted shapes output by the neural network model 0, such as connecting the geometric primitives in each predicted shape to obtain the vectorized shape of the map elements, splicing the vectorized shapes of the map elements in different remote sensing images, etc., to obtain the predicted vector map;
S25:修正。由测绘人员等对预测矢量地图进行修正,得到目标区域的矢量地图,以确保矢量地图的准确性。S25: Correction. The predicted vector map is corrected by surveying and mapping personnel to obtain a vector map of the target area to ensure the accuracy of the vector map.
继续参考图2,在一些实施例中,上述神经网络模型0可以包括形状初始化网络1、形状回归网络2和拓扑重建网络3。Continuing to refer to FIG. 2 , in some embodiments, the above-mentioned neural network model 0 may include a shape initialization network 1 , a shape regression network 2 and a topology reconstruction network 3 .
其中,形状初始化网络1用于提取遥感图像中各个地图元素的初始形状。例如,形状初始化网络1可以将遥感图像IM2中的房屋提取为多边形,将道路提取为折线。在一些实施例中,形状初始化网络1还用于确定道路、河流中的关键点,例如道路中的路口点(道路的交点)、河流中的分流点和汇聚点等。Among them, the shape initialization network 1 is used to extract the initial shape of each map element in the remote sensing image. For example, the shape initialization network 1 can extract the houses in the remote sensing image IM2 as polygons and the roads as polylines. In some embodiments, the shape initialization network 1 is also used to determine key points in roads and rivers, such as intersection points (intersection points of roads) in roads, divergence points and convergence points in rivers, etc.
形状回归网络2用于对形状初始化网络1得到的初始形状进行优化,得到各地图元素的回归形状,以及回归形状中各几何基元的方向数据,以提高各地图元素的几何图形的精度。 Shape regression network 2 is used to optimize the initial shape obtained by shape initialization network 1, obtain the regression shape of each map element, and the direction data of each geometric primitive in the regression shape, so as to improve the accuracy of the geometry of each map element.
拓扑重建网络3用于推理出各地图元素的回归形状中几何基元间的拓扑关系,例如用多边形描述的地图元素的几何基元可以是线段,线段间的拓扑关系可以包括但不限于共线、平行等,又例如,用折线描述的地图元素可以是点,点间的拓扑关系可以包括连接和不连接。Topology reconstruction network 3 is used to infer the topological relationship between geometric primitives in the regression shape of each map element. For example, the geometric primitives of map elements described by polygons can be line segments, and the topological relationships between line segments can include but are not limited to collinearity. , parallel, etc. For another example, the map elements described by polylines can be points, and the topological relationships between points can include connections and non-connections.
在得到各地图元素的几何基元间的拓扑关系后,可以利用后处理模块4,根据各地图元素的几何基元间的拓扑关系,对各几何基元进行连接、对各地图元素进行拼接等后处理操作,得到矢量地图。可以理解,在一些实施例中,后处理模块4可以被实现为神经网络,也可以被实现为其他的处理逻辑,在此不做限定。After obtaining the topological relationship between the geometric primitives of each map element, the post-processing module 4 can be used to connect the geometric primitives, splice the map elements, etc. according to the topological relationship between the geometric primitives of each map element. Post-processing operations produce vector maps. It can be understood that in some embodiments, the post-processing module 4 can be implemented as a neural network or other processing logic, which is not limited here.
可以理解,神经网络模型0的各网络可以包括一种或多种神经网络层,包括但不限于语义分割网络、卷积网络、池化网络、分类网络、激活网络、注意力机制网络、全连接网络、循环神经网络、批量归一化(Batch Normalization,BN)网络等。It can be understood that each network of the neural network model 0 may include one or more neural network layers, including but not limited to semantic segmentation network, convolutional network, pooling network, classification network, activation network, attention mechanism network, fully connected network, recurrent neural network, batch normalization (Batch Normalization, BN) network, etc.
可以理解,图2所示的神经网络模型0的结构只是一种示例,在另一些实施例中,神经网络模型0还可以包括更多或更少的网络,也可以组合或拆分部分网络,在此不做限定。例如,在一些实施例中,以神经网络形式实现的后处理模块4可以包括在神经网络模型0中。It can be understood that the structure of the neural network model 0 shown in Figure 2 is just an example. In other embodiments, the neural network model 0 can also include more or less networks, and some networks can also be combined or split. No limitation is made here. For example, in some embodiments, the post-processing module 4 implemented in the form of a neural network may be included in the neural network model 0.
下面介绍神经网络模型0中各网络的训练过程。The following introduces the training process of each network in neural network model 0.
首先介绍形状初始化网络1的训练过程。First, the training process of shape initialization network 1 is introduced.
图3根据本申请的一些实施例,示出了一种形状初始化网络1的结构示意图。Figure 3 shows a schematic structural diagram of a shape initialization network 1 according to some embodiments of the present application.
如图3所示,形状初始化网络1包括语义分割网络11、掩膜生成网络12、边缘提取网络13和形状生成网络14。As shown in Figure 3, the shape initialization network 1 includes a semantic segmentation network 11, a mask generation network 12, an edge extraction network 13 and a shape generation network 14.
其中,语义分割网络11用于通过提取样本图像集中各样本图像的图像特征(Image Embedding)。在一些实施例中,语义分割网络11可以包括目标检测网络(Feature Pyramid Networks,FPN)。Among them, the semantic segmentation network 11 is used to extract the image features (Image Embedding) of each sample image in the sample image set. In some embodiments, the semantic segmentation network 11 may include a target detection network (Feature Pyramid Networks, FPN).
掩膜生成网络12用于根据样本图像集中各样本图像的图像特征,得到各地图元素的轮廓掩膜,例如在一些实施例中,掩膜生成网络12可以包括串联的卷积网络、批量归一化(Batch Normalization,BN)网络和激活网络(例如线性整流函数Rectified Linear Unit,ReLU)。The mask generation network 12 is used to obtain the outline mask of each map element based on the image characteristics of each sample image in the sample image set. For example, in some embodiments, the mask generation network 12 may include a series of convolutional networks, batch normalization Batch Normalization (BN) network and activation network (such as linear rectification function Rectified Linear Unit, ReLU).
边缘提取网络13用于根据各地图元素的轮廓掩膜和图像特征,提取轮廓掩膜的边缘,得到各地图元素的掩膜边缘,其中掩膜边缘用于描述地图元素的轮廓。在一些实施例中,边缘提取网络13可以包括串联的卷积网络、批量归一化(Batch Normalization,BN)网络和激活网络(例如线性整流函数)。The edge extraction network 13 is used to extract the edges of the contour mask according to the contour mask and image features of each map element to obtain the mask edge of each map element, where the mask edge is used to describe the contour of the map element. In some embodiments, the edge extraction network 13 may include a concatenated convolutional network, a batch normalization (Batch Normalization, BN) network, and an activation network (such as a linear rectification function).
形状生成网络14用于对各地图元素的掩膜边缘进行简化,得到各地图元素的初始形状,以减少初始形状所包括的几何基元的数量,例如减少多边形所包括线段的数量、减少折线所包括的点的数量等,以提高利用神经网络模型0对输入的图像进行推理的速度。在一些实施例中,可以利用道格拉斯-普克(Douglas Peucker,DP)等算法简化多边形、利用非极大抑制(non-maximum suppression,NMS)等算法简化折线。The shape generation network 14 is used to simplify the mask edges of each map element and obtain the initial shape of each map element to reduce the number of geometric primitives included in the initial shape, such as reducing the number of line segments included in polygons and reducing the number of polyline segments. The number of points included, etc., to improve the speed of inference on the input image using the neural network model 0. In some embodiments, algorithms such as Douglas Peucker (DP) can be used to simplify polygons, and algorithms such as non-maximum suppression (NMS) can be used to simplify polylines.
可以理解,图3所示的形状初始化网络1的结构只是一种示例,在另一些实施例中,形状初始化网络1也可以采用其他结构,各网络也可以采用其他类型的神经网络来实现,在此不做限定。It can be understood that the structure of the shape initialization network 1 shown in Figure 3 is just an example. In other embodiments, the shape initialization network 1 can also adopt other structures, and each network can also be implemented using other types of neural networks. In This is not limited.
下面结合图3所示的形状初始化网络1的结构,介绍形状初始化网络1的训练过程。The training process of shape initialization network 1 is introduced below based on the structure of shape initialization network 1 shown in Figure 3.
具体地,图4根据本申请的一些实施例,示出了一种形状初始化网络1的训练流程示意图。该流程的执行主体为电子设备,如图4所示,该流程包括如下步骤:Specifically, FIG. 4 shows a schematic diagram of the training process of the shape initialization network 1 according to some embodiments of the present application. The execution subject of this process is electronic equipment, as shown in Figure 4. The process includes the following steps:
S401:获取样本图像集。S401: Obtain a sample image set.
电子设备获取目标区域中的样本图像集,该样本图像集中包括了各样本图像中的地图元素的参考轮廓。The electronic device acquires a sample image set in the target area, and the sample image set includes the reference outline of the map element in each sample image.
在一些实施例中,样本图像集可以包括N张样本图像,每张样本图像的大小为H×W(即高H像素、宽W像素)、样本图像中的每个像素可以包括n个通道(n为样本图像的颜色通道数,例如假设样本图像为RGB图像,则n=3),从而样本图像集可以表示为一个4维矩阵P,矩阵P的大小为N×n×H×W,矩阵P的元素P(i,j,k,m)表示第i张样本图像第k行第m列元素的第j个颜色通道的值。In some embodiments, the sample image set may include N sample images, the size of each sample image is H×W (ie, H pixels high, W pixels wide), and each pixel in the sample image may include n channels ( n is the number of color channels of the sample image. For example, assuming that the sample image is an RGB image, n=3), so the sample image set can be expressed as a 4-dimensional matrix P. The size of the matrix P is N×n×H×W. The matrix The element P(i,j,k,m) of P represents the value of the j-th color channel of the element in the k-th row and m-th column of the i-th sample image.
可以理解,在一些实施例中,样本图像集中地图元素的参考轮廓可以包括由人工确定出的各地图元素的矢量数据,例如用折线表示的道路,用多边形表示的房屋等,以便于在训练神经网络模型0过程中可以通过该参考轮廓来评价神经网络模型0推理出的结果的好坏,并根据评价结果来调整神经网络模型0的网络参数。It can be understood that in some embodiments, the reference outline of the map elements in the sample image set may include manually determined vector data of each map element, such as roads represented by polylines, houses represented by polygons, etc., so as to facilitate training the neural network. During the network model building process, the reference profile can be used to evaluate the quality of the results inferred by the neural network model, and the network parameters of the neural network model can be adjusted based on the evaluation results.
S402:利用语义分割网络11得到样本图像集中地图元素的图像特征。S402: Use the semantic segmentation network 11 to obtain the image features of the map elements in the sample image set.
电子设备利用语义分割网络11,例如FPN网络,对样本图像集进行特征提取,得到样本图像集的图像特征。The electronic device uses a semantic segmentation network 11, such as an FPN network, to extract features from the sample image set to obtain the image features of the sample image set.
在一些实施例中,电子设备可以将矩阵P输入到语义分割网络11中,得到的样本图像集的图像特征矩阵F,矩阵F的大小为N×C×H×W,其中,C为语义分割网络11对各样本图像提取的图像特征的数量。矩阵F中的元素P(i,j,k,m)表示第i张样本图像第k行第m列元素的第j个特征的值,由语义分割网络11的类型预先确定或由开发人员预先设定。In some embodiments, the electronic device can input the matrix P into the semantic segmentation network 11 to obtain the image feature matrix F of the sample image set. The size of the matrix F is N×C×H×W, where C is the semantic segmentation. The number of image features extracted by network 11 for each sample image. The element P(i, j, k, m) in the matrix F represents the value of the j-th feature of the k-th row and m-th column element of the i-th sample image, which is predetermined by the type of the semantic segmentation network 11 or by the developer. set up.
S403:基于样本图像集的图像特征,利用掩膜生成网络12得到各地图元素的轮廓掩膜,利用边缘提取网络13得到各轮廓掩膜的掩膜边缘。S403: Based on the image features of the sample image set, use the mask generation network 12 to obtain the contour masks of each map element, and use the edge extraction network 13 to obtain the mask edges of each contour mask.
在一些实施例中,电子设备可以将样本图像的图像特征矩阵F输入到掩膜生成网络12中,得到各地图元素的轮廓掩膜,例如,在一些实施例中,各地图元素的轮廓掩膜可以表示为矩阵M,矩阵M的大小为N×p×H×W,其中,p表示地图元素的分类数(以下以分为2类为例进行介绍),例如可以将地 图元素分为以多边形表示的地图元素(p=1)和以折线表示的地图元素(p=2)。从而对于给定的p和N,矩阵M中大小为1×1×H×W的子矩阵表示第N张样本图像中第p类地图元素的轮廓掩膜。例如该大小为1×1×H×W子矩阵中,属于同一类地图元素的元素的值可以相同,参考图5,对于样本图像IM2,大小为1×1×H×W的子矩阵中,房屋所在的像素的值可以都为1,其他像素的值可以为0。In some embodiments, the electronic device can input the image feature matrix F of the sample image into the mask generation network 12 to obtain the outline mask of each map element. For example, in some embodiments, the outline mask of each map element It can be expressed as a matrix M. The size of the matrix M is N×p×H×W, where p represents the number of classifications of map elements (the following is an example of dividing map elements into 2 categories). For example, map elements can be divided into polygons. The map element represented by a polyline (p=1) and the map element represented by a polyline (p=2). Therefore, for a given p and N, a submatrix of size 1×1×H×W in the matrix M represents the contour mask of the p-th type map element in the N-th sample image. For example, in the sub-matrix with a size of 1×1×H×W, the values of elements belonging to the same type of map elements can be the same. Refer to Figure 5. For the sample image IM2, in the sub-matrix with a size of 1×1×H×W, The pixels where the house is located can all have a value of 1, and the other pixels can have a value of 0.
在得到地图元素的轮廓掩膜后,在一些实施例中,电子设备可以将前述轮廓掩膜和对应的参考轮廓输入到边缘提取网络13中,得到轮廓掩膜的掩膜边缘。例如,参考图6A,边缘提取网络13可以推理得到轮廓掩膜的边缘区域内的各像素点与参考轮廓的坐标距离DT,DT的大小为N×2×H×W,DT中的元素DT(i,j,k,m)表示第i张样本图像的第j类地图元素中的第k行第m列像素与参考轮廓的坐标距离,DT(i,j,k,m)可以包括两个元素dx和dy,分别表示在H方向和W方向的坐标距离。例如B1为图6A所示的轮廓掩膜边缘区域内的一点,坐标为(x,y),B1到参考轮廓的距离DT为(dx,dy),则B1(x,y)对应于掩膜边缘上的点的坐标为(x+dx,y+dy)。如此,将边缘区域内所有点的坐标,与对应的坐标距离相加,即可得到轮廓掩膜的掩膜边缘上的点的坐标。假设轮廓掩膜的边缘区域有L个像素点,则由该L个像素点得到的轮廓掩膜的掩膜轮廓可以表示为点集(x i+dx i,y i+dy i)(i=1,2,……,L)。 After obtaining the contour mask of the map element, in some embodiments, the electronic device can input the aforementioned contour mask and the corresponding reference contour into the edge extraction network 13 to obtain the mask edge of the contour mask. For example, referring to Figure 6A, the edge extraction network 13 can infer the coordinate distance DT between each pixel in the edge area of the contour mask and the reference contour. The size of DT is N×2×H×W. The element DT in DT ( i, j, k, m) represents the coordinate distance between the k-th row and m-th column pixel in the j-th type map element of the i-th sample image and the reference outline. DT(i, j, k, m) can include two The elements dx and dy represent the coordinate distance in the H direction and W direction respectively. For example, B1 is a point in the edge area of the contour mask shown in Figure 6A, the coordinates are (x, y), and the distance DT from B1 to the reference contour is (dx, dy), then B1 (x, y) corresponds to the mask The coordinates of the point on the edge are (x+dx,y+dy). In this way, by adding the coordinates of all points in the edge area and the corresponding coordinate distances, the coordinates of the points on the mask edge of the contour mask can be obtained. Assuming that there are L pixels in the edge area of the contour mask, the mask outline of the contour mask obtained from the L pixels can be expressed as a point set (x i +dx i ,y i +dy i ) (i= 1, 2,...,L).
可以理解,轮廓掩膜的边缘区域的大小可以预先设定,例如边缘区域可以为距离轮廓掩膜最外层的像素点的距离小于预设边缘距离阈值的像素点组成的区域。It can be understood that the size of the edge area of the outline mask can be preset. For example, the edge area can be an area composed of pixels whose distance from the outermost pixel of the outline mask is less than a preset edge distance threshold.
可以理解,在一些实施例中,边缘区域内的像素点到参考轮廓的坐标距离,可以是参考轮廓上距离该像素点最近的点的坐标与该像素点的坐标的差值。It can be understood that in some embodiments, the coordinate distance from a pixel point in the edge area to the reference outline may be the difference between the coordinates of the point closest to the pixel point on the reference outline and the coordinates of the pixel point.
可以理解,在利用训练好的形状初始化网络1提取地图元素的轮廓掩膜对应的掩膜边缘的过程中,由于不存在参考轮廓,上述DT可以是轮廓掩膜的边缘区域内的点到轮廓掩膜最外层像素组成的轮廓的坐标距离。例如,参考图6B,对于轮廓掩膜的边缘区域内的点E1(x,y)到轮廓掩膜最外层像素组成的轮廓的坐标距离为(dx,dy),则点E1对应于轮廓掩膜的掩膜边缘上的点E1'的坐标为(x+dx,y+dy)。It can be understood that in the process of using the trained shape initialization network 1 to extract the mask edge corresponding to the contour mask of the map element, since there is no reference contour, the above DT can be a point-to-contour mask within the edge area of the contour mask. The coordinate distance of the outline composed of the outermost pixels of the membrane. For example, referring to Figure 6B, for the coordinate distance from the point E1 (x, y) in the edge area of the contour mask to the contour composed of the outermost pixels of the contour mask is (dx, dy), then the point E1 corresponds to the contour mask The coordinates of point E1' on the mask edge of the film are (x+dx, y+dy).
可以理解,在另一些实施例中,也可以通过其他方式来得到轮廓掩膜的掩膜边缘,例如直接以轮廓掩膜的最外层的点,作为轮廓掩膜的掩膜边缘,在此不做限定。It can be understood that in other embodiments, the mask edge of the contour mask can also be obtained in other ways, for example, directly using the outermost point of the contour mask as the mask edge of the contour mask, which is not the case here. Make limitations.
S404:利用形状生成网络14简化掩膜边缘,得到初始形状。S404: Use the shape generation network 14 to simplify the mask edges and obtain the initial shape.
电子设备对各地图元素的轮廓掩膜的掩膜边缘进行简化,得到各地图元素的初始形状,以减少初始形状包括的几何基元的数量,提高电子设备利用神经网络模型0对遥感图像进行推理的速度。例如,通过多边形简化算法,例如DP算法,对多边形进行简化,得到包括线段更少的多边形初始形状;又例如通过线简化算法,例如NMS算法,对折线上的点进行抽稀,得到包括更少的点的折线初始形状。The electronic device simplifies the mask edge of the outline mask of each map element to obtain the initial shape of each map element to reduce the number of geometric primitives included in the initial shape and improve the electronic device's use of neural network model 0 to reason about remote sensing images. speed. For example, through polygon simplification algorithms, such as the DP algorithm, polygons are simplified to obtain an initial polygon shape that includes fewer line segments; another example is through line simplification algorithms, such as the NMS algorithm, which thin out the points on the polyline to obtain an initial shape that includes fewer line segments. The initial shape of the polyline at the point.
可以理解,以上以DP算法或NMS算法对掩膜边缘进行简化只是一种示例,在另一些实施例中,也可以采用其他算法进行简化,在此不做赘述。It can be understood that the above simplification of the mask edge using the DP algorithm or the NMS algorithm is just an example. In other embodiments, other algorithms can also be used for simplification, which will not be described again here.
可以理解,在一些实施例中,电子设备得到的初始形状中,还可以包括初始形状中的各几何基元的图像特征。It can be understood that in some embodiments, the initial shape obtained by the electronic device may also include image features of each geometric primitive in the initial shape.
S405:计算损失函数,并基于损失函数判断是否满足终止条件。S405: Calculate the loss function and determine whether the termination condition is met based on the loss function.
电子设备根据初始形状中各个预测点的坐标,以及参考轮廓上相对应的参考点的坐标,计算损失函数,并基于损失函数判断是否满足终止条件,如果满足,说明形状初始化网络1得到的初始形状满足要求,转至步骤S406;否则,说明形状初始化网络1基于当前的网络参数不能得到满足要求的初始形状,转至步骤S407。The electronic device calculates the loss function based on the coordinates of each predicted point in the initial shape and the coordinates of the corresponding reference point on the reference contour, and determines whether the termination condition is met based on the loss function. If it is met, it indicates the initial shape obtained by the shape initialization network 1 If the requirements are met, go to step S406; otherwise, it means that the shape initialization network 1 cannot obtain an initial shape that meets the requirements based on the current network parameters, and go to step S407.
可以理解,在一些实施例中,对于形状初始化网络1中的各网络可以使用不同的损失函数。It can be understood that in some embodiments, different loss functions may be used for each network in the shape initialization network 1.
在一些实施例中,对于语义分割网络11和掩膜生成网络12,损失函数可以通过交叉熵损失函数 (Cross Entropy Loss Function)、灶性损失函数(Focal Loss Function)、0-1损失、熵与交叉熵损失、softmax损失等。In some embodiments, for the semantic segmentation network 11 and the mask generation network 12, the loss function may be a cross-entropy loss function (Cross Entropy Loss Function), a focal loss function (Focal Loss Function), 0-1 loss, entropy and Cross entropy loss, softmax loss, etc.
例如,假设某一图像中包括N1个像素点,语义分割网络11和掩膜生成网络12将N1个像素点分为M1类(即分为M1类地图元素),则语义分割网络11和掩膜生成网络12的交叉熵损失L 11-12-CEL可以表示为如下公式(1)。 For example, assuming that an image includes N1 pixels, and the semantic segmentation network 11 and the mask generation network 12 classify the N1 pixels into M1 categories (i.e., into M1 map elements), then the semantic segmentation network 11 and the mask generation network 12 The cross entropy loss L 11-12-CEL of the generation network 12 can be expressed as the following formula (1).
Figure PCTCN2022092810-appb-000001
Figure PCTCN2022092810-appb-000001
公式(1)中,y ij为0-1变量,在第i个像素点在第j类地图元素轮廓掩膜区域内时,y ij=1,否则y ij=0;p ij为掩膜生成网络12确定出的第i个像素点在第j类地图元素的轮廓掩膜区域内的概率。从公式(1)中可知,交叉熵损失L 11-12-CEL用于指示掩膜生成网络12得到的轮廓掩膜的准确度,L 11-12-CEL越小,说明掩膜生成网络12得到的轮廓掩膜的准确度越高。 In formula (1), y ij is a 0-1 variable. When the i-th pixel is within the j-th map element outline mask area, y ij =1, otherwise y ij =0; p ij is the mask generated The network 12 determines the probability that the i-th pixel is within the contour mask area of the j-th type map element. It can be seen from formula (1) that the cross entropy loss L 11-12-CEL is used to indicate the accuracy of the contour mask obtained by the mask generation network 12. The smaller the L 11-12-CEL , the smaller the The accuracy of the contour mask is higher.
可以理解,语义分割网络11和掩膜生成网络12的交叉熵损失L 11-12-CEL反应了利用语义分割网络11和掩膜生成网络12得到的轮廓掩膜的精度,L 11-12-CEL越小,说明精度越高。 It can be understood that the cross-entropy loss L 11-12-CEL of the semantic segmentation network 11 and the mask generation network 12 reflects the accuracy of the contour mask obtained by using the semantic segmentation network 11 and the mask generation network 12, L 11-12-CEL The smaller the value, the higher the accuracy.
在一些实施例中,对应边缘提取网络13,损失函数可以包括均方误差(Mean squared error,MSE,也称L2损失)。In some embodiments, corresponding to the edge extraction network 13, the loss function may include mean squared error (MSE, also known as L2 loss).
假设某一初始形状包括N2个预测点,第i个预测点的坐标为(x i,y i),第i个预测点在参考轮廓中对应的参考点的坐标为(x si,y si),则边缘提取网络13的L2损失L 13-L2可以表示为如下公式(2): Assume that a certain initial shape includes N2 prediction points, the coordinates of the i-th prediction point are (x i , y i ), and the coordinates of the reference point corresponding to the i-th prediction point in the reference contour are (x si , y si ) , then the L2 loss L 13-L2 of the edge extraction network 13 can be expressed as the following formula (2):
Figure PCTCN2022092810-appb-000002
Figure PCTCN2022092810-appb-000002
可以理解,边缘提取网络13的L2损失L 13-L2反应了边缘提取网络13得到的轮廓掩膜的掩膜边缘与对应的参考轮廓间的相似度,L 13-L2越小,说明相似度越高,边缘提取网络13的精度越高。 It can be understood that the L2 loss L 13-L2 of the edge extraction network 13 reflects the similarity between the mask edge of the contour mask obtained by the edge extraction network 13 and the corresponding reference contour. The smaller the L 13-L2 , the greater the similarity. The higher, the higher the accuracy of edge extraction network 13.
可以理解,在另一些实施例中,也可以通过其他类型的损失函数来确定是否满足终止条件。It can be understood that in other embodiments, other types of loss functions can also be used to determine whether the termination condition is met.
可以理解,终止条件可以包括以下条件中的至少一个:各网络对应的损失函数收敛、各网络对应的损失函数值小于对应的预设损失函数值。例如,在交叉熵损失函数收敛和L2损失都收敛时,确定满足终止条件;又例如,在交叉熵损失函数小于对应的第一预设损失函数值且L2损失值小于对应的第二预设损失函数值时,确定满足终止条件。It can be understood that the termination condition may include at least one of the following conditions: the loss function corresponding to each network converges, and the loss function value corresponding to each network is less than the corresponding preset loss function value. For example, when both the cross-entropy loss function converges and the L2 loss converges, the termination condition is determined to be satisfied; for another example, when the cross-entropy loss function is less than the corresponding first preset loss function value and the L2 loss value is less than the corresponding second preset loss function value, it is determined that the termination condition is met.
可以理解,在另一些实施例中,终止条件还可以包括其他条件,在此不做限定。例如,在一些实施例中,还可以将各网络的损失函数进行加权求和(即将各网络的损失函数乘以各自对应的权重值后相加)得到总损失函数,在总损失函数收敛或小于预设总损失函数值时,确定满足终止条件。例如,在损失函数包括前述L 11-12-CEL和L 13-L2的情况下,总损失函数可以表示为λ 1L 11-12-CEL2L 13-L2,其中λ 1表示交叉熵损失的权重、λ 2表示L2损失的权重,λ 1、λ 2可以由开发人员预先设定。 It can be understood that in other embodiments, the termination condition may also include other conditions, which are not limited here. For example, in some embodiments, the loss functions of each network can also be weighted and summed (that is, the loss functions of each network are multiplied by their corresponding weight values and then added) to obtain the total loss function. When the total loss function converges or is less than When presetting the total loss function value, it is determined that the termination condition is met. For example, in the case where the loss function includes the aforementioned L 11-12-CEL and L 13-L2 , the total loss function can be expressed as λ 1 L 11-12-CEL + λ 2 L 13-L2 , where λ 1 represents cross entropy The weight of the loss and λ 2 represent the weight of the L2 loss, and λ 1 and λ 2 can be preset by the developer.
S406:存储网络参数,得到形状初始化网络1。S406: Store network parameters and obtain shape initialization network 1.
电子设备存储形状初始化网络1当前使用的网络参数,得到形状初始化网络1。The electronic device stores the network parameters currently used by the shape initialization network 1 to obtain the shape initialization network 1 .
S407:调整网络参数,进行下一轮训练。S407: Adjust network parameters and conduct the next round of training.
电子设备在确定出不满足终止条件的情况下,调整形状初始化网络1的网络参数,进行下一轮训练。例如,在所有网络对应的损失函数都不满足相应的终止条件时,调整所有网络的网络参数,进行下一轮训练;又例如,在部分网络的损失函数满足相应的终止条件,另一部分网络的损失函数不满足相应的终止条件的情况下,调整损失函数不满足相应的终止条件的网络的网络参数,进行下一轮训练;再例如,在总损失函数不满足相应的终止条件的情况下,调整至少部分网络的网络参数,进行下一轮训练。When the electronic device determines that the termination condition is not met, it adjusts the network parameters of the shape initialization network 1 and performs the next round of training. For example, when the corresponding loss functions of all networks do not meet the corresponding termination conditions, adjust the network parameters of all networks and conduct the next round of training; for another example, when the loss functions of some networks meet the corresponding termination conditions, the loss functions of other parts of the network meet the corresponding termination conditions. When the loss function does not meet the corresponding termination conditions, adjust the network parameters of the network whose loss function does not meet the corresponding termination conditions and conduct the next round of training; for another example, when the total loss function does not meet the corresponding termination conditions, Adjust the network parameters of at least part of the network and proceed to the next round of training.
从上述形状初始化网络1的训练过程可知,形状初始化网络1是基于对目标区域的样本图像中各地 图元素的参考轮廓的学习,也即是对目标区域的地图元素的几何特征的学习,来对图像进行语义分割,并得到各地图元素的初始形状,能更好地适应目标区域的地图元素的几何特征,可以提高各地图元素的初始形状的精度。It can be seen from the training process of the above-mentioned shape initialization network 1 that the shape initialization network 1 is based on the learning of the reference contours of each map element in the sample image of the target area, that is, the learning of the geometric characteristics of the map elements of the target area. The image is semantically segmented and the initial shape of each map element is obtained, which can better adapt to the geometric characteristics of the map elements in the target area and improve the accuracy of the initial shape of each map element.
对于训练好的形状初始化网络1,电子设备可以将遥感图像输入到该网络,得到遥感图像中的地图元素的初始形状和图像特征。For the trained shape initialization network 1, the electronic device can input the remote sensing image into the network to obtain the initial shape and image features of the map elements in the remote sensing image.
下面介绍形状回归网络2的训练过程。The training process of shape regression network 2 is introduced below.
图7根据本申请的一些实施例,示出了一种形状回归网络2的结构示意图。Figure 7 shows a schematic structural diagram of a shape regression network 2 according to some embodiments of the present application.
如图7所示,形状回归网络2包括池化网络21、特征编码网络22、方向生成网络23和形状调整网络24。As shown in FIG. 7 , the shape regression network 2 includes a pooling network 21 , a feature encoding network 22 , a direction generation network 23 and a shape adjustment network 24 .
其中,池化网络21用于对初始形状中的各几何基元的特征参数进行池化、插值,得到各几何基元的池化特征。例如,前述语义分割网络11提取的样本图像的图像特征是以像素为单位,但经过前述形状生成网络13后,初始形状中的各几何基元的坐标与语义分割网络11得到的图像特征,并不存在一一对应关系,此时可以通过池化网络21,根据初始形状中的各几何基元相邻的像素的图像特征,通过插值的方式得到初始形状中各几何基元的特征。例如,对于以多边形表示的初始形状,可以通过线特征插值(Line of Interest,LOI)方法对几何基元的图像特征进行插值,对于以折线表示的初始形状,可以通过点特征插值(Point of Interest,POI)方法进行插值,具体将在下文进行举例说明,在此不做赘述。Among them, the pooling network 21 is used to pool and interpolate the characteristic parameters of each geometric primitive in the initial shape to obtain the pooled characteristics of each geometric primitive. For example, the image features of the sample image extracted by the aforementioned semantic segmentation network 11 are in pixel units, but after passing through the aforementioned shape generation network 13, the coordinates of each geometric primitive in the initial shape are combined with the image features obtained by the semantic segmentation network 11. There is no one-to-one correspondence. At this time, the characteristics of each geometric primitive in the initial shape can be obtained through interpolation through the pooling network 21 according to the image features of adjacent pixels of each geometric primitive in the initial shape. For example, for the initial shape represented by polygons, the image features of the geometric primitives can be interpolated through the line feature interpolation (Line of Interest, LOI) method. For the initial shape represented by polylines, the point feature interpolation (Point of Interest) can be used. , POI) method for interpolation. Specific examples will be given below and will not be described in detail here.
特征编码网络22用于对初始形状中各几何基元的池化特征进行重新进行编码,得到各几何基元的回归编码特征,该回归编码特征可以用于推理各几何基元的方向数据、对初始形状进行调整等。在一些实施例中,特征编码网络22可以包括多头注意力网络(Multi-Head-Attention Network)。The feature encoding network 22 is used to re-encode the pooling features of each geometric primitive in the initial shape to obtain the regression coding features of each geometric primitive. The regression coding features can be used to infer the direction data of each geometric primitive, and to Adjust the initial shape, etc. In some embodiments, the feature encoding network 22 may include a Multi-Head-Attention Network.
方向生成网络23用于根据各几何基元的回归编码特征,得到各几何基元的方向数据。其中,在几何基元为点时,几何基元的方向可以是该点的切线方向;在几何基元为线段时,几何基元的方向可以是线段的方向。在一些实施例中方向生成网络23可以包括串联的卷积网络、BN网络和激活网络。The direction generation network 23 is used to obtain the direction data of each geometric primitive based on the regression encoding characteristics of each geometric primitive. Wherein, when the geometric primitive is a point, the direction of the geometric primitive may be the tangent direction of the point; when the geometric primitive is a line segment, the direction of the geometric primitive may be the direction of the line segment. In some embodiments, the direction generation network 23 may include a convolutional network, a BN network, and an activation network in series.
可以理解,在一些实施例中,在训练形状回归网络2的过程中,方向生成网络23得到的方向数据可以用于计算方向数据的角度约束损失和L2损失,并根据得到的角度约束损失和L2损失来调整形状回归网络2的网络参数,以提高方向生成网络23得到的几何基元的方向数据的准确性,具体计算方法将在下文进行介绍,在此不做赘述。It can be understood that in some embodiments, during the process of training the shape regression network 2, the direction data obtained by the direction generation network 23 can be used to calculate the angle constraint loss and L2 loss of the direction data, and according to the obtained angle constraint loss and L2 The loss is used to adjust the network parameters of the shape regression network 2 to improve the accuracy of the direction data of the geometric primitives obtained by the direction generation network 23. The specific calculation method will be introduced below and will not be described in detail here.
形状调整网络24用于根据各几何基元的回归编码特征,对初始形状中各几何基元的位置进行调整,得到更为准确的回归形状,并计算回归形状中各几何基元中的预测点的坐标相对于参考轮廓上对应的点的坐标残差,以用于计算损失函数。在一些实施例中方向生成网络23可以包括串联的卷积网络、BN网络和激活网络。The shape adjustment network 24 is used to adjust the position of each geometric primitive in the initial shape according to the regression encoding characteristics of each geometric primitive to obtain a more accurate regression shape, and calculate the predicted points in each geometric primitive in the regression shape. The coordinates are relative to the coordinate residual of the corresponding point on the reference contour, which is used to calculate the loss function. In some embodiments, the direction generation network 23 may include a convolutional network, a BN network, and an activation network in series.
可以理解,在一些实施例中,形状调整网络24生成的回归形状可以用于计算相对形状损失(Relative Shape Loss),并根据相对形状损失来调整形状回归网络2的网络参数,以提高形状调整网络24得到的回归形状的准确性,具体计算方法将在下文进行介绍,在此不做赘述。It can be understood that in some embodiments, the regression shape generated by the shape adjustment network 24 can be used to calculate the relative shape loss (Relative Shape Loss), and adjust the network parameters of the shape regression network 2 according to the relative shape loss to improve the shape adjustment network 24 The accuracy of the regression shape obtained, the specific calculation method will be introduced below, and will not be described in detail here.
具体地,图8根据本申请的一些实施例,示出了一种形状回归网络2的训练流程示意图。该流程的执行主体为电子设备,如图8所示,该流程包括如下步骤。Specifically, FIG. 8 shows a schematic diagram of the training process of the shape regression network 2 according to some embodiments of the present application. The execution subject of this process is electronic equipment, as shown in Figure 8. The process includes the following steps.
S801:对初始形状中的几何基元的图像特征和分类特征进行特征池化,得到初始形状的池化特征。S801: Perform feature pooling on the image features and classification features of the geometric primitives in the initial shape to obtain the pooled features of the initial shape.
如前所述,前述形状初始化网络1得到的样本图像的图像特征和分类特征都是基于像素点得到,而初始形状中包括的点与像素点不存在一一对应关系,因此,可以对初始形状中的几何基元的图像特征和分类特征进行池化、插值等,得到各几何基元的池化特征。As mentioned before, the image features and classification features of the sample image obtained by the shape initialization network 1 are obtained based on pixel points, and there is no one-to-one correspondence between the points included in the initial shape and the pixel points. Therefore, the initial shape can be The image features and classification features of the geometric primitives are pooled, interpolated, etc., and the pooled features of each geometric primitive are obtained.
例如,对于以折线表示的初始形状,其几何基元为点,可以通过POI算法来得到初始形状中各几何基元的池化特征。具体地,假设前述语义分割网络11得到点A的图像特征对应的向量为c0、点B的图像特征对应的向量c1,则初始形状中,位于线段AB上的点C的图像特征对应的向量为c0+(c1-c0)l AC/l AB,其中,l AC为线段AC的长度,l AB为线段AB的长度。 For example, for an initial shape represented by a polyline, whose geometric primitives are points, the POI algorithm can be used to obtain the pooling features of each geometric primitive in the initial shape. Specifically, assuming that the aforementioned semantic segmentation network 11 obtains the vector corresponding to the image feature of point A as c0 and the vector c1 corresponding to the image feature of point B, then in the initial shape, the vector corresponding to the image feature of point C located on line segment AB is c0+(c1-c0)l AC /l AB , where l AC is the length of line segment AC, and l AB is the length of line segment AB.
又例如,对于以多边形表示的初始形状,其几何基元为线段,可以通过LOI算法来得到初始形状中各几何基元的池化特征。具体地,可以获取线段上的多个点(例如32个点),再将该多个点分为若干组(例如将32个点分为4组),并通过POI算法得到该多个点池化特征,再对每组中点的池化特征求均值,得到该组的池化特征(例如4个组的池化特征对应的向量为n1,n2,n3,n4),在把各组的池化特征进行连接,得到该线段的池化特征(例如前述向量n1,n2,n3,n4连接,得到该线段的池化特征对应的向量n5=[n1n2n3n4]。For another example, for an initial shape represented by a polygon, whose geometric primitives are line segments, the LOI algorithm can be used to obtain the pooling features of each geometric primitive in the initial shape. Specifically, multiple points on the line segment (for example, 32 points) can be obtained, and then the multiple points can be divided into several groups (for example, 32 points can be divided into 4 groups), and the multiple point pools can be obtained through the POI algorithm. features, and then average the pooled features of the midpoints of each group to obtain the pooled features of the group (for example, the vectors corresponding to the pooled features of 4 groups are n1, n2, n3, n4), and then average the pooled features of each group. The pooling features are connected to obtain the pooling features of the line segment (for example, the aforementioned vectors n1, n2, n3, and n4 are connected to obtain the vector n5 = [n1n2n3n4] corresponding to the pooling features of the line segment.
可以理解,在另一些实施例中,也可以通过其他方式来确定初始形状中各几何基元的池化特征,在此不做限定。It can be understood that in other embodiments, the pooling characteristics of each geometric primitive in the initial shape can also be determined in other ways, which is not limited here.
S802:利用特征编码网络22对初始形状的池化特征进行编码,得到回归编码特征。S802: Use the feature encoding network 22 to encode the pooling features of the initial shape to obtain regression encoding features.
电子设备利用特征编码网络22对初始形状的池化特征进行重新编码,例如丢弃池化特征中对形状调整和方向数据影响较小的特征、重新提取对形状调整和方向数据影响较大的特征等,得到回归编码特征。The electronic device uses the feature encoding network 22 to re-encode the pooled features of the initial shape, for example, discarding the features in the pooled features that have a small impact on the shape adjustment and orientation data, re-extracting the features that have a greater impact on the shape adjustment and orientation data, etc. , get the regression coding features.
在一些实施例中,特征编码网络22可以是基于全局注意力机制的编码网络,例如前述多头注意力网络。In some embodiments, the feature encoding network 22 may be an encoding network based on a global attention mechanism, such as the aforementioned multi-head attention network.
S803:基于初始形状的回归编码特征,利用方向生成网络23得到初始形状的预测方向数据、利用形状调整网络24对初始形状进行调整,得到回归形状。S803: Based on the regression encoding features of the initial shape, use the direction generation network 23 to obtain the predicted direction data of the initial shape, and use the shape adjustment network 24 to adjust the initial shape to obtain the regression shape.
电子设备将初始形状的各几何基元的编码特征输入到方向生成网络23和形状调整网络24,分别得到初始形状中各几何基元的预测方向数据和回归形状。The electronic device inputs the encoding features of each geometric primitive of the initial shape into the direction generation network 23 and the shape adjustment network 24 to obtain the predicted direction data and regression shape of each geometric primitive in the initial shape respectively.
可以理解,在一些实施例中,在初始形状为多边形时,几何基元为线段,则几何基元的方向为线段的方向;在初始形状为折线时,几何基元为点,则几何基元的方向为点的切线方向。It can be understood that in some embodiments, when the initial shape is a polygon and the geometric primitive is a line segment, the direction of the geometric primitive is the direction of the line segment; when the initial shape is a polyline and the geometric primitive is a point, then the geometric primitive The direction of is the tangent direction of the point.
可以理解,在一些实施例中,得到的回归形状中包括了各几何基元的方向数据、各几何基元中的点的坐标数据、各几何基元的顺序等。It can be understood that in some embodiments, the obtained regression shape includes the direction data of each geometric primitive, the coordinate data of the points in each geometric primitive, the order of each geometric primitive, etc.
S804:基于回归形状和几何基元的方向数据和计算损失函数。S804: Orientation data and calculation of loss function based on regressed shape and geometric primitives.
电子设备基于初始形状中各几何基元的方向数据、回归形状计算损失函数。The electronic device calculates the loss function based on the direction data of each geometric primitive in the initial shape and the regression shape.
例如,在一些实施例中,损失函数可以包括基于各几何基元的预测方向数据和该几何基元在参考轮廓中对应的点或线的方向数据间的L2损失,用于指示方向生成网络23得到的各几何基元的方向的准确性,该L2损失越小,说明方向生成网络23得到的几何基元的方向数据越准确。具体地,假设某一初始形状包括N3个基元,第i个基元的方向数据为dr i,第i个基元在参考轮廓中对应的点或线的方向数据为dr si,则方向生成网络23的L2损失L 23-L2可以表示为如下公式(3): For example, in some embodiments, the loss function may include an L2 loss based on the predicted direction data of each geometric primitive and the direction data of the corresponding point or line of the geometric primitive in the reference outline, used to indicate the direction generation network 23 The accuracy of the obtained directions of each geometric primitive, the smaller the L2 loss, indicates that the direction data of the geometric primitives obtained by the direction generation network 23 is more accurate. Specifically, assuming that a certain initial shape includes N3 primitives, the direction data of the i-th primitive is dr i , and the direction data of the point or line corresponding to the i-th primitive in the reference outline is dr si , then the direction is generated The L2 loss L 23-L2 of network 23 can be expressed as the following formula (3):
Figure PCTCN2022092810-appb-000003
Figure PCTCN2022092810-appb-000003
又例如,在一些实施例中,损失函数可以包括基于回归形状中的各几何基元和参考轮廓中对应的点或线的坐标残差得到的相对形状损失(Relative Shape Loss),用于评估回归形状的准确性。相对形状损失可以通过初始形状中所有点的形状损失的平均值、和等来表示,其中,折线中的非路口点的形状损失可以为该点到参考轮廓的投影距离、折线中路口点的形状损失可以为该点到参考轮廓中对应的参考路口点的距离、多边形中的点的形状损失可以是该点到参考轮廓的投影距离。相对形状损失用于指示地图 元素的回归形状和参考轮廓的相似度,相对形状损失越低,说明利用形状回归网络2得到的回归形状与对应的参考轮廓的相似度越高、回归形状的精度越高。For another example, in some embodiments, the loss function may include a relative shape loss (Relative Shape Loss) obtained based on the coordinate residuals of each geometric primitive in the regression shape and the corresponding point or line in the reference contour, used to evaluate the regression Shape accuracy. The relative shape loss can be expressed by the average, sum, etc. of the shape loss of all points in the initial shape, where the shape loss of the non-intersection point in the polyline can be the projection distance of the point to the reference contour, the shape of the intersection point in the polyline The loss can be the distance from the point to the corresponding reference intersection point in the reference contour. The shape loss of a point in the polygon can be the projection distance from the point to the reference contour. The relative shape loss is used to indicate the similarity between the regression shape of the map element and the reference contour. The lower the relative shape loss, the higher the similarity between the regression shape obtained by using the shape regression network 2 and the corresponding reference contour, and the higher the accuracy of the regression shape. high.
再例如,在一些实施例中,损失函数中可以包括角度约束损失,以提高回归形状的规则性。在一些实施例中,角度约束损失L TV可以表示为如下公式(4)。 For another example, in some embodiments, an angle constraint loss may be included in the loss function to improve the regularity of the regression shape. In some embodiments, the angle constraint loss L TV can be expressed as the following formula (4).
Figure PCTCN2022092810-appb-000004
Figure PCTCN2022092810-appb-000004
公式(4)中,N4为初始形状中的角的数量,
Figure PCTCN2022092810-appb-000005
为初始形状中各角的平均角度,α k为第k个角的角度。可以理解,L TV越小,说明回归形状越规则。
In formula (4), N4 is the number of corners in the initial shape,
Figure PCTCN2022092810-appb-000005
is the average angle of each corner in the initial shape, α k is the angle of the k-th corner. It can be understood that the smaller L TV is, the more regular the regression shape is.
可以理解,在另一些实施例中,损失函数还可以包括其他损失,例如用于提高回归形状的光滑性的光滑约束损失等,在此不做限定。It can be understood that in other embodiments, the loss function may also include other losses, such as smooth constraint loss used to improve the smoothness of the regression shape, etc., which are not limited here.
S805:基于损失函数判断是否满足终止条件。S805: Determine whether the termination condition is met based on the loss function.
电子设备基于损失函数判断是否满足终止条件,如果满足,则说明回归形状满足要求,转至步骤S806;否则说明回归形状不满足要求,转至步骤S807。The electronic device determines whether the termination condition is met based on the loss function. If it is met, it means that the regression shape meets the requirements and goes to step S806; otherwise, it means that the regression shape does not meet the requirements and goes to step S807.
可以理解,终止条件可以包括以下条件中的至少一个:各网络的损失函数都收敛、各网络的损失函数值都小于对应的预设损失函数值。It can be understood that the termination condition may include at least one of the following conditions: the loss function of each network converges, and the loss function value of each network is less than the corresponding preset loss function value.
可以理解,终止条件也可以包括总损失函数小于总损失函数阈值或总损失函数收敛,其中,总损失函数可以由前述步骤S805中的各损失函数加权求和得到。It can be understood that the termination condition may also include that the total loss function is less than the total loss function threshold or the total loss function converges, wherein the total loss function can be obtained by the weighted sum of each loss function in the aforementioned step S805.
可以理解,在另一些实施例中,终止条件还可以包括其他条件,在此不做限定。It can be understood that in other embodiments, the termination condition may also include other conditions, which are not limited here.
S806:存储网络参数,得到形状回归网络2。S806: Store network parameters and obtain shape regression network 2.
电子设备存储形状回归网络2的网络参数,得到形状回归网络2。The electronic device stores the network parameters of the shape regression network 2 to obtain the shape regression network 2.
S807:调整网络参数,进行下一轮训练。S807: Adjust network parameters and conduct the next round of training.
电子设备在确定出不满足终止条件的情况下,调整形状回归网络2的网络参数,进行下一轮训练。例如,在形状回归网络2中的各网络的损失函数都不满足对应的终止条件时,可以对各网络的网络参数都进行调整;又例如,在只有部分网络的损失函数不满足对应的终止条件时,可以只调整该部分网络的网络参数,进行下一轮训练;再例如,在总损失函数不满足对应的终止条件时,可以调整各网络的至少部分网络参数,进行下一轮训练。When the electronic device determines that the termination condition is not met, the electronic device adjusts the network parameters of the shape regression network 2 and performs the next round of training. For example, when the loss functions of each network in shape regression network 2 do not meet the corresponding termination conditions, the network parameters of each network can be adjusted; for another example, when the loss functions of only some networks do not meet the corresponding termination conditions When , you can only adjust the network parameters of this part of the network and perform the next round of training; for another example, when the total loss function does not meet the corresponding termination condition, you can adjust at least part of the network parameters of each network and perform the next round of training.
下面介绍拓扑重建网络3的训练过程。The training process of topology reconstruction network 3 is introduced below.
图9根据本申请的一些实施例,示出了一种拓扑重建网络3的结构示意图。如图9所示,拓扑重建网络3包括池化网络31、特征编码网络32和关系推理网络33。Figure 9 shows a schematic structural diagram of a topology reconstruction network 3 according to some embodiments of the present application. As shown in Figure 9, the topology reconstruction network 3 includes a pooling network 31, a feature encoding network 32 and a relationship reasoning network 33.
其中,池化网络31用于对初始形状中的各几何基元的图像特征和方向数据进行插值,得到各几何基元的池化特征。具体可以参考前述池化网络21的相关描述,在此不做赘述。Among them, the pooling network 31 is used to interpolate the image features and direction data of each geometric primitive in the initial shape to obtain the pooling features of each geometric primitive. For details, reference may be made to the relevant description of the aforementioned pooling network 21, which will not be described again here.
特征编码网络32用于对初始形状中各几何基元的池化特征进行重新编码,例如丢弃对拓扑关系推理影响较小的特征、提取对拓扑关系推理影响较大的特征等,得到各几何基元的推理编码特征。在一些实施例中,特征编码网络32可以包括多头注意力网络。The feature encoding network 32 is used to re-encode the pooled features of each geometric primitive in the initial shape, such as discarding features that have a small impact on topological relationship reasoning, extracting features that have a greater impact on topological relationship reasoning, etc., to obtain each geometric basis. Meta-inferential encoding features. In some embodiments, feature encoding network 32 may include a multi-head attention network.
关系推理网络33用于根据各几何基元的推理编码特征,得到初始形状中各几何基元两两间的拓扑关系。其中,在初始形状为多边形的情况下,几何基元为线段,几何基元两两间的拓扑关系包括:共线、平行等;在初始形状为折线的情况下,几何基元为点,各几何基元间的关系包括连接/不连接。在一些实施例中,关系推理网络33可以包括卷积网络、BN网络、激活网络等。The relational reasoning network 33 is used to obtain the topological relationship between each geometric primitive in the initial shape based on the inference encoding characteristics of each geometric primitive. Among them, when the initial shape is a polygon, the geometric primitives are line segments, and the topological relationships between two geometric primitives include: collinear, parallel, etc.; when the initial shape is a polyline, the geometric primitives are points, and each The relationship between geometric primitives includes connection/disconnection. In some embodiments, the relational reasoning network 33 may include a convolutional network, a BN network, an activation network, etc.
可以理解,在一些实施例中,在训练拓扑重建网络3的过程中,关系推理网络33得到的几何基元 间的预测拓扑关系可以用于计算交叉熵损失和监督对比损失、用于与几何基元的方向数据结合来计算几何属性与关系一致性损失等损失函数,并基于损失函数来调整拓扑重建网络3的网络参数,以提高拓扑重建网络3得到的几何基元间的预测拓扑关系的准确性,具体计算方法将在下文进行介绍,在此不做赘述。It can be understood that in some embodiments, during the process of training the topology reconstruction network 3, the predicted topological relationships between geometric primitives obtained by the relational reasoning network 33 can be used to calculate cross-entropy loss and supervised contrast loss, and to compare with the geometric basis. The direction data of the elements are combined to calculate loss functions such as geometric attributes and relationship consistency loss, and the network parameters of the topology reconstruction network 3 are adjusted based on the loss function to improve the accuracy of the predicted topological relationships between geometric primitives obtained by the topology reconstruction network 3. The specific calculation method will be introduced below and will not be described in detail here.
具体地,图10根据本申请的一些实施例,示出了一种拓扑重建网络3的训练流程示意图。该流程的执行主体为电子设备,如图10所示,该流程包括如下步骤。Specifically, FIG. 10 shows a schematic diagram of the training process of the topology reconstruction network 3 according to some embodiments of the present application. The execution subject of this process is electronic equipment, as shown in Figure 10. This process includes the following steps.
S1001:将几何基元的图像特征和方向数据进行特征池化,得到几何基元的池化特征。S1001: Perform feature pooling on the image features and direction data of the geometric primitives to obtain the pooled features of the geometric primitives.
电子设备对初始形状中的几何基元的图像特征和方向数据进行特征池化等,得到各几何基元的池化特征,具体可以参考步骤S801的相关描述,在此不做赘述。The electronic device performs feature pooling on the image features and direction data of the geometric primitives in the initial shape to obtain the pooled features of each geometric primitive. For details, please refer to the relevant description of step S801, which will not be described again here.
可以理解,对于以多边形表示的地图元素,电子设备可以将回归形状的图像特征和方向数据进行特征池化,得到几何基元的池化特征。It can be understood that for map elements represented by polygons, the electronic device can perform feature pooling on the image features and direction data of the regression shape to obtain the pooled features of the geometric primitives.
S1002:利用特征编码网络32对几何基元的池化特征进行编码,得到推理编码特征。S1002: Use the feature encoding network 32 to encode the pooled features of the geometric primitives to obtain inference encoding features.
电子设备利用特征编码网络32对初始形状的池化特征进行编码,得到各几何基元的推理编码特征,具体可以参考前述步骤S802,在此不做赘述。The electronic device uses the feature encoding network 32 to encode the pooling features of the initial shape to obtain the inference encoding features of each geometric primitive. For details, please refer to the aforementioned step S802, which will not be described again here.
S1003:基于推理编码特征,利用关系推理网络33得到几何基元间的预测拓扑关系。S1003: Based on the inference coding features, use the relational inference network 33 to obtain the predicted topological relationship between geometric primitives.
电子设备将各几何基元的编码特征输入到关系推理网络33中,得到初始形状中各几何基元间的预测拓扑关系。The electronic device inputs the encoding features of each geometric primitive into the relational reasoning network 33 to obtain the predicted topological relationship between each geometric primitive in the initial shape.
在一些实施例中,初始形状中各几何基元间的预测拓扑关系可以用矩阵R来表示,矩阵R的大小为K×K,其中K为初始形状中几何基元的数量,矩阵R中的元素R(i,j)用于指示第i个几何基元与第j个几何基元间的拓扑关系,例如连接、共线、平行等。In some embodiments, the predicted topological relationship between each geometric primitive in the initial shape can be represented by a matrix R. The size of the matrix R is K×K, where K is the number of geometric primitives in the initial shape, and in the matrix R The element R(i,j) is used to indicate the topological relationship between the i-th geometric primitive and the j-th geometric primitive, such as connection, collinearity, parallelism, etc.
例如,参考图11,由8个点P1、P2、P3、P4、P5、P6、P7、P8组成的初始形状的几何基元间的预测拓扑关系可以是大小为8×8的矩阵,第i行第j列的元素R(i,j)表示点Pi和点Pj的拓扑关系。例如,R(3,5)=0表示点P3和点P5的拓扑关系为不连接、R(3,4)=1表示点P3和点P4的拓扑关系为连接、R(1,1)=-1表示点P1和自身不存在拓扑关系。For example, referring to Figure 11, the predicted topological relationship between the geometric primitives of the initial shape consisting of 8 points P1, P2, P3, P4, P5, P6, P7, P8 can be a matrix of size 8×8, i The element R(i,j) in row j-th column represents the topological relationship between point Pi and point Pj. For example, R(3,5)=0 means that the topological relationship between point P3 and point P5 is not connected, R(3,4)=1 means that the topological relationship between point P3 and point P4 is connected, R(1,1)= -1 indicates that there is no topological relationship between point P1 and itself.
可以理解,在一些实施例中,对于以折线方式表示初始形状,关系推理网络33可以根据各几何基元的推理编码特征再次进行特征提取,得到各几何基元在隐空间(Hidden Space,也称特征空间)的特征(以下称为隐空间特征),通过计算各几何基元与某一基元的隐空间特征间的距离,并与该几何基元的隐空间距离最小的预设数量个几何基元与该几何基元的拓扑关系设置为连接。It can be understood that in some embodiments, for expressing the initial shape in a polyline manner, the relational reasoning network 33 can perform feature extraction again according to the inference encoding features of each geometric primitive to obtain the hidden space (Hidden Space, also known as Hidden Space) of each geometric primitive. feature space) (hereinafter referred to as latent space features), by calculating the distance between each geometric primitive and the latent space feature of a certain primitive, and determining the preset number of geometries with the smallest distance from the latent space of this geometric primitive. The topological relationship between the primitive and this geometric primitive is set to connected.
S1004:基于初始形状中各几何基元间的预测拓扑关系计算损失函数。S1004: Calculate the loss function based on the predicted topological relationship between each geometric primitive in the initial shape.
电子设备基于关系推理网络33得到的各几何基元间的拓扑关系计算损失函数。The electronic device calculates the loss function based on the topological relationship between each geometric primitive obtained by the relational reasoning network 33 .
例如,参考图11,在一些实施例中,损失函数可以包括各几何基元的预测拓扑关系与参考拓扑关系的交叉熵损失L CEL,该损失基于初始形状中各几何基元间的预测拓扑关系和各几何基元的参考拓扑关系确定。例如,在一些实施例中,L CEL可以通过如下公式(5)计算。 For example, referring to Figure 11, in some embodiments, the loss function may include a cross-entropy loss LCEL between the predicted topological relationship of each geometric primitive and the reference topological relationship, the loss is based on the predicted topological relationship between each geometric primitive in the initial shape and the reference topological relationship of each geometric primitive is determined. For example, in some embodiments, LCEL can be calculated by the following formula (5).
Figure PCTCN2022092810-appb-000006
Figure PCTCN2022092810-appb-000006
公式(5)中,N5为初始形状中几何基元的数量,R ij为第i个几何基元和第j个几何基元的预测拓扑关系值(例如前述矩阵R的第i行第j列的元素值),R0 ij为第i个几何基元和第j个几何基元的参考拓扑关系。例如,基于公式(5),图11中的折线对几何属性与关系一致性损失L CEL=-18。 In formula (5), N5 is the number of geometric primitives in the initial shape, R ij is the predicted topological relationship value of the i-th geometric primitive and the j-th geometric primitive (for example, the i-th row and j-th column of the aforementioned matrix R element value), R0 ij is the reference topological relationship between the i-th geometric primitive and the j-th geometric primitive. For example, based on formula (5), the polyline in Figure 11 has a consistency loss of geometric attributes and relationships L CEL =-18.
又例如,在一些实施例中,在初始形状为多边形的情况下,损失函数还可以包括几何属性与关系的 一致性损失L C,用于表征几何基元的属性和几何基元间的拓扑关系的一致性,在训练过程中,通过降低L C,可以提高关系推理网络33确定出的预测拓扑关系的正确率。在一些实施例中,L C可以通过如下公式(6)计算。 For another example, in some embodiments, when the initial shape is a polygon, the loss function may also include a consistency loss LC of geometric attributes and relationships, which is used to characterize the attributes of geometric primitives and the topological relationship between geometric primitives. consistency, during the training process, by reducing L C , the accuracy of the predicted topological relationship determined by the relational reasoning network 33 can be improved. In some embodiments, LC can be calculated by the following formula (6).
Figure PCTCN2022092810-appb-000007
Figure PCTCN2022092810-appb-000007
公式(6)中N6为初始形状中几何基元的数量;c i为第i个几何基元的属性;c j为第j个几何基元的属性;tr为几何基元间的拓扑关系为r的两个几何基元的属性对应的理想距离,例如在r表示两个几何基元的拓扑关系为平行或共线时,几何基元的属性可以包括几何基元的方向数据,例如切线方向,在两条线段平行或共线时,该两条线段的切线方向应该相同,从而tr应当为0。 In formula (6), N6 is the number of geometric primitives in the initial shape; c i is the attribute of the i-th geometric primitive; c j is the attribute of the j-th geometric primitive; tr is the topological relationship between geometric primitives: The ideal distance corresponding to the attributes of the two geometric primitives of r. For example, when r indicates that the topological relationship between the two geometric primitives is parallel or collinear, the attributes of the geometric primitive can include the direction data of the geometric primitive, such as the tangent direction. , when two line segments are parallel or collinear, the tangent directions of the two line segments should be the same, so tr should be 0.
再例如,在一些实施例中,损失函数还可以包括监督对比损失(Supervised Contrastive Loss),以便于提高关系推理网络33在根据各几何基元的推理编码特征确定几何基元间的拓扑关系过程中,提取的各几何基元在隐空间特征与推理得到的几何基元间的拓扑关系的一致性。也就是说,通过使监督对比损失满足终止条件,例如小于预设的监督对比损失或该监督对比损失函数收敛,可以使得具有连接、共线等拓扑关系的几何基元的隐空间特征也相似,从而在将预测图像集中的地图元素的几何基元对应的推理编码特征输入到关系推理网络33时,使得关系推理网络33提取的各几何基元的隐空间特征与推理结果的一致性更好,提高预测的拓扑关系的精度。具体地,在一些实施例中,监督对比损失L SCL可以通过如下公式(7)进行计算。 For another example, in some embodiments, the loss function may also include a supervised contrastive loss (Supervised Contrastive Loss), in order to improve the relationship reasoning network 33 in the process of determining the topological relationship between geometric primitives based on the inference encoding characteristics of each geometric primitive. , the consistency of the topological relationship between the latent space features of each extracted geometric primitive and the inferred geometric primitive. That is to say, by making the supervised contrast loss satisfy the termination condition, such as being less than the preset supervised contrast loss or the supervised contrast loss function converging, the latent space characteristics of geometric primitives with topological relationships such as connections and collinearities can also be similar. Therefore, when the inference coding features corresponding to the geometric primitives of the map elements in the predicted image set are input to the relational reasoning network 33, the latent space features of each geometric primitive extracted by the relational reasoning network 33 are more consistent with the reasoning results. Improve the accuracy of predicted topological relationships. Specifically, in some embodiments, the supervised contrast loss L SCL can be calculated by the following formula (7).
Figure PCTCN2022092810-appb-000008
Figure PCTCN2022092810-appb-000008
公式(7)中,I表示几何基元的集合;P(i)表示与第i个几何基元具有连接或共线关系的几何基元的集合,|P(i)|表示集合P(i)的势(即集合P(i)包括的元素的个数);A(i)表示与第i个几何基元不具有连接或共线关系的几何基元的集合;z i表示几何基元i的隐空间特征对应的向量;z p表示几何基元p的隐空间特征对应的向量;z α表示几何基元α的隐空间特征对应的向量;τ为标量温度参数(scalar temperature parameter),是一个正实数域超参数(即τ∈R +),可以由开发人员预先设定;·表示向量点积。从公式(7)可知,监督对比损失的值越小,具有连接或共线关系的几何基元的隐空间特征对应的向量的相似度越大、不具有连接或共线关系的几何基元的隐空间特征对应的向量的相似度越小。 In formula (7), I represents the set of geometric primitives; P(i) represents the set of geometric primitives that have a connection or collinear relationship with the i-th geometric primitive, |P(i)| represents the set P(i ) (that is, the number of elements included in the set P(i)); A(i) represents the set of geometric primitives that do not have a connection or collinear relationship with the i-th geometric primitive; z i represents the geometric primitive The vector corresponding to the latent space feature of i; z p represents the vector corresponding to the latent space feature of geometric primitive p; z α represents the vector corresponding to the latent space feature of geometric primitive α; τ is the scalar temperature parameter, Is a positive real domain hyperparameter (i.e. τ∈R + ), which can be preset by developers; ·Represents the vector dot product. It can be seen from formula (7) that the smaller the value of the supervised contrast loss, the greater the similarity of the vectors corresponding to the latent space features of the geometric primitives with connected or collinear relationships, and the greater the similarity of the vectors corresponding to the geometric primitives that do not have connected or collinear relationships. The smaller the similarity of the vectors corresponding to the latent space features.
可以理解,在另一些实施例中,也可以通过其他方式来计算监督对比损失,在此不做限定。It can be understood that in other embodiments, the supervised contrast loss can also be calculated in other ways, which is not limited here.
可以理解,在另一些实施例中,损失函数也可以包括更多的损失函数,在此不做限定。It can be understood that in other embodiments, the loss function may also include more loss functions, which is not limited here.
可以理解,在一些实施例中,各几何基元的参考拓扑关系可以动态计算,即是先确定出参考轮廓中各几何基元中的点的参考点,并以参考点间的拓扑关系作为各几何基元的拓扑关系。It can be understood that in some embodiments, the reference topological relationship of each geometric primitive can be calculated dynamically, that is, the reference points of the points in each geometric primitive in the reference outline are first determined, and the topological relationship between the reference points is used as the reference point of each geometric primitive. Topological relationships of geometric primitives.
S1005:基于损失函数判断是否满足终止条件。S1005: Determine whether the termination condition is met based on the loss function.
电子设备基于损失函数,判断是否满足终止条件,如果是,说明拓扑重建网络3得到的预测拓扑关系满足要求,转至步骤S1006;否则,说明拓扑重建网络3得到的预测拓扑关系不满足要求,转至步骤S1007。Based on the loss function, the electronic device determines whether the termination condition is met. If so, it means that the predicted topology relationship obtained by the topology reconstruction network 3 meets the requirements, and go to step S1006; otherwise, it means that the predicted topology relationship obtained by the topology reconstruction network 3 does not meet the requirements, and go to step S1006. Go to step S1007.
可以理解,在一些实施例中,电子设备可以在各损失函数都收敛或各损失函数值都小于对应的预设损失函数值的情况下,确定满足终止条件。It can be understood that in some embodiments, the electronic device may determine that the termination condition is met when each loss function converges or each loss function value is less than the corresponding preset loss function value.
在另一些实施例中,电子设备在步骤S1005中确定的损失函数有多个的情况下,可以在由该多个损失函数的加权求和得到的总损失函数收敛或小于预设总损失函数值的情况下,确定满足终止条件。例如,在损失函数包括交叉熵损失L CEL、几何属性与关系的一致性损失L C和监督对比损失L SCL的情况下, 总损失函数可以表示为λ 3L CEL4L C5L SCL,其中λ 3表示交叉熵损失L CEL的权重、λ 4表示几何属性与关系的一致性损失L C的权重、λ 5表示监督对比损失L SCL的权重,λ 3、λ 4、λ 5可以由开发人员预先设定。 In other embodiments, when the electronic device determines multiple loss functions in step S1005, the total loss function obtained by the weighted sum of the multiple loss functions can converge or be less than the preset total loss function value. , it is determined that the termination conditions are met. For example, in the case where the loss function includes the cross-entropy loss L CEL , the consistency loss of geometric attributes and relations L C and the supervised contrast loss L SCL , the total loss function can be expressed as λ 3 L CEL + λ 4 L C + λ 5 L SCL , where λ 3 represents the weight of the cross entropy loss L CEL , λ 4 represents the weight of the consistency loss of geometric attributes and relationships LC , λ 5 represents the weight of the supervision contrast loss L SCL , λ 3 , λ 4 , λ 5 Can be preset by the developer.
S1006:存储网络参数,得到拓扑重建网络3。S1006: Store network parameters and obtain topology reconstruction network 3.
电子设备在确定出满足终止条件的情况下,存储拓扑重建网络3的网络参数,得到拓扑重建网络。When the electronic device determines that the termination conditions are met, the electronic device stores the network parameters of the topology reconstruction network 3 to obtain the topology reconstruction network.
S1007:调整网络参数,进行下一轮训练。S1007: Adjust network parameters and conduct the next round of training.
电子设备在确定出不满足终止条件的情况下,调整拓扑重建网络3的网络参数,进行下一轮训练。When the electronic device determines that the termination conditions are not met, the electronic device adjusts the network parameters of the topology reconstruction network 3 and performs the next round of training.
通过前述图3至图11所示的实施例所示的训练过程,即可训神经网络模型0的网络参数,并基于该网络参数,对目标区域的预测图像集中的预测图像进行推理,得到各预测图像的矢量化地图。具体地,可以先利用形状初始化网络1得到预测图像中的地图元素的初始形状,再利用形状回归网络2对初始形状进行调整,得到精度更高的回归形状,然后利用拓扑重建网络3得到回归形状中的各几何基元间的拓扑关系,最后通过后处理模块4连接回归形状中的各几何基元,得到地图元素的矢量化地图。Through the training process shown in the embodiments shown in Figure 3 to Figure 11, the network parameters of the neural network model 0 can be trained, and based on the network parameters, the predicted images in the predicted image set of the target area are inferred to obtain each Vectorized map of predicted images. Specifically, the shape initialization network 1 can be used to first obtain the initial shape of the map element in the predicted image, and then the shape regression network 2 can be used to adjust the initial shape to obtain a regression shape with higher accuracy, and then the topology reconstruction network 3 can be used to obtain the regression shape. The topological relationship between the geometric primitives in the vector is finally connected through the post-processing module 4 to connect the geometric primitives in the regression shape to obtain a vectorized map of the map elements.
由此可见,参考图12,神经网络模型0的训练过程和利用训练好的神经网络模型0对图像进行推理的过程可以不对称。对于以多边形表示的地图元素,由于多边形相对于折线更为复杂,在训练过程中,是将回归形状作为拓扑重建网络3的输入来训练拓扑重建网络3,而对于以折线表示的地图元素的输入,由于折线相对简单,将初始形状的折线作为拓扑重建网络3的输入来训练拓扑重建网络3,由于初始形状较回归形状精度更差,可以使得拓扑重建网络3在输入数据精度较低的情况下仍然得到正确的预测结果,如此可以提高了拓扑重建网络3的抗噪声能力,提升了拓扑重建网络3的稳定性。It can be seen from this that, referring to Figure 12, the training process of the neural network model 0 and the process of inferring the image using the trained neural network model 0 may be asymmetric. For map elements represented by polygons, since polygons are more complex than polylines, during the training process, the regression shape is used as the input of the topology reconstruction network 3 to train the topology reconstruction network 3. For map elements represented by polylines, the input , since the polyline is relatively simple, the polyline of the initial shape is used as the input of the topology reconstruction network 3 to train the topology reconstruction network 3. Since the initial shape has worse accuracy than the regression shape, the topology reconstruction network 3 can be used when the input data accuracy is low. Correct prediction results are still obtained, which can improve the anti-noise ability of the topology reconstruction network 3 and improve the stability of the topology reconstruction network 3.
下面介绍利用前述训练好的神经网络模型0来生成矢量地图的过程。The following introduces the process of generating vector maps using the previously trained neural network model 0.
图13根据本申请的一些实施例,示出了一种地图生成方法的流程示意图。该流程的执行主体为电子设备,如图13所示,该流程包括如下步骤。Figure 13 shows a schematic flowchart of a map generation method according to some embodiments of the present application. The execution subject of this process is electronic equipment, as shown in Figure 13. This process includes the following steps.
S1301:利用形状初始化网络1得到预测图像中的地图元素的初始形状。S1301: Use the shape initialization network 1 to obtain the initial shape of the map element in the predicted image.
电子设备将预测图像输入到形状初始化网络1中,利用语义分割网络11提取预测图像的图像特征,然后利用掩膜生成网络12得到预测图像中的地图元素的轮廓掩膜,然后利用边缘提取网络13提取轮廓掩膜的掩膜边缘,最后利用形状生成网络14对掩膜边缘进行简化,得到地图基元的初始形状。例如,参考前述图2,将图像IM2输入到形状初始化网络1后,可以得到图像IM2中的房屋和道路的初始形状。The electronic device inputs the predicted image into the shape initialization network 1, uses the semantic segmentation network 11 to extract image features of the predicted image, then uses the mask generation network 12 to obtain the contour mask of the map elements in the predicted image, and then uses the edge extraction network 13 The mask edges of the contour mask are extracted, and finally the shape generation network 14 is used to simplify the mask edges to obtain the initial shape of the map primitive. For example, referring to the aforementioned Figure 2, after inputting the image IM2 to the shape initialization network 1, the initial shapes of the houses and roads in the image IM2 can be obtained.
S1302:利用形状回归网络2对初始形状进行推理,得到初始形状的回归形状、几何基元的方向数据。S1302: Use the shape regression network 2 to infer the initial shape and obtain the regression shape of the initial shape and the direction data of the geometric primitives.
电子设备将地图元素的初始形状输入到形状回归网络2,利用池化网络21、特征编码网络22得到初始形状中的几何基元的回归编码特征,再利用方向生成网络23得到几何基元的方向数据、利用形状调整网络24对初始形状进行调整,得到精度更高、形状更规则的回归形状。The electronic device inputs the initial shape of the map element into the shape regression network 2, uses the pooling network 21 and the feature encoding network 22 to obtain the regression encoding features of the geometric primitives in the initial shape, and then uses the direction generation network 23 to obtain the direction of the geometric primitives. Data, the shape adjustment network 24 is used to adjust the initial shape to obtain a regression shape with higher accuracy and more regular shape.
S1303:利用拓扑重建网络3得到回归形状中几何基元间的拓扑关系。S1303: Use topological reconstruction network 3 to obtain the topological relationship between geometric primitives in the regression shape.
电子设备将回归形状输入到拓扑重建网络3,利用池化网络31、特征编码网络32得到回归形状中各几何基元的推理编码特征,再利用关系推理网络33得到回归形状中几何基元间的拓扑关系。例如,将前述图像IM2中道路的回归形状输入到拓扑重建网络3后,可以得到如图11所示的拓扑关系。The electronic device inputs the regression shape into the topology reconstruction network 3, uses the pooling network 31 and the feature encoding network 32 to obtain the inference coding features of each geometric primitive in the regression shape, and then uses the relational reasoning network 33 to obtain the relationship between the geometric primitives in the regression shape. topological relationship. For example, after inputting the regression shape of the road in the aforementioned image IM2 into the topology reconstruction network 3, the topological relationship shown in Figure 11 can be obtained.
S1304:基于回归形状、几何基元间的拓扑关系、几何基元的方向数据,利用后处理模块4得到矢量地图。S1304: Based on the regression shape, the topological relationship between the geometric primitives, and the direction data of the geometric primitives, use the post-processing module 4 to obtain the vector map.
电子设备基于回归形状、几何基元间的拓扑关系、几何基元的方向数据,利用后处理模块4得到矢量地图。The electronic device uses the post-processing module 4 to obtain the vector map based on the regression shape, the topological relationship between the geometric primitives, and the direction data of the geometric primitives.
在一些实施例中,在回归形状为多边形的情况下,后处理模块4可以先将回归形状中每条线段旋转至与线段的方向数据相同,例如,参考图14,回归形状中的线段S1S2和线段S2S3的方向,与前述形状回归网络2生成的方向(水平)不一致,后处理模块可以将线段S1S2顺时针旋转至水平、将线段S2S3逆时针旋转至水平。In some embodiments, when the regression shape is a polygon, the post-processing module 4 may first rotate each line segment in the regression shape to be the same as the direction data of the line segment. For example, with reference to Figure 14, line segments S1S2 and S2 in the regression shape The direction of line segment S2S3 is inconsistent with the direction (horizontal) generated by the aforementioned shape regression network 2. The post-processing module can rotate line segment S1S2 clockwise to horizontal and rotate line segment S2S3 counterclockwise to horizontal.
然后,利用后处理模块4将由于调整线段方向而不再相连的线段连接起来(即将每一条线段的端点和与其端点相邻的线段中,距离端点最近的点相连),得到封闭的多边形,例如,参考图14,在将线段S1S2、线段S2S3旋转至水平后,线段S1S2'、线段S2S3'和线段S3S4不再相连,后处理模块4将线段S1S2'的端点S2'与相邻的线段S2S3'的端点S2相连、将线段S2S3'的端点S3'与相邻的线段S3S4的端点S3相连。Then, use the post-processing module 4 to connect the line segments that are no longer connected due to the adjustment of the line segment direction (that is, connect the end point of each line segment to the point closest to the end point among the line segments adjacent to its end point) to obtain a closed polygon, such as , referring to Figure 14, after rotating the line segment S1S2 and the line segment S2S3 to the level, the line segment S1S2', the line segment S2S3' and the line segment S3S4 are no longer connected, and the post-processing module 4 connects the end point S2' of the line segment S1S2' with the adjacent line segment S2S3' Connect the endpoint S2 of the line segment S2S3' to the endpoint S3 of the adjacent line segment S3S4.
最后,后处理模块4可以删除得到的封闭多边形中长度小于预设边长阈值的线段,在删除一条线段时,可以确定两端相连的两条线是否平行/共线,如果平行/共线,则两条线段合并为一条线段,否则延长两线段至相交,如此,可以通过设置不同的预设边长阈值,来调整输出的多边形的简洁性。例如,参考图14,线段S2'S2、线段S3S3'、线段S6S7、线段S9S10的长度小于预设边长阈值,可以删除线段S2'S2、线段S3S3'、线段S6S7和线段S9S10,由于线段S1S2'、线段S2S3'和线段S3S4方向相同,可以合并为一条线段S1S4,由于线段S5S6和S7S8不平行也不共线、线段S8S9和S10S11不平行,则将延长线段S7S8至与线段S5S6相交得到线段S6S8、延长线段S10S11与线段S8S9相交得到线段S9S11,得到形状规则的矢量地图。Finally, the post-processing module 4 can delete the line segments in the obtained closed polygon whose length is less than the preset side length threshold. When deleting a line segment, it can determine whether the two lines connected at both ends are parallel/collinear. If they are parallel/collinear, Then the two line segments are merged into one line segment, otherwise the two line segments are extended until they intersect. In this way, the simplicity of the output polygon can be adjusted by setting different preset side length thresholds. For example, referring to Figure 14, the lengths of line segments S2'S2, S3S3', S6S7, and S9S10 are less than the preset side length threshold, and the line segments S2'S2, S3S3', S6S7, and S9S10 can be deleted. Since the line segment S1S2' , line segment S2S3' and line segment S3S4 have the same direction and can be merged into one line segment S1S4. Since line segments S5S6 and S7S8 are neither parallel nor collinear, and line segments S8S9 and S10S11 are not parallel, line segment S7S8 will be extended to intersect with line segment S5S6 to obtain line segment S6S8, Extend the line segment S10S11 and intersect the line segment S8S9 to obtain the line segment S9S11, and obtain a vector map with a regular shape.
在一些实施例中,在回归形状为折线的情况下,后处理模块4可以基于回归形状中的点,以及各点间的拓扑关系,将有连接关系的点进行连接,得到矢量折线。In some embodiments, when the regression shape is a polyline, the post-processing module 4 can connect points with connection relationships based on the points in the regression shape and the topological relationship between the points to obtain a vector polyline.
通过本申请实施例提供的方法,由于神经网络模型0是基于学习目标区域的地图元素的几何特征得到,可以基于目标区域的遥感图像,得到精度更高的矢量地图。此外,对于不同的目标区域,通过标记该不同的目标区域的遥感图像中的地图元素的参考轮廓,并重新训练神经网络模型0,即可使用训练好的神经网络模型0,基于该不同的目标区域的遥感图像得到该不同的目标区域的矢量地图,而无需进行复杂的启发式规则设置、参数调校,在大尺度地图的构建场景,例如对包括多个地区、城市或国家的区域进行地图矢量化的场景中,在确保矢量地图精度的同时,可以提高矢量地图的生成效率。Through the method provided by the embodiments of this application, since the neural network model 0 is obtained based on learning the geometric characteristics of the map elements of the target area, a vector map with higher accuracy can be obtained based on the remote sensing image of the target area. In addition, for different target areas, by marking the reference contours of map elements in the remote sensing images of the different target areas and retraining the neural network model 0, the trained neural network model 0 can be used, based on the different targets The remote sensing images of the area can be used to obtain vector maps of different target areas without the need for complex heuristic rule settings and parameter adjustments. In large-scale map construction scenarios, such as mapping areas that include multiple regions, cities or countries In vectorized scenarios, while ensuring the accuracy of vector maps, the efficiency of vector map generation can be improved.
为了进一步验证本申请提供的地图生成方法的精度,利用公开数据集中的遥感图像进行了验证。In order to further verify the accuracy of the map generation method provided in this application, remote sensing images in public data sets were used for verification.
首先,基于开源数据集CrowdAI中的数据集对本申请实施例提供的地图生成方法对房屋的矢量化结果与当前精度较高的SOTA算法的效果进行对比,结果如表1所示。First, based on the data set in the open source data set CrowdAI, the vectorization results of houses provided by the map generation method provided in the embodiment of this application are compared with the effect of the current SOTA algorithm with higher accuracy. The results are shown in Table 1.
表1 CrowdAI数据集上的测试结果Table 1 Test results on the CrowdAI data set
方法method 平均正切角度误差mean tangent angle error
SOTA算法SOTA algorithm 31.9°31.9°
本申请this application 26.7°26.7°
从表1可见,SOTA算法对房屋的矢量化结果的平均正切角度误差(Mean max tangent angle errors)为31.9°,而本申请提供的地图生成方法对房屋矢量化结果的平均正切角度误差为26.7°,提高了16.3%。其中,平均正切角度误差,是指推理得到的不同遥感图像中的矢量地图中的线与对应的参考线的正切角度误差的平均值,该误差值越低,说明模型对房屋矢量化的精度越高。例如,假设利用某一模型对N6张遥感图像进行了矢量化,每一张遥感图像中的各线段与对应的参考线段的方向差的最大值为dtan(i),则该模型对该N6张遥感图像的平均正切角度误差可以记为
Figure PCTCN2022092810-appb-000009
As can be seen from Table 1, the mean max tangent angle errors (Mean max tangent angle errors) of the house vectorization results of the SOTA algorithm are 31.9°, while the mean max tangent angle errors of the house vectorization results of the map generation method provided by this application are 26.7°. , an increase of 16.3%. Among them, the average tangent angle error refers to the average tangent angle error between the lines in the vector maps in different remote sensing images and the corresponding reference lines inferred. The lower the error value, the more accurate the model is in vectorizing houses. high. For example, assuming that a certain model is used to vectorize N6 remote sensing images, and the maximum value of the direction difference between each line segment in each remote sensing image and the corresponding reference line segment is dtan(i), then the model will vectorize the N6 remote sensing images. The average tangent angle error of remote sensing images can be recorded as
Figure PCTCN2022092810-appb-000009
进一步,图15根据本申请的一些实施例,示出了一种利用神经网络模型0对部分遥感图像中的房屋进行矢量化的结果示意图。从图15可以看到语义分割网络11得到的图像中的地图元素的轮廓掩膜与 房屋的实际轮廓相差较大,而通过形状回归网络2,得到的回归形状与房屋的实际轮廓相似度较高,进而得到的矢量地图中的房屋也和房屋的实际形状相似度较高。Furthermore, FIG. 15 shows a schematic diagram of the result of vectorizing houses in part of remote sensing images using neural network model 0 according to some embodiments of the present application. It can be seen from Figure 15 that the contour mask of the map element in the image obtained by the semantic segmentation network 11 is quite different from the actual contour of the house, and through the shape regression network 2, the regression shape obtained is highly similar to the actual contour of the house. , and then the houses in the vector map obtained are also highly similar to the actual shapes of the houses.
此外,还基于开源数据集SpaceNet3_Road对本申请实施例提供的地图生成方法对道路的矢量化结果与当前精度较高的Sat2Graph算法进行了对比,结果如表2所示。In addition, based on the open source data set SpaceNet3_Road, the road vectorization results of the map generation method provided in the embodiment of this application were compared with the current Sat2Graph algorithm with higher accuracy. The results are shown in Table 2.
表2 SpaceNet3_Road数据集上的测试结果Table 2 Test results on SpaceNet3_Road data set
方法method 模型大小Model size 拓扑结构相似度topological similarity 平均路径长度相似度average path length similarity
Sat2Graph算法Sat2Graph algorithm 200M200M 80.9780.97 64.4364.43
本申请this application 100M100M 86.6386.63 67.6767.67
从表2可见,本申请的神经网络模型较Sat2Graph算法的模型占用空间更小,得到的矢量地图与参考矢量地图的拓扑结构相似度(Topology Similarity)更高、平均路径长度相似度(Average Path Length Similarity,APLS)也更高。其中,拓扑结构相似度用于指利用模型推理得到的矢量路网与参考矢量路网的拓扑结构的相似度,得分越高,说明模型得到的矢量路网的精度越高;APLS用于指示模型推理得到的矢量路网中的线与参考矢量路网中的线的相似度,得分越高,说明利用模型得到的矢量路网的精度越高。It can be seen from Table 2 that the neural network model of this application occupies less space than the model of the Sat2Graph algorithm. The obtained vector map has a higher topology similarity (Topology Similarity) and a higher average path length similarity (Average Path Length) with the reference vector map. Similarity, APLS) is also higher. Among them, topological similarity refers to the similarity between the topological structure of the vector road network obtained by model inference and the reference vector road network. The higher the score, the higher the accuracy of the vector road network obtained by the model; APLS is used to indicate the model The similarity between the lines in the inferred vector road network and the lines in the reference vector road network. The higher the score, the higher the accuracy of the vector road network obtained by using the model.
进一步,图16根据本申请的一些实施例,示出了一种利用神经网络模型0对遥感图像中的道路进行矢量化的结果示意图。从图16可以看到由神经网络模型0得到的折线中的点的方向与参考道路的方向一致性较好。Further, Figure 16 shows a schematic diagram of the result of vectorizing roads in remote sensing images using neural network model 0 according to some embodiments of the present application. It can be seen from Figure 16 that the direction of the points in the polyline obtained by the neural network model 0 is consistent with the direction of the reference road.
图17A和图17B根据本申请的一些实施例,示出了利用神经网络模型0对部分较复杂的遥感图像中的道路的重建效果示意图。从图17A和图17B可见,本申请实施例提供的地图生成方法得到的矢量路网与遥感地图中的道路中心重合度较高,说明得到的矢量地图的精度较高。17A and 17B are schematic diagrams showing the reconstruction effect of some relatively complex roads in remote sensing images using neural network model 0 according to some embodiments of the present application. It can be seen from Figure 17A and Figure 17B that the vector road network obtained by the map generation method provided by the embodiment of the present application has a high degree of coincidence with the road center in the remote sensing map, indicating that the accuracy of the obtained vector map is high.
可以理解,前述各实施例中使用遥感图像对本申请的技术方案进行介绍只是一种示例,本申请实施例的技术方案也可以适用于对其他任意包括地图元素的图像(例如照片、航拍图像等)中的地图元素进行矢量化。It can be understood that the use of remote sensing images in the foregoing embodiments to introduce the technical solutions of the present application is only an example. The technical solutions of the embodiments of the present application can also be applied to any other images including map elements (such as photos, aerial images, etc.) Vectorize the map elements in .
进一步,本申请实施例还提供了一种地图生成装置,用于实现前述各实施例提供的地图生成方法。Furthermore, embodiments of the present application also provide a map generation device for implementing the map generation method provided by the foregoing embodiments.
具体地,图18根据本申请的一些实施例,示出了一种地图生成装置200的结果示意图。如图18所示,地图生成装置200包括:数据获取单元201、初始形状生成单元202、形状回归单元203、拓扑重建单元204和后处理单元205。Specifically, FIG. 18 shows a schematic diagram of the results of the map generation device 200 according to some embodiments of the present application. As shown in FIG. 18 , the map generation device 200 includes: a data acquisition unit 201 , an initial shape generation unit 202 , a shape regression unit 203 , a topology reconstruction unit 204 and a post-processing unit 205 .
其中,数据获取单元201,用于获取某一区域的图像,该所述图像中包括地图元素,其中,地图元素是图像中待转换为矢量地图的元素。Among them, the data acquisition unit 201 is used to acquire an image of a certain area, and the image includes map elements, where the map elements are elements in the image to be converted into vector maps.
初始形状生成单元202,用于利用第一模型(例如前述形状初始化网络1)对图像进行推理,得到地图元素对应的第一几何图形,第一几何图形中包括几何基元。具体可以参考前述步骤S1301的相关描述,在此不做赘述。The initial shape generation unit 202 is configured to use a first model (such as the aforementioned shape initialization network 1) to perform inference on the image to obtain a first geometric figure corresponding to the map element, where the first geometric figure includes geometric primitives. For details, reference may be made to the relevant description of step S1301, which will not be described again here.
形状回归单元203,用于基于第一几何图形输入第二模型(例如前述形状回归网络2)以得到各几何基元的方向,以及基于第一几何图形得到地图元素对应的第二几何图形,第二几何图形中包括与第一几何图形相同的几何基元,且第二几何图形中的几何基元位置排布与第一几何图形中的几何基元位置排布不同。具体可以参考前述步骤S1302的相关描述,在此不做赘述。The shape regression unit 203 is used to input a second model (such as the aforementioned shape regression network 2) based on the first geometric figure to obtain the direction of each geometric primitive, and to obtain the second geometric figure corresponding to the map element based on the first geometric figure. The second geometric figure includes the same geometric primitives as the first geometric figure, and the position arrangement of the geometric primitives in the second geometric figure is different from the position arrangement of the geometric primitives in the first geometric figure. For details, reference may be made to the relevant description of step S1302, which will not be described again here.
拓扑重建单元204,用于利用第三模型(例如前述拓扑重建网络3),基于几何基元的方向、第二几何图形得到各几何基元间的拓扑关系。具体可以参考前述步骤S1303的相关描述,在此不做赘述。The topology reconstruction unit 204 is configured to use a third model (such as the aforementioned topology reconstruction network 3) to obtain the topological relationship between each geometric primitive based on the direction of the geometric primitive and the second geometric figure. For details, reference may be made to the relevant description of step S1303, which will not be described again here.
后处理单元205基于各几何基元间的拓扑关系、各几何基元的方向、第二几何图形,得到图像对应 的矢量地图。例如,在一些实施例中,后处理单元205可以用于执行前述后处理模块4的相关操作,具体可以参考前述步骤S1304的相关描述,在此不做赘述。The post-processing unit 205 obtains a vector map corresponding to the image based on the topological relationship between each geometric primitive, the direction of each geometric primitive, and the second geometric figure. For example, in some embodiments, the post-processing unit 205 may be used to perform related operations of the aforementioned post-processing module 4. For details, reference may be made to the related description of the aforementioned step S1304, which will not be described again here.
可以理解,图18所示的地图生成装置200的结构只是一种示意,在另一些实施例中,地图生成装置200还可以包括更多或更少的单元,也可以合并或拆分部分单元,在此不做限定。It can be understood that the structure of the map generation device 200 shown in Figure 18 is only a schematic. In other embodiments, the map generation device 200 may also include more or less units, or some units may be merged or split. No limitation is made here.
可以理解,上述各实施例中,用于训练神经网络模型0的电子设备或用于利用神经网络模型0进行推理的电子设备可以是能够进行神经网络模型训练或推理的任意电子设备,包括但不限于膝上型计算机、台式计算机、平板计算机、服务器等,在此不做限定。以下以电子设备100为例说明用于训练神经网络模型0的电子设备或用于利用神经网络模型0进行推理的电子设备的结构。具体地,图19根据本申请的一些实施例,示出了一种用于执行本申请实施例的电子设备100的结构示意图。电子设备100可以包括一个或多个处理器101、系统内存102、非易失性存储器(Non-Volatile Memory,NVM)103、输入/输出(I/O)设备104、通信接口105、以及用于耦接处理器101、系统内存102、非易失性存储器103、输入/输出(I/O)设备104和通信接口105的系统控制逻辑106。其中:It can be understood that in the above embodiments, the electronic device used to train the neural network model 0 or the electronic device used to perform inference using the neural network model 0 can be any electronic device capable of training or inferring the neural network model, including but not Limited to laptops, desktops, tablets, servers, etc., without limitation here. The following uses the electronic device 100 as an example to illustrate the structure of an electronic device used to train the neural network model 0 or to perform inference using the neural network model 0. Specifically, FIG. 19 shows a schematic structural diagram of an electronic device 100 for executing embodiments of the present application according to some embodiments of the present application. The electronic device 100 may include one or more processors 101, system memory 102, non-volatile memory (NVM) 103, input/output (I/O) devices 104, communication interface 105, and System control logic 106 couples processor 101, system memory 102, non-volatile memory 103, input/output (I/O) devices 104, and communication interface 105. in:
处理器101可以包括一个或多个处理单元,例如:处理器101可以包括中央处理器(central processing unit,CPU)、应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。在一些实施例中,处理器101可以用于执行相关指令,用于训练前述神经网络模型0或利用训练好的神经网络模型0对遥感图像进行推理。The processor 101 may include one or more processing units. For example, the processor 101 may include a central processing unit (CPU), an application processor (AP), a modem processor, a graphics processor ( graphics processing unit (GPU), image signal processor (ISP), controller, video codec, digital signal processor (DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc. Among them, different processing units can be independent devices or integrated in one or more processors. In some embodiments, the processor 101 can be used to execute relevant instructions for training the aforementioned neural network model 0 or using the trained neural network model 0 to perform inference on remote sensing images.
特别地,在一些实施例中,NPU可以用于运行神经网络模型0的相关指令来对图像进行语义分割、生成地图元素的轮廓掩膜、生成轮廓掩膜的掩膜轮廓、生成地图元素的初始形状/回归形状、生成几何基元的方向数据/拓扑关系等。In particular, in some embodiments, the NPU can be used to run related instructions of the neural network model 0 to perform semantic segmentation of the image, generate a contour mask of the map element, generate a mask outline of the contour mask, and generate an initialization of the map element. Shape/regression shape, direction data/topological relationship of generated geometric primitives, etc.
系统内存102是易失性存储器,例如随机存取存储器(Random-Access Memory,RAM),双倍数据率同步动态随机存取存储器(Double Data Rate Synchronous Dynamic Random Access Memory,DDR SDRAM)等。系统内存用于临时存储数据和/或指令,例如,在一些实施例中,系统内存102可以用于临时存储神经网络模型0的网络参数、样本图像集、对神经网络模型0进行训练或利用神经网络模型0进行推理过程中的中间数据、存储矢量地图等。System memory 102 is a volatile memory, such as random access memory (Random-Access Memory, RAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), etc. The system memory is used to temporarily store data and/or instructions. For example, in some embodiments, the system memory 102 can be used to temporarily store network parameters of the neural network model 0, sample image sets, train the neural network model 0, or utilize neural network model 0. Network model 0 performs intermediate data in the inference process, stores vector maps, etc.
非易失性存储器103可以包括用于存储数据和/或指令的一个或多个有形的、非暂时性的计算机可读介质。在一些实施例中,非易失性存储器103可以包括闪存等任意合适的非易失性存储器和/或任意合适的非易失性存储设备,例如硬盘驱动器(Hard Disk Drive,HDD)、光盘(Compact Disc,CD)、数字通用光盘(Digital Versatile Disc,DVD)、固态硬盘(Solid-State Drive,SSD)等。在一些实施例中,非易失性存储器103也可以是可移动存储介质,例如安全数字(Secure Digital,SD)存储卡等。在另一些实施例中,非易失性存储器103可以用于永久存储神经网络模型0的网络参数、样本图像集、对神经网络模型0进行训练或利用神经网络模型0进行推理过程中的中间数据、存储矢量地图等。Non-volatile memory 103 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions. In some embodiments, the non-volatile memory 103 may include any suitable non-volatile memory such as flash memory and/or any suitable non-volatile storage device, such as a hard disk drive (Hard Disk Drive, HDD), optical disk ( Compact Disc (CD), Digital Versatile Disc (DVD), Solid-State Drive (SSD), etc. In some embodiments, the non-volatile memory 103 may also be a removable storage medium, such as a Secure Digital (Secure Digital, SD) memory card, etc. In other embodiments, the non-volatile memory 103 can be used to permanently store network parameters of the neural network model 0, sample image sets, intermediate data in the process of training the neural network model 0 or using the neural network model 0 for inference. , store vector maps, etc.
特别地,系统内存102和/或非易失性存储器103可以包括指令107的副本。指令107在被处理器101中的至少一个执行时,使电子设备100通过本申请的实施例提供的方法来训练神经网络模型0中的全部或至少一个部分,或利用神经网络模型0进行推理。In particular, system memory 102 and/or non-volatile storage 103 may include copies of instructions 107 . When executed by at least one of the processors 101, the instructions 107 cause the electronic device 100 to train all or at least a part of the neural network model 0 through the method provided by the embodiment of the present application, or use the neural network model 0 to perform inference.
输入/输出(I/O)设备104可以包括用户界面,使得用户能够与电子设备100进行交互,例如选择或输入样本图像集,对样本图像集中的地图元素进行标记等。Input/output (I/O) device 104 may include a user interface that enables a user to interact with electronic device 100, such as selecting or inputting a sample image set, marking map elements in the sample image set, etc.
网络接口105可以包括收发器,用于为电子设备100提供有线或无线通信接口,进而通过一个或多个网络与任意其他合适的设备进行通信。在一些实施例中,电子设备100可以通过网络接口105与其他电子设备建立通信连接,从其他电子设备获取样本图像集、预测图像集等。 Network interface 105 may include a transceiver for providing a wired or wireless communications interface for electronic device 100 to communicate with any other suitable device over one or more networks. In some embodiments, the electronic device 100 can establish a communication connection with other electronic devices through the network interface 105 to obtain sample image sets, predicted image sets, etc. from other electronic devices.
系统控制逻辑106可以包括任意合适的接口控制器,以为电子设备100的其他模块提供任意合适的接口。例如在一些实施例中,系统控制逻辑106可以包括一个或多个存储器控制器,以提供处理器101连接到系统内存102和非易失性存储器103的接口。又例如,在另一些实施例中,系统控制逻辑106可以包括至少一个外设部件互连标准(Peripheral Component Interconnect,PCI)控制器,以提供处理器101利用PCI总线,连接到通过PCI接口连接于电子设备100的设备/器件/模块(例如显卡、声卡等)的接口。System control logic 106 may include any suitable interface controller to provide any suitable interfaces to other modules of electronic device 100 . For example, in some embodiments, system control logic 106 may include one or more memory controllers to provide an interface for processor 101 to system memory 102 and non-volatile memory 103 . For another example, in other embodiments, the system control logic 106 may include at least one Peripheral Component Interconnect (PCI) controller to provide the processor 101 to use the PCI bus to connect to the computer connected to the computer through the PCI interface. Interfaces of devices/devices/modules (such as graphics cards, sound cards, etc.) of the electronic device 100.
在一些实施例中,处理器101中的至少一个可以与用于系统控制逻辑106的一个或多个控制器的逻辑封装在一起,以形成系统封装(System in Package,SiP)。在另一些实施例中,处理器101中的至少一个还可以与用于系统控制逻辑106的一个或多个控制器的逻辑集成在同一芯片上,以形成片上系统(System-on-Chip,SoC)。In some embodiments, at least one of the processors 101 may be packaged with logic for one or more controllers of the system control logic 106 to form a system in package (SiP). In other embodiments, at least one of the processors 101 may also be integrated on the same chip with the logic of one or more controllers for the system control logic 106 to form a system-on-chip (SoC). ).
可以理解,电子设备100可以是能够进行深度学习模型训练的任意电子设备,包括但不限于膝上型计算机、台式计算机、平板计算机、服务器等,在此不做限定。It can be understood that the electronic device 100 can be any electronic device capable of deep learning model training, including but not limited to laptop computers, desktop computers, tablet computers, servers, etc., which are not limited here.
可以理解,本申请实施例示出的电子设备100的结构并不构成对电子设备100的具体限定。在另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that the structure of the electronic device 100 shown in the embodiment of the present application does not constitute a specific limitation on the electronic device 100 . In other embodiments, the electronic device 100 may include more or fewer components than illustrated, some components may be combined, some components may be separated, or components may be arranged differently. The components illustrated may be implemented in hardware, software, or a combination of software and hardware.
本申请公开的机制的各实施例可以被实现在硬件、软件、固件或这些实现方法的组合中。本申请的实施例可实现为在可编程系统上执行的计算机程序或程序代码,该可编程系统包括至少一个处理器、存储系统(包括易失性和非易失性存储器和/或存储元件)、至少一个输入设备以及至少一个输出设备。Various embodiments of the mechanisms disclosed in this application may be implemented in hardware, software, firmware, or a combination of these implementation methods. Embodiments of the present application may be implemented as a computer program or program code executing on a programmable system including at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements) , at least one input device and at least one output device.
可将程序代码应用于输入指令,以执行本申请描述的各功能并生成输出信息。可以按已知方式将输出信息应用于一个或多个输出设备。为了本申请的目的,处理系统包括具有例如数字信号处理器(DSP)、微控制器、专用集成电路(ASIC)或微处理器之类的处理器的任何系统。Program code may be applied to input instructions to perform the functions described herein and to generate output information. Output information can be applied to one or more output devices in a known manner. For the purposes of this application, a processing system includes any system having a processor such as a digital signal processor (DSP), microcontroller, application specific integrated circuit (ASIC), or microprocessor.
程序代码可以用高级程序化语言或面向对象的编程语言来实现,以便与处理系统通信。在需要时,也可用汇编语言或机器语言来实现程序代码。事实上,本申请中描述的机制不限于任何特定编程语言的范围。在任一情形下,该语言可以是编译语言或解释语言。Program code may be implemented in a high-level procedural language or an object-oriented programming language to communicate with the processing system. When necessary, assembly language or machine language can also be used to implement program code. In fact, the mechanisms described in this application are not limited to the scope of any particular programming language. In either case, the language may be a compiled or interpreted language.
在一些情况下,所公开的实施例可以以硬件、固件、软件或其任何组合来实现。所公开的实施例还可以被实现为由一个或多个暂时或非暂时性机器可读(例如,计算机可读)存储介质承载或存储在其上的指令,其可以由一个或多个处理器读取和执行。例如,指令可以通过网络或通过其他计算机可读介质分发。因此,机器可读介质可以包括用于以机器(例如,计算机)可读的形式存储或传输信息的任何机制,包括但不限于,软盘、光盘、光碟、只读存储器(CD-ROMs)、磁光盘、只读存储器(ROM)、随机存取存储器(RAM)、可擦除可编程只读存储器(EPROM)、电可擦除可编程只读存储器(EEPROM)、磁卡或光卡、闪存、或用于利用因特网以电、光、声或其他形式的传播信号来传输信息(例如,载波、红外信号数字信号等)的有形的机器可读存储器。因此,机器可读介质包括适合于以机器(例如,计算机)可读的形式存储或传输电子指令或信息的任何类型的机器可读介质。In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried on or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be operated by one or more processors Read and execute. For example, instructions may be distributed over a network or through other computer-readable media. Thus, machine-readable media may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy disks, optical disks, optical disks, read-only memories (CD-ROMs), magnetic Optical disk, read-only memory (ROM), random-access memory (RAM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), magnetic or optical card, flash memory, or Tangible machine-readable storage used to transmit information (e.g., carrier waves, infrared signals, digital signals, etc.) using electrical, optical, acoustic, or other forms of propagated signals over the Internet. Thus, machine-readable media includes any type of machine-readable media suitable for storing or transmitting electronic instructions or information in a form readable by a machine (eg, computer).
在附图中,可以以特定布置和/或顺序示出一些结构或方法特征。然而,应该理解,可能不需要这样的特定布置和/或排序。而是,在一些实施例中,这些特征可以以不同于说明性附图中所示的方式和/或顺序来布置。另外,在特定图中包括结构或方法特征并不意味着暗示在所有实施例中都需要这样的特 征,并且在一些实施例中,可以不包括这些特征或者可以与其他特征组合。In the drawings, some structural or methodological features may be shown in specific arrangements and/or orders. However, it should be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a different manner and/or order than shown in the illustrative figures. Additionally, the inclusion of structural or methodological features in a particular figure is not meant to imply that such features are required in all embodiments, and in some embodiments these features may not be included or may be combined with other features.
需要说明的是,本申请各设备实施例中提到的各单元/模块都是逻辑单元/模块,在物理上,一个逻辑单元/模块可以是一个物理单元/模块,也可以是一个物理单元/模块的一部分,还可以以多个物理单元/模块的组合实现,这些逻辑单元/模块本身的物理实现方式并不是最重要的,这些逻辑单元/模块所实现的功能的组合才是解决本申请所提出的技术问题的关键。此外,为了突出本申请的创新部分,本申请上述各设备实施例并没有将与解决本申请所提出的技术问题关系不太密切的单元/模块引入,这并不表明上述设备实施例并不存在其它的单元/模块。It should be noted that each unit/module mentioned in each device embodiment of this application is a logical unit/module. Physically, a logical unit/module can be a physical unit/module, or it can be a physical unit/module. Part of the module can also be implemented as a combination of multiple physical units/modules. The physical implementation of these logical units/modules is not the most important. The combination of functions implemented by these logical units/modules is what solves the problem of this application. Key technical issues raised. In addition, in order to highlight the innovative part of this application, the above-mentioned equipment embodiments of this application do not introduce units/modules that are not closely related to solving the technical problems raised by this application. This does not mean that the above-mentioned equipment embodiments do not exist. Other units/modules.
需要说明的是,在本专利的示例和说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in the examples and descriptions of this patent, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply There is no such actual relationship or sequence between these entities or operations. Furthermore, the terms "comprises," "comprises," or any other variations thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also those not expressly listed other elements, or elements inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a" does not exclude the presence of additional identical elements in a process, method, article, or device that includes the stated element.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited thereto. Any person familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present invention. should be covered by the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (16)

  1. 一种地图生成方法,应用于电子设备,其特征在于,所述方法包括:A map generation method, applied to electronic devices, characterized in that the method includes:
    获取某一区域的图像,所述图像中包括地图元素,其中,所述地图元素是所述图像中待转换为矢量地图的元素;Obtain an image of a certain area, where the image includes a map element, where the map element is an element in the image to be converted into a vector map;
    利用第一模型对所述图像进行推理,得到所述地图元素对应的第一几何图形,所述第一几何图形中包括几何基元;Using the first model to reason on the image, obtain a first geometric figure corresponding to the map element, where the first geometric figure includes geometric primitives;
    基于所述第一几何图形输入第二模型以得到各所述几何基元的方向,以及,基于所述第一几何图形得到所述地图元素对应的第二几何图形,所述第二几何图形中包括与所述第一几何图形相同的几何基元,且所述第二几何图形中的几何基元位置排布与所述第一几何图形中的几何基元位置排布不同;Input a second model based on the first geometric figure to obtain the direction of each geometric primitive, and obtain a second geometric figure corresponding to the map element based on the first geometric figure, in which It includes the same geometric primitives as the first geometric figure, and the position arrangement of the geometric primitives in the second geometric figure is different from the position arrangement of the geometric primitives in the first geometric figure;
    利用第三模型,基于所述几何基元的方向、所述第二几何图形得到各所述几何基元间的拓扑关系;Using a third model, the topological relationship between each of the geometric primitives is obtained based on the direction of the geometric primitives and the second geometric figure;
    基于各所述几何基元间的拓扑关系、各所述几何基元的方向、所述第二几何图形,得到所述图像对应的矢量地图。Based on the topological relationship between each of the geometric primitives, the direction of each of the geometric primitives, and the second geometric figure, a vector map corresponding to the image is obtained.
  2. 根据权利要求1所述的方法,其特征在于,所述第一模型、所述第二模型、所述第三模型中的至少一个是基于所述某一区域的地图元素的几何特征训练得到。The method according to claim 1, characterized in that at least one of the first model, the second model, and the third model is trained based on the geometric characteristics of the map elements of the certain area.
  3. 根据权利要求1所述的方法,其特征在于,在所述几何基元为线段的情况下,所述第二几何图形中还包括各所述几何基元的连接顺序;并且,The method according to claim 1, characterized in that, when the geometric primitive is a line segment, the second geometric figure also includes a connection sequence of each geometric primitive; and,
    所述基于各所述几何基元间的拓扑关系、各所述几何基元的方向、所述第二几何图形,得到所述图像对应的矢量地图,包括:The vector map corresponding to the image is obtained based on the topological relationship between the geometric primitives, the direction of each geometric primitive, and the second geometric figure, including:
    将所述第二几何图形中的第一几何基元的方向调整为与所述第一几何基元对应的所述方向相同,其中所述第一几何基元在第二几何图形中的方向与所述第一几何基元对应的所述方向不同;Adjust the direction of the first geometric primitive in the second geometric figure to be the same as the direction corresponding to the first geometric primitive, wherein the direction of the first geometric primitive in the second geometric figure is the same as The directions corresponding to the first geometric primitives are different;
    将所述第一几何基元和第二几何基元连接,得到所述第二几何图形对应的多边形,其中,所述第二几何基元的所述连接顺序与所述第一几何基元相邻。Connect the first geometric primitive and the second geometric primitive to obtain a polygon corresponding to the second geometric figure, wherein the connection sequence of the second geometric primitive is the same as that of the first geometric primitive. adjacent.
  4. 根据权利要求3所述的方法,其特征在于,所述第二几何图形对应的多边形中包括顺序连接的第一线段、第二线段和第三线段;并且,所述基于各所述几何基元间的拓扑关系、各所述几何基元的方向、所述第二几何图形,得到所述图像对应的矢量地图,还包括:The method according to claim 3, characterized in that the polygon corresponding to the second geometric figure includes a first line segment, a second line segment and a third line segment connected in sequence; and, based on each of the geometric bases, The topological relationship between elements, the direction of each geometric primitive, and the second geometric figure are used to obtain a vector map corresponding to the image, which also includes:
    在所述第二线段的长度小于预设边长阈值的情况下,删除所述第二线段;并且If the length of the second line segment is less than the preset side length threshold, delete the second line segment; and
    在所述第一线段和所述第三线段的拓扑关系为共线或平行的情况下,将所述第一线段和第二线段合并为一条线段;When the topological relationship between the first line segment and the third line segment is collinear or parallel, merge the first line segment and the second line segment into one line segment;
    在所述第一线段和所述第三线段的拓扑关系不是共线或平行的情况下,延长所述第一线段和/或所述第三线段,使所述第一线段和所述第三线段相交。When the topological relationship between the first line segment and the third line segment is not collinear or parallel, extend the first line segment and/or the third line segment so that the first line segment and the The third line segment intersects.
  5. 根据权利要求1所述的方法,其特征在于,在所述几何基元为点的情况下,所述基于各所述几何基元间的拓扑关系、各所述几何基元的方向、所述第二几何图形,得到所述图像对应的矢量地图,包括:The method according to claim 1, characterized in that when the geometric primitives are points, the method is based on the topological relationship between the geometric primitives, the direction of each geometric primitive, the For the second geometric figure, obtain the vector map corresponding to the image, including:
    将拓扑关系为连接的点进行连接,得到对应的矢量化折线。Connect the points whose topological relationship is connected to obtain the corresponding vectorized polyline.
  6. 根据权利要求1所述的方法,其特征在于,所述利用第一模型对图像进行推理,得到所述地图元素对应的第一几何图形,包括:The method according to claim 1, characterized in that said using the first model to reason on the image to obtain the first geometric figure corresponding to the map element includes:
    对所述图像进行语义分割,得到所述地图元素的轮廓掩膜,所述轮廓掩模用于指示所述图像中所述地图元素所在的区域;Perform semantic segmentation on the image to obtain a contour mask of the map element, where the contour mask is used to indicate the area where the map element is located in the image;
    提取所述轮廓掩膜的掩膜边缘;Extract the mask edge of the contour mask;
    简化所述掩膜边缘,得到所述第一几何图形。Simplify the mask edge to obtain the first geometric figure.
  7. 根据权利要求1所述的方法,其特征在于,所述地图元素包括房屋、道路、湖泊、海洋、河流、森林、沙漠中的至少一个;并且The method of claim 1, wherein the map elements include at least one of houses, roads, lakes, oceans, rivers, forests, and deserts; and
    房屋、湖泊、海洋、森林、沙漠对应的第一几何图形为多边形;The first geometric figure corresponding to houses, lakes, oceans, forests, and deserts is polygon;
    道路、河流对应的第一几何图形为折线。The first geometric figure corresponding to roads and rivers is polyline.
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 7, characterized in that the method further includes:
    通过以下方式训练所述第一模型:The first model is trained by:
    获取样本数据,所述样本数据中包括所述某一区域的样本图像集、所述样本图像集中各样本图像中的地图元素对应的参考轮廓;Obtain sample data, which includes a sample image set of a certain area and a reference outline corresponding to a map element in each sample image in the sample image set;
    利用所述第一模型对各样本图像的图像特征,并基于所述图像特征得到各所述样本图像中的地图元素的轮廓掩膜,所述轮廓掩膜指示所述地图元素在对应的样本图像中的区域;The first model is used to compare the image features of each sample image, and based on the image features, a contour mask of the map element in each sample image is obtained. The contour mask indicates that the map element is in the corresponding sample image. area in;
    基于所述轮廓掩膜,得到各所述样本图像中的地图元素对应的第一预测几何图形;Based on the contour mask, obtain the first predicted geometry corresponding to the map element in each of the sample images;
    基于第一损失函数值和第二损失函数值对所述第一模型进行训练,其中,所述第一损失函数用于指示所述轮廓掩膜的准确度,所述第二损失函数用于指示所述第一预测几何图形与所述参考轮廓的相似度。The first model is trained based on a first loss function value and a second loss function value, wherein the first loss function is used to indicate the accuracy of the contour mask, and the second loss function is used to indicate The similarity between the first predicted geometry and the reference contour.
  9. 根据权利要求1至7中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 7, characterized in that the method further includes:
    通过以下方式训练所述第二模型:The second model is trained by:
    获取样本数据,所述样本数据中包括所述某一区域的样本图像集中各样本图像中的地图元素对应的参考轮廓、所述参考轮廓中各几何基元对应的参考方向、利用所述第一模型得到的各样本图像中的地图元素对应的第三几何图形;Obtain sample data, which includes reference contours corresponding to map elements in each sample image in the sample image set of a certain area, reference directions corresponding to each geometric primitive in the reference contour, and using the first The third geometric figure corresponding to the map element in each sample image obtained by the model;
    利用第二模型,得到各所述样本图像中的各地图元素对应的第二预测几何图形,所述第三几何图形中的几何基元的预测方向,其中,所述第二预测几何图形包括和所述第三几何图形相同的几何基元,并且所述第二预测几何图形中的几何基元的排布方式与所述第三几何图形不同;Using the second model, obtain the second predicted geometric figure corresponding to each map element in each of the sample images, and the predicted direction of the geometric primitive in the third geometric figure, wherein the second predicted geometric figure includes and The third geometric figure has the same geometric primitives, and the arrangement of the geometric primitives in the second predicted geometric figure is different from that of the third geometric figure;
    基于第三损失函数和第四损失函数对所述第二模型进行训练,其中,所述第三损失函数用于指示所述第三几何图形中的几何基元的预测方向与对应的参考方向的相似度、所述第四损失函数用于指示所述第二预测几何图形与对应的参考轮廓间的相似度。The second model is trained based on a third loss function and a fourth loss function, wherein the third loss function is used to indicate the difference between the predicted direction of the geometric primitive in the third geometric figure and the corresponding reference direction. The similarity and the fourth loss function are used to indicate the similarity between the second predicted geometric figure and the corresponding reference contour.
  10. 根据权利要求1至7中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 7, characterized in that the method further includes:
    获取样本数据,所述样本数据中包括所述某一区域的样本图像集中,各样本图像的地图元素对应的参考轮廓中几何基元间的参考拓扑关系,以及利用所述第一模型得到的各样本图像中的地图元素对应的第四几何图形、所述第四几何图形中的几何基元的方向;Obtain sample data, which includes the sample image set of a certain area, the reference topological relationship between the geometric primitives in the reference outline corresponding to the map element of each sample image, and each of the geometric primitives obtained using the first model. The fourth geometric figure corresponding to the map element in the sample image, and the direction of the geometric primitive in the fourth geometric figure;
    利用所述第三模型,确定出所述第四几何图形中,各所述几何基元的隐空间特征,并基于所述隐空间特征,确定出所述第四几何图形中的几何基元间的预测拓扑关系;Using the third model, the latent space characteristics of each geometric primitive in the fourth geometric figure are determined, and based on the latent space characteristics, the inter-geometric primitives in the fourth geometric figure are determined. predicted topological relationships;
    基于第五损失函数和第六损失函数训练所述第三模型,其中,所述第五损失函数用于指示所述第四几何图形中的几何基元间的预测拓扑关系与对应的参考拓扑关系的匹配度,所述第六损失函数用于指示预测拓扑关系为平行、共线或连接的几何基元间的所述隐空间特征的相似度。The third model is trained based on a fifth loss function and a sixth loss function, wherein the fifth loss function is used to indicate the predicted topological relationship between the geometric primitives in the fourth geometric figure and the corresponding reference topological relationship. The matching degree, the sixth loss function is used to indicate the similarity of the latent space features between geometric primitives whose predicted topological relationships are parallel, collinear or connected.
  11. 一种模型训练方法,应用于电子设备,其特征在于,所述方法包括:A model training method, applied to electronic equipment, characterized in that the method includes:
    获取样本数据,所述样本数据中包括某一区域的样本图像集中各样本图像中地图元素对应的参考轮廓、各地图元素对应的第五几何图形或第六几何图形、所述第五几何图形中的几何基元的方向、以及所述第五几何图形中的几何基元的图像特征,其中,所述第五几何图形中的几何基元的图像特征,在利用第四模型推理得到各地图元素的第五几何图形时生成,所述第五几何图形与对应的参考轮廓的相似度低于所述第六几何图形与对应的参考轮廓的相似度,并且所述第五几何图形和第六几何图形具有相同的几 何基元;Obtain sample data, which includes the reference contour corresponding to the map element in each sample image in the sample image set of a certain area, the fifth geometric figure or sixth geometric figure corresponding to each map element, and the fifth geometric figure corresponding to the fifth geometric figure. The direction of the geometric primitives, and the image features of the geometric primitives in the fifth geometric figure, wherein the image features of the geometric primitives in the fifth geometric figure are obtained by reasoning with the fourth model to obtain each map element The fifth geometric figure is generated when the similarity between the fifth geometric figure and the corresponding reference outline is lower than the similarity between the sixth geometric figure and the corresponding reference outline, and the fifth geometric figure and the sixth geometric figure are Shapes have the same geometric primitives;
    基于将所述第五几何图形或第六几何图形、所述第五几何图形中的几何基元的图像特征、所述第五几何图形中的几何基元的方向输入到具有第一网络参数的第五模型,得到各所述几何基元对应的隐空间特征,并根据各所述几何基元对应的隐空间特征,推理得到各所述几何基元间的预测拓扑关系;Based on inputting the fifth geometric figure or the sixth geometric figure, the image features of the geometric primitives in the fifth geometric figure, and the direction of the geometric primitives in the fifth geometric figure into a network with first network parameters. The fifth model obtains the latent space characteristics corresponding to each of the geometric primitives, and infers the predicted topological relationship between each of the geometric primitives based on the latent space characteristics corresponding to each of the geometric primitives;
    基于所述第五几何图形中的几何基元间的预测拓扑关系和对应的参考拓扑关系,确定第七损失函数和第八损失函数,其中,所述参考拓扑关系可以基于各样本图像中地图元素对应的参考轮廓确定,所述第七损失函数用于指示所述第五几何图形中的几何基元间的预测拓扑关系和对应的参考拓扑关系的匹配度,所述第八损失函数用于指示预测拓扑关系为平行、共线或连接的几何基元间的所述隐空间特征的相似度;Determine the seventh loss function and the eighth loss function based on the predicted topological relationship between the geometric primitives in the fifth geometric figure and the corresponding reference topological relationship, wherein the reference topological relationship may be based on the map elements in each sample image The corresponding reference contour is determined, the seventh loss function is used to indicate the matching degree of the predicted topological relationship between the geometric primitives in the fifth geometric figure and the corresponding reference topological relationship, and the eighth loss function is used to indicate Predict the similarity of the latent space features between geometric primitives whose topological relationships are parallel, collinear or connected;
    在所述第七损失函数和所述第八损失函数满足终止条件的情况下,保存具有所述第一网络参数的第五模型;When the seventh loss function and the eighth loss function satisfy the termination condition, save the fifth model with the first network parameters;
    在所述第七损失函数和所述第八损失函数不满足终止条件的情况下,调整将所述第五模型的网络参数调整为第二网络参数,进行下一轮训练。When the seventh loss function and the eighth loss function do not satisfy the termination condition, the network parameters of the fifth model are adjusted to the second network parameters and the next round of training is performed.
  12. 根据权利要求11所述的方法,其特征在于,在所述第五几何图形中的几何基元为线段的情况下,通过以下方式确定所述第七损失函数和所述第八损失函数满足终止条件:The method according to claim 11, characterized in that, when the geometric primitive in the fifth geometric figure is a line segment, it is determined in the following manner that the seventh loss function and the eighth loss function satisfy the termination condition:
    基于所述第五几何图形中的几何基元的方向,确定出几何基元间的方向关系,以及拓扑关系对应的参考方向关系,确定出第九损失函数,所述第九损失函数用于指示各所述几何基元的预测拓扑关系与所述方向的一致性;Based on the directions of the geometric primitives in the fifth geometric figure, the directional relationship between the geometric primitives and the reference direction relationship corresponding to the topological relationship are determined, and a ninth loss function is determined, and the ninth loss function is used to indicate The predicted topological relationship of each geometric primitive is consistent with the direction;
    在所述第七损失函数、所述第八损失函数、所述第九损失函数都收敛,或所述第七损失函数、所述第八损失函数、所述第九损失函数都小于对应的预设损失函数值,或总损失函数收敛,或总损失函数小于对应的预设总损失函数值的情况下,确定满足终止条件,其中,所述总损失函数包括所述第七损失函数、所述第八损失函数、所述第九损失函数的加权和。When the seventh loss function, the eighth loss function, and the ninth loss function all converge, or the seventh loss function, the eighth loss function, and the ninth loss function are all smaller than the corresponding predetermined Assuming that the loss function value, or the total loss function converges, or the total loss function is less than the corresponding preset total loss function value, it is determined that the termination condition is met, wherein the total loss function includes the seventh loss function, the The eighth loss function and the weighted sum of the ninth loss function.
  13. 根据权利要求11所述的方法,其特征在于,所述基于所述第五几何图形或第六几何图形、所述第五几何图形中的几何基元的图像特征、所述第五几何图形中的几何基元的方向,得到各所述几何基元对应的隐空间特征,包括:The method according to claim 11, characterized in that the image features based on the fifth geometric figure or the sixth geometric figure, the geometric primitives in the fifth geometric figure, the The direction of the geometric primitives is used to obtain the latent space features corresponding to each geometric primitive, including:
    在所述第五几何图形的几何基元为点的情况下,基于所述第五几何图形、所述第五几何图形中的几何基元的图像特征、所述第五几何图形中的几何基元的方向,得到各所述几何基元对应的隐空间特征;In the case where the geometric primitive of the fifth geometric figure is a point, based on the fifth geometric figure, the image features of the geometric primitive in the fifth geometric figure, the geometric primitive in the fifth geometric figure, The direction of each geometric primitive is obtained to obtain the latent space characteristics corresponding to each geometric primitive;
    在所述第五几何图形的几何基元为线段的情况下,基于所述第六几何图形、所述第五几何图形中的几何基元的图像特征、所述第五几何图形中的几何基元的方向,得到各所述几何基元对应的隐空间特征。In the case where the geometric primitive of the fifth geometric figure is a line segment, based on the image features of the sixth geometric figure, the geometric primitive in the fifth geometric figure, the geometric primitive in the fifth geometric figure, The direction of the element is used to obtain the latent space characteristics corresponding to each geometric primitive.
  14. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质包括指令,在所述指令被电子设备执行时,使所述电子设备实现权利要求1至13中任一项所述的方法。A computer-readable storage medium, characterized in that the computer-readable storage medium includes instructions that, when executed by an electronic device, cause the electronic device to implement the method described in any one of claims 1 to 13 method.
  15. 一种电子设备,其特征在于,包括:An electronic device, characterized by including:
    存储器,用于存储由电子设备的一个或多个处理器执行的指令;以及memory for storing instructions for execution by one or more processors of the electronic device; and
    处理器,是所述电子设备的处理器之一,用于执行所述存储器中存储的所述指令以实现权利要求1至13中任一项所述的方法。The processor is one of the processors of the electronic device, and is configured to execute the instructions stored in the memory to implement the method according to any one of claims 1 to 13.
  16. 一种计算机程序产品,其特征在于,包括计算机程序/指令,所述计算机程序/指令被处理器执行时实现权利要求1至13中任一项所述的方法。A computer program product, characterized by comprising a computer program/instruction that implements the method of any one of claims 1 to 13 when executed by a processor.
PCT/CN2022/092810 2022-05-13 2022-05-13 Map generation method, model training method, readable medium, and electronic device WO2023216251A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202280090751.XA CN118613792A (en) 2022-05-13 2022-05-13 Map generation method, model training method, readable medium, and electronic device
PCT/CN2022/092810 WO2023216251A1 (en) 2022-05-13 2022-05-13 Map generation method, model training method, readable medium, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/092810 WO2023216251A1 (en) 2022-05-13 2022-05-13 Map generation method, model training method, readable medium, and electronic device

Publications (1)

Publication Number Publication Date
WO2023216251A1 true WO2023216251A1 (en) 2023-11-16

Family

ID=88729550

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/092810 WO2023216251A1 (en) 2022-05-13 2022-05-13 Map generation method, model training method, readable medium, and electronic device

Country Status (2)

Country Link
CN (1) CN118613792A (en)
WO (1) WO2023216251A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117422787A (en) * 2023-12-18 2024-01-19 中国人民解放军国防科技大学 Remote sensing image map conversion method integrating discriminant and generative model
CN118034070A (en) * 2024-04-15 2024-05-14 青岛杰瑞工控技术有限公司 Active and passive calibration compensation method for ocean monitoring equipment parameters combined with mechanical structure
CN118051576A (en) * 2024-02-27 2024-05-17 兰州交通大学 Micro map direction distance system
CN118097432A (en) * 2024-04-18 2024-05-28 厦门理工学院 Remote sensing image model estimation method based on second-order space consistency constraint
CN118521581A (en) * 2024-07-22 2024-08-20 南京万玺科技有限公司 Processing method and system for steel production line for automobile

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457512A (en) * 2018-05-08 2019-11-15 腾讯科技(深圳)有限公司 A kind of map-indication method, device, server, terminal and storage medium
CN110517334A (en) * 2018-05-21 2019-11-29 北京四维图新科技股份有限公司 A kind of method and device that map vector data obtains
CN110991452A (en) * 2019-12-03 2020-04-10 深圳市捷顺科技实业股份有限公司 Parking stall frame detection method, device, equipment and readable storage medium
CN112066997A (en) * 2020-08-25 2020-12-11 海南太美航空股份有限公司 Method and system for exporting high-definition route map
WO2021087985A1 (en) * 2019-11-08 2021-05-14 深圳市欢太科技有限公司 Model training method and apparatus, storage medium, and electronic device
WO2022000469A1 (en) * 2020-07-03 2022-01-06 Nokia Technologies Oy Method and apparatus for 3d object detection and segmentation based on stereo vision

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457512A (en) * 2018-05-08 2019-11-15 腾讯科技(深圳)有限公司 A kind of map-indication method, device, server, terminal and storage medium
CN110517334A (en) * 2018-05-21 2019-11-29 北京四维图新科技股份有限公司 A kind of method and device that map vector data obtains
WO2021087985A1 (en) * 2019-11-08 2021-05-14 深圳市欢太科技有限公司 Model training method and apparatus, storage medium, and electronic device
CN110991452A (en) * 2019-12-03 2020-04-10 深圳市捷顺科技实业股份有限公司 Parking stall frame detection method, device, equipment and readable storage medium
WO2022000469A1 (en) * 2020-07-03 2022-01-06 Nokia Technologies Oy Method and apparatus for 3d object detection and segmentation based on stereo vision
CN112066997A (en) * 2020-08-25 2020-12-11 海南太美航空股份有限公司 Method and system for exporting high-definition route map

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117422787A (en) * 2023-12-18 2024-01-19 中国人民解放军国防科技大学 Remote sensing image map conversion method integrating discriminant and generative model
CN117422787B (en) * 2023-12-18 2024-03-08 中国人民解放军国防科技大学 Remote sensing image map conversion method integrating discriminant and generative model
CN118051576A (en) * 2024-02-27 2024-05-17 兰州交通大学 Micro map direction distance system
CN118034070A (en) * 2024-04-15 2024-05-14 青岛杰瑞工控技术有限公司 Active and passive calibration compensation method for ocean monitoring equipment parameters combined with mechanical structure
CN118097432A (en) * 2024-04-18 2024-05-28 厦门理工学院 Remote sensing image model estimation method based on second-order space consistency constraint
CN118521581A (en) * 2024-07-22 2024-08-20 南京万玺科技有限公司 Processing method and system for steel production line for automobile

Also Published As

Publication number Publication date
CN118613792A (en) 2024-09-06

Similar Documents

Publication Publication Date Title
WO2023216251A1 (en) Map generation method, model training method, readable medium, and electronic device
US11880959B2 (en) Method for point cloud up-sampling based on deep learning
CN110427877B (en) Human body three-dimensional posture estimation method based on structural information
US11468262B2 (en) Deep network embedding with adversarial regularization
Littwin et al. Deep meta functionals for shape representation
Shi et al. Land-use/land-cover change detection based on class-prior object-oriented conditional random field framework for high spatial resolution remote sensing imagery
WO2022147736A1 (en) Virtual image construction method and apparatus, device, and storage medium
WO2020216033A1 (en) Data processing method and device for facial image generation, and medium
Zhu et al. AdaFit: Rethinking learning-based normal estimation on point clouds
CN110889015B (en) Independent decoupling convolutional neural network characterization method for graph data
Hui et al. Superpoint network for point cloud oversegmentation
CN113191387A (en) Cultural relic fragment point cloud classification method combining unsupervised learning and data self-enhancement
Li et al. Joint semantic-geometric learning for polygonal building segmentation
Yin et al. Sparse representation over discriminative dictionary for stereo matching
Pistilli et al. Learning robust graph-convolutional representations for point cloud denoising
Li et al. Neaf: Learning neural angle fields for point normal estimation
CN110838122B (en) Point cloud segmentation method and device and computer storage medium
WO2023164933A1 (en) Building modeling method and related apparatus
CN111310821A (en) Multi-view feature fusion method, system, computer device and storage medium
CN113129311A (en) Label optimization point cloud example segmentation method
CN116933141B (en) Multispectral laser radar point cloud classification method based on multicore graph learning
CN110991230B (en) Method and system for detecting ship by using remote sensing images in any direction based on rotation candidate frame
Lian et al. SORCNet: robust non-rigid shape correspondence with enhanced descriptors by Shared Optimized Res-CapsuleNet
Wang et al. Neural-imls: Self-supervised implicit moving least-squares network for surface reconstruction
US20220165029A1 (en) Computer Vision Systems and Methods for High-Fidelity Representation of Complex 3D Surfaces Using Deep Unsigned Distance Embeddings

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22941203

Country of ref document: EP

Kind code of ref document: A1