CN113129362B

CN113129362B - Method and device for acquiring three-dimensional coordinate data

Info

Publication number: CN113129362B
Application number: CN202110441111.6A
Authority: CN
Inventors: 赵珊珊
Original assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Current assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date: 2021-04-23
Filing date: 2021-04-23
Publication date: 2024-05-10
Anticipated expiration: 2041-04-23
Also published as: CN113129362A

Abstract

Disclosed are a method and apparatus for acquiring three-dimensional coordinate data, a computer readable storage medium, and an electronic device, the method comprising: determining at least one first semantic region in the image to be annotated; determining a plurality of first sampling pixel points in a second semantic region and at least one texture map of a reference three-dimensional model corresponding to the second semantic region based on first event information corresponding to each first semantic region, wherein the second semantic region is any one of the at least one first semantic region; acquiring a second sampling pixel point based on second event information corresponding to the second semantic region, wherein the second sampling pixel point is any one of the plurality of first sampling pixel points; and acquiring a first three-dimensional coordinate of the second sampling pixel point corresponding to the surface of the reference three-dimensional model based on third event information corresponding to the at least one texture map. According to the technical scheme, the three-dimensional coordinate data corresponding to the two-dimensional image can be obtained.

Description

Method and device for acquiring three-dimensional coordinate data

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and apparatus for acquiring three-dimensional coordinate data.

Background

In the field of computer vision, three-dimensional modeling of two-dimensional images is an important research point, and three-dimensional modeling requires three-dimensional coordinate data according to two-dimensional images, which can help to improve and optimize a three-dimensional model. Therefore, how to generate three-dimensional coordinate data corresponding to a two-dimensional image is a problem to be solved.

Disclosure of Invention

The present application has been made to solve the above-mentioned technical problems. The embodiment of the application provides a method and a device for acquiring three-dimensional coordinate data, a computer-readable storage medium and electronic equipment, which can acquire the three-dimensional coordinate data corresponding to a two-dimensional image.

According to an aspect of the present application, there is provided a method of acquiring three-dimensional coordinate data, including:

Determining at least one first semantic region in the image to be annotated;

Determining a plurality of first sampling pixel points in a second semantic region and at least one texture map of a reference three-dimensional model corresponding to the second semantic region based on first event information corresponding to each first semantic region, wherein the second semantic region is any one of the at least one first semantic region;

Acquiring a second sampling pixel point based on second event information corresponding to the second semantic region, wherein the second sampling pixel point is any one of the plurality of first sampling pixel points;

and acquiring a first three-dimensional coordinate of the second sampling pixel point corresponding to the surface of the reference three-dimensional model based on third event information corresponding to the at least one texture map.

According to a second aspect of the present application, there is provided an acquisition apparatus of three-dimensional coordinate data, comprising:

The first region determining module is used for determining at least one first semantic region in the image to be annotated;

The second area determining module is used for determining a plurality of first sampling pixel points in a second semantic area and at least one texture map of a reference three-dimensional model corresponding to the second semantic area based on first event information corresponding to each first semantic area, wherein the second semantic area is any one of the first semantic areas;

The pixel point determining module is used for acquiring a second sampling pixel point based on second event information corresponding to the second semantic region, wherein the second sampling pixel point is any one of the first sampling pixel points in the second semantic region;

And the coordinate determining module is used for acquiring the first three-dimensional coordinate of the second sampling pixel point corresponding to the reference three-dimensional model surface based on the third event information corresponding to the at least one texture map.

According to a third aspect of the present application, there is provided a computer-readable storage medium storing a computer program for executing the above-described three-dimensional coordinate data acquisition method.

According to a fourth aspect of the present application, there is provided an electronic device comprising:

A processor;

A memory for storing the processor-executable instructions;

The processor is configured to read the executable instruction from the memory, and execute the instruction to implement the method for acquiring three-dimensional coordinate data.

According to the three-dimensional coordinate data acquisition method, the three-dimensional coordinate data acquisition device, the computer-readable storage medium and the electronic equipment, the three-dimensional coordinate data corresponding to the two-dimensional image is acquired through the texture maps corresponding to the semantic areas in the two-dimensional image, so that a dense corresponding relation from the two-dimensional image to the three-dimensional model is established, and the problem of association between the two-dimensional image and the three-dimensional model is solved.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing embodiments thereof in more detail with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and together with the embodiments of the application, and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.

FIG. 1 is a scene graph of an image to be annotated and a second semantic region to which an exemplary embodiment of the present application applies.

FIG. 2A is a scene graph of a backward texture map corresponding to the second semantic region of FIG. 1.

FIG. 2B is a scene graph of the forward texture map corresponding to the second semantic region of FIG. 1.

FIG. 2C is a scene graph of the right-view texture map corresponding to the second semantic region of FIG. 1.

FIG. 2D is a scene graph of the left-view texture map corresponding to the second semantic region of FIG. 1.

FIG. 2E is a scene graph of the top-down texture map corresponding to the second semantic region of FIG. 1.

Fig. 3 is a flowchart of a method for acquiring three-dimensional coordinate data according to an exemplary embodiment of the present application.

Fig. 4 is a flowchart of step 32 in a method for acquiring three-dimensional coordinate data according to an exemplary embodiment of the present application.

Fig. 5 is a flowchart illustrating step 34 in a method for acquiring three-dimensional coordinate data according to an exemplary embodiment of the present application.

Fig. 6 is a flowchart illustrating step 341 in a method for acquiring three-dimensional coordinate data according to an exemplary embodiment of the present application.

Fig. 7 is a schematic structural diagram of an apparatus for acquiring three-dimensional coordinate data according to an exemplary embodiment of the present application.

Fig. 8 is a schematic diagram of a three-dimensional coordinate data acquisition device according to an exemplary embodiment of the present application.

Fig. 9 is a schematic structural diagram of a third coordinate determining unit 741 in a schematic structural diagram of an apparatus for acquiring three-dimensional coordinate data according to an exemplary embodiment of the present application.

Fig. 10 is a block diagram of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

Hereinafter, exemplary embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.

Summary of the application

In the field of computer vision, three-dimensional modeling of two-dimensional images is an important research point, and three-dimensional modeling requires accurate three-dimensional coordinate data, which can help to improve and optimize a three-dimensional model. Therefore, how to generate three-dimensional coordinate data corresponding to a two-dimensional image is a problem to be solved.

According to the method and the device, the texture mapping corresponding to the semantic region in the two-dimensional image is considered, so that the acquisition of the three-dimensional coordinate data corresponding to the two-dimensional image is realized, the dense corresponding relation between the two-dimensional image and the three-dimensional model is built, and the problem of association between the two-dimensional image and the three-dimensional model is solved.

Exemplary method

The present embodiment is applicable to electronic devices, and in particular, to various personal computers or servers. As shown in fig. 3, the method for acquiring three-dimensional coordinate data according to an exemplary embodiment of the present application at least includes the following steps:

Step 31, at least one first semantic region in the image to be annotated is determined.

In an embodiment, the image to be annotated can be understood as a two-dimensional image, which contains the two-dimensional target object. The two-dimensional target object may be a person, a building, a landscape, or the like, which is not limited herein, and the corresponding image to be annotated may be a person image, a building image, a landscape, or the like, and may be specifically determined in combination with an actual scene, and the person image is preferred. Here, the image to be annotated includes a number of first semantic regions, where the image to be annotated may be obtained by an image processing software tool, in one example, as shown in fig. 1.

In an embodiment, the first semantic region may be understood as a two-dimensional region corresponding to the three-dimensional target object, where the three-dimensional target object may be a part of the two-dimensional target object, and meanwhile, the first semantic region may be a three-dimensional target object with a different angle, and in an example, the two-dimensional target object is a woman, and then the three-dimensional target object may be a woman face, where the first semantic region may be a woman side face or a front face, and so on. In order to distinguish different semantemes of a plurality of first semantic areas, the first semantic areas carry semantic tags, and the semantic tags are used for describing three-dimensional target objects corresponding to the first semantic areas. The number of the first semantic regions is not limited herein, and meanwhile, considering that the two-dimensional image indicates the projection of the three-dimensional target object under different viewing angles, the plurality of first semantic regions generally correspond to the plurality of parts of the two-dimensional target object, and specifically needs to be determined in combination with the actual scene. In one example, the two-dimensional target object is a person, and the number of first semantic regions includes a face region, a hand region, a foot region, an arm region, a thigh region, a calf region, a back region, a chest region, and the like.

Step 32, determining a plurality of first sampling pixels in a second semantic region and at least one texture map of a reference three-dimensional model corresponding to the second semantic region based on the first event information corresponding to each first semantic region, wherein the second semantic region is any one of the at least one first semantic region.

In an embodiment, the first event information may be understood as external input information, specifically, a click position of an external input device within a certain first semantic area, where the external input device may be a mouse.

In an embodiment, the second semantic region is any one of the first semantic regions, i.e. one of the first semantic regions is selected as the second semantic region.

In an embodiment, the plurality of first sampling pixels in the second semantic region may be understood as key points selected from the plurality of pixels in the second semantic region, and the key points may be understood as projection points for constructing surface points of the three-dimensional target object corresponding to the second semantic region. The number of the first sampling pixels is not limited, and in the embodiment of the application, the number of the first sampling pixels is not limited, and in particular, the actual scene is required to be determined, for example, 336 key points (24 parts, 14 points for each part) are required when a human body three-dimensional model is generated by using a human body real-time gesture recognition system (DensePose) in the prior art, and the key points are the first sampling pixels.

In an embodiment, the reference three-dimensional model may be understood as a convex polygon representation of the three-dimensional target object corresponding to the second semantic region, i.e. one object consisting of a multitude of convex polygons, which may be triangles or quadrilaterals, typically quadrilaterals, wherein a convex polygon specifically refers to a polygon wherein none of the interior angles is a reflex angle (an angle larger than 180 degrees and smaller than 360 degrees).

In one embodiment, the texture map may be understood as a view texture map, and the view texture map may be understood as a two-dimensional color image obtained by projecting a reference three-dimensional model to two-dimensional texture coordinates at a certain view angle, and filling the corresponding map of the reference three-dimensional model into a corresponding texture coordinate domain. Wherein a map can be understood as a surface color image referencing a three-dimensional model. The two-dimensional texture coordinate system may represent texture coordinates of a pixel point, and a texture coordinate domain may be understood as a coordinate region formed by four texture coordinates, wherein the texture coordinates are generally expressed by letters as (u, v), which are also referred to as actual texture coordinates, where u represents a horizontal direction, i.e., a width of a texture, and v represents a vertical direction, i.e., a height of the texture.

In an embodiment, the reference three-dimensional model is determined based on the semantic tags carried by the second semantic region, and the same semantic tags correspond to the same reference three-dimensional model, so that the memory of the electronic equipment is saved, and the data processing efficiency is improved.

In one embodiment, the texture maps include any one or more of left-looking texture map, right-looking texture map, front-looking texture map, back-looking texture map, top-looking texture map and bottom-looking texture map of the reference three-dimensional model, wherein the front-looking texture map is understood as a color image obtained by projecting the reference three-dimensional model into a two-dimensional texture coordinate system from the front and filling the corresponding map of the reference three-dimensional model into the corresponding texture coordinate domain, and other texture maps are similar and will not be described herein. For example, if the semantic tag is a male head, the three-dimensional model is referred to as a male head, and the texture maps may be left-looking texture map, right-looking texture map, front-looking texture map, rear-looking texture map, and top-looking texture map of the male head, and fig. 2A to 2E may be referred to.

In an embodiment, the second semantic region is an enlarged view, thereby facilitating viewing of the second semantic region. Here, the different first sampling pixels may be distinguished by a display color, a number, or the like, that is, the display color of the first sampling pixels is different, or the number of the first sampling pixels is different, and it should be understood that the manner of distinguishing the different first sampling pixels is not limited to the above-described manner of displaying the color and the number.

For example, referring to fig. 1, an image to be marked is a back image of a person, an area 1 and an area 2 in the image to be marked each represent a first semantic area, a second semantic area may be obtained by clicking a mouse at any position in the area 1, and 16 first sampling pixels represented by A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, a back texture view shown in fig. 2A, a front texture view shown in fig. 2B, a right texture view shown in fig. 2C, a left texture view shown in fig. 2D, and a top texture view shown in fig. 2E may be obtained, where the second semantic area is an enlarged area 1, and the 16 letters are used to distinguish one expression mode of different first sampling pixels, or may be distinguished by using a numerical number, or may be distinguished by using different display colors.

Step 33, obtaining a second sampling pixel point based on the second event information corresponding to the second semantic region, where the second sampling pixel point is any one of the plurality of first sampling pixel points.

In an embodiment, the second event information may be understood as external input information, specifically, a click position of the external input device within the second semantic region, where the click position may be understood as a screen position of the selected first sampling pixel point, and the external input device may be a mouse.

In an embodiment, the second sampling pixel point is any one of the first sampling pixel points, i.e. any one of the first sampling pixel points is selected as the second sampling pixel point. Referring to fig. 1, if the first sampling pixel point J is in the selected state, J is the second sampling pixel point.

And step 34, acquiring a first three-dimensional coordinate corresponding to the second sampling pixel point on the surface of the reference three-dimensional model based on third event information corresponding to the at least one texture map.

In an embodiment, the third event information may be understood as external input information, specifically referring to a click position of the external input device on the texture map, where the external input device may be a mouse.

Based on the operation of texture mapping, a first three-dimensional coordinate corresponding to the image to be marked is obtained, and then a dense corresponding relation between the two-dimensional image and the reference three-dimensional model is established. The first three-dimensional coordinate is understood to be a three-dimensional texture coordinate, which is generally denoted by letters (u, v, w), w representing the vertical direction, i.e. the depth of the texture. The dense correspondence may be understood as a correspondence between a pixel point in a two-dimensional image and a surface point of a three-dimensional model, and specifically refers to a correspondence between a pixel point in a semantic region in an image to be annotated and a surface point of a reference three-dimensional model. And then, utilizing the determined three-dimensional coordinates of the semantic region to perform three-dimensional modeling of the semantic region.

According to the embodiment, through the plurality of sampling pixel points in the semantic region and the corresponding texture maps of the reference three-dimensional model, for any selected sampling pixel point, based on event information of the texture maps, three-dimensional coordinates of the sampling pixel points are determined, so that a dense corresponding relation from a two-dimensional image to the three-dimensional model is established, the position of the pixel points in the semantic region on the three-dimensional model can be determined based on the corresponding relation, association between the two-dimensional image and the three-dimensional model is achieved, and the problem of association between the two-dimensional image and the three-dimensional model is solved.

FIG. 4 is a flow chart illustrating the step of determining a plurality of first sampling pixels within a second semantic region in the embodiment illustrated in FIG. 3.

As shown in fig. 4, in an exemplary embodiment of the present application based on the embodiment shown in fig. 3, the step of determining a plurality of first sampling pixels in the second semantic region shown in step 32 may specifically include the following steps:

step 321, obtaining second pixel coordinates corresponding to the plurality of pixel points on the closed line of the second semantic region.

In an embodiment, the second semantic region may be understood as a region consisting of a closed curve, which is formed by a line between a plurality of pixel points.

In an embodiment, the plurality of pixel points on the closed line may be understood as all the pixel points forming the closed line, so as to ensure that the second pixel coordinates corresponding to the plurality of pixel points respectively can accurately reflect the position of the second semantic region.

In one example, referring to fig. 1, the second semantic region is formed by a closed line formed by a line between at least 7 pixels, the pixels on the closed line are not shown in fig. 1, but the number of pixels can be inferred from the line segment and the intersection between the line segments to be at least 7. Here, the second pixel coordinate of the pixel point may be understood as the position of the pixel point in the image to be annotated.

Step 322, determining a plurality of third pixel coordinates in the closed line based on the second pixel coordinates corresponding to the plurality of pixel points on the closed line.

Specifically, the second pixel coordinates corresponding to the plurality of pixel points on the closed line are determined, namely, a two-dimensional coordinate space is determined, coordinate point sampling is performed in the two-dimensional coordinate space, the sampling mode is not limited, and therefore a plurality of third pixel coordinates are determined. In one example, the pixel coordinates of the 16 first sampling pixel points in the second semantic region in fig. 1, which correspond to the image to be annotated, are the third pixel coordinates.

Step 323, determining a plurality of first sampling pixel points based on the third pixel coordinates.

The pixel point corresponding to each third pixel coordinate is the first sampling pixel point, and the first sampling pixel points are all in the closed line. In one example, the first sampling pixels are uniformly distributed within the second semantic region.

In the embodiment, the pixel coordinates of the pixel points on the closed line are used for sampling the pixel points in the closed line, so that a plurality of first sampling pixel points are determined, key points for three-dimensional modeling are obtained, the semantic region is characterized by the sampling pixel points, the position of each pixel point in the semantic region on the three-dimensional model is not required to be determined, and the association between the semantic region and the three-dimensional model is quickly established.

FIG. 5 is a flowchart illustrating the step of obtaining the first three-dimensional coordinates of the second sampling pixel point on the reference three-dimensional model surface based on the third event information corresponding to the at least one texture map in the embodiment shown in FIG. 3.

As shown in fig. 5, in an exemplary embodiment of the present application based on the embodiment shown in fig. 3, the step of obtaining, based on the third event information corresponding to the at least one texture map, the first three-dimensional coordinate of the second sampling pixel point corresponding to the reference three-dimensional model surface shown in step 34 may specifically include the following steps:

Step 341, based on third event information corresponding to the at least one texture map, obtaining first pixel coordinates of at least one target pixel corresponding to the second sampling pixel, where each of the texture maps where the target pixel is located is different.

In one embodiment, the target pixel points may be understood as the second sampling pixel points corresponding to the pixel points on the texture map, one target pixel point on one texture map, and different target pixel points on different texture maps. Here, the first pixel coordinate of the target pixel point may be understood as a two-dimensional texture coordinate of the target pixel point on the texture map.

Specifically, by observing the second sampling pixel point and the plurality of texture maps, determining one or more texture maps where the second sampling pixel point is located, and performing clicking operation on the texture maps to obtain a plurality of pixel points corresponding to the second sampling pixel point, if the positions of the pixel points are not suitable, the pixel points can be moved, the moved pixel points are determined to be target pixel points, and therefore a plurality of target pixel points are determined, wherein the surface points of the plurality of target pixel points corresponding to the reference three-dimensional model may be different.

Step 342, determining a second three-dimensional coordinate of the at least one target pixel point on the surface of the reference three-dimensional model based on the first pixel coordinate of the at least one target pixel point and a preset data mapping relationship between the pixel point in the at least one texture map and the surface point of the reference three-dimensional model.

In an embodiment, the preset data mapping relationship may be understood as a mapping relationship between a pixel point in the texture map and a surface point of the reference three-dimensional model, that is, a three-dimensional texture coordinate of each surface point of the reference three-dimensional model corresponds to a two-dimensional texture coordinate in the texture map one by one, so as to indicate a position of each surface point in the reference three-dimensional model in the texture map, for example, when the image to be marked is a human image, the preset data mapping relationship may be DensePose-COCO (large-scale contrast standard data set, which is used in the prior art to map all human pixels in the RGB image to a three-dimensional surface of a human body, where the RGB image may be understood as the texture map).

In one embodiment, the surface points of the reference three-dimensional model may be understood as vertices of convex polygons on the reference three-dimensional model.

Specifically, traversing the corresponding relation between the two-dimensional texture coordinates and the three-dimensional texture coordinates in the preset data mapping relation, thereby obtaining the second three-dimensional coordinates of the vertexes of the plurality of target pixel points on the reference three-dimensional model.

In some possible implementations, three-dimensional coordinates of vertices of the plurality of target pixel points on the reference three-dimensional model are determined, and the three-dimensional coordinates are sampled corresponding to surface points in a surface area of the reference three-dimensional model, for example, a center point may be selected, so as to determine a second three-dimensional coordinate.

And step 343, taking the second three-dimensional coordinate as the first three-dimensional coordinate of the second sampling pixel point.

According to the embodiment, through event information of the texture maps, pixel coordinates corresponding to a plurality of target pixel points corresponding to the sampling pixel points are determined, three-dimensional coordinates of the sampling pixel points are determined based on the pixel coordinates corresponding to the plurality of target pixel points and preset data mapping relations of the two-dimensional texture coordinates and the three-dimensional texture coordinates, so that a dense corresponding relation from a two-dimensional image to a three-dimensional model is established, positions of the pixel points in a semantic region on the three-dimensional model are determined based on the dense corresponding relation, and association between the semantic region and the three-dimensional model is achieved.

Fig. 6 is a flowchart illustrating a step of obtaining a first pixel coordinate of at least one target pixel corresponding to the second sampling pixel based on third event information corresponding to the at least one texture map in the embodiment shown in fig. 5.

As shown in fig. 6, in an exemplary embodiment of the present application based on the embodiment shown in fig. 5, the step of obtaining, based on the third event information corresponding to the at least one texture map, the first pixel coordinate of the at least one target pixel corresponding to the second sampled pixel shown in step 341 may specifically include the following steps:

Step 3411, determining at least two candidate pixel points based on the third event information corresponding to the at least one texture map, where the at least two candidate pixel points respectively correspond to the same surface points on the reference three-dimensional model, and the texture map where each candidate pixel point is located is different.

Specifically, the surface points of the reference three-dimensional model may have corresponding relations with different texture maps, where a plurality of candidate pixel points respectively correspond to the same surface points on the reference three-dimensional model, and the texture maps where each candidate pixel point is located are different, so that the target pixel point corresponding to the second sampling pixel point can be determined more accurately. For example, the target pixel point or the candidate pixel point corresponding to the first sampling pixel point a in fig. 1 is the pixel point a in fig. 2A, that is, the first sampling pixel point and the corresponding target pixel point or the candidate pixel point have the same letters, where fig. 2A shows the target pixel point or the candidate pixel point corresponding to the first sampling pixel point A, B, C, D, E, F, G, H, I, J in fig. 1, and if the mouse clicking operation is performed at the pixel point position corresponding to G in fig. 2A, three candidate pixel points may be determined, that is, the candidate pixel point G in fig. 2A, the candidate pixel point G in fig. 2D, and the candidate pixel point G in fig. 2E.

Specifically, by observing the second semantic region and a plurality of texture maps, determining a texture map closest to the second semantic region, and performing mouse clicking operation on the texture map, wherein the position of the mouse clicking operation is the position of the candidate pixel point, and other candidate pixel points which are the same as the surface point of the candidate pixel point corresponding to the reference three-dimensional model are displayed on other texture maps.

Step 3412, based on the fourth event information corresponding to the at least two candidate pixel points, acquiring the first pixel coordinates of at least one target pixel point, where two or more target pixel points correspond to the same surface point on the reference three-dimensional model.

In an embodiment, the fourth event information may be understood as external input information, specifically, location information of the candidate pixel after the external input device moves the candidate pixel, where the external input device is a keyboard, specifically, the candidate pixel is moved by using up, down, left and right keys on the keyboard, and one pixel is moved at a time, so that a plurality of target pixel points corresponding to the second sampling pixel are more accurately determined.

In an embodiment, two or more target pixel points correspond to the same surface point on the reference three-dimensional model, so that the positions of the sampling pixel points corresponding to the surface point on the reference three-dimensional model can be reflected more accurately.

In an embodiment, the display colors of the candidate pixel point, the target pixel point and the second sampling pixel point are the same, so that the possibility of misoperation is reduced. For example, in one example, the first sampling pixel point J in the second semantic region in fig. 1, the target pixel point J in fig. 2A, the target pixel point J in fig. 2D, and the target pixel point J in fig. 2E are the same in display color.

It should be noted that, in the process of moving the candidate pixel points, the number of candidate pixel points may be changed due to the change of the positions of the candidate pixel points, which may cause the change of the surface points of the candidate pixel points on the reference three-dimensional model, so that the number of candidate pixel points may be the same as or different from the number of target pixel points. For example, when the position of the candidate pixel point G in fig. 2A is moved, it is assumed that the surface point of the moved candidate pixel point G corresponding to the reference three-dimensional model corresponds only to fig. 2A, and the candidate pixel point G in fig. 2D and the candidate pixel point G in fig. 2E are both disappeared.

According to the method, the first pixel coordinates of the sampling pixel points corresponding to the plurality of target pixel points on the texture map are determined through the moving operation of the sampling pixel points corresponding to the plurality of candidate pixel points on the texture map, the positions of the pixel points in the semantic region on the three-dimensional model can be determined more accurately based on the first pixel coordinates, and therefore the association between the pixel points in the semantic region and the three-dimensional model can be established more accurately.

Exemplary apparatus

According to the same concept as the method embodiment of the application, the embodiment of the application also provides a device for acquiring three-dimensional coordinate data.

As shown in fig. 7, an apparatus for acquiring three-dimensional coordinate data according to an exemplary embodiment of the present application includes:

a first region determining module 71, configured to determine at least one first semantic region in the image to be annotated;

A second region determining module 72, configured to determine, based on first event information corresponding to each first semantic region, a plurality of first sampling pixels in a second semantic region and at least one texture map of a reference three-dimensional model corresponding to the second semantic region, where the second semantic region is any one of the first semantic regions;

a pixel determining module 73, configured to obtain a second sampling pixel based on second event information corresponding to the second semantic region, where the second sampling pixel is any one of the first sampling pixels in the second semantic region;

The coordinate determining module 74 is configured to obtain, based on third event information corresponding to the at least one texture map, a first three-dimensional coordinate of the second sampling pixel point corresponding to the reference three-dimensional model surface.

As shown in fig. 8, in an exemplary embodiment, the second area determining module 72 includes:

a first coordinate determining unit 721, configured to obtain second pixel coordinates corresponding to a plurality of pixel points on a closed line of the second semantic region, respectively;

a second coordinate determining unit 722, configured to determine a plurality of third pixel coordinates in the closed line based on second pixel coordinates corresponding to the plurality of pixel points on the closed line, respectively;

The pixel point determining unit 723 is configured to determine a plurality of first sampling pixel points based on the plurality of third pixel coordinates.

As shown in fig. 8, in one exemplary embodiment, the coordinate determination module 74 includes:

A third coordinate determining unit 741, configured to obtain, based on third event information corresponding to the at least one texture map, a first pixel coordinate of at least one target pixel corresponding to the second sampling pixel, where the texture map where each target pixel is located is different;

A fourth coordinate determining unit 742, configured to determine a second three-dimensional coordinate of the at least one target pixel point on the surface of the reference three-dimensional model based on the first pixel coordinate of the at least one target pixel point and a preset data mapping relationship between the pixel point in the at least one texture map and the surface point of the reference three-dimensional model;

And a fifth coordinate determining unit 743, configured to take the second three-dimensional coordinate as the first three-dimensional coordinate of the second sampling pixel point.

As shown in fig. 9, in an exemplary embodiment, the third coordinate determination unit 741 includes:

A pixel point determining subunit 7411, configured to determine at least two candidate pixel points based on third event information corresponding to the at least one texture map, where the at least two candidate pixel points respectively correspond to the same surface points on the reference three-dimensional model, and the texture maps where each candidate pixel point is located are different;

The coordinate determining subunit 7412 is configured to obtain, based on fourth event information corresponding to the at least two candidate pixel points, a first pixel coordinate of at least one target pixel point, where two or more target pixel points correspond to the same surface point on the reference three-dimensional model.

Exemplary electronic device

Fig. 10 illustrates a block diagram of an electronic device according to an embodiment of the application.

As shown in fig. 10, the electronic device 100 includes one or more processors 101 and memory 102.

The processor 101 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities and may control other components in the electronic device 100 to perform desired functions.

Memory 102 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that may be executed by the processor 101 to implement the three-dimensional coordinate data acquisition methods and/or other desired functions of the various embodiments of the present application described above.

In one example, the electronic device 100 may further include: an input device 103 and an output device 104, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).

Of course, only some of the components of the electronic device 100 relevant to the present application are shown in fig. 10 for simplicity, components such as buses, input/output interfaces, etc. being omitted. In addition, the electronic device 100 may include any other suitable components depending on the particular application.

Exemplary computer program product and computer readable storage Medium

In addition to the methods and apparatus described above, embodiments of the application may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in the method of acquiring three-dimensional coordinate data according to the various embodiments of the application described in the "exemplary methods" section of this specification.

The computer program product may write program code for performing operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present application may also be a computer-readable storage medium, on which computer program instructions are stored, which, when being executed by a processor, cause the processor to perform the steps in the method of acquiring three-dimensional coordinate data according to the various embodiments of the present application described in the "exemplary method" section above in the present specification.

The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The basic principles of the present application have been described above in connection with specific embodiments, but it should be noted that the advantages, benefits, effects, etc. mentioned in the present application are merely examples and not intended to be limiting, and these advantages, benefits, effects, etc. are not to be construed as necessarily possessed by the various embodiments of the application. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, as the application is not necessarily limited to practice with the above described specific details.

The block diagrams of the devices, apparatuses, devices, systems referred to in the present application are only illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.

It is also noted that in the apparatus, devices and methods of the present application, the components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered as equivalent aspects of the present application.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the application to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims

1. A method for acquiring three-dimensional coordinate data comprises the following steps:

Determining at least one first semantic region in the image to be annotated;

Acquiring a first three-dimensional coordinate of the second sampling pixel point corresponding to the surface of the reference three-dimensional model based on third event information corresponding to the at least one texture map;

Wherein the determining the plurality of first sampling pixels in the second semantic region includes:

Acquiring second pixel coordinates corresponding to a plurality of pixel points on a closed line of a second semantic region respectively;

Determining a plurality of third pixel coordinates in the closed line based on second pixel coordinates corresponding to the plurality of pixel points on the closed line respectively;

a plurality of first sampled pixel points is determined based on the plurality of third pixel coordinates.

2. The method of claim 1, wherein the obtaining, based on the third event information corresponding to the at least one texture map, the first three-dimensional coordinates of the second sampled pixel point corresponding to the reference three-dimensional model surface comprises:

Acquiring first pixel coordinates of at least one target pixel point corresponding to the second sampling pixel point based on third event information corresponding to the at least one texture map, wherein the texture map of each target pixel point is different;

determining a second three-dimensional coordinate of the at least one target pixel point on the surface of the reference three-dimensional model based on a first pixel coordinate of the at least one target pixel point and a preset data mapping relationship between the pixel point in the at least one texture map and the surface point of the reference three-dimensional model;

and taking the second three-dimensional coordinate as a first three-dimensional coordinate of the second sampling pixel point.

3. The method according to claim 2, wherein the obtaining, based on the third event information corresponding to the at least one texture map, the first pixel coordinates of the at least one target pixel corresponding to the second sampling pixel includes:

Determining at least two candidate pixel points based on third event information corresponding to the at least one texture map, wherein the at least two candidate pixel points respectively correspond to the same surface points on the reference three-dimensional model, and the texture map where each candidate pixel point is located is different;

and acquiring first pixel coordinates of at least one target pixel point based on fourth event information corresponding to the at least two candidate pixel points, wherein two or more target pixel points are identical in corresponding surface points on the reference three-dimensional model.

4. A method according to claim 3, wherein the candidate pixel point, the target pixel point and the second sampling pixel point are displayed in the same color.

5. The method of claim 1, wherein the first sampling pixels are uniformly distributed within the second semantic region;

and the display colors corresponding to the first sampling pixel points in the second semantic region are different.

6. The method of any of claims 1 to 5, wherein the second semantic region is an enlarged view;

The at least one texture map comprises any one or more of a left-looking texture map, a right-looking texture map, a front-looking texture map, a back-looking texture map, a top-looking texture map, and a bottom-looking texture map of the reference three-dimensional model;

And determining the reference three-dimensional model based on the semantic tags carried by the second semantic region, wherein the same semantic tags correspond to the same reference three-dimensional model.

7. A training data acquisition device comprising:

The coordinate determining module is used for acquiring a first three-dimensional coordinate of the second sampling pixel point corresponding to the reference three-dimensional model surface based on third event information corresponding to the at least one texture map;

The second region determining module includes: a first coordinate determination unit, a second coordinate determination unit, and a pixel point determination unit;

the first coordinate determination unit is used for: acquiring second pixel coordinates corresponding to a plurality of pixel points on a closed line of a second semantic region respectively;

The second coordinate determination unit is used for: determining a plurality of third pixel coordinates in the closed line based on second pixel coordinates corresponding to the plurality of pixel points on the closed line respectively;

The pixel point determining unit is used for: a plurality of first sampled pixel points is determined based on the plurality of third pixel coordinates.

8. A computer readable storage medium storing a computer program for execution by a processor to implement the method of acquiring three-dimensional coordinate data of any one of claims 1-6.

9. An electronic device, the electronic device comprising:

A processor;

A memory for storing the processor-executable instructions;

The processor is configured to read the executable instructions from the memory and execute the instructions to implement the method for acquiring three-dimensional coordinate data according to any one of claims 1 to 6.