CN114089836A

CN114089836A - Labeling method, terminal, server and storage medium

Info

Publication number: CN114089836A
Application number: CN202210063444.4A
Authority: CN
Inventors: 张增杰; 施文哲; 夏宏飞; 周琴芬
Original assignee: Zte Nanjing Co ltd; ZTE Corp
Current assignee: Zte Nanjing Co ltd; ZTE Corp
Priority date: 2022-01-20
Filing date: 2022-01-20
Publication date: 2022-02-25
Anticipated expiration: 2042-01-20
Also published as: CN114089836B

Abstract

The embodiment of the invention provides a labeling method, a terminal, a server and a storage medium, wherein the positioning image is acquired and sent to the server, so that the server obtains pose data of the terminal in a virtual scene corresponding to a point cloud map according to the positioning image, receives the pose data sent by the server, generates and displays a virtual connecting line between the terminal and the virtual scene according to the pose data, determines a target labeling object in the virtual scene according to the virtual connecting line, and labels the target labeling object, so that interactive feedback of vision and environment is realized through the virtual connecting line during labeling, accurate and rapid labeling of the target labeling object is realized, and the labeling efficiency is improved.

Description

Labeling method, terminal, server and storage medium

Technical Field

The present invention relates to, but not limited to, the field of digital twinning technologies, and in particular, to an annotation method, a terminal, a server, and a storage medium.

Background

With the development of AR (Augmented Reality) cloud technology, large-scale three-dimensional environment point cloud maps have been constructed by laser or panoramic cameras. The point cloud maps support centimeter-level accurate positioning, are used for constructing a business district, a scenic spot and even an urban level digital twin, and when people explore a real environment through a mobile phone or MR (Mixed Reality) glasses, digital contents highly related to the environment can be automatically matched, so that the whole space becomes a screen and an information entrance shared by all people, and huge revolution of an information acquisition mode is brought.

However, in practical application, massive digital content information arrangement and mapping are involved, real-time updating and maintenance are needed, content is continuously corrected and increased, and multiple users are needed to participate in labeling.

Disclosure of Invention

The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.

The embodiment of the invention provides a labeling method, a terminal, a server and a storage medium, which can improve the labeling efficiency.

In a first aspect, an embodiment of the present invention provides a labeling method, which is applied to a terminal, where the labeling method includes:

acquiring a positioning image, and sending the positioning image to a server so that the server can obtain pose data of the terminal in a virtual scene corresponding to a point cloud map according to the positioning image;

receiving the pose data sent by the server, and generating and displaying a virtual connecting line between the terminal and the virtual scene according to the pose data;

and determining a target labeling object in the virtual scene according to the virtual connecting line, and labeling the target labeling object.

In a second aspect, an embodiment of the present invention further provides a labeling method, applied to a server, where the labeling method includes:

receiving a positioning image sent by a terminal, and obtaining pose data of the terminal in a virtual scene corresponding to a point cloud map according to the positioning image;

and sending the pose data to the terminal so that the terminal generates and displays a virtual connecting line between the terminal and the virtual scene according to the pose data, and the terminal determines a target labeling object in the virtual scene according to the virtual connecting line and labels the target labeling object.

In a third aspect, an embodiment of the present invention further provides a terminal, including a memory and a processor, where the memory stores a computer program, and the processor implements the annotation method according to the first aspect when executing the computer program.

In a fourth aspect, an embodiment of the present invention further provides a server, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the annotation method according to the second aspect when executing the computer program.

In a fifth aspect, the present invention further provides a computer-readable storage medium, where the storage medium stores a program, and the program, when executed by a processor, implements the annotation method according to the first aspect or the second aspect.

The embodiment of the invention at least has the following beneficial effects: the method comprises the steps of acquiring a positioning image, sending the positioning image to a server, enabling the server to obtain pose data of a terminal in a virtual scene corresponding to a point cloud map according to the positioning image, receiving the pose data sent by the server, generating and displaying a virtual connecting line between the terminal and the virtual scene according to the pose data, determining a target marking object in the virtual scene according to the virtual connecting line, and marking the target marking object, so that interactive feedback of vision and environment is realized through the virtual connecting line during marking, accurate and rapid marking of the target marking object is achieved, and marking efficiency is improved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the example serve to explain the principles of the invention and not to limit the invention.

Fig. 1 is a schematic diagram of a network architecture according to an embodiment of the present invention;

fig. 2 is a flowchart of a labeling method applied to a terminal according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a method for determining a target annotation object according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of determining a location of a label according to an embodiment of the present invention;

fig. 5 is an enlarged schematic processing diagram of a local area corresponding to an intersection point between a virtual connection line and a target annotation object according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of an annotation process provided in an embodiment of the present invention;

FIG. 7 is a flowchart of an annotation method applied to a server according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a complete labeling process according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a terminal according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of a server according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the orientation description, such as "up", "down", etc., is based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplification of description, but does not indicate or imply that the device or element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention.

It should be noted that although functional blocks are partitioned in a schematic diagram of an apparatus and a logical order is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the partitioning of blocks in the apparatus or the order in the flowchart. The terms "first," "second," and the like in the description, in the claims, or in the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. In the description of the present invention, "a plurality" means two or more unless otherwise specified.

In the description of the present invention, it should be noted that unless otherwise specifically limited, the terms "mounted" and "connected" are used in a broad sense, and those skilled in the art can reasonably determine the specific meaning of the above terms in the present invention by combining the specific contents of the technical solutions.

With the development of AR cloud technology, large-scale three-dimensional environment point cloud maps have been constructed by laser or panoramic cameras. These point cloud maps support centimeter-level accurate positioning, are used to construct business circles, scenic spots and even city level digital twins, and when people explore a real environment through mobile phones or MR glasses, digital contents highly related to the environment can be automatically matched, and the whole space becomes a screen and an information entrance shared by all people, which will bring a huge revolution of information acquisition modes.

However, in practical application, massive digital content information is arranged and mapped, real-time updating and maintenance are needed, content is continuously corrected and increased, and multiple users are needed to participate in labeling, and the current implementation modes mainly include the following two modes:

one method is that a three-dimensional dense point cloud map is loaded by using 3D editing software at a PC terminal for editing, because the point cloud map lacks geographical and semantic information, an operator needs to continuously search deployment points and adjust content positions, although the point cloud map is accurate, the arrangement efficiency is low, the use threshold is high, and the co-construction requirements of common users cannot be met;

the other method is that the marked content is placed at an approximate position by clicking a screen through fingers at a moving end, then the posture of the marked content is adjusted through visual inspection by using several gestures, for example, a single finger drags up and down, left and right to adjust the space XY axis coordinate of the marked content, a double finger rotates to adjust the orientation of the marked content, a three finger drags up and down to adjust the Z axis coordinate, and the size of the marked content is adjusted through kneading and zooming, the adjustment process is complicated and inaccurate, fine adjustment is usually performed through inputting coordinate values, rotation values and amplification values, the numerical values need to be tried repeatedly, and the method is very inefficient. In addition, when the method is used in a large space, due to the distance, a annotator cannot judge the orientation of the annotation content in the three-dimensional space as a whole by direct visual observation as when facing a small scene, and can only observe the target from various angles by continuously moving back and forth in a large range to adjust, so that the annotation is more inefficient. Therefore, although the current mobile end labeling mode can support crowdsourcing standards, the current mobile end labeling mode is only suitable for labeling a small amount of content in a desktop and room level small space and cannot meet the requirements of accurate and efficient deployment in a large space.

Based on this, the embodiment of the invention provides a labeling method, a terminal, a server and a storage medium, which can improve the labeling efficiency.

The terminal in the embodiment of the present invention may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted terminal, and the like, but is not limited thereto.

Referring to fig. 1, fig. 1 is a schematic diagram of a network architecture provided in an embodiment of the present invention, where the network architecture includes a user side and a cloud side, where the user side is a terminal side, and the cloud side is a server side. Before the embodiment of the invention is implemented, a laser or a panoramic camera is needed to carry out large-scale acquisition on environmental data, a server generates a sparse point cloud map, a dense point cloud map and Mesh (surface Mesh) data thereof according to the environmental data through an algorithm, and carries out panoramic segmentation on the dense point cloud map, wherein the sparse point cloud map is used for positioning equipment; mesh data of the dense point cloud map is used for marking the fit between the content and the target and the shielding of the content; the specific implementation mode of the panoramic segmentation is to perform semantic segmentation and target detection on the environment by combining the RGB image and the dense point cloud map.

Specifically, a Sparse Point cloud Map, namely a Sparse 3D Point-closed Map, is a three-dimensional space structure formed by reversely deducing a small number of two-dimensional feature points of an image into a three-dimensional space; the Dense Point cloud Map, namely the Dense 3D Point-closed Map, is generated by performing depth fusion and texture mapping on each pixel Point on the basis of the sparse Point cloud Map, and the Mesh data is representation data of triangular Mesh coverage of the environment surface.

The server is provided with a space positioning module, a Mesh generation module, a panorama segmentation module, a labeling resource template module, a labeling auditing module and a labeling rule configuration module, wherein the space positioning module is used for acquiring image frames uploaded by the terminal, extracting image feature points, finding out similar target reference images by comparing cosine similarity between received image feature vectors and feature vectors Of candidate reference images at the cloud end, calculating a 6-DOF (Six Degrees Of Freedom) pose Of the current terminal in a point cloud map through feature matching between images and a sparse point cloud structure, and returning the pose to the mobile phone through an interface. The 6-DOF pose is synthesized by 3 translation degrees of freedom and 3 spatial rotation degrees of freedom of the object.

And the Mesh generation module is used for returning Mesh data within 180-degree range of the camera facing to the area behind the terminal according to the 6-DOF pose of the terminal when the spatial positioning of the terminal is finished, and the data are a self-defined Mesh vertex coordinate array and an index array.

And the panorama segmentation module is used for returning target bbox data within 180-degree range of the camera facing to the area behind the terminal according to the 6-DOF pose of the terminal when the space positioning of the terminal is finished, wherein the data comprises target class label and three-dimensional target frame coordinate information. bbox, or Bounding box, represents the Bounding box of an object detected by an object in computer vision.

The labeling resource template module stores labeling templates Of different target labeling objects according to classification, such as POI (Point Of Interest) labels Of buildings and voice explanation labels Of scenic spots, and after a user identifies an environment, corresponding labels are automatically matched to realize quick editing.

The labeling rule configuration module is used for defining the size, position and fitting relation rules of the labeling content and the labeling object, so that the labeling content is automatically adsorbed to the labeling object and adjusted to a proper state without manual adjustment again by a user.

The marking auditing module is used for verifying the marking content to ensure that the marking content is accurate and does not violate rules.

Based on the network architecture shown in fig. 1, referring to fig. 2, fig. 2 is a flowchart of a labeling method applied to a terminal according to an embodiment of the present invention, where the labeling method includes, but is not limited to, the following steps 201 to 203.

Step 201: acquiring a positioning image, and sending the positioning image to a server so that the server can obtain pose data of the terminal in a virtual scene corresponding to the point cloud map according to the positioning image;

step 202: receiving pose data sent by a server, and generating and displaying a virtual connecting line between the terminal and a virtual scene according to the pose data;

step 203: and determining a target labeling object in the virtual scene according to the virtual connecting line, and labeling the target labeling object.

The positioning image can be acquired through a camera of the terminal, the terminal sends the positioning image to the server after acquiring the positioning image, the server acquires pose data of the terminal in a virtual scene corresponding to the point cloud map according to the positioning image, and the pose data can be 6-DOF pose data.

And then, the terminal generates a virtual connecting line according to the pose data, and the virtual connecting line is used for scanning and detecting a virtual scene to assist a user in marking. In specific application, a terminal is provided with a labeling application program, after the user starts the labeling application program, a virtual scene in a point cloud map is displayed in a program interface of the labeling application program, the virtual scene with the terminal as a main view angle is displayed according to the current pose data of the terminal, and a target labeling object is also displayed in the virtual scene. It can be understood that the terminal can continuously acquire the positioning images and send the positioning images to the server, so that the virtual scene displayed in the current program interface can change along with the change of the pose of the terminal.

After the terminal generates the virtual connecting line, the virtual connecting line is further displayed in a program interface, so that the determination of the target labeling object is visually assisted by a user, and the specific implementation form of the virtual connecting line can include but is not limited to a virtual searchlight, a detection ray or other invisible/transparent connecting lines.

The method comprises the steps of acquiring a positioning image, sending the positioning image to a server, enabling the server to obtain pose data of a terminal in a virtual scene corresponding to a point cloud map according to the positioning image, receiving the pose data sent by the server, generating and displaying a virtual connecting line between the terminal and the virtual scene according to the pose data, determining a target labeling object in the virtual scene according to the virtual connecting line, and labeling the target labeling object, so that interactive feedback of vision and the virtual scene is realized through the virtual connecting line during labeling, accurate and rapid labeling of the target labeling object is achieved, and labeling efficiency is improved.

In an embodiment, in step 202, a virtual connection line between the terminal and the virtual scene is generated and displayed according to the pose data, and specifically, the virtual connection line between the terminal and the virtual scene is drawn from the terminal to the virtual scene corresponding to the point cloud map according to the pose data, and the virtual connection line is displayed.

In an embodiment, the virtual connecting line may start from the center of the screen of the terminal and be perpendicular to the screen of the terminal, so that the position relationship between the terminal and the virtual scene may be accurately depicted when the virtual connecting line is subsequently used for scanning and detecting the virtual scene, thereby improving the accuracy of subsequently determining the target annotation object. It is understood that the virtual connection line is displayed in the screen of the terminal, and thus is perpendicular to the screen from the inside of the screen based on the perspective of a user holding the terminal. It can be understood that the starting point of the virtual connecting line and the included angle with the center of the screen can be determined according to actual requirements.

In an embodiment, in step 203, the target annotation object is determined in the virtual scene according to the virtual connection line, specifically, the target annotation object may be determined by receiving bounding box data of a candidate annotation object in the virtual scene sent by the server, determining an intersection state between the virtual connection line and a bounding box of the candidate annotation object according to the bounding box data, and taking the candidate annotation object as the target annotation object when the intersection state indicates that the virtual connection line intersects any boundary surface of the candidate annotation object.

The server can perform panoramic segmentation after generating a point cloud map, generate a bbox corresponding to each building, and display the bbox in a cube or cuboid form after rendering, wherein the cube or cuboid corresponding to each building can be used as a candidate labeling object, and any boundary surface of the candidate labeling object is one side surface of the corresponding cube or cuboid.

For example, referring to fig. 3, fig. 3 is a schematic diagram of determining a target annotation object according to an embodiment of the present invention, wherein a virtual connecting line intersects one boundary surface of a candidate annotation object 301, and then the candidate annotation object 301 is the target annotation object. On this basis, the target annotation object intersected with the virtual connecting line may be subjected to highlighting processing, and the manner of highlighting processing may be highlighting processing, label display processing, and the like. Through highlighting the target labeling object, the current target labeling object can be displayed more intuitively, so that a user can more accurately determine the target labeling object when labeling, and the labeling efficiency is improved.

In an embodiment, in the step 203, the target annotation object is determined in the virtual scene according to the virtual connection line, specifically, a staying time length of the virtual connection line on the candidate annotation object may also be determined, and when the staying time length is greater than or equal to a preset first threshold, the candidate annotation object is taken as the target annotation object.

For example, the first threshold may be set to 2 seconds, and when the staying time of the virtual connecting line on the candidate annotation object reaches 2 seconds, the candidate annotation object is the target annotation object. The target labeling object is automatically determined through the stay time, the manual selection of a user is not needed, and the labeling efficiency is improved. It is understood that the first threshold may be set according to actual situations, and may be 2.5 seconds, 3 seconds, and the like in addition to 2 seconds, and the embodiment of the present invention is not limited.

It should be added that the target annotation object is automatically determined by the dwell time, and the target annotation object can be highlighted as well.

It can be understood that, the two manners of determining the target annotation object through the intersection state between the bounding boxes of the candidate annotation objects or automatically determining the target annotation object through the dwell time may be combined, that is, the target annotation object may be determined according to the intersection state and the dwell time at the same time, which is not limited in the embodiment of the present invention.

In an embodiment, in step 203, the target annotation object is annotated, which may specifically be receiving environment grid data of a virtual scene sent by a server, determining an intersection point between a virtual connection line and the target annotation object according to the environment grid data, performing highlighting processing on the intersection point, and annotating the target annotation object according to the intersection point after the highlighting processing.

Specifically, the environment Mesh data may be the Mesh data, and an intersection point between the virtual connection line and the target annotation object is used to determine a specific annotation position, that is, the bounding box data is used to determine the target annotation object, and the environment Mesh data is used to determine the specific annotation position. Meanwhile, the intersection points are highlighted, so that specific marking positions can be displayed more intuitively, users can more accurately determine the marking positions to be marked when marking is carried out, and accordingly marking efficiency is improved.

Similarly, the manner of highlighting the intersection may be highlighting, label displaying, and the like, and the embodiment of the present invention is not limited thereto. For example, referring to fig. 4, fig. 4 is a schematic diagram of determining a labeling position according to an embodiment of the present invention, where a virtual connecting line intersects a Mesh of a local environment surface in a virtual scene, and an intersection point of the virtual connecting line and the Mesh may be highlighted, so that a user can more accurately determine a labeling position to be labeled when labeling.

In one embodiment, when the position of the terminal moves, the virtual connecting line moves along with the position, so that the intersection point of the virtual connecting line and the candidate annotation object, and the intersection point of the virtual connecting line and the local environment surface Mesh in the virtual scene can also follow the movement of the terminal. For example, the tracking display may be implemented by using a VIO (Visual-Inertial odometer) tracking display.

In an embodiment, when the target annotation object is annotated according to the intersection point between the highlighted virtual connecting line and the target annotation object, a distance value between the terminal and the target annotation object may be specifically determined, when the distance value is greater than or equal to a preset second threshold, a local area corresponding to the intersection point between the highlighted virtual connecting line and the target annotation object is determined, the local area is magnified, and the target annotation object is annotated according to the intersection point in the magnified local area.

For example, referring to fig. 5, fig. 5 is a schematic diagram illustrating an enlarged processing of a local area corresponding to an intersection point between a virtual connection line and a target annotation object according to an embodiment of the present invention, where the second threshold may be 50 meters, that is, when a distance between a terminal and the target annotation object is greater than or equal to 50 meters, the target annotation object is considered to be farther from the terminal, and at this time, the local area corresponding to the intersection point between the virtual connection line and the target annotation object may be enlarged, so that a user may confirm an annotation position more accurately. It is to be understood that the second threshold may also be 100 meters, 150 meters, and the like, and the embodiment of the present invention is not limited thereto. In addition, the size of the local area corresponding to the intersection point can also be set according to the actual situation, as long as the intersection point between the virtual connecting line and the target labeling object can be clearly displayed. And when the position of the virtual connecting line is moved, automatically canceling the amplification processing effect until the virtual connecting line is intersected with the next target labeling object meeting the distance condition.

In an embodiment, referring to fig. 6, fig. 6 is a schematic diagram of a labeling process provided in an embodiment of the present invention, when labeling a target labeled object according to an intersection point after highlighting processing, specifically, obtaining labeled content, transmitting the labeled content to the intersection point after highlighting processing along a virtual connecting line in response to a projection operation instruction for the labeled content, displaying a first transmission animation in the process of transmitting the labeled content, and labeling the labeled content to a surface of the target labeled object.

Specifically, the projection operation instruction is used for confirming that the target labeling object is labeled by adopting the labeling content, namely after the user confirms to label, the labeling content is transmitted to an intersection point between the virtual connecting line and the target labeling object along the virtual connecting line in the virtual scene, and the first transmission animation is displayed in the process of transmitting the labeling content, so that the labeling process at this time is displayed more intuitively, the labeling feeling of the user is enhanced, and the user can conveniently confirm that the labeling is completed at this time.

In an embodiment, when obtaining the annotation content, the method may specifically receive bounding box data of a candidate annotation object in a virtual scene sent by a server, determine a category of the annotation content according to the bounding box data, display a corresponding annotation template according to the category, receive an annotation operation instruction input according to the annotation template, and obtain the annotation content according to the annotation operation instruction.

Specifically, different categories of labeled content, such as building POI, store IP (intellectual property) image, etc., may be preset by different bbox, and the embodiment of the present invention is not limited thereto. The type of the marked content is determined according to the data of the boundary frame during marking, and then the corresponding marking template is displayed according to the type, so that a user can mark the target marking object more conveniently, and the marking efficiency is higher. Moreover, the marked content can be one or a combination of more of characters, pictures, videos and audios, so that the richness of the marked content is improved.

In an embodiment, a preset labeling rule may also be obtained, and the labeling mode of the labeled content is adjusted according to the labeling rule. The annotation manner includes, but is not limited to, a size of the annotation content, an annotation position, and a rule of attaching the annotation content to the Mesh, where the annotation rule may be defined in the management of the station annotation content of the back server. Such as building POI, attached to the building surface; the store IP image stands 0.5 m in front of the store, and the size is automatically adjusted according to the bbox scale in proportion. By acquiring the labeling rule, the labeling mode of the labeled content can be automatically adjusted according to the labeling rule, and the labeled content can be more standardized.

In an embodiment, after the annotation is completed, the user may also view the annotated content, specifically, in response to a viewing operation instruction for the annotated content, transmit the annotated content on the surface of the target annotated object to the starting point of the virtual connecting line along the virtual connecting line, and display a second transmission animation in the process of transmitting the annotated content. The principle of the second transmission animation is similar to that of the first transmission animation, and the transmission directions are different. Similarly, when the annotation content is viewed, the annotation content can be more intuitively displayed by displaying the second transmission animation.

On the basis, besides the checking of the marked content, the marked content can be edited, so that the marked content can be modified conveniently and quickly, and the reliability of the marked content is improved.

In addition, in the process of transferring the annotation content, the size of the annotation content can be adjusted according to the distance between the annotation content and the terminal screen, for example, when the annotation content is transferred from the terminal screen to the surface of the target annotation object along the virtual connecting line, the annotation content increases with the increase of the transfer distance, and when the annotation content is transferred from the target annotation object back to the terminal screen along the virtual connecting line, the annotation content decreases with the increase of the transfer distance.

After the target labeling object is labeled, the labeled content can be automatically verified according to the verification rule. After the marked content passes the verification, the marked content is persistently set in the virtual scene, and other users can quickly acquire all marked information when using the application program to access, so that the crowdsourcing and sharing of the digital twin content in the big scene are realized.

In addition, referring to fig. 7, fig. 7 is a flowchart of a labeling method applied to a server according to an embodiment of the present invention, where the labeling method includes, but is not limited to, the following steps 701 to 702.

Step 701: receiving a positioning image sent by a terminal, and obtaining pose data of the terminal in a virtual scene corresponding to a point cloud map according to the positioning image;

step 702: and sending the pose data to the terminal so that the terminal generates and displays a virtual connecting line between the terminal and the virtual scene according to the pose data, and the terminal determines a target labeling object in the virtual scene according to the virtual connecting line so as to label the target labeling object.

The labeling method applied to the server and the labeling method applied to the terminal are based on the same inventive concept, so that interactive feedback of vision and a virtual scene can be realized through a virtual connecting line during labeling, accurate and rapid labeling of a target labeling object is achieved, and the labeling efficiency is improved.

In an embodiment, in step 701, the pose data of the terminal in the virtual scene corresponding to the point cloud map is obtained according to the positioning image, specifically, the first image feature of the positioning image may be extracted, the first image feature is compared with the second image features of the multiple candidate reference images, the target reference image is determined from the multiple candidate reference images according to the comparison result, and the pose data of the terminal in the point cloud map is obtained according to the second image feature of the target reference image and the sparse point cloud structure.

Specifically, the first image feature and the second image feature may be represented by feature vectors, and the first image feature may be compared with the second image features of the candidate reference images, specifically, the similarity of the feature vectors corresponding to the first image feature and the second image feature may be compared, for example, cosine similarity, euclidean distance, and the like. And finding out a target reference image similar to the positioning image from the candidate reference image by comparing the similarity between the first image characteristic and the second image characteristic, thereby rapidly calculating the pose information of the terminal.

It should be added that the principle of calculating numerical pose information by those skilled in the art is not described herein again.

The labeling method provided by the embodiment of the invention is described in a complete flow.

Referring to fig. 8, fig. 8 is a schematic diagram of a complete annotation process provided by an embodiment of the present invention, for an operation of an administrator (i.e., a server side), a point cloud map is uploaded on the server side, a spatial location service is started, then a Mesh JSON (java script Object notification) data service is started, and a target bbox data service is started. In addition, the annotation resources need to be maintained, the annotation display rules need to be configured, and annotation verification needs to be performed.

For the operation of a user (terminal side), firstly opening a camera of the terminal, acquiring and uploading a picture frame of a positioning image to a server, returning a corresponding 6-DOF pose to the terminal by the server, then carrying out space positioning according to the 6-DOF pose, if the space positioning fails, acquiring and uploading the picture frame of the positioning image to the server again, if the space positioning succeeds, opening a detection connecting line to scan the environment, loading surrounding Mesh data through a Mesh JSON data service of the server, loading surrounding bbox data through a target bbox data service of the server, then carrying out collision shielding detection, if a collision object (a target labeling object) is not detected, continuing to scan the environment, if the collision object is detected, carrying out distance detection, if the distance is less than 50 meters, not carrying out amplification processing, if the distance is more than 50 meters, carrying out amplification processing, and then loading a labeling template, changing the labeled content by a user, performing labeling projection after determining the labeled content, and displaying the attached target labeled object according to a preconfigured labeling display rule.

It will be understood that, although the steps in the respective flowcharts described above are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in a strict order unless explicitly stated in the present embodiment, and may be performed in other orders. Moreover, at least a part of the steps in the above-mentioned flowcharts may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or the stages is not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a part of the steps or the stages in other steps.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a terminal according to an embodiment of the present invention. The terminal 900 includes: a first memory 901, a first processor 902 and a computer program stored on the first memory 901 and executable on the first processor 902, the computer program being operative to perform the above-mentioned labeling method.

The first processor 902 and the first memory 901 may be connected by a bus or other means.

The first memory 901 is a non-transitory computer readable storage medium, and can be used for storing a non-transitory software program and a non-transitory computer executable program, such as the annotation method described in the embodiment of the present invention. The first processor 902 implements the above-described labeling method by running a non-transitory software program and instructions stored in the first memory 901.

The first memory 901 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data for performing the labeling method described above. Further, the first memory 901 may include a high speed random access memory, and may also include a non-transitory memory, such as at least one storage device memory device, flash memory device, or other non-transitory solid state memory device. In some embodiments, the first memory 901 may optionally include memory located remotely from the first processor 902, which may be connected to the terminal 900 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The non-transitory software programs and instructions needed to implement the annotation methods described above are stored in the first memory 901 and, when executed by the one or more first processors 902, perform the annotation methods described above.

The embodiment of the invention also provides a computer-readable storage medium, which stores computer-executable instructions, and the computer-executable instructions are used for executing the labeling method.

Referring to fig. 10, fig. 10 is a schematic structural diagram of a server according to an embodiment of the present invention. The server 1000 includes: a second memory 1001, a second processor 1002 and a computer program stored on the second memory 1001 and executable on the second processor 1002, the computer program being operable to perform the above-mentioned labeling method.

The second processor 1002 and the second memory 1001 may be connected by a bus or other means.

The second memory 1001 is a non-transitory computer readable storage medium, and can be used for storing a non-transitory software program and a non-transitory computer executable program, such as the annotation method described in the embodiment of the present invention. The second processor 1002 implements the above-described labeling method by running a non-transitory software program and instructions stored in the second memory 1001.

The second memory 1001 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data for performing the labeling method described above. In addition, the second memory 1001 may include a high speed random access memory and may also include a non-transitory memory, such as at least one storage device memory device, flash memory device, or other non-transitory solid state memory device. In some embodiments, the second memory 1001 optionally includes memory located remotely from the second processor 1002, and such remote memory may be connected to the server 1000 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The non-transitory software programs and instructions needed to implement the annotation methods described above are stored in the second memory 1001 and, when executed by the one or more second processors 1002, perform the annotation methods described above.

In one embodiment, the computer-readable storage medium stores computer-executable instructions that are executed by one or more control processors to implement the tagging method described above.

The above-described embodiments of the apparatus are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may also be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, storage device storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

It should also be appreciated that the various implementations provided by the embodiments of the present invention can be combined arbitrarily to achieve different technical effects.

While the preferred embodiments of the present invention have been described in detail, it will be understood by those skilled in the art that the foregoing and various other changes, omissions and deviations in the form and detail thereof may be made without departing from the scope of this invention.

Claims

1. A labeling method is applied to a terminal, and comprises the following steps:

2. The labeling method of claim 1, wherein the generating and displaying a virtual connecting line between the terminal and the virtual scene according to the pose data comprises:

according to the pose data, drawing a virtual connecting line between the terminal and a virtual scene from the terminal to the virtual scene corresponding to the point cloud map;

and displaying the virtual connecting line.

3. The labeling method of claim 1, wherein the determining a target labeling object in the virtual scene according to the virtual connecting line comprises:

receiving the bounding box data of the candidate annotation object in the virtual scene sent by the server;

determining the intersection state between the virtual connecting line and the boundary box of the candidate labeling object according to the boundary box data;

and when the intersection state represents that the virtual connecting line intersects any boundary surface of the candidate labeling object, taking the candidate labeling object as a target labeling object.

4. The labeling method of claim 1, wherein the determining a target labeling object in the virtual scene according to the virtual connecting line comprises:

determining the stay time of the virtual connecting line on the candidate marking object;

and when the stay time is greater than or equal to a preset first threshold, taking the candidate labeling object as a target labeling object.

5. The labeling method according to any one of claims 1 to 4, wherein after determining a target labeling object in the virtual scene according to the virtual connecting line, the labeling method further comprises:

and highlighting the target labeling object.

6. The labeling method according to any one of claims 1 to 4, wherein the labeling the target labeling object comprises:

receiving environment grid data of the virtual scene sent by the server;

determining an intersection point between the virtual connecting line and the target labeling object according to the environment grid data;

and highlighting the intersection point, and labeling the target labeling object according to the intersection point subjected to highlighting.

7. The labeling method of claim 6, wherein the labeling the target labeling object according to the intersection point after the highlighting process comprises:

determining a distance value between the terminal and the target labeling object;

when the distance value is larger than or equal to a preset second threshold value, determining a local area corresponding to the intersection point after the highlight processing, and amplifying the local area;

and marking the target marking object in the enlarged local area according to the intersection point.

8. The labeling method of claim 6, wherein the labeling the target labeling object according to the intersection point after the highlighting process comprises:

acquiring the marked content;

responding to a projection operation instruction of the marked content, transmitting the marked content to the intersection point subjected to highlighting processing along the virtual connecting line, and displaying a first transmission animation in the process of transmitting the marked content;

and marking the marking content to the surface of the target marking object.

9. The annotation method of claim 8, wherein the obtaining of the annotation content comprises:

determining the type of the labeled content according to the data of the boundary box, and displaying a corresponding labeling template according to the type;

and receiving a labeling operation instruction input according to the labeling template, and obtaining labeling content according to the labeling operation instruction.

10. The annotation method according to claim 8 or 9, further comprising:

acquiring a preset labeling rule;

and adjusting the labeling mode of the labeled content according to the labeling rule.

11. The annotation method according to claim 8 or 9, further comprising:

and responding to a viewing operation instruction of the marked content, transmitting the marked content on the surface of the target marked object to the starting point of the virtual connecting line along the virtual connecting line, and displaying a second transmission animation in the process of transmitting the marked content.

12. An annotation method applied to a server, the annotation method comprising:

and sending the pose data to the terminal so that the terminal generates and displays a virtual connecting line between the terminal and the virtual scene according to the pose data, and the terminal determines a target labeling object in the virtual scene according to the virtual connecting line so as to label the target labeling object.

13. The annotation method according to claim 12, wherein the obtaining pose data of the terminal in the virtual scene corresponding to the point cloud map according to the positioning image includes:

extracting a first image feature of the positioning image;

comparing the first image characteristic with second image characteristics of a plurality of candidate reference images, and determining a target reference image from the plurality of candidate reference images according to a comparison result;

and obtaining pose data of the terminal in the point cloud map according to the second image features of the target reference image and the sparse point cloud structure.

14. A terminal comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the annotation method of any one of claims 1 to 11 when executing the computer program.

15. A server comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the annotation method of any one of claims 12 to 13 when executing the computer program.

16. A computer-readable storage medium storing a program, wherein the program is characterized by implementing the labeling method of any one of claims 1 to 13 when executed by a processor.