WO2023093217A1 - Procédé et appareil de marquage de données, et dispositif informatique, support de stockage et programme - Google Patents

Procédé et appareil de marquage de données, et dispositif informatique, support de stockage et programme Download PDF

Info

Publication number
WO2023093217A1
WO2023093217A1 PCT/CN2022/117915 CN2022117915W WO2023093217A1 WO 2023093217 A1 WO2023093217 A1 WO 2023093217A1 CN 2022117915 W CN2022117915 W CN 2022117915W WO 2023093217 A1 WO2023093217 A1 WO 2023093217A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
dimensional
information
model
target scene
Prior art date
Application number
PCT/CN2022/117915
Other languages
English (en)
Chinese (zh)
Inventor
侯欣如
姜翰青
刘浩敏
陈东生
甄佳楠
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023093217A1 publication Critical patent/WO2023093217A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/08Indexing scheme for image data processing or generation, in general involving all processing steps from image acquisition to 3D model generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30108Industrial image inspection

Definitions

  • the present disclosure relates to the field of computer technology, and relates to but not limited to a data labeling method, device, computer equipment, storage medium and computer program.
  • the target scene can be photographed and the data marked on the captured image can be used.
  • the data labeling results corresponding to the objects in the two-dimensional image it is impossible to intuitively display the data labeling results corresponding to the objects in the two-dimensional image, and it is not convenient for subsequent object management, control and maintenance based on the data labeling results.
  • Embodiments of the present disclosure at least provide a data labeling method, device, computer equipment, storage medium, and computer program.
  • an embodiment of the present disclosure provides a data labeling method, which is applied to a server; the method includes: acquiring image data to be processed; the image data carries label information, and the image data includes information about the target scene The video or image obtained by image acquisition; based on the image data, three-dimensional reconstruction of the target scene is performed to obtain a three-dimensional model of the target scene; based on the two-dimensional labeling position of the tag information in the image data, Determine a target three-dimensional position of the tag information in a model coordinate system corresponding to the three-dimensional model; based on the target three-dimensional position, add tag information corresponding to the target three-dimensional position to the three-dimensional model.
  • the target 3D position of the label information in the 2D image in the model coordinate system corresponding to the 3D model is determined; Position, add label information to the 3D model, so that the generated 3D model of the target scene carries label information, which is convenient for visually displaying the data annotation results corresponding to each target object in the target scene through the 3D model of the target scene carrying label information , which facilitates the subsequent management, control and maintenance of the target objects in the target scene based on the data annotation results.
  • the tag information carried by the image data is obtained according to the following steps: performing semantic segmentation processing on images in the image data, and generating the tag information based on a result of the semantic segmentation processing.
  • the tag information carried by the image data is obtained according to the following steps: receiving the tag information sent by the terminal device; wherein, the tag information is the response of the terminal device to the image in the image data Generated by the annotation operation.
  • the determining the target three-dimensional position of the label information in the model coordinate system corresponding to the three-dimensional model based on the two-dimensional label position of the label information in the image data includes: In at least one frame of image included in the image data, determine the target image marked with the label information; determine that the two-dimensional labeling position of the label information is at the target pixel corresponding to the target image; based on the target image, and The image acquisition device collects the pose of the target image, restores the three-dimensional position of the target image, and obtains the three-dimensional position of the target pixel in the model coordinate system; based on the position of the target pixel in the model The three-dimensional position in the coordinate system determines the target three-dimensional position of the tag information in the model coordinate system corresponding to the three-dimensional model.
  • the target pixel points marked with tag information on the two-dimensional image can be accurately determined, corresponding to the three-dimensional coordinates of the target scene in the three-dimensional coordinate system.
  • the location enables the subsequent accurate addition of tag information to the 3D model of the target scene based on the 3D model of the target, and improves the accuracy of subsequent generation of the 3D model of the target scene carrying the tag information.
  • the 3D model includes 3D sub-models respectively corresponding to each target object in the target scene; based on the 3D position of the target, adding the 3D model corresponding to the 3D position of the target label information, including: based on the three-dimensional position of the target and the poses of the three-dimensional sub-models corresponding to the target objects in the three-dimensional model in the model coordinate system, determine the label information to be added a target object; establishing an association relationship between the target object to which the tag information is to be added and the tag information.
  • the target object to be added with label information can be accurately determined from the multiple target objects contained in the target scene, so as to provide The tag information is added to the target object to which the tag information is to be added, thereby improving the accuracy of the generated 3D model of the target scene carrying the tag information.
  • the establishment of the association between the target object to which the tag information is to be added and the tag information includes: based on the three-dimensional position of the target, determining that the tag information is in the Adding the three-dimensional annotation position on the target object of the tag information; establishing an association relationship between the three-dimensional annotation position and the tag information.
  • the corresponding label information can be accurately added to the three-dimensional sub-model corresponding to the target object to be added with label information in the target scene, thereby improving the accuracy of the generated three-dimensional model of the target scene carrying label information .
  • the method further includes: acquiring display material; generating a label instance based on the display material and the label information; in response to a label display event being triggered, displaying the 3D model of the target scene, and The tag instance.
  • the user can more intuitively know the spatial structure of the target scene, the pose information of each target object in the target scene, and the structure of each target object , and the label information carried by the target object, it is convenient for subsequent users to manage, control and maintain the target object in the target scene based on the data labeling results.
  • the label display event includes: the target object added with the label information is triggered; the display of the 3D model of the target scene and the label instance includes: display of the target scene The 3D model and the label instance corresponding to the triggered target object.
  • the label display event includes: the 3D label position associated with the label information is displayed in a graphical user interface; the 3D model showing the target scene and the label instance are displayed, It includes: a 3D model showing the target scene, and a label instance associated with the 3D marked position.
  • the label information includes at least one of the following: label attribute information, label content information; wherein, the label attribute information includes at least one of the following: label size information, label color information, label shape information
  • the tag content information includes at least one of the following: attribute information corresponding to the target object, defect inspection result information corresponding to the target object, and fault maintenance information corresponding to the target object.
  • the 3D model includes a 3D point cloud model; performing 3D reconstruction on the target scene based on the image data to obtain a 3D model of the target scene includes: based on the image data, And the pose when the image acquisition device collects the image data, performs three-dimensional point cloud reconstruction on the target scene, and obtains the point cloud data of the target scene;
  • the point cloud data includes: a plurality of objects belonging to the target scene The point cloud points corresponding to the target objects and the position information respectively corresponding to each of the point cloud points; performing semantic segmentation processing on the point cloud data to obtain semantic information respectively corresponding to a plurality of the point cloud points; based on the The point cloud data and the semantic information generate a three-dimensional point cloud model of the target scene;
  • the three-dimensional point cloud model of the target scene includes three-dimensional sub-point cloud models corresponding to each target object.
  • the 3D point cloud reconstruction of the target scene is performed, and the obtained point cloud data is semantically segmented to generate a real space that can reflect each target object in the target scene
  • the structure and the 3D point cloud model of the pose information corresponding to each target object provide a more accurate data basis for subsequent labeling of the 3D model of the target scene.
  • the 3D model includes a 3D dense model; performing 3D reconstruction on the target scene based on the image data to obtain a 3D model of the target scene includes: based on the image data, and The image acquisition device collects the pose of the image data, performs three-dimensional dense reconstruction on the target scene, and obtains three-dimensional dense data of the target scene; the three-dimensional dense data includes: multiple target objects located in the target scene A plurality of dense points on the surface and position information corresponding to each of the dense points; performing semantic segmentation processing on the three-dimensional dense data to obtain semantic information corresponding to a plurality of patches composed of the dense points; based on the The three-dimensional dense data and the semantic information are used to generate a three-dimensional dense model of the target scene; the three-dimensional dense model of the target scene includes three-dimensional sub-dense models corresponding to each target object.
  • the 3D dense reconstruction of the target scene is performed, and the obtained 3D dense data is semantically segmented to generate a real spatial structure that can reflect the target objects in the target scene , and the three-dimensional dense model of the pose information corresponding to each target object provides a more accurate data basis for subsequent labeling of the three-dimensional model of the target scene.
  • the target object includes at least one of the following: a building located in the target scene, and a device deployed in the target scene.
  • an embodiment of the present disclosure further provides a data labeling device, which is applied to a server, and the device includes: an acquisition part configured to acquire image data to be processed; the image data carries tag information, and the image data includes Collect images of the target scene to obtain video or images; the first processing part is configured to perform three-dimensional reconstruction on the target scene based on the image data to obtain a three-dimensional model of the target scene; the determination part is configured to The two-dimensional labeling position of the label information in the image data determines the target three-dimensional position of the label information in the model coordinate system corresponding to the three-dimensional model; the second processing part is configured to be based on the target three-dimensional position, as The three-dimensional model adds label information corresponding to the three-dimensional position of the target.
  • the acquisition part when the acquisition part acquires the label information carried by the image data according to the following steps, it is specifically configured to: perform semantic segmentation processing on the image in the image data, and based on the semantic As a result of the segmentation process, the label information is generated.
  • the acquiring part when the acquiring part acquires the tag information carried by the image data according to the following steps, it is specifically configured to: receive the tag information sent by the terminal device; wherein the tag information is the generated by the terminal device in response to the labeling operation on the image in the image data.
  • the determining part determines the target 3D position of the label information in the model coordinate system corresponding to the 3D model when executing the 2D labeling position in the image data based on the label information.
  • position the specific configuration is: in at least one frame of image included in the image data, determine the target image marked with the label information; determine the two-dimensional label position of the label information at the target pixel corresponding to the target image point; based on the target image and the pose when the image acquisition device captures the target image, perform three-dimensional position restoration on the target image to obtain the three-dimensional position of the target pixel point in the model coordinate system; based on The three-dimensional position of the target pixel point in the model coordinate system determines the target three-dimensional position of the label information in the model coordinate system corresponding to the three-dimensional model.
  • the 3D model includes 3D sub-models corresponding to the target objects in the target scene;
  • the specific configuration is: based on the target three-dimensional position and the three-dimensional sub-models respectively corresponding to the target objects in the three-dimensional model in the model coordinate system determine the target object to which the tag information is to be added; and establish an association between the target object to which the tag information is to be added and the tag information.
  • the second processing part when the second processing part executes the establishment of the association between the target object to which the tag information is to be added and the tag information, it is specifically configured to: based on the target three-dimensional The location is to determine the three-dimensional labeling position of the label information on the target object to which the label information is to be added; to establish an association relationship between the three-dimensional labeling position and the label information.
  • the apparatus further includes: a presentation part configured to obtain presentation materials; generate a label instance based on the presentation materials and the label information; and present the target scene in response to a label display event being triggered 3D model of , and the label instance.
  • a presentation part configured to obtain presentation materials; generate a label instance based on the presentation materials and the label information; and present the target scene in response to a label display event being triggered 3D model of , and the label instance.
  • the tag display event includes: the target object added with the tag information is triggered; when the display part executes the display of the 3D model of the target scene and the tag instance, specifically The configuration is: displaying the 3D model of the target scene and the tag instance corresponding to the triggered target object.
  • the label display event includes: the 3D label position associated with the label information is displayed in a graphical user interface; the presentation part performs the displaying the 3D model of the target scene, and the tag instance, the specific configuration is: displaying the 3D model of the target scene and the tag instance associated with the 3D marked position.
  • the label information includes at least one of the following: label attribute information, label content information; wherein, the label attribute information includes at least one of the following: label size information, label color information, label shape information
  • the tag content information includes at least one of the following: attribute information corresponding to the target object, defect inspection result information corresponding to the target object, and fault maintenance information corresponding to the target object.
  • the 3D model includes a 3D point cloud model; when the first processing part performs the 3D reconstruction of the target scene based on the image data to obtain the 3D model of the target scene , the specific configuration is: based on the image data and the pose when the image acquisition device collects the image data, perform three-dimensional point cloud reconstruction on the target scene to obtain point cloud data of the target scene; the point cloud The data includes: point cloud points corresponding to multiple target objects in the target scene, and position information corresponding to each of the point cloud points; performing semantic segmentation processing on the point cloud data to obtain multiple points Semantic information corresponding to the cloud points; based on the point cloud data and the semantic information, generate a 3D point cloud model of the target scene; the 3D point cloud model of the target scene includes the 3D point cloud model corresponding to each target object subpoint cloud model.
  • the 3D model includes a 3D dense model; when the first processing part performs the 3D reconstruction of the target scene based on the image data to obtain the 3D model of the target scene,
  • the specific configuration is: based on the image data and the pose when the image acquisition device collects the image data, perform three-dimensional dense reconstruction on the target scene to obtain three-dimensional dense data of the target scene;
  • the three-dimensional dense data includes : a plurality of dense points located on the surface of a plurality of target objects in the target scene, and position information respectively corresponding to each of the dense points; performing semantic segmentation processing on the three-dimensional dense data to obtain a plurality of dense points composed of Semantic information corresponding to each patch; based on the three-dimensional dense data and the semantic information, generate a three-dimensional dense model of the target scene;
  • the three-dimensional dense model of the target scene includes three-dimensional sub-models corresponding to each target object dense model.
  • the target object includes at least one of the following: a building located in the target scene, and a device deployed in the target scene.
  • an embodiment of the present disclosure further provides a computer device, a processor, and a memory, the memory stores machine-readable instructions executable by the processor, and the processor is used to execute the machine-readable instructions stored in the memory.
  • a readable instruction when the machine-readable instruction is executed by the processor, any one of the above data labeling methods is executed when the machine-readable instruction is executed by the processor.
  • an embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is run, any one of the above-mentioned data labeling methods is executed.
  • the embodiment of the present disclosure further provides a computer program, including computer readable code, when the computer readable code is run in the electronic device, the processor in the electronic device executes any one of the above A data labeling method.
  • FIG. 1 shows a flow chart of a data labeling method provided by an embodiment of the present disclosure
  • FIG. 2 shows a schematic diagram of a graphical user interface indicating image acquisition in the data labeling method provided by an embodiment of the present disclosure
  • FIG. 3 shows a flow chart of a specific method of generating a three-dimensional dense model in the data labeling method provided by an embodiment of the present disclosure
  • FIG. 4 shows a schematic diagram of a display interface showing a three-dimensional model of a target scene and a label instance in the data labeling method provided by an embodiment of the present disclosure
  • Fig. 5 shows a schematic diagram of a data labeling device provided by an embodiment of the present disclosure
  • Fig. 6 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
  • the target scene can be photographed and the data marked on the captured image can be used.
  • the data labeling results corresponding to the objects in the two-dimensional image, and it is not convenient for subsequent object management, control and maintenance based on the data labeling results.
  • the unity engine can be used to determine the 3D model corresponding to the image according to the collected image, and manually add labels in the 3D model.
  • This solution has at least the following disadvantages: 1) Adding labels to the 3D model is difficult The process is a manual operation with low efficiency. When a large number of labels need to be added, the workload is too large; 2) The above-mentioned scheme of adding labels to the 3D model is a scheme of artificially adding labels, which is prone to errors due to subjective factors; 3 ) When using the unity engine to manually add labels in the 3D model, it is likely that there will be many similar devices that make it difficult to determine the specific location of the label.
  • embodiments of the present disclosure provide a data labeling method, device, computer equipment, and storage medium.
  • a three-dimensional model of the target scene By constructing a three-dimensional model of the target scene, and based on the two-dimensional labeling position of the label information in the image data, it is determined that the two-dimensional
  • the tag information in the image is the 3D position of the target in the model coordinate system corresponding to the 3D model; based on the 3D position of the target, tag information is added to the 3D model, so that the generated 3D model of the target scene carries tag information, which is convenient for passing tags
  • the 3D model of the target scene of the information intuitively displays the data labeling results corresponding to each target object in the target scene, which facilitates the subsequent management, control and maintenance of the target objects in the target scene based on the data labeling results.
  • the execution subject of the data labeling method provided in the embodiments of the present disclosure is generally a computer device with a certain computing power.
  • the computer The equipment includes, for example: terminal equipment or server or other processing equipment, and the terminal equipment can be user equipment (User Equipment, UE), mobile equipment, user terminal, terminal, cellular phone, cordless phone, personal digital assistant (Personal Digital Assistant, PDA) , handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc.
  • the data labeling method may be implemented by calling a computer-readable instruction stored in a memory by a processor.
  • FIG. 1 it is a flow chart of a data labeling method provided by an embodiment of the present disclosure.
  • the method is applied to a server, and the method includes steps S101 to S104, wherein:
  • the image data carries label information
  • the image data includes the video or image obtained by image acquisition of the target scene by an image acquisition device
  • the video may include, for example, a panoramic video
  • the image acquisition device may include, for example, a mobile phone, a camera, a video camera, a camera, At least one of panoramic cameras, unmanned aerial vehicles, drones, and the like. Since the image acquisition device can obtain image data or a plurality of video frame images when shooting a target scene, it can be suitable for shooting in a target scene with a large space such as a computer room and a factory building, for example.
  • Computer room Take the computer room as an example, in which, for example, computing equipment, data storage equipment, and signal receiving equipment can be stored; in the factory building, for example, production equipment, loading and unloading equipment, and transportation equipment can be stored. Both the computer room and the factory building are physical spaces.
  • the target scene may include, for example, a computer room with a large floor area, for example, a computer room with an area of 20 square meters, 30 square meters, or 50 square meters.
  • an image acquisition device may be used to shoot the scene therein.
  • the target scene can also be an outdoor scene, for example, in order to monitor the surrounding environment of the tower used for communication or for power transmission, so as to prevent the vegetation around the tower from affecting the normal application of the tower during the growth process, therefore,
  • the tower and surrounding environment can be used as the target scene, image data can be obtained, and the tower, the vegetation near the tower, and the buildings that may exist near the tower can be modeled.
  • the target scene for data collection and data labeling may include multiple areas, for example, multiple computer rooms may be included in a large target scene. Since the areas for data collection are similar, the data labeling method provided by the embodiments of the present disclosure may be applied in multiple different areas included in the target scene accordingly.
  • the target object may include but not limited to: at least one of buildings located in the target scene, equipment deployed in the target scene, and vegetation located in the target scene;
  • the buildings located in the target scene may include, but not limited to: at least one of the ceiling of the computer room, the floor of the computer room, the wall of the computer room, the pillars of the computer room, etc.; Including but not limited to: at least one of the towers and outdoor cabinets installed on the sky of the computer room, the cable racks connected to the towers, and the indoor cabinets installed in the computer room.
  • the image acquisition device when the image acquisition device collects images of the target scene, it can control the robot equipped with the image acquisition device to walk in the target place to obtain the image data corresponding to the target scene;
  • the way of image acquisition equipment is to collect images of the target scene to obtain the image data corresponding to the target scene; or, it is also possible to control the UAV equipped with image acquisition equipment to fly in the target scene to collect the image data of the target scene .
  • the image collection device when collecting images of the target scene, in order to model the target scene more completely, the image collection device can be controlled to collect images in different poses to form image data corresponding to the target scene.
  • the image data obtained by image acquisition by the image acquisition device is used for, for example, three-dimensional reconstruction during data processing, it is necessary to determine the pose of the image acquisition device in the target scene.
  • the gyroscope of the image acquisition device may be calibrated to determine the pose of the image acquisition device in the target scene before the image acquisition device is used to acquire images of the target scene; exemplary, for example The optical axis of the image acquisition device can be adjusted to be parallel to the ground of the target scene.
  • image data acquisition can be performed by selecting an image data acquisition mode of the image acquisition device, and image data corresponding to the target scene can be obtained.
  • the image acquisition device installed on the server can be used to acquire images of the target scene.
  • FIG. 2 is a schematic diagram of a graphical user interface indicating image acquisition provided by an embodiment of the present disclosure.
  • the captured image data is shown.
  • the image data includes two objects in the target scene, including a cabinet and an integrated cabinet.
  • instruction information prompting the user to take pictures may also be displayed on the GUI, such as the prompt information 21 of "front of the cabinet” and "side of the cabinet” shown below the GUI.
  • the prompt information 21 may be displayed in the form of operation buttons, for example, and in response to the user triggering the corresponding operation button, an acquisition button 22 indicating to acquire an image may also be displayed at a corresponding position. In response to triggering the capture button 22, the image capture can be started accordingly, and further image capture can be performed according to the instruction information.
  • a return control 23 and an edit control 24 may also be displayed correspondingly.
  • the previous operation step can be returned, and the graphical user interface correspondingly displays the display image corresponding to the previous operation step.
  • the corresponding data annotation page may also be displayed to the user accordingly.
  • the tag information carried by the image data may also be correspondingly acquired.
  • the label information may include but not limited to: label attribute information, label content information; label attribute information is used to display labels, for example may include but not limited to: label size information, label color information, label shape information; label content information is used to display labels Describe the information contained in the target object in the target scene in the real scene, for example, may include but not limited to at least one of the following: attribute information of the corresponding target object, defect inspection result information of the corresponding target object, and fault repair of the corresponding target object Situation information.
  • At least one of the following A1-A3 can be used, but not limited to, to obtain the tag information carried by the image data:
  • A1. Perform semantic segmentation processing on the images in the image data, and generate label information based on the results of the semantic segmentation processing; or, perform image recognition on the images in the image data, and generate label information based on the results of image recognition.
  • a pre-trained neural network can be used to analyze the images in the image data. Perform semantic segmentation processing; generate label information based on the result of semantic segmentation processing.
  • the pre-trained neural network may include but not limited to at least one of the following: convolutional neural network (Convolutional Neural Networks, CNN), self-attention neural network Transformer.
  • convolutional neural network Convolutional Neural Networks, CNN
  • self-attention neural network Transformer Self-attention neural network Transformer
  • the terminal device collects the image data of the target scene, it transmits the image data of the target scene to the server, and the server uses a pre-trained neural network to perform semantic segmentation on the images in the received image data of the target scene processing; generate label information based on the result of semantic segmentation processing.
  • a pre-trained neural network is used to perform semantic segmentation processing on images in the image data to obtain cabinets and integrated cabinets; then, label the cabinets and integrated cabinets at corresponding positions in the image data.
  • image recognition refers to the technology of using computers to process, analyze and understand images to identify targets and objects in various patterns.
  • deep learning algorithms can be applied to image recognition of images in image data, so that Objects such as cabinets and integrated cabinets are identified; at the corresponding positions in the image data, the cabinets and integrated cabinets can be marked.
  • tag information may also be generated in response to the user's marking operation on the GUI.
  • the collected image data may be correspondingly displayed on the graphical user interface.
  • Users such as staff can view the image data displayed on the GUI by dragging, sliding, and other operations on the GUI, or by triggering controls such as playing and pausing.
  • one or more frames of images can also be selected for labeling, for example, a clearer image can be determined during the process of viewing image data, or the image of the target object can be clearly displayed, and after determining In the image, data labeling is performed by triggering any pixel.
  • a corresponding data labeling control may be triggered to fill in the labeling data at a corresponding position in the graphical user interface.
  • At least one corresponding labeling control may be displayed on the GUI.
  • attribute information such as the object's device name, service life, specific functions, device person in charge, device manufacturer, device size and specifications, and relevant text notes.
  • an indicator label can be displayed at the position triggered by the user, such as the indicator label 25 corresponding to the integrated cabinet and the indicator label 26 corresponding to the cabinet in FIG. 2 .
  • both the indicator tag and the tag data can be used as tag information obtained by tagging images in the image data; in some embodiments, by triggering the tag information in the tag information, for example, the corresponding tag data can be viewed accordingly.
  • A3. Receive the tag information sent by the terminal device; wherein, the tag information is generated by the terminal device in response to the labeling operation on the image in the image data.
  • the collected image data can be displayed on the graphical user interface of the terminal device; Operation, or view the image data displayed on the GUI by triggering controls such as play and pause; in the process of viewing image data, you can also select one or more frames of images for manual annotation.
  • the specific annotation process please refer to The specific implementation manner shown in A2.
  • the terminal device After the terminal device responds to the labeling operation on the image in the image data, it generates label information; and sends the label information to the server, so that the server receives the label information obtained by labeling the image in the image data. In this way, by labeling the images in the image data, label information that meets the actual needs of users can be generated.
  • the data labeling method provided by the embodiment of the present disclosure further includes:
  • the 3D model may include, for example: at least one of a 3D point cloud model and a 3D dense model.
  • At least one of the following B1-B2 can be used, but not limited to, to perform 3D reconstruction of the target scene based on image data to obtain a 3D model of the target scene:
  • the 3D model includes a 3D point cloud model
  • the 3D model includes a 3D point cloud model
  • the 3D point cloud model of the target scene includes each target The three-dimensional sub-point cloud models corresponding to the objects.
  • the point cloud data includes: point cloud points corresponding to multiple target objects belonging to the target scene, and position information corresponding to each point cloud point.
  • At least one of the following C1-C2 can be used, but not limited to, to reconstruct the 3D point cloud of the target scene based on the image data to obtain the point cloud data of the target scene:
  • each pixel in each frame of the image data corresponding to the acquired target scene does not have a depth value. For example, through multiple images taken from different angles of the target scene, the specific position of each point in the target scene can be calculated, so that the point cloud points corresponding to the target scene can be constructed, and the point cloud data of the target scene can be obtained .
  • each pixel point in each frame of image in the obtained image data corresponding to the target scene corresponds to a depth value, and the image containing the depth value can be used to determine each pixel in the target scene.
  • the position coordinates of the point in the target scene that is, determine the point cloud data of the target scene.
  • the semantic information of each point cloud point can be determined by means of semantic segmentation.
  • semantic segmentation on point cloud data is more complicated than semantic segmentation on two-dimensional space, for example, the problem of semantic segmentation can be transformed into Semantic segmentation problem on 2D images.
  • the point cloud points can be projected into a virtual two-dimensional image, and then the pre-trained neural network can be used for semantic segmentation processing.
  • the pre-trained neural network refer to the relevant description in the specific implementation manner shown in A1 in the embodiment S101 of the present disclosure.
  • the virtual semantic segmentation image can be obtained; each virtual pixel in the virtual semantic segmentation image corresponds to a score under different categories, where the scores under different categories represent the virtual pixel Points belong to the confidence of the corresponding category, and corresponding semantic information can be determined for the virtual pixel according to the scores under different categories.
  • the semantic information determined in the virtual two-dimensional image can be mapped to the point cloud points according to the corresponding relationship between the virtual pixel points and the point cloud points, that is, the semantic information of each point cloud point can be determined.
  • the semantic information of the point cloud points may include at least one of the following: three-dimensional coordinates, color, classification value, intensity value, and time.
  • the 3D point cloud model of the target scene can be determined according to the position information of each point cloud point in the target scene and the corresponding semantic information of each point cloud point.
  • the point cloud model includes three-dimensional sub-point cloud models respectively corresponding to a plurality of target objects.
  • point cloud points with adjacent positions and the same semantic information can be used as point cloud points corresponding to the same target object.
  • the 3D sub-point cloud model corresponding to the target object can be generated based on the point cloud points corresponding to each target object; here, in the 3D sub-point cloud model corresponding to each target object Semantic information corresponding to each point cloud point can be carried.
  • the 3D dense reconstruction of the target scene can be performed to obtain the 3D dense data of the target scene; for the 3D dense data Perform semantic segmentation processing to obtain semantic information corresponding to multiple patches composed of dense points; generate a 3D dense model of the target scene based on the 3D dense data and semantic information; the 3D dense model of the target scene includes each target object respectively The corresponding 3D subdense model.
  • the three-dimensional dense data includes: multiple dense points located on the surface of multiple target objects in the target scene, and the position information corresponding to each dense point; here, any one of the multiple patches composed of dense points is composed of It is composed of at least three dense points with a connection relationship.
  • the surface patch provided by the embodiment of the present disclosure may include, but not limited to, at least one of a triangular surface patch or a quadrilateral surface patch, which is not specifically limited here.
  • At least one of the following methods D1-D2 may be adopted but not limited to:
  • the image acquisition device only undertakes the task of image acquisition, and relies on the network connection to transmit the collected image data and the pose of the image acquisition device when collecting image data to the data processing device, so that the data processing device can establish the target scene 3D model of .
  • the network connections that can be relied on can include, but are not limited to, fiber Ethernet adapters (Fiber Ethernet Adapter), mobile communication technologies (such as fourth-generation mobile communication technology (4G) or fifth-generation mobile communication technology (5G)), And wireless fidelity communication (Wireless Fidelity, Wi-Fi);
  • the data processing equipment may include, but not limited to, the computer equipment described above, for example.
  • the data processing device processes the image data, for example, it can perform 3D point cloud reconstruction of the target scene according to the image data and the pose of the image acquisition device when collecting the image data (that is, the pose of the image acquisition device in the target scene).
  • it can also perform 3D dense reconstruction of the target scene according to the image data and the pose of the image acquisition device when collecting image data, and obtain the 3D dense data of the target scene; use At least one of CNN, deep self-attention transformation network, etc., performs semantic segmentation processing on three-dimensional dense data, and obtains semantic information corresponding to multiple patches composed of dense points; based on three-dimensional dense data, and each patch Corresponding to the semantic information, a 3D dense model of the target scene is
  • relevant data of an inertial measurement unit (Inertial Measurement Unit, IMU) when the image acquisition device collects image data may be obtained.
  • IMU Inertial Measurement Unit
  • the inertial measurement unit IMU of the image acquisition device may contain three single-axis accelerometers and three single-axis gyroscopes, the accelerometers can detect the acceleration of the image acquisition device when collecting image data, and the gyroscope The instrument can detect the angular velocity of the image acquisition device when collecting image data. In this way, by collecting relevant data of the inertial measurement unit IMU in the image acquisition device, the pose when the image acquisition device collects image data can be accurately determined.
  • the image acquisition device when the image acquisition device collects image data, when the image acquisition device gradually moves to collect image data, it can gradually generate a 3D point cloud model or a 3D dense model covering the target scene; After the image data is collected, use the obtained complete image data to generate a 3D point cloud model or a 3D dense model corresponding to the target scene.
  • the image data may include a panoramic video
  • a 3D model of the target scene may also be generated based on the panoramic video obtained by capturing images of the target scene by the panoramic camera.
  • the panoramic camera selects two fisheye cameras set at the front and rear positions on the scanner; the fisheye camera is placed on the scanner at a preset pose to obtain a complete panoramic video corresponding to the target scene.
  • a data processing device uses a panoramic camera to capture a panoramic video of a target scene to generate a flowchart of a specific method for generating a three-dimensional dense model, wherein:
  • the data processing device acquires two panoramic videos synchronized in time and collected in real time by two fisheye cameras at the front and rear of the scanner.
  • the two panoramic videos respectively include multiple frames of video frame images. Since two fisheye cameras capture two panoramic videos synchronized in time in real time, the time stamps of the multi-frame video frame images included in the two panoramic videos respectively correspond to each other.
  • the accuracy of the time stamp and the acquisition frequency when acquiring video frame images in the panoramic video can also be determined according to the specific instrument parameters of the two fisheye cameras.
  • the time stamp when collecting video frame images is set to be accurate to nanoseconds; when collecting video frame images in panoramic videos, the collection frequency is not lower than 30 hertz (Hz).
  • the data processing device determines the relevant data of the inertial measurement unit (IMU) when the two fisheye cameras acquire the panoramic video respectively.
  • IMU inertial measurement unit
  • the corresponding scanner coordinate system can also be determined for the fisheye camera (the scanner coordinate system can be composed of X axis, Y axis, and Z axis, for example), so as to determine the correlation of the inertial measurement unit IMU on the scanner coordinate system Data, such as acceleration and angular velocity in the X-axis, Y-axis, and Z-axis of the scanner coordinate system.
  • the time stamp when the relevant data of the inertial measurement unit IMU is acquired can also be determined according to the specific instrument parameters of the two fisheye cameras. Exemplarily, it may be determined that the observation frequency for acquiring relevant data of the inertial measurement unit IMU is not lower than 400 Hz.
  • the data processing device determines the poses of the two fisheye cameras in the world coordinate system based on the relevant data of the inertial measurement unit IMU.
  • the two coordinate systems in the world coordinate system can be determined according to the coordinate system conversion relationship.
  • the pose of the fisheye camera for example, can be expressed as a 6-degree of freedom (6-Degree Of Freedom, 6DOF) pose. Specifically, it is determined in the world coordinate system according to the coordinate system conversion relationship between the scanner coordinate system and the world coordinate system.
  • 6DOF 6-degree of freedom
  • the image processing steps of image processing, extracting key points, tracking key points, and establishing the relationship between key points can be accurately calculated.
  • the 6DOF pose of the acquisition device that is, the real-time acquisition and calculation of the 6DOF pose of the image acquisition device; and the coordinates of the dense point cloud points in the target object can also be obtained.
  • key frame images can also be determined in the corresponding multi-frame video frame images in the panoramic video, so as to have a sufficient amount of processing data in the process of ensuring three-dimensional dense reconstruction At the same time, reduce the amount of calculation and improve efficiency.
  • the method of determining the key frame image from the panoramic video may be, but not limited to, at least one of the following methods E1-E4:
  • extracting a preset number of video frame images at a preset time may include, but not limited to, two frames per second, for example.
  • image processing algorithm image analysis algorithm, natural language processing (Natural Language Processing, NLP) and other technologies to identify the content of each video frame image in the panoramic video, and determine the semantic information corresponding to each frame video frame image, based on Semantic information corresponding to each frame of video frame image, and extract the video frame image containing the target object as the key frame image.
  • natural language processing Natural Language Processing, NLP
  • the panoramic video of the target scene can be shown to the user, and when the panoramic video is displayed, in response to the user's selection operation on some of the video frames, the selected video frame is used as the panoramic video Keyframe images in .
  • prompt information of a selected key frame image may be displayed to the user.
  • a video frame image in a panoramic video may be selected in response to a user's specific operations such as long press and double click, and the selected video frame image may be used as a key frame image.
  • prompt information can also be displayed, for example, displaying the text "Please long press to select this video frame image", and when receiving the user's long press operation on any video frame image in the panoramic video, Use this video frame image as a key frame image.
  • the key frame image map can be stored in the background, so that after the image acquisition device is controlled to return to the captured position, the two video frame images at this position can be compared to compare the image
  • the acquisition device performs loopback detection, thereby correcting the cumulative positioning error of the image acquisition device under long-term and long-distance operations.
  • the data processing device processes the key frame images in the panoramic video respectively acquired by the fisheye cameras and the poses of the fisheye cameras as input data of the real-time dense reconstruction algorithm.
  • the panoramic video acquired by any fisheye camera after using the above S301-S303 to determine the new key frame image in the panoramic video, all the currently obtained key frame images and the corresponding new key frame images
  • the pose of the fisheye camera is used as the input data of the real-time dense reconstruction algorithm.
  • the pose of the corresponding fisheye camera is input as input data to Real-time dense reconstruction algorithm, so when inputting a new key frame image, it is not necessary to repeat the input.
  • the data processing device uses a real-time dense reconstruction algorithm to process the input data to obtain three-dimensional dense data corresponding to the target scene.
  • the obtained three-dimensional dense data may include, but not limited to: multiple dense points on the surfaces of multiple target objects in the target scene, and position information corresponding to each dense point.
  • the dense point cloud can be continuously expanded and updated along with the process of capturing the panoramic video, for example but not limited to.
  • the update frequency can be determined according to the input frequency of the key frame image and the pose of the fisheye camera when inputting the real-time dense reconstruction algorithm, for example.
  • the dense stereo matching technology can be used to estimate the dense depth map corresponding to the key frame image, and use the pose of the corresponding fisheye camera to fuse the dense depth map into a three-dimensional dense model, so that the 3D dense model of the target scene can be obtained after the target scene is collected.
  • the dense depth map is also called the distance image, which is different from the brightness value stored in the pixel point in the grayscale image, and the pixel point stores the distance between the point and the image acquisition device, that is, the depth value; due to the size of the depth value It is only related to distance, but has nothing to do with factors such as environment, light, direction, etc. Therefore, the dense depth map can truly and accurately reflect the geometric depth information of the scene, so that a 3D dense model that can reflect the real target scene can be generated based on the dense depth map;
  • image enhancement such as denoising or inpainting can be performed on dense depth maps to provide high-quality dense depth images for 3D reconstruction.
  • the pose of the image acquisition device corresponding to the key frame image and the pose of the image acquisition device corresponding to the adjacent new key frame image It can be determined whether the pose of the image capture device is adjusted when capturing the target scene. If the pose is not adjusted, continue to perform real-time 3D dense reconstruction of the target scene to obtain a 3D dense model; The dense depth map of the target object is real-time 3D dense reconstruction, so as to obtain an accurate 3D dense model.
  • the image acquisition device has the computing power that can process the image data, and after the image data is collected, it uses its own computing power to process the image data to obtain the 3D model corresponding to the target scene.
  • the data labeling method provided by the embodiment of the present disclosure further includes:
  • the target image marked with label information can be determined in at least one frame of image included in the image data; the two-dimensional label position of the determined label information is at the corresponding Target pixel points: based on the target image and the pose of the target image when the image acquisition device collects the target image, restore the three-dimensional position of the target image to obtain the three-dimensional position of the target pixel point in the model coordinate system; based on the target pixel point in the model coordinate system , and determine the target three-dimensional position of the label information in the model coordinate system corresponding to the three-dimensional model.
  • corresponding data labeling can be performed by triggering a certain pixel point in the image in the image data in the graphical user interface. Therefore, after the target image marked with label information is determined, the target pixel corresponding to the two-dimensional label position of the label information in the target image may also be determined in response to the user's labeling operation.
  • the target 3D position of the tag information in the model coordinate system corresponding to the 3D model is the target 3D position of the tag information in the target scene; the target 3D position of the tag information in the model coordinate system corresponding to the 3D model can be based on the target The three-dimensional position of the pixel point in the target scene is determined.
  • the depth value of the target pixel in the panoramic image can be determined accordingly.
  • the three-dimensional position information of the target pixel point in the camera coordinate system can be directly known (here, the three-dimensional position information of the target pixel point in the camera coordinate system includes abscissa, ordinate and depth value), and the determined coordinate system
  • the conversion relationship means that the three-dimensional position of the target pixel in the target image can be restored to determine the three-dimensional position of the target pixel in the target scene.
  • the target pixel is a pixel in the two-dimensional image, for example, it can be determined whether the target pixel has a corresponding point cloud point in the three-dimensional model of the target scene, to determine whether the target pixel is in the target scene 3D position in ; for example, if the target pixel point has a corresponding point cloud point in the 3D model of the target scene, the corresponding 3D position information of the point cloud point in the camera coordinate system can be used as the target pixel point in the The corresponding three-dimensional coordinate information in the camera coordinate system. In a manner similar to the above example, the three-dimensional position of the target pixel can be restored to determine the three-dimensional position of the target pixel in the target scene.
  • a specific manner of determining the three-dimensional position of the target pixel point in the target scene may be determined according to actual conditions, and is not specifically limited here.
  • the target three-dimensional position of the label information in the target scene can be determined through the three-dimensional position of the target pixel point in the target scene.
  • the target three-dimensional position of the label information in the target scene may be determined based on the three-dimensional position of the target pixel in the target scene according to but not limited to at least one of the following F1-F2:
  • the three-dimensional position of the target pixel point in the target scene can be directly determined as the target three-dimensional position of the label information in the target scene; that is, the target pixel point in the model coordinate system
  • the three-dimensional position is determined as the target three-dimensional position of the label information in the model coordinate system corresponding to the three-dimensional model.
  • the target three-dimensional position of the tag information in the target scene may be calculated based on the three-dimensional position of the target pixel point in the target scene.
  • the average three-dimensional position of the three-dimensional positions of multiple target pixels in the target scene is calculated, and the average three-dimensional position is determined as the target three-dimensional position of the label information in the target scene, that is, the average three-dimensional position is determined as the target three-dimensional position of the label information in the three-dimensional
  • the three-dimensional position of the target in the model coordinate system corresponding to the model; in a possible implementation manner, the three-dimensional position of each target pixel in the target scene can also be calculated based on the three-dimensional positions of multiple target pixels in the target scene weighted summation, and based on the result of the weighted summation, calculate the average three-dimensional position of the three-dimensional positions of multiple target pixel points in the target scene, and determine the average three-dimensional position as the target three-dimensional position of the label information in the target scene.
  • the specific manner of calculating the target three-dimensional position of the tag information in the target scene based on the three-dimensional position of the target pixel in the target scene can be set according to implementation requirements, and is not limited here.
  • the data labeling method provided by the embodiment of the present disclosure further includes:
  • the 3D model includes 3D sub-models corresponding to each target object in the target scene
  • the preset distance threshold can be It is set according to actual needs, and there is no specific limitation here; for example, the preset distance can be 0, then the point cloud point at the target three-dimensional position is used as the target point cloud point, and the target object to which the target point cloud point belongs is the object to be added Target object for label information.
  • an association relationship between the three-dimensional sub-model or vector model corresponding to the target object to which label information is to be added and the label information may be established.
  • the association relationship may include, for example, the corresponding relationship between the 3D sub-model or vector model corresponding to the target object to be added with label information and the label information. In some embodiments, it may also include the 3D submodel corresponding to the target object to be added with label information. The relative positional relationship between the model or vector model and the label information.
  • the three-dimensional labeling position of the label information on the target object to which the label information is to be added can be determined; and the association relationship between the three-dimensional labeling position and the label information can be established.
  • the three-dimensional position of the target may be used as the three-dimensional marking position of the label information on the target object to which the label information is to be added; and the label information is added to the three-dimensional marking position.
  • the 3D model and tag information of the target scene may also be displayed accordingly.
  • the following methods may be adopted: acquiring display material; generating a label instance based on the display material and label information; displaying a 3D model of the target scene and the label instance in response to a label display event being triggered.
  • the display material may include, for example, a new interface displayed on the GUI, or pop-up window information.
  • Label information for example, may include at least one of the following: label attribute information, label content information; for label attribute information, for example, may include at least one of label size information, label color information, label shape information, etc., here, label size information
  • label attribute information for example, may include at least one of label size information, label color information, label shape information, etc., here, label size information
  • the label color information can be changed according to the user's choice.
  • the label shape information can include, for example, a text box or a table; for example, the label content information can include but not limited to: the name of the target object , the service life of the target object, specific functions, the person in charge of the equipment, the equipment manufacturer, equipment size specifications, relevant text notes and other attribute information, as well as the defect detection result information of the target object, and the fault maintenance information of the target object, etc.
  • the defect detection result information of the target object may include, but not limited to: at least one of the position where the target object has a defect, and the defect condition of the target object, for example, there is a defect at the left side cabinet door of the cabinet crack.
  • the style of rendering and displaying the 3D model of the target scene and the label information For example, when displaying the 3D model of the target scene, you can display a wireframe with a certain color outline; when displaying label information, A label in the form of a text box can be displayed, and the label can display the name of the target object, the position where the target object has defects, and the defect situation of the target object. The details can be determined according to the actual situation.
  • At least one of G1-G2 can be used to display the 3D model of the target scene and the label instance:
  • the label display event includes: a target object added with label information is triggered, a 3D model of the target scene and a label instance corresponding to the triggered target object may be displayed.
  • the 3D model of the target scene and the 3D sub-model corresponding to the target object can be displayed At the same time, display the corresponding label instance. In this way, the user can view the tag instance by triggering the corresponding location.
  • the label display event includes: the 3D label position associated with the label information is displayed in the GUI, the 3D model of the target scene and the label instance associated with the 3D label position can be displayed.
  • the three-dimensional position in response to the user inputting any three-dimensional position, is used as the target three-dimensional position corresponding to the tag information, and if there is tag information at the position or a nearby position, the corresponding three-dimensional model of the target scene and the tag instance be shown.
  • the data marked in the stage of data labeling can be checked and verified through the displayed label examples, so as to improve the accuracy of data labeling.
  • the target scene in Figure 4 includes a machine room, including two control cabinets and a cabinet in the machine room; shown in Figure 4
  • the label instance 41 containing "control cabinet” is displayed at the corresponding position of each control cabinet
  • the label instance 42 containing "cabinet” is displayed at the corresponding position of the cabinet; here, when triggering the label instance corresponding to each control cabinet or cabinet After that, you can obtain detailed attribute information such as the service life, specific functions, person in charge of the equipment, equipment manufacturer, equipment size specifications, relevant text notes, etc. of the target objects corresponding to each control cabinet or cabinet, as well as the defect detection result information of the target objects , and information about the fault maintenance of the target object.
  • the target 3D position of the label information in the 2D image in the model coordinate system corresponding to the 3D model is determined ;
  • tag information is added to the 3D model, so that the generated 3D model of the target scene carries the tag information, which is convenient for intuitively displaying the correspondence of each target object in the target scene through the 3D model of the target scene carrying the tag information.
  • the corresponding three-dimensional position of the target can be determined in the three-dimensional model according to the position of the pixel in the two-dimensional image, and displayed at the target three-dimensional position
  • the added label information that is, after marking in the 2D image, can be displayed synchronously and in real time in the 3D model, which is beneficial to improve the efficiency of digital asset management.
  • the embodiments of the present disclosure may be applied to scenarios such as defect detection and fault detection.
  • the tag information added to the 3D model may include defect inspection result information of the target object or fault maintenance information of the target object.
  • the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possible
  • the inner logic is OK.
  • the embodiment of the present disclosure also provides a data labeling device corresponding to the data labeling method. Since the problem-solving principle of the device in the embodiment of the present disclosure is similar to the above-mentioned data labeling method of the embodiment of the present disclosure, the implementation of the device See the implementation of the method.
  • the data labeling device 500 includes: an acquisition part 501, a first processing part 502, a determination part 503, and a second processing part 504; wherein:
  • the acquiring part 501 is configured to acquire image data to be processed; the image data carries tag information, and the image data includes video or images obtained by image acquisition of a target scene; the first processing part 502 is configured to image data, performing three-dimensional reconstruction on the target scene to obtain a three-dimensional model of the target scene; the determining part 503 is configured to determine the tag information based on the two-dimensional marked position of the tag information in the image data The target three-dimensional position in the model coordinate system corresponding to the three-dimensional model; the second processing part 504 is configured to add label information corresponding to the target three-dimensional position to the three-dimensional model based on the target three-dimensional position.
  • the acquisition part 501 when the acquisition part 501 acquires the tag information carried by the image data according to the following steps, it is specifically configured to: perform semantic segmentation processing on the images in the image data, and based on the Generate the tag information based on the result of the semantic segmentation processing.
  • the acquiring part 501 acquires the tag information carried by the image data according to the following steps, it is specifically configured to: receive the tag information sent by the terminal device; wherein, the tag information is generated by the terminal device in response to an annotation operation on images in the image data.
  • the determining part 503 determines the position of the label information in the model coordinate system corresponding to the three-dimensional model when performing the two-dimensional labeling position in the image data based on the label information.
  • the specific configuration is: in at least one frame of image included in the image data, determine the target image marked with the label information; determine that the two-dimensional marked position of the label information is in the corresponding Target pixel points; based on the target image and the pose of the target image when the image acquisition device collects the target image, perform three-dimensional position restoration on the target image to obtain the three-dimensional position of the target pixel point in the model coordinate system ; Based on the three-dimensional position of the target pixel point in the model coordinate system, determine the target three-dimensional position of the label information in the model coordinate system corresponding to the three-dimensional model.
  • the 3D model includes 3D sub-models respectively corresponding to each target object in the target scene; the second processing part 504 executes the step based on the 3D position of the target, for the
  • the specific configuration is: based on the target 3D position and the 3D sub-models respectively corresponding to the target objects in the 3D model in the model coordinates Determine the target object to which the tag information is to be added based on the pose of the system; establish an association relationship between the target object to which the tag information is to be added and the tag information.
  • the second processing part 504 executes the establishment of the association between the target object to which the tag information is to be added and the tag information, it is specifically configured to: based on the The target three-dimensional position is to determine the three-dimensional labeling position of the label information on the target object to which the label information is to be added; and establish an association relationship between the three-dimensional labeling position and the label information.
  • the device further includes: a display part configured to acquire display material; generate a label instance based on the display material and the label information; and display the displayed The 3D model of the target scene and the label instance.
  • the tag display event includes: the target object added with the tag information is triggered; the display part is performing the display of the 3D model of the target scene and the tag , the specific configuration is: displaying the 3D model of the target scene and the tag instance corresponding to the triggered target object.
  • the label display event includes: a three-dimensional label position associated with the label information is displayed in a graphical user interface;
  • the specific configuration is: displaying the 3D model of the target scene and the tag instance associated with the 3D annotation position.
  • the label information includes at least one of the following: label attribute information, label content information; wherein, the label attribute information includes at least one of the following: label size information, label color information, Label shape information; the label content information includes at least one of the following: attribute information corresponding to the target object, defect inspection result information corresponding to the target object, and fault maintenance information corresponding to the target object.
  • the 3D model includes a 3D point cloud model; the first processing part 502 executes the 3D reconstruction of the target scene based on the image data to obtain a 3D image of the target scene.
  • the specific configuration is: based on the image data and the pose when the image acquisition device collects the image data, perform three-dimensional point cloud reconstruction on the target scene to obtain point cloud data of the target scene;
  • the point cloud data includes: point cloud points corresponding to a plurality of target objects belonging to the target scene, and position information corresponding to each of the point cloud points; performing semantic segmentation processing on the point cloud data to obtain a plurality of Semantic information corresponding to the point cloud points respectively; based on the point cloud data and the semantic information, generate a three-dimensional point cloud model of the target scene; the three-dimensional point cloud model of the target scene includes each target object corresponding to 3D sub-point cloud model of .
  • the 3D model includes a 3D dense model; the first processing part 502 performs the 3D reconstruction of the target scene based on the image data to obtain the 3D model of the target scene.
  • the specific configuration is: based on the image data and the pose when the image acquisition device collects the image data, perform three-dimensional dense reconstruction on the target scene to obtain three-dimensional dense data of the target scene; the three-dimensional dense The data includes: multiple dense points located on the surfaces of multiple target objects in the target scene, and position information corresponding to each of the dense points; performing semantic segmentation processing on the three-dimensional dense data to obtain Semantic information corresponding to a plurality of patches respectively; based on the 3D dense data and the semantic information, a 3D dense model of the target scene is generated; the 3D dense model of the target scene includes the corresponding 3D subdense models.
  • the target object includes at least one of the following: a building located in the target scene, and a device deployed in the target scene.
  • FIG. 6 it is a schematic structural diagram of a computer device 600 provided by an embodiment of the present disclosure, including a processor 601 , a memory 602 , and a bus 603 .
  • the memory 602 is used to store execution instructions, including a memory 6021 and an external memory 6022; the memory 6021 here is also called an internal memory, and is used to temporarily store calculation data in the processor 601 and exchange data with an external memory 6022 such as a hard disk.
  • the processor 601 exchanges data with the external memory 6022 through the memory 6021.
  • the processor 601 communicates with the memory 602 through the bus 603, so that the processor 601 executes the following instructions:
  • the image data carries label information, and the image data includes video or images obtained by image acquisition of the target scene; based on the image data, three-dimensional reconstruction is performed on the target scene to obtain the target A three-dimensional model of the scene; based on the two-dimensional marked position of the label information in the image data, determining a target three-dimensional position of the label information in a model coordinate system corresponding to the three-dimensional model; based on the target three-dimensional position, Add label information corresponding to the target three-dimensional position to the three-dimensional model.
  • Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the steps of the data labeling method described in the foregoing method embodiments are executed.
  • the storage medium may be a volatile or non-volatile computer-readable storage medium.
  • the embodiment of the present disclosure also provides a computer program product, the computer program product carries a program code, and the instructions included in the program code can be used to execute the steps of the data labeling method described in the above method embodiment, for details, please refer to the above method Example.
  • the above-mentioned computer program product may be specifically implemented by means of hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.
  • a software development kit Software Development Kit, SDK
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor.
  • the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Processing Or Creating Images (AREA)

Abstract

La présente divulgation concerne un procédé et un appareil de marquage de données, et un dispositif informatique, un support de stockage et un programme. Le procédé est appliqué à un serveur. Le procédé consiste à : acquérir des données d'image, qui doivent être traitées, les données d'image transportant des informations d'étiquette et comprenant une vidéo ou une image obtenue en effectuant une collecte d'image sur un scénario cible ; sur la base des données d'image, effectuer une reconstruction tridimensionnelle sur le scénario cible, de manière à obtenir un modèle tridimensionnel du scénario cible ; sur la base d'une position de marquage bidimensionnelle des informations d'étiquette dans les données d'image, déterminer une position tridimensionnelle cible des informations d'étiquette dans un système de coordonnées de modèle qui correspond au modèle tridimensionnel ; et sur la base de la position tridimensionnelle cible, ajouter, au modèle tridimensionnel, des informations d'étiquette qui correspondent à la position tridimensionnelle cible.
PCT/CN2022/117915 2021-11-23 2022-09-08 Procédé et appareil de marquage de données, et dispositif informatique, support de stockage et programme WO2023093217A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111396883.9A CN114140528A (zh) 2021-11-23 2021-11-23 数据标注方法、装置、计算机设备及存储介质
CN202111396883.9 2021-11-23

Publications (1)

Publication Number Publication Date
WO2023093217A1 true WO2023093217A1 (fr) 2023-06-01

Family

ID=80390981

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/117915 WO2023093217A1 (fr) 2021-11-23 2022-09-08 Procédé et appareil de marquage de données, et dispositif informatique, support de stockage et programme

Country Status (2)

Country Link
CN (1) CN114140528A (fr)
WO (1) WO2023093217A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117011503A (zh) * 2023-08-07 2023-11-07 青岛星美装饰服务有限公司 一种加工数据确定方法、装置、设备和可读存储介质
CN117197361A (zh) * 2023-11-06 2023-12-08 四川省地质调查研究院测绘地理信息中心 实景三维数据库构建方法、电子设备和计算机可读介质
CN117557871A (zh) * 2024-01-11 2024-02-13 子亥科技(成都)有限公司 三维模型标注方法、装置、设备及存储介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114140528A (zh) * 2021-11-23 2022-03-04 北京市商汤科技开发有限公司 数据标注方法、装置、计算机设备及存储介质
CN114758075B (zh) * 2022-04-22 2023-03-24 如你所视(北京)科技有限公司 用于生成三维标签的方法、装置和存储介质
CN114842175B (zh) * 2022-04-22 2023-03-24 如你所视(北京)科技有限公司 三维标签的交互呈现方法、装置、设备和介质
CN114777671A (zh) * 2022-04-25 2022-07-22 武汉中观自动化科技有限公司 工件模型处理方法、服务器、前端设备及三维扫描系统
CN117218131A (zh) * 2023-11-09 2023-12-12 天宇正清科技有限公司 一种验房问题标注方法、系统、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080143709A1 (en) * 2006-12-14 2008-06-19 Earthmine, Inc. System and method for accessing three dimensional information from a panoramic image
CN106683068A (zh) * 2015-11-04 2017-05-17 北京文博远大数字技术有限公司 一种三维数字化图像采集方法及设备
CN112768016A (zh) * 2021-01-26 2021-05-07 马元 基于肺部临床影像的临床教学方法和系统
CN114140528A (zh) * 2021-11-23 2022-03-04 北京市商汤科技开发有限公司 数据标注方法、装置、计算机设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080143709A1 (en) * 2006-12-14 2008-06-19 Earthmine, Inc. System and method for accessing three dimensional information from a panoramic image
CN106683068A (zh) * 2015-11-04 2017-05-17 北京文博远大数字技术有限公司 一种三维数字化图像采集方法及设备
CN112768016A (zh) * 2021-01-26 2021-05-07 马元 基于肺部临床影像的临床教学方法和系统
CN114140528A (zh) * 2021-11-23 2022-03-04 北京市商汤科技开发有限公司 数据标注方法、装置、计算机设备及存储介质

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117011503A (zh) * 2023-08-07 2023-11-07 青岛星美装饰服务有限公司 一种加工数据确定方法、装置、设备和可读存储介质
CN117011503B (zh) * 2023-08-07 2024-05-28 青岛星美装饰服务有限公司 一种加工数据确定方法、装置、设备和可读存储介质
CN117197361A (zh) * 2023-11-06 2023-12-08 四川省地质调查研究院测绘地理信息中心 实景三维数据库构建方法、电子设备和计算机可读介质
CN117197361B (zh) * 2023-11-06 2024-01-26 四川省地质调查研究院测绘地理信息中心 实景三维数据库构建方法、电子设备和计算机可读介质
CN117557871A (zh) * 2024-01-11 2024-02-13 子亥科技(成都)有限公司 三维模型标注方法、装置、设备及存储介质
CN117557871B (zh) * 2024-01-11 2024-03-19 子亥科技(成都)有限公司 三维模型标注方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN114140528A (zh) 2022-03-04

Similar Documents

Publication Publication Date Title
WO2023093217A1 (fr) Procédé et appareil de marquage de données, et dispositif informatique, support de stockage et programme
US11238644B2 (en) Image processing method and apparatus, storage medium, and computer device
WO2019242262A1 (fr) Procédé et dispositif de guidage à distance basé sur la réalité augmentée, terminal et support de stockage
EP2915140B1 (fr) Initialisation rapide pour slam visuel monoculaire
Zollmann et al. Augmented reality for construction site monitoring and documentation
CN103901884B (zh) 信息处理方法和信息处理设备
CN105678748A (zh) 三维监控系统中基于三维重构的交互式标定方法和装置
US20180357819A1 (en) Method for generating a set of annotated images
JP2011239361A (ja) 繰り返し撮影用arナビゲーション及び差異抽出のシステム、方法及びプログラム
KR101181967B1 (ko) 고유식별 정보를 이용한 3차원 실시간 거리뷰시스템
CN110428501B (zh) 全景影像生成方法、装置、电子设备及可读存储介质
CN108594999A (zh) 用于全景图像展示系统的控制方法和装置
US11989827B2 (en) Method, apparatus and system for generating a three-dimensional model of a scene
CN109934873B (zh) 标注图像获取方法、装置及设备
CN112802208B (zh) 一种航站楼内三维可视化方法及装置
CN115035162A (zh) 基于视觉slam的监控视频人员定位跟踪方法及系统
CN112270702A (zh) 体积测量方法及装置、计算机可读介质和电子设备
JP2021060868A (ja) 情報処理装置、情報処理方法、およびプログラム
Yu et al. Intelligent visual-IoT-enabled real-time 3D visualization for autonomous crowd management
CN117197388A (zh) 一种基于生成对抗神经网络和倾斜摄影的实景三维虚拟现实场景构建方法及系统
CN113838193A (zh) 数据处理方法、装置、计算机设备及存储介质
CN113610702B (zh) 一种建图方法、装置、电子设备及存储介质
CN114283243A (zh) 数据处理方法、装置、计算机设备及存储介质
CN114089836B (zh) 标注方法、终端、服务器和存储介质
CN114332207A (zh) 距离确定的方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897305

Country of ref document: EP

Kind code of ref document: A1