CN112348972A - Fine semantic annotation method based on large-scale scene three-dimensional model - Google Patents

Fine semantic annotation method based on large-scale scene three-dimensional model Download PDF

Info

Publication number
CN112348972A
CN112348972A CN202011011807.7A CN202011011807A CN112348972A CN 112348972 A CN112348972 A CN 112348972A CN 202011011807 A CN202011011807 A CN 202011011807A CN 112348972 A CN112348972 A CN 112348972A
Authority
CN
China
Prior art keywords
semantic
dimensional
model
dimensional model
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011011807.7A
Other languages
Chinese (zh)
Inventor
何娇
王江安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi Tudou Data Technology Co ltd
Original Assignee
Shaanxi Tudou Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi Tudou Data Technology Co ltd filed Critical Shaanxi Tudou Data Technology Co ltd
Priority to CN202011011807.7A priority Critical patent/CN112348972A/en
Publication of CN112348972A publication Critical patent/CN112348972A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/004Annotating, labelling

Abstract

The invention discloses a fine semantic annotation method based on a large-scale scene three-dimensional model, which comprises the following steps of iterative execution under an Active Learning (Active Learning) frame, and S1 carries out semantic segmentation network training on CNN by using a continuously expanded labeled image set; s2 back-projecting the pixel labels in all the images to the three-dimensional grid model by using the calibrated camera parameters; s3, taking the fused semantic three-dimensional model as a supervisor; s4 the training-fusion-selection process continues until the labels of the model become stable, i.e. the percentage of different labels for the same patch in the previous and current iterations is below a threshold, η the present invention can be used to fine label large scale scene three-dimensional models reconstructed from images, the proposed method uses limited manual work, while the quality of the semantic labeling of the model can be guaranteed.

Description

Fine semantic annotation method based on large-scale scene three-dimensional model
Technical Field
The invention belongs to the technical field of unmanned aerial vehicle oblique photography, and particularly relates to a fine semantic annotation method based on a large-scale scene three-dimensional model.
Background
In recent years, semantic annotation of three-dimensional models has been a challenging research direction. At present, there are the following two methods for automatic semantic annotation of large-scale three-dimensional models. One is to combine the three-dimensional model and semantics to reconstruct the scene. And carrying out image segmentation by adopting a pre-trained decision tree. And then reconstructing a semantic model by combining the label image and the depth map. And secondly, distributing semantic labels for the three-dimensional model. Firstly, pixel-level semantic segmentation is carried out on a two-dimensional image, and then the labels are back projected into a three-dimensional model by using calibrated camera parameters and fused together.
Since the types and shapes of three-dimensional objects in different scenes are different, it is difficult to have a general method suitable for most scenes. Three-dimensional semantic models can help humans and automated systems know "what objects" are "where" in a particular scene and have a variety of applications in the areas of autopilot, augmented reality, and robotics, among others. A fine, large-scale three-dimensional model of a scene has thousands of patches, and one of the most straightforward approaches is to label them manually. However, there is no effective tool for manually labeling each patch, and the existing deep learning techniques cannot process three-dimensional models of large-scale scenes. Therefore, it is necessary to find a method for labeling a large-scale three-dimensional scene model.
Aiming at the problems in the related art, an effective solution is not provided at present, and therefore a fine semantic annotation method based on a large-scale scene three-dimensional model is provided.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides a fine semantic annotation method based on a large-scale scene three-dimensional model, and solves the problems mentioned in the background technology.
(II) technical scheme
In order to achieve the purpose, the invention provides the following technical scheme: a fine semantic annotation method based on a large-scale scene three-dimensional model is characterized in that the following steps are executed in an iterative manner under an Active Learning (Active Learning) framework:
s1, performing semantic segmentation network training on the CNN by using the continuously expanded marked image set, and then acquiring a pixel-level semantic label of an unmarked image by using the trained CNN;
s2, back projecting pixel labels in all images to a three-dimensional grid model by using the calibrated camera parameters, fusing the labels and the three-dimensional grid model by using an MRF (Markov random field) optimization method, and giving an independent label to each patch by combining a two-dimensional semantic label and three-dimensional geometric characteristics;
s3, selecting a plurality of valuable images for marking by taking the fused semantic three-dimensional model as a supervisor and applying a batch image selection method, and merging the images into a training set after the images are manually marked for preparing the next iteration;
the S4 training-fusing-selecting process will continue until the labels of the model become stable, i.e., the percentage of different labels for the same patch in the previous and current iterations is below the threshold η.
Preferably, the method takes the three-dimensional grid model reconstructed by the SfM and the MVS and the calibrated image as input, outputs the three-dimensional semantic grid model, each patch is labeled with a semantic label, and different colors represent different categories.
Preferably, SfM is formed by horizontally and vertically interleaving multiple channels, each channel provides 8Gbps switching capability (super player 720 provides 20Gpbs per channel), and the maximum advantage of matrix switching is to allow multiple non-conflicting switches to be performed simultaneously and support point-to-multipoint (Multicast) switching.
Preferably, the MVS is a substrate that uses two 14Mhz Motorola 68000 CPUs for 320 × 224 resolutions (65, 536 colors maximum color, 4096 colors on screen), the sound processing chip is Z80A, there are 8 channels FM synthesis sound source and 7 channels digital stereo sound source (PSG & PCM), the system RAM is 7MB (56Mbits) and the maximum volume of the cassette is 42MB (330 Mbits).
Preferably, the semantics are segmented into tasks in computer vision, in the process, different parts in the visual input are classified into different categories according to the semantics, and through semantic understanding, each category has certain realistic significance.
Preferably, in the MRF optimization in S2, variable weight parameters are introduced into a conventional MRF image segmentation algorithm to connect the marker field model and the feature field model, so that a balance is formed between the two models, a segmentation result that can maintain image edges, image important details and region consistency is obtained, then an edge penalty function is introduced at the edges in a self-adaptive manner, the contribution of energy of a potential function to an energy function is adjusted, blurring of the edges during segmentation is reduced, and the edge positioning accuracy is improved.
(III) advantageous effects
Compared with the prior art, the invention provides a fine semantic annotation method based on a large-scale scene three-dimensional model, which has the following beneficial effects:
the method can be used for finely marking a large-scale scene three-dimensional model reconstructed by the image by determining the semantic segmentation class number, the marking data for training and the semantic segmentation for the image, and the method uses limited manpower and can ensure the semantic labeling quality of the model.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic view of an image according to the present invention;
FIG. 3 is a 3D image diagram according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
Referring to fig. 1-3, the present invention provides a technical solution: a fine semantic annotation method based on a large-scale scene three-dimensional model is characterized in that the following steps are executed in an iterative manner under an Active Learning (Active Learning) framework:
s1, performing semantic segmentation network training on the CNN by using the continuously expanded marked image set, and then acquiring a pixel-level semantic label of an unmarked image by using the trained CNN;
s2, back projecting pixel labels in all images to a three-dimensional grid model by using the calibrated camera parameters, fusing the labels and the three-dimensional grid model by using an MRF (Markov random field) optimization method, and giving an independent label to each patch by combining a two-dimensional semantic label and three-dimensional geometric characteristics;
s3, selecting a plurality of valuable images for marking by taking the fused semantic three-dimensional model as a supervisor and applying a batch image selection method, and merging the images into a training set after the images are manually marked for preparing the next iteration;
the S4 training-fusing-selecting process will continue until the labels of the model become stable, i.e., the percentage of different labels for the same patch in the previous and current iterations is below the threshold η.
The method takes an SfM and MVS reconstructed three-dimensional grid model and a calibrated image as input, outputs a three-dimensional semantic grid model, each patch is labeled with a semantic label, and different colors represent different categories.
The specific operation is as follows:
step 1: determining the number of semantic segmentation categories and labeling data;
number of semantic segmentation classes: class 4, label 0-3 (representing other classes, buildings, roads, vegetation, respectively); labeling data: performing semantic segmentation and annotation on a small number of images by using Labelme data annotation software to generate json files;
step 2: training the labeled data through a semantic segmentation network to obtain a relatively ideal classification model;
and step 3: performing semantic segmentation on the image to obtain probability distribution of each category;
and 4, step 4: calculating the probability Pr (l) of each patch of the mesh grid corresponding to labelf=l)
Figure BSA0000220148830000051
Ωf,iRepresenting the projected area of the patch f in image I, I representing the entire image set;
and 5: and each patch in the mesh grid is assigned with a corresponding label, and MRF semantic fusion is carried out in a 3D space. The patch labeling problem is treated as an energy minimization problem on the MRF. Gibbs energy of MRF posterior probability distribution is
Figure BSA0000220148830000052
F is the entire set of patches, a is the set of neighboring patches,
Figure BSA0000220148830000053
Vf,q(lf,lq) Representing the geometrical constraint of the abutment surfaces (f, q).
Minimizing the energy E through an alpha-expansion algorithm, and generating a semantic three-dimensional model, wherein each patch has a semantic label;
step 6: and once the 3D semantic tags are obtained, the batch image selection can be used as a supervisor to measure the segmentation quality of each image, help to select valuable images for annotation, perform semantic annotation on a large-scale scene three-dimensional model, and greatly save the annotation cost by actively selecting the images for annotation.
Wherein SfM is formed by horizontally and vertically interleaving multiple channels, each channel provides 8Gbps switching capability (super player 720 provides 20Gpbs per channel), and the maximum advantage of matrix switching is to allow multiple non-conflicting switches to be performed simultaneously and support point-to-multipoint (Multicast) switching.
The MVS is a substrate, which uses two 14Mhz Motorola 68000 CPUs, and can achieve a resolution of 320 × 224 (maximum color number 65,536 colors, 4096 colors on-screen display), the sound processing chip is Z80A, there are 8 channels FM synthesis sound source and 7 channels digital stereo sound source (PSG & PCM), the system RAM is 7MB (56Mbits), and the maximum capacity of the cassette is 42MB (330 Mbits).
The semantics are divided into tasks in computer vision, in the process, different parts in the visual input are divided into different categories according to the semantics, and through semantic understanding, each category has certain practical significance.
In the MRF optimization in S2, variable weight parameters are introduced into a conventional MRF image segmentation algorithm to connect the marker field model and the feature field model, so that a balance is formed between the two models, a segmentation result that can maintain image edges, image important details, and region consistency is obtained, then an edge penalty function is introduced at the edges in a self-adaptive manner, the contribution of energy of a potential function to an energy function is adjusted, blurring of the edges during segmentation is reduced, and the positioning accuracy of the edges is improved.
The figures 2 and 3 of the invention are only schematic in function, and the details of the specific objects in the figures have no direct effect on the implementation of the technical scheme of the invention and do not influence the disclosure of the scheme.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
In the description of the present invention, it should be noted that the terms "upper", "lower", "inner", "outer", "front", "rear", "both ends", "one end", "the other end", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present invention.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (6)

1. A fine semantic annotation method based on a large-scale scene three-dimensional model is characterized by comprising the following steps: the following steps are performed iteratively under an Active Learning (Active Learning) framework:
s1 performs semantic segmentation network training on CNNs using an ever-expanding set of labeled images. Then using the trained CNN to obtain a pixel-level semantic label of the unlabeled image;
s2, back projecting pixel labels in all images to a three-dimensional grid model by using the calibrated camera parameters, fusing the labels and the three-dimensional grid model by using an MRF (Markov random field) optimization method, and giving an independent label to each patch by combining a two-dimensional semantic label and three-dimensional geometric characteristics;
s3, selecting a plurality of valuable images for marking by taking the fused semantic three-dimensional model as a supervisor and applying a batch image selection method, and merging the images into a training set after the images are manually marked for preparing the next iteration;
the S4 training-fusing-selecting process will continue until the labels of the model become stable, i.e., the percentage of different labels for the same patch in the previous and current iterations is below the threshold η.
2. The method for fine semantic annotation based on the large-scale scene three-dimensional model according to claim 1, wherein: the method takes an SfM and MVS reconstructed three-dimensional grid model and a calibrated image as input, outputs a three-dimensional semantic grid model, each surface patch is attached with a semantic label, and different colors represent different categories.
3. The method for fine semantic annotation based on the large-scale scene three-dimensional model according to claim 1, wherein: the SfM is formed by horizontally and vertically interleaving a plurality of channels, each channel provides 8Gbps switching capability (super 720 provides 20Gpbs per channel), and the maximum advantage of matrix switching is to allow a plurality of non-conflicting exchanges to be performed simultaneously and support point-to-multipoint (Multicast) exchange.
4. The method for fine semantic annotation based on the large-scale scene three-dimensional model according to claim 1, wherein: the MVS is a substrate that uses two 14Mhz Motorola 68000 CPUs, can achieve a resolution of 320x224 (65,536 colors maximum color, 4096 colors on-screen display), the sound processing chip is Z80A, has an 8-channel FM composite sound source and a 7-channel digital stereo sound source (PSG & PCM), the system RAM is 7MB (56Mbits), and the maximum capacity of the cassette is 42MB (330 Mbits).
5. The method for fine semantic annotation based on the large-scale scene three-dimensional model according to claim 1, wherein: the semantics are divided into tasks in computer vision, in the process, different parts in the visual input are divided into different categories according to the semantics, and through semantic understanding, each category has certain realistic significance.
6. The method for fine semantic annotation based on the large-scale scene three-dimensional model according to claim 1, wherein: in the MRF optimization in S2, variable weight parameters are introduced into a conventional MRF image segmentation algorithm to connect the marker field model and the feature field model, so that a balance is formed between the two models, a segmentation result that can maintain image edges, image important details, and region consistency is obtained, then an edge penalty function is introduced at the edges in a self-adaptive manner, the contribution of energy of a potential function to an energy function is adjusted, blurring of the edges during segmentation is reduced, and the edge positioning accuracy is improved.
CN202011011807.7A 2020-09-22 2020-09-22 Fine semantic annotation method based on large-scale scene three-dimensional model Pending CN112348972A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011011807.7A CN112348972A (en) 2020-09-22 2020-09-22 Fine semantic annotation method based on large-scale scene three-dimensional model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011011807.7A CN112348972A (en) 2020-09-22 2020-09-22 Fine semantic annotation method based on large-scale scene three-dimensional model

Publications (1)

Publication Number Publication Date
CN112348972A true CN112348972A (en) 2021-02-09

Family

ID=74358053

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011011807.7A Pending CN112348972A (en) 2020-09-22 2020-09-22 Fine semantic annotation method based on large-scale scene three-dimensional model

Country Status (1)

Country Link
CN (1) CN112348972A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113870405A (en) * 2021-09-24 2021-12-31 埃洛克航空科技(北京)有限公司 Texture map selection method for three-dimensional scene reconstruction and related device
CN115393361A (en) * 2022-10-28 2022-11-25 湖南大学 Method, device, equipment and medium for segmenting skin disease image with low annotation cost
CN117557871A (en) * 2024-01-11 2024-02-13 子亥科技(成都)有限公司 Three-dimensional model labeling method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160055237A1 (en) * 2014-08-20 2016-02-25 Mitsubishi Electric Research Laboratories, Inc. Method for Semantically Labeling an Image of a Scene using Recursive Context Propagation
CN111444914A (en) * 2020-03-23 2020-07-24 复旦大学 Image semantic segmentation method based on PU-L earning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160055237A1 (en) * 2014-08-20 2016-02-25 Mitsubishi Electric Research Laboratories, Inc. Method for Semantically Labeling an Image of a Scene using Recursive Context Propagation
CN111444914A (en) * 2020-03-23 2020-07-24 复旦大学 Image semantic segmentation method based on PU-L earning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YANG ZHOU 等: "Fine-Level Semantic Labeling of Large-Scale 3D Model by Active Learning", 2018 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), pages 523 - 532 *
胡伟;柏文阳;瞿裕忠;: "语义Web中对象共指的消解研究", 软件学报, no. 07 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113870405A (en) * 2021-09-24 2021-12-31 埃洛克航空科技(北京)有限公司 Texture map selection method for three-dimensional scene reconstruction and related device
CN113870405B (en) * 2021-09-24 2022-11-08 埃洛克航空科技(北京)有限公司 Texture map selection method for three-dimensional scene reconstruction and related device
CN115393361A (en) * 2022-10-28 2022-11-25 湖南大学 Method, device, equipment and medium for segmenting skin disease image with low annotation cost
CN117557871A (en) * 2024-01-11 2024-02-13 子亥科技(成都)有限公司 Three-dimensional model labeling method, device, equipment and storage medium
CN117557871B (en) * 2024-01-11 2024-03-19 子亥科技(成都)有限公司 Three-dimensional model labeling method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112348972A (en) Fine semantic annotation method based on large-scale scene three-dimensional model
CN108389251B (en) Projection full convolution network three-dimensional model segmentation method based on fusion of multi-view features
CN110363116B (en) Irregular human face correction method, system and medium based on GLD-GAN
Hu et al. Single-image real-time rain removal based on depth-guided non-local features
CN104820990A (en) Interactive-type image-cutting system
CN111161364B (en) Real-time shape completion and attitude estimation method for single-view depth map
CN108876814A (en) A method of generating posture stream picture
CN112001407A (en) Model iterative training method and system based on automatic labeling
CN107358645A (en) Product method for reconstructing three-dimensional model and its system
CN108734773A (en) A kind of three-dimensional rebuilding method and system for mixing picture
CN105574545B (en) The semantic cutting method of street environment image various visual angles and device
CN104463962B (en) Three-dimensional scene reconstruction method based on GPS information video
CN112132232A (en) Medical image classification labeling method and system and server
CN112749611A (en) Face point cloud model generation method and device, storage medium and electronic equipment
CN112862736B (en) Real-time three-dimensional reconstruction and optimization method based on points
CN113421210A (en) Surface point cloud reconstruction method based on binocular stereo vision
CN116152442B (en) Three-dimensional point cloud model generation method and device
CN113011438A (en) Node classification and sparse graph learning-based bimodal image saliency detection method
CN112686247A (en) Identification card number detection method and device, readable storage medium and terminal
CN114387308A (en) Machine vision characteristic tracking system
Rong et al. Active learning based 3D semantic labeling from images and videos
CN109859255A (en) The non-concurrent acquisition of the multi-angle of view of big-movement moving object and method for reconstructing
CN115187768A (en) Fisheye image target detection method based on improved YOLOv5
CN115936796A (en) Virtual makeup changing method, system, equipment and storage medium
Tiator et al. Using semantic segmentation to assist the creation of interactive VR applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 504, Block E, HUanpu science and Technology Industrial Park, 211 tianguba Road, high tech Zone, Xi'an City, Shaanxi Province, 710000

Applicant after: Tudou Data Technology Group Co.,Ltd.

Address before: Room 504, Block E, HUanpu science and Technology Industrial Park, 211 Gaoxin Tiangu 8th Road, Yanta District, Xi'an City, Shaanxi Province, 710075

Applicant before: SHAANXI TUDOU DATA TECHNOLOGY Co.,Ltd.