WO2023088177A1 - Procédé d'entrainement de modèle de réseau neuronal, et procédé et dispositif d'établissement de modèle tridimensionnel vectorisé - Google Patents

Procédé d'entrainement de modèle de réseau neuronal, et procédé et dispositif d'établissement de modèle tridimensionnel vectorisé Download PDF

Info

Publication number
WO2023088177A1
WO2023088177A1 PCT/CN2022/131344 CN2022131344W WO2023088177A1 WO 2023088177 A1 WO2023088177 A1 WO 2023088177A1 CN 2022131344 W CN2022131344 W CN 2022131344W WO 2023088177 A1 WO2023088177 A1 WO 2023088177A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
target
planes
network model
initial
Prior art date
Application number
PCT/CN2022/131344
Other languages
English (en)
Chinese (zh)
Inventor
胡志华
黄经纬
张彦峰
孙明伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023088177A1 publication Critical patent/WO2023088177A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects

Definitions

  • the embodiments of the present application relate to the field of artificial intelligence (AI), in particular to a neural network model training method, a vectorized three-dimensional model building method and equipment.
  • AI artificial intelligence
  • the vectorized 3D model is the basic data for many tasks, such as positioning and navigation, interior design, virtual reality, etc. Different from the dense 3D point cloud, the vectorized 3D model is a highly abstract structured 3D model, so human intervention is usually required to solve the problem. Get better results.
  • the mainstream vectorized 3D model reconstruction schemes mainly include two types, namely laser scanner-based schemes and image-based schemes, both of which aim at reconstructing dense 3D point clouds.
  • Embodiments of the present application provide a neural network model training method, a vectorized three-dimensional model building method and equipment, which are used for low-cost automatic reconstruction of vectorized three-dimensional models, and improve the reconstruction effect of weak texture areas.
  • Embodiments of the present application also provide corresponding computer equipment, computer-readable storage media, chip systems, and the like.
  • the first aspect of the present application provides a neural network model training method, including: obtaining a training sample, the training sample includes a sample image, a target vectorized three-dimensional model of a sample object, and camera parameters of a preset camera, and the sample image is a sample taken by a preset camera The object is obtained; the first neural network model is trained based on the training samples, and the first neural network model is used to obtain the initial vectorized 3D model of the sample image, and the first vectorized 3D model is adjusted according to the deviation between the initial vectorized 3D model and the target vectorized 3D model.
  • the neural network model is iteratively updated to obtain a second neural network model, and the second neural network model is used to predict the vectorized 3D model of the target object; wherein, the initial vectorized 3D model is obtained by intersecting multiple initial planes, and the multiple initial planes is determined according to the initial layout and camera parameters, and the initial layout is determined according to the sample image by the first neural network model.
  • the sample object in this application can be any scene, such as an indoor or outdoor building, specifically an indoor scene of a room, the sample image is multi-view image data, and there are multiple sample images.
  • the first neural network model in this application can determine the initial layout of the sample image according to the sample image, determine multiple initial planes according to the initial layout and the camera parameters, and intersect the multiple initial planes to obtain an initial vectorized 3D model .
  • the second neural network model can be obtained, wherein the preset value can be the value specified in advance by the user, and the preset number of times can be the number of times specified in advance by the user, and the multiple sample images in a set of training samples and the preset
  • the camera parameters of the camera are input to the second neural network model, and the deviation between the vectorized 3D model output by the second neural network model and the target vectorized 3D model in the group of training samples is smaller than the preset value required by the user.
  • the neural network model trains the sample image and the corresponding vectorized 3D model based on the plane as the reconstruction unit during the training process, so that the vectorized 3D model can be obtained directly by inputting the image into the trained neural network model , which greatly reduces the reconstruction cost and also realizes fully automatic reconstruction without manual intervention, and can also have a better reconstruction effect in weak texture areas.
  • the first neural network model is used to determine a plurality of candidate planes near each initial plane among the multiple initial planes, and from the multiple initial planes and the multiple candidate planes A plurality of target planes are determined in the method, and the first neural network model is specifically used to intersect the multiple target planes to obtain an initial vectorized three-dimensional model.
  • the candidate planes are determined from multiple initial planes, and then multiple target planes are determined from multiple initial planes and multiple candidate planes, so as to find a more accurate and real plane to obtain vectorized 3D model, improving the accuracy of reconstructing vectorized 3D models.
  • the initial layout includes pixel coordinates of multiple initial planes in the sample image
  • the first neural network model is specifically used to obtain plane equations of multiple initial planes according to the pixel coordinates, and based on multiple The plane equations of the initial planes determine the plane equations of multiple alternative planes near each initial plane in the multiple initial planes, so as to determine the multiple alternative planes.
  • multiple candidate planes are determined based on plane equations, which improves the feasibility of the scheme.
  • the sample image is multi-view image data, and there are multiple sample images.
  • the first neural network model is also used to obtain the consistency cost of the multiple sample images, and according to the consistency The cost determines multiple target planes from multiple initial planes and multiple candidate planes.
  • the sample image is multi-view image data, so that the consistency cost of multiple sample images can be obtained, and the target plane can be determined based on the consistency cost, which improves the feasibility of the solution.
  • the first neural network model is specifically further used to extract the feature vector of each pixel in multiple sample images through the feature extraction network, according to camera parameters, multiple initial planes and multiple Differentiable mapping is performed on the candidate plane with the plane as the primitive, and the corresponding relationship between multiple sample images is obtained, and the consistency cost is obtained based on the feature vector and the corresponding relationship with the plane as the basic unit.
  • the correspondence between multiple sample images is obtained through the differentiable mapping with the plane as the primitive, so as to calculate the consistency cost and improve the feasibility of the scheme.
  • the first neural network model is also specifically used to obtain the semantic segmentation results of multiple sample images through the semantic segmentation network, and obtain the adaptive weight, and use the semantic segmentation result and adaptive weight as the target weight to weight and accumulate the consistency cost.
  • the semantic segmentation and adaptive weight network are used to extract weights, remove the influence of occlusion factors, and improve the accuracy of reconstructing the vectorized 3D model.
  • the first neural network model is specifically used to use the true value of the depth map as a weak supervision signal for differentiable mapping and weighted accumulation, and the true value of the depth map is obtained by scanning multiple sample images .
  • the true value of the depth map can also be obtained by scanning multiple sample images as a weak supervision signal for differentiable mapping and weighted accumulation, which improves the accuracy of reconstructing the vectorized 3D model.
  • the second aspect of the present application provides a method for establishing a vectorized three-dimensional model, including: acquiring a target image and camera parameters of a preset camera, where the target image is obtained by shooting a target object with a preset camera; inputting the target image and camera parameters into the target neuron
  • the network model is used to predict the vectorized 3D model of the target object; wherein, the vectorized 3D model of the target object is obtained by intersecting multiple initial planes, and the multiple initial planes are determined according to the initial layout and camera parameters, and the initial layout is the target neuron
  • the network model is determined from the target image.
  • the sample object in this application can be any scene, such as an indoor or outdoor building, specifically an indoor scene of a room, the sample image is multi-view image data, and there are multiple sample images.
  • the target neural network model in this application is pre-trained.
  • the vectorized 3D model can be obtained directly by inputting the image into the trained target neural network model, which greatly reduces the reconstruction cost and realizes fully automatic reconstruction without manual intervention. It has a better reconstruction effect.
  • the target neural network model is also used to determine a plurality of alternative planes near each initial plane in the plurality of initial planes, and from the plurality of initial planes and the plurality of alternative planes A plurality of target planes are determined in the method, and the target neural network model is specifically used to intersect the multiple target planes to obtain a vectorized three-dimensional model.
  • the candidate planes are determined from multiple initial planes, and then multiple target planes are determined from multiple initial planes and multiple candidate planes, so as to find a more accurate and real plane to obtain vectorized 3D model, improving the accuracy of reconstructing vectorized 3D models.
  • the initial layout includes pixel coordinates of multiple initial planes in the target image
  • the target neural network model is specifically used to obtain plane equations of multiple initial planes according to the pixel coordinates, and based on multiple The plane equation of the initial plane determines the plane equations of the plurality of alternative planes near each of the plurality of initial planes to determine the plurality of alternative planes.
  • multiple candidate planes are determined based on plane equations, which improves the feasibility of the scheme.
  • the target image is multi-view image data, and there are multiple target images
  • the target neural network model is also used to obtain the consistency cost of multiple target images, and according to the consistency cost
  • a plurality of target planes are determined from the plurality of initial planes and the plurality of candidate planes.
  • the sample image is multi-view image data, so that the consistency cost of multiple sample images can be obtained, and the target plane can be determined based on the consistency cost, which improves the feasibility of the solution.
  • the target neural network model is further used to extract the feature vector of each pixel in multiple target images through the feature extraction network, and according to the camera parameters, multiple initial planes and multiple backup The plane is selected for differentiable mapping with the plane as the primitive, and the corresponding relationship between multiple target images is obtained, and the consistency cost is obtained with the plane as the basic unit according to the feature vector and the corresponding relationship.
  • the correspondence between multiple sample images is obtained through the differentiable mapping with the plane as the primitive, so as to calculate the consistency cost and improve the feasibility of the scheme.
  • the target neural network model is specifically used to obtain the semantic segmentation results of multiple target images through the semantic segmentation network, and obtain the adaptive weights of multiple target images through the adaptive weight network. value, and use the semantic segmentation result and the adaptive weight as the target weight to carry out weighted accumulation of the consistency cost.
  • the semantic segmentation and adaptive weight network are used to extract weights, remove the influence of occlusion factors, and improve the accuracy of reconstructing the vectorized 3D model.
  • a third aspect of the present application provides a computer device configured to execute the method in the foregoing first aspect or any possible implementation manner of the first aspect.
  • the computer device includes modules or units for executing the method in the above first aspect or any possible implementation manner of the first aspect, such as: an acquisition unit and a training unit.
  • a fourth aspect of the present application provides a computer device configured to execute the method in the above-mentioned second aspect or any possible implementation manner of the second aspect.
  • the computer device includes a module or unit for executing the method in the second aspect or any possible implementation manner of the second aspect, such as an acquisition unit and a processing unit.
  • the fifth aspect of the present application provides a computer device, the computer device includes a processor, a memory, and a computer-readable storage medium storing a computer program; the processor is coupled to the computer-readable storage medium, and the computer running on the processor executes instructions, When the computer-executed instructions are executed by the processor, the processor executes the method in the first aspect or any possible implementation manner of the first aspect.
  • the computer device may further include an input/output (input/output, I/O) interface, and the computer-readable storage medium storing the computer program may be a memory.
  • the sixth aspect of the present application provides a computer device, the computer device includes a processor, a memory, and a computer-readable storage medium storing a computer program; the processor is coupled to the computer-readable storage medium, and the computer running on the processor executes instructions, When the computer-executed instructions are executed by the processor, the processor executes the method according to the second aspect or any possible implementation manner of the second aspect.
  • the computer device may further include an input/output (input/output, I/O) interface, and the computer-readable storage medium storing the computer program may be a memory.
  • the seventh aspect of the present application provides a computer-readable storage medium storing one or more computer-executable instructions.
  • the processor executes any one of the above-mentioned first aspect or the first aspect. method of implementation.
  • the eighth aspect of the present application provides a computer-readable storage medium that stores one or more computer-executable instructions.
  • the processor executes any one of the above-mentioned second or second aspects. method of implementation.
  • the ninth aspect of the present application provides a computer program product that stores one or more computer-executable instructions.
  • the processor executes any of the possible implementations of the above-mentioned first aspect or the first aspect. Methods.
  • the tenth aspect of the present application provides a computer program product that stores one or more computer-executable instructions.
  • the processor executes any of the above-mentioned second aspect or any possible implementation of the second aspect. Methods.
  • the eleventh aspect of the present application provides a chip system, the chip system includes at least one processor and an interface, the interface is used to receive data and/or signals, and at least one processor is used to support the computer device to implement the above first aspect or the first aspect
  • the system-on-a-chip may further include a memory, and the memory is used for storing necessary program instructions and data of the computer device.
  • the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
  • the twelfth aspect of the present application provides a chip system, the chip system includes at least one processor and an interface, the interface is used to receive data and/or signals, and the at least one processor is used to support the computer device to implement the above-mentioned second aspect or the first The functions involved in any possible implementation of the second aspect.
  • the system-on-a-chip may further include a memory, and the memory is used for storing necessary program instructions and data of the computer device.
  • the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
  • the neural network model trains the sample image and the corresponding vectorized 3D model based on the plane as the reconstruction unit during the training process, so that the vectorized 3D model can be obtained directly by inputting the image into the trained neural network model.
  • the model greatly reduces the reconstruction cost and realizes fully automatic reconstruction without manual intervention, and can also have a better reconstruction effect in weak texture areas.
  • Fig. 1 is a schematic diagram of an artificial intelligence main framework
  • Fig. 2 is a schematic diagram of the system architecture provided by the embodiment of the present application.
  • Fig. 3 is a schematic structural diagram of a convolutional neural network
  • Fig. 4 is another schematic structural diagram of a convolutional neural network
  • Fig. 5 is another schematic diagram of the system architecture provided by the embodiment of the present application.
  • Fig. 6 is a schematic diagram of an embodiment of the neural network model training method provided by the embodiment of the present application.
  • Fig. 7 is a schematic flow chart of an embodiment of the neural network model training method provided by the embodiment of the present application.
  • FIG. 8 is a schematic diagram of an embodiment of a vectorized 3D model reconstruction method provided by an embodiment of the present application.
  • FIG. 9 is a schematic flow chart of an embodiment of a vectorized 3D model reconstruction method provided in an embodiment of the present application.
  • Fig. 10 is a schematic flowchart of step s4 in an embodiment of the vectorized 3D model reconstruction method provided by the embodiment of the present application;
  • Fig. 11 is a schematic diagram of computer equipment in the system architecture provided by the embodiment of the present application.
  • Fig. 12 is a reconstruction effect diagram of the single image layout prediction method provided by the embodiment of the present application.
  • Fig. 13 is a reconstruction effect diagram of the vectorized 3D model reconstruction method provided by the embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • Fig. 15 is another schematic structural diagram of a computer device provided by an embodiment of the present application.
  • Fig. 16 is another schematic structural diagram of a computer device provided by an embodiment of the present application.
  • FIG. 17 is another schematic structural diagram of a computer device provided by an embodiment of the present application.
  • Embodiments of the present application provide a neural network model training method, a vectorized three-dimensional model building method and equipment, which are used for low-cost automatic reconstruction of vectorized three-dimensional models, and improve the reconstruction effect of weak texture areas.
  • Embodiments of the present application also provide corresponding computer equipment, computer-readable storage media, chip systems, and the like. Each will be described in detail below.
  • AI Artificial intelligence
  • Artificial intelligence is a comprehensive technology of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a manner similar to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Figure 1 is a schematic diagram of an artificial intelligence main framework, which describes the overall workflow of an artificial intelligence system, and is applicable to general artificial intelligence field requirements.
  • Intelligent information chain reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has undergone a condensed process of "data-information-knowledge-wisdom".
  • IT value chain reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of artificial intelligence, information (provided and processed by technology) to the systematic industrial ecological process.
  • the infrastructure provides computing power support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the basic platform.
  • computing power is composed of smart chips (central processing unit (CPU), neural network processor (network processing unit, NPU), graphics processing unit (graphic processing unit GPU), application-specific integrated circuit (application specific integrated circuit (ASIC), field-programmable gate array (field-programmable gate array, FPGA) and other hardware acceleration chips);
  • the basic platform includes distributed computing framework and network and other related platform guarantees and supports, which can include cloud storage and Computing, interconnection network, etc.
  • sensors communicate with the outside to obtain data, and these data are provided to the smart chips in the distributed computing system provided by the basic platform for calculation.
  • Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, text, and IoT data of traditional equipment, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
  • machine learning and deep learning can symbolize and formalize intelligent information modeling, extraction, preprocessing, training, etc. of data.
  • Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to reasoning control strategies, and the typical functions are search and matching.
  • Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image processing identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is the packaging of the overall solution of artificial intelligence, which commercializes intelligent information decision-making and realizes landing applications. Its application fields mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart healthcare, smart security, autonomous driving, smart city, smart terminal, smart marketing, smart city and smart customer service, etc.
  • neural network models such as: deep neural networks (deep neural networks, DNN) model or convolutional neural network (convolutional neuron network, CNN) model.
  • DNN deep neural networks
  • CNN convolutional neural network
  • a target neural network model suitable for the business scenario can be obtained.
  • the sample data may be image data, voice data or text data, etc., and the type of the sample data is determined according to the applicable field of the neural network model.
  • the neural network model is used in the field of image processing, the sample data can be various image data captured by a camera, and the training process of the neural network model can be performed in the system architecture 200 shown in FIG. 2 .
  • an embodiment of the present application provides a system architecture 200 .
  • the data collection device 260 is used to collect sample data for training the neural network model and store it in the database 230.
  • the sample data can be understood by referring to the introduction of the sample data in the previous paragraph, and the description will not be repeated here.
  • Training device 220 generates target neural network model/rules 201 based on sample data maintained in database 230 . The following will describe in more detail how the training device 220 obtains the target neural network model/rule 201 based on the sample data.
  • the target neural network model/rule 201 can reconstruct a vectorized 3D model based on the target image, for example.
  • W is a weight vector
  • each value in the vector represents the weight value of a neuron in this layer of neural network.
  • the vector W determines the space transformation from the input space to the output space described above, that is, the weight W of each layer controls how to transform the space.
  • the purpose of training the deep neural network model is to finally obtain the weight matrix of all layers of the trained neural network (the weight matrix formed by the vector W of many layers). Therefore, the training process of the neural network model is essentially to learn the way to control the space transformation, and more specifically, to learn the weight matrix.
  • the difference between the predicted value of the neural network model and the target value is the loss function (loss function) or objective function (objective function).
  • the target neural network model/rules obtained by the training device 220 can be applied to different systems or devices.
  • the execution device 210 is configured with an I/O interface 212 for data interaction with external devices, and a “user” can input data to the I/O interface 212 through a client device 240 .
  • the execution device 210 can call data, codes, etc. in the data storage system 250 , and can also store data, instructions, etc. in the data storage system 250 .
  • the calculation module 211 uses the target neural network model/rule 201 to process the input data. For example, in the field of automatic driving, the target neural network model/rule 201 identifies obstacles in the process of automatic driving from the image data of the traffic scene. .
  • the I/O interface 212 returns the processing result to the client device 240 to provide to the user.
  • the training device 220 can generate corresponding target neural network models/rules 201 based on sample data of different business scenarios for different targets, so as to provide users with better results.
  • Fig. 2 is only a schematic diagram of a system architecture provided by the embodiment of the present application, and the positional relationship between devices, devices, modules, etc. shown in Fig. 2 does not constitute any limitation.
  • the data The storage system 250 is an external memory relative to the execution device 210 , and in other cases, the data storage system 250 may also be placed in the execution device 210 .
  • the system architecture 200 can be deployed on computer devices such as servers, virtual machines, and terminal devices.
  • the terminal device can be a mobile phone, a tablet computer (pad), a computer with a wireless transceiver function, a virtual reality (virtual reality, VR) terminal, an augmented reality (augmented reality, AR) terminal, an industrial control (industrial control ), wireless terminals in self driving, wireless terminals in remote medical, wireless terminals in smart grid, wireless terminals in transportation safety , wireless terminals in smart cities, wireless terminals in smart homes, etc.
  • the convolutional neural network model can also be referred to as the convolutional neural network for short. It is a deep neural network with a convolutional structure and a deep learning architecture.
  • the deep learning architecture refers to the algorithm through machine learning. Multiple levels of learning are performed at different levels of abstraction.
  • CNN is a feed-forward artificial neural network in which individual neurons respond to overlapping regions in images fed into it.
  • a convolutional neural network (CNN) 100 may include an input layer 110 , a convolutional/pooling layer 120 , where the pooling layer is optional, and a neural network layer 130 .
  • the convolutional layer/pooling layer 120 can include layers 121-126 as examples.
  • the 121st layer is a convolutional layer
  • the 122nd layer is a pooling layer
  • the 123rd layer is a convolutional layer
  • the 124th layer is a convolutional layer.
  • Layer is a pooling layer
  • 125 is a convolutional layer
  • 126 is a pooling layer
  • 121 and 122 are convolutional layers
  • 123 is a pooling layer
  • 124 and 125 are convolutional layers
  • 126 is pooling layer. That is, the output of the convolutional layer can be used as the input of the subsequent pooling layer, or it can be used as the input of another convolutional layer to continue the convolution operation.
  • the convolutional layer 121 can include many convolutional operators, which are also called kernels, and their role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator can essentially be a weight matrix, which is usually pre-defined.
  • weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained through training can extract information from the input image, thereby helping the convolutional neural network model 100 to make correct predictions.
  • pooling layer after the convolutional layer, that is, the layers 121-126 as shown in 120 in Figure 3, which can be a convolutional layer followed by a layer
  • the pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
  • the sole purpose of pooling layers is to reduce the spatial size of the image.
  • the convolutional neural network model 100 After being processed by the convolutional layer/pooling layer 120, the convolutional neural network model 100 is not enough to output the required output information. Because as mentioned earlier, the convolutional layer/pooling layer 120 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network model 100 needs to use the neural network layer 130 to generate one or a group of outputs with the required number of classes. Therefore, the neural network layer 130 may include multiple hidden layers (131, 132 to 13n as shown in FIG. 3 ) and an output layer 140, and the parameters contained in the multi-layer hidden layers may be determined according to specific task types. Relevant training data are pre-trained, for example, the task type can include image recognition, image classification, image super-resolution reconstruction, etc.
  • the output layer 140 After the multiple hidden layers in the neural network layer 130 , that is, the last layer of the entire convolutional neural network model 100 is the output layer 140 .
  • the output layer 140 has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error.
  • the convolutional neural network model 100 shown in FIG. 3 is only an example of a convolutional neural network model.
  • the convolutional neural network model can also exist in the form of other network models.
  • multiple convolutional layers/pooling layers as shown in FIG. 4 are parallelized, and the features extracted respectively are input to the full neural network layer 130 for processing.
  • the algorithm based on the convolutional neural network model shown in Fig. 3 and Fig. 4 above can be implemented in the NPU chip.
  • both the deep neural network model and the convolutional neural network model include weights.
  • the training process of the neural network model is the process of continuously updating the weights in the neural network model through multiple iterations.
  • each iteration uses the sample data to calculate the loss function of this iteration, and then the loss function is calculated.
  • the first-order optimization obtains the first-order gradient, and then further additional optimization is performed on the basis of the first-order gradient.
  • the second-order optimization is further performed on the basis of the first-order gradient to obtain the update weight of this iteration, and then use The update weight of this iteration is used to update the model, and then the next iteration is performed on the basis of the model after the weight is updated in this iteration, until the preset number of iterations is updated or the loss is less than the preset value to achieve convergence, and the entire neural network model is completed.
  • the preset value may be a value pre-specified by the user, and the preset number of times may be the number of times pre-specified by the user.
  • the embodiment of the present application provides a neural network model training method.
  • the neural network model obtained through the training of this application can be used to reconstruct a vectorized 3D model.
  • the first computer device can obtain multiple sets of training samples, wherein each set of training samples includes a sample image obtained by shooting a sample object with a preset camera, a target vectorized 3D model of the sample object, and a preset The camera parameters of the camera, and then train the first neural network model based on multiple sets of training samples to obtain the second neural network model.
  • the model training phase can be performed offline, or the first neural network can be periodically retrained or updated model to get a better second neural network model.
  • the second neural network model trained in the model training stage can be applied to the vectorized three-dimensional model reconstruction stage, and the second neural network model can be stored in the second computing device.
  • the target image obtained by shooting the target object with the preset camera and the camera parameters of the preset camera can be obtained through the client, and then the target image and camera parameters are input into the target neural network model, and the target neural network
  • the model is used to obtain the initial layout of the target image through the layout prediction network, determine multiple initial planes according to the initial layout and camera parameters, and intersect multiple initial planes to obtain a vectorized 3D model, and return it to the client.
  • the first computer device or the second computer device in FIG. 5 above may be a server, a terminal device or a virtual machine.
  • an embodiment of the neural network model training method provided by the embodiment of the present application includes:
  • training samples wherein the training samples can be multiple groups, and each group of training samples includes the sample image obtained by shooting the sample object with the preset camera, the target vectorized three-dimensional object of the sample object
  • the camera parameters of the model and the preset camera, the sample objects of each set of training samples are different.
  • the sample object can be any scene, such as indoor and outdoor buildings, specifically, it can be an indoor scene of a room, the sample image is multi-view image data, and there are multiple sample images, that is, the preset camera shoots the sample scene from multiple perspectives
  • the image data obtains multiple sample images ⁇ I i ⁇
  • the preset camera can be a panoramic camera
  • the captured ⁇ I i ⁇ is a panoramic image
  • the camera parameters of the preset camera include a rotation matrix and a translation vector, respectively using ⁇ R i , T i ⁇ means that multiple sample images ⁇ I i ⁇ and camera parameters ⁇ R i ,T i ⁇ are stored in the memory in the form of an array
  • the target vectorized 3D model of the sample object can be a pre-established standard vector 3D models, for example obtained by lidar scanning.
  • the first neural network model is trained based on the training samples, the first neural network model is used to obtain the initial vectorized 3D model of the sample image, and the first neural network model is trained according to the deviation between the initial vectorized 3D model and the target vectorized 3D model. Iteratively updating to obtain the second neural network model. Wherein the second neural network model is used to predict the vectorized three-dimensional model of the target object.
  • the second neural network model can be obtained through repeated training, and the second neural network model can be obtained when the training iterations have been performed for a preset number of times or the training loss is less than a preset value
  • the preset value can be the value specified in advance by the user
  • the preset number of times can be the number of times specified in advance by the user
  • input a plurality of sample images in a set of training samples and the camera parameters of the preset camera to the second neural network model the The deviation between the vectorized three-dimensional model output by the second neural network model and the target vectorized three-dimensional model in the group of training samples is smaller than the preset value required by the user.
  • the first neural network model For a set of training samples, after being input to the first neural network model, the first neural network model is used to obtain the initial layout of the sample image, determine multiple initial planes according to the initial layout and camera parameters, and perform multiple initial planes Intersect to get the initial vectorized 3D model.
  • the first neural network model includes a semantic segmentation network, a layout prediction network, a feature extraction network, and an adaptive weight network.
  • the first neural network model will preprocess the input multi-view image data, and obtain the semantic segmentation results ⁇ S i ⁇ of multiple sample images through a semantic segmentation network, such as HoHoNet, and through a layout prediction network, such as HorizonNet, Get the initial layout ⁇ L i ⁇ of multiple sample images, where the semantic segmentation network can divide each pixel of multiple sample images into different categories, multiple categories can be predefined by the user, for example, the multiple sample images Each pixel is divided into beam, board, bookcase, ceiling, chair, column, door, floor, sofa, Table (table), wall (wall), window (window) and other (clutter) a total of thirteen categories, and the layout prediction network can divide each pixel in multiple sample images into a ceiling under the Manhattan assumption , floors and walls 1,2...n, where n is the number of walls.
  • the above two methods are both deep learning algorithms, so the GPU will be used for parallel computing to ensure that they do not affect the overall solution efficiency.
  • the first neural network model can extract the feature vector ⁇ F ij ⁇ of each pixel in multiple sample images through the feature extraction network, and through the adaptive weight value network to obtain the adaptive weights ⁇ W ij ⁇ of multiple sample images, where the feature extraction network contains a total of 9 two-dimensional convolutional layers, and batch normalization (Batch Normalization) and activation ( Relu), the adaptive weight network contains a total of two two-dimensional convolutional layers, and batch normalization and activation are also performed after each convolution.
  • the camera parameters of the preset camera may also include scale parameters, that is, the height of the preset camera to the ground.
  • the first neural network model is based on the initial layout ⁇ L i ⁇ of the sample image and the default The scale parameter determines multiple initial planes of the sample image.
  • the initial layout ⁇ L i ⁇ contains the pixel coordinates of each plane in the sample scene. Under the Manhattan assumption, the ground and ceiling are horizontal, while the walls are vertical , so the two-dimensional layout information ⁇ L i ⁇ can be converted into three-dimensional plane information by using the default scale parameters according to the panoramic image projection relationship, and the plane equations of multiple initial planes can be obtained according to the pixel coordinates.
  • A, B, and C can be represented by the normal vectors of each plane, and D can be represented by the distance from the origin to the plane, thus obtaining multiple initial planes.
  • D can be represented by the distance from the origin to the plane.
  • the reconstruction problem is transformed into determining the parameter D of each plane, that is, determining multiple alternative planes near each initial plane in multiple initial planes.
  • vx, vy, and vz are any points in the plane, and then according to the camera parameters ⁇ R i , T i ⁇ , with the help of the depth information Depth, the differentiable mapping with the plane as the primitive is completed, and the differentiable mapping of each pixel between multiple sample images is obtained
  • the camera parameters ⁇ R i , T i ⁇ with the help of the depth information Depth
  • Depth the differentiable mapping with the plane as the primitive
  • vx, vy, and vz are the three-dimensional point coordinates corresponding to any pixel point of the plane in a sample image
  • x', y', z' are the three-dimensional point coordinates of the pixel point corresponding to the plane in another sample image
  • each pixel is represented by the feature vector ⁇ F ij ⁇ of each pixel.
  • the first neural network model can use the plane as the basic unit to calculate the consistency cost of multiple sample images according to the correspondence between the pixels of multiple sample images:
  • m is the number of pixels in the plane
  • N is the total number of multiple sample images.
  • V j and V′ ij are respectively It is the value calculated by 3D U-Net for the image features of the pixels corresponding to the reference image and the i-th source image.
  • lidar it is also possible to use lidar to scan the sample scene and obtain the depth map truth value of multiple sample images based on the projection relationship, participate in differentiable mapping and cost accumulation, and use it as a weak supervision signal for adaptive weight calculation.
  • the first neural network model determines the target plane according to the cost of each candidate plane, the candidate plane with the smallest cost is the target plane, and intersects the target planes in the sample scene to obtain the final initial vectorized 3D model, in which each target After the planes are intersected, the boundary of each target plane is determined, and the vectorized 3D model can be determined by removing the part of the plane beyond the boundary.
  • the vectorized 3D model of the overall structure of the room is determined, and the layout of the room is the same, and then the obtained initial vectorized 3D model is compared with the input target vectorized 3D model, for example, the initial vectorized 3D model and the target
  • the vectorized 3D model is converted into a depth map, that is, each 3D point in the 3D space is converted into a 2D depth according to the camera parameters, that is, the vertical distance from each 3D point to the imaging plane, and the value of each distance is formed into a matrix.
  • Subtracting the matrix of the initial vectorized 3D model and the matrix of the target vectorized 3D model can obtain the deviation between the initial vectorized 3D model and the target vectorized 3D model, and repeatedly correct the weights according to the deviation, and continue to compare the new initial vector 3D model and target vectorized 3D model, train the first neural network model until the number of training times reaches the preset number, or the deviation is less than the preset value to achieve convergence, at this time the training of the first neural network model is completed, and the second neural network model is obtained Model.
  • the preset number of times specified by the user is 300, and the first neural network model is updated iteratively for 300 times, it is considered that the current first neural network model converges, that is, the training is completed, and the current first neural network model is The model is used as a second neural network model for subsequent use.
  • the first neural network model is iteratively updated to its depth map and the target vectorized 3D model
  • the root mean square error of the depth map is less than 20 cm, it is considered that the current first neural network model converges, that is, the training is completed, and the current first neural network model is used as the second neural network model for subsequent use.
  • both the reference image and the source image extract the features of each pixel through the feature extraction network, where the feature extraction network can include three
  • the network composed of convolutional layer, batch normalization layer and activation layer has a step size of 1-1-2. You can also use 3 such networks to form a feature extraction network.
  • the reference image will also pass through the semantic segmentation network, layout Prediction network and adaptive weight network, where the adaptive weight network can include a network composed of two convolutional layers, batch normalization layer and activation layer, with a step size of 2-2, semantic segmentation network and layout prediction network.
  • the true value of the depth map can also be used as loss function 1, participate in differentiable mapping and cost accumulation to be used as a weak supervision signal for adaptive weight calculation, use the target weight obtained by adaptive weight network and semantic segmentation network as loss function 2, use loss function 1 and Loss function 2 modifies the training weights.
  • the neural network model trains the sample image and the corresponding vectorized 3D model based on the plane as the reconstruction unit during the training process, so that the vectorized 3D model can be obtained directly by inputting the image into the trained neural network model.
  • the model greatly reduces the reconstruction cost and realizes fully automatic reconstruction without manual intervention, and can also have a better reconstruction effect in weak texture areas.
  • the second neural network model After the second neural network model is obtained through the above training, the second neural network model can be used to perform vectorized 3D model reconstruction.
  • the process of vectorized 3D model reconstruction will be described below with reference to the accompanying drawings.
  • an embodiment of the vectorized 3D model establishment method includes:
  • the vectorized 3D model of the target object is obtained by intersecting multiple initial planes, the multiple initial planes are determined according to the initial layout and camera parameters, and the initial layout is determined by the target neural network model according to the target image.
  • the target neural network model can be the second neural network model in the above-mentioned embodiment, the target image is obtained by shooting the target object with a preset camera, the target neural network model is used to determine the initial layout of the target image, and multiple initial plane, and intersect multiple initial planes to obtain a vectorized 3D model.
  • the target neural network model is also used to determine a plurality of alternative planes near each initial plane in the plurality of initial planes, and determine a plurality of target planes from the plurality of initial planes and the plurality of alternative planes, and the target neural network
  • the model is specifically used to intersect multiple target planes to obtain a vectorized three-dimensional model.
  • the initial layout includes pixel coordinates of multiple initial planes in the target image
  • the target neural network model is specifically used to obtain plane equations of multiple initial planes according to the pixel coordinates, and determine multiple initial planes based on the plane equations of multiple initial planes. Plane equations of multiple candidate planes near each initial plane in the plane to determine multiple candidate planes.
  • the target image is multi-view image data, and there are multiple target images
  • the target neural network model is also used to obtain the consistency cost of multiple target images, and according to the consistency cost from multiple initial planes and multiple backup Determine multiple target planes in the selected plane.
  • the target neural network model is also specifically used to extract the feature vector of each pixel in multiple target images through the feature extraction network, and perform the process according to the camera parameters, the plane equations of multiple initial planes and the plane equations of multiple candidate planes.
  • the differentiable mapping with the plane as the primitive obtains the correspondence between multiple target images, and the consistency cost is obtained with the plane as the basic unit according to the feature vector and the correspondence.
  • the target neural network model is also specifically used to obtain the semantic segmentation results of multiple target images through the semantic segmentation network, obtain the adaptive weights of multiple target images through the adaptive weight network, and combine the semantic segmentation results with the automatic
  • the adaptation weight is used as the target weight to weight and accumulate the consistency cost.
  • the multi-view image of the target object is first obtained through the preset camera, that is, the image pair, and then s1 semantic segmentation, s2 initial layout prediction, and s3 plane parameters are sequentially performed And mask calculation, s4 plane parameter optimization, so as to obtain the vectorized three-dimensional model of the target object, wherein, the mask of s3 can be the ground, ceiling and several walls in the scene of the target object, as shown in Figure 10, execute
  • plane parameters are optimized, AI feature extraction, plane-based differentiable mapping, semantic and self-attention weighted image consistency cost accumulation will be performed sequentially to obtain a vectorized 3D model, among which, when performing differentiable mapping Multiple candidate planes will be selected, and semantic information and AI adaptive weights will be used as target weight corrections when calculating the consistency cost.
  • the first computer device or the second computer device in FIG. 5 includes data access hardware and computing hardware
  • the computing hardware includes a preprocessing module, a parameter optimization module and a deep learning framework.
  • the data access module includes CPU and memory to ensure the storage and reading of data.
  • the preprocessing module is used for semantic segmentation and layout prediction of input image data.
  • the parameter optimization module includes AI feature extraction, differentiable mapping with plane as a unit, and Weighted cost accumulation and other modules
  • the deep learning framework includes the neural network model pre-determined by the user
  • the deep learning framework is used to call the GPU and memory of the computing hardware for training and calculation to ensure the efficiency of the solution. All calculations are performed on the computing hardware.
  • the CPU model can be Intel(R) Xeon(R) Gold 6136CPU@3.00GHz
  • the GPU model is RTX 3090.
  • the deep learning framework is Pytorch, and the version number is v1.8.0.
  • the data used is the 2D-3D-Semantic data set, and the multi-view images of 112 scenes are obtained by sorting the data set for accuracy assessment.
  • the evaluation index Taking the root mean square error of the depth corresponding to the reconstructed 3D model as the evaluation index, it can be seen from Table 1 that the method provided by the embodiment of the present application has significantly improved the reconstruction accuracy compared with the layout prediction of a single image, while the traditional geometric reconstruction method Rebuild failed because not enough reliable matches could be found.
  • Figure 12 and Figure 13 show the results of single image layout prediction and reconstruction results of this scheme.
  • the smallest 3D model in the figure is the 3D point cloud obtained by laser radar scanning, which is used as the true value here, as can be seen from Figure 12
  • the planes of the 3D model obtained by single image layout prediction are far from the true value, while the 3D model reconstructed by this scheme is very close to the true value.
  • this solution uses more convenient and low-cost panoramic image data. Compared with the former, which usually takes hours, the whole reconstruction process of this solution only takes a few seconds, which greatly reduces the cost. And it is a fully automatic method with high efficiency.
  • the embodiment of the present application provides a vectorized 3D model reconstruction scheme with a plane as the primitive.
  • multi-view images are used as input.
  • planes are used as the basic unit of reconstruction, and images are used as Consistency is used as a reference, and the vectorized 3D model reconstruction is realized by determining the optimal 3D plane.
  • the plane as the reconstruction unit can better deal with weak texture and non-Lambertian regions, and can directly obtain the vectorized 3D model through plane intersection, achieving fully automatic vectorized 3D model reconstruction.
  • consistency constraints between multi-view images can improve the integrity and accuracy of vectorized 3D model reconstruction. All steps are implemented by calling the GPU under the AI framework, which can improve the efficiency of the entire reconstruction scheme.
  • the AI model is implemented based on a weakly supervised deep learning scheme that integrates semantics and self-attention mechanisms.
  • semantic information and self-attention mechanisms are introduced for weight calculation.
  • an embodiment of a computer device 1400 provided in the embodiment of the present application includes:
  • the obtaining unit 1401 is used to obtain training samples, the training samples include a sample image, a target vectorized 3D model of the sample object, and camera parameters of a preset camera, and the sample image is obtained by shooting the sample object with the preset camera; the obtaining unit 1401 can execute Step 601 in the above method embodiment.
  • the training unit 1402 is configured to train the first neural network model based on the training samples, the first neural network model is used to obtain the initial vectorized 3D model of the sample image, and the deviation pair between the initial vectorized 3D model and the target vectorized 3D model
  • the first neural network model is iteratively updated to obtain the second neural network model, and the second neural network model is used to predict the vectorized 3D model of the target object; wherein, the initial vectorized 3D model is obtained by intersecting multiple initial planes, and the multiple The initial plane is determined according to the initial layout and camera parameters, and the initial layout is determined according to the sample image by the first neural network model.
  • the training unit 1402 can execute step 602 in the above method embodiment.
  • the training unit 1402 trains the sample image and the corresponding vectorized 3D model based on the plane as the reconstruction unit during the training process, so that the vectorized 3D model can be obtained directly by inputting the image into the trained neural network model.
  • the model greatly reduces the reconstruction cost and realizes fully automatic reconstruction without manual intervention, and can also have a better reconstruction effect in weak texture areas.
  • the first neural network model is also used to determine a plurality of alternative planes near each initial plane in the plurality of initial planes, and determine a plurality of target planes from the plurality of initial planes and the plurality of alternative planes.
  • a neural network model is specifically used to intersect multiple target planes to obtain an initial vectorized three-dimensional model.
  • the initial layout includes pixel coordinates of multiple initial planes in the sample image
  • the first neural network model is specifically used to obtain plane equations of multiple initial planes according to the pixel coordinates, and determine multiple plane equations based on the multiple initial planes. Plane equations of multiple candidate planes near each of the initial planes in the initial plane to determine the multiple candidate planes.
  • the sample image is multi-view image data, and there are multiple sample images.
  • the first neural network model is also used to obtain the consistency cost of the multiple sample images, and according to the consistency cost from multiple initial planes and multiple Identify multiple target planes among the candidate planes.
  • the first neural network model is also specifically used to extract the feature vector of each pixel in multiple sample images through the feature extraction network, and use the plane as the primitive according to the camera parameters, multiple initial planes and multiple candidate planes. Differentiable mapping of differentiable maps to obtain the corresponding relationship between multiple sample images, and according to the feature vector and the corresponding relationship, the consistency cost is obtained with the plane as the basic unit.
  • the first neural network model is also specifically used to obtain semantic segmentation results of multiple sample images through the semantic segmentation network, obtain adaptive weights of multiple sample images through the adaptive weight network, and combine the semantic segmentation results and The adaptive weight is used as the target weight to weight and accumulate the consistency cost.
  • the first neural network model is further specifically used to use the true value of the depth map as a weak supervision signal for differentiable mapping and weighted accumulation, and the true value of the depth map is obtained by scanning multiple sample images.
  • the computer device 1400 provided in the embodiment of the present application can be understood by referring to the corresponding content in the foregoing embodiment of the neural network model training method, and will not be repeated here.
  • an embodiment of a computer device 1500 provided in the embodiment of the present application includes:
  • the acquiring unit 1501 is configured to acquire a target image and camera parameters of a preset camera, and the target image is obtained by shooting a target object with the preset camera; the acquiring unit 1001 may execute step 801 in the above method embodiment.
  • a processing unit 1502 configured to input the target image and the camera parameters into the target neural network model to predict a vectorized 3D model of the target object; wherein, the vectorized 3D model of the target object is a plurality of initial planes Obtained by performing intersection, the multiple initial planes are determined according to the initial layout and the camera parameters, and the initial layout is determined according to the target image by the target neural network model.
  • the processing unit 1502 may execute step 802 in the foregoing method embodiment.
  • the target neural network model is also used to determine a plurality of alternative planes near each initial plane in the plurality of initial planes, and determine a plurality of target planes from the plurality of initial planes and the plurality of alternative planes, and the target neuron
  • the network model is specifically used to intersect multiple target planes to obtain a vectorized three-dimensional model.
  • the initial layout includes pixel coordinates of multiple initial planes in the target image
  • the target neural network model is specifically used to obtain plane equations of multiple initial planes according to the pixel coordinates, and determine multiple initial planes based on the plane equations of multiple initial planes. Plane equations of multiple candidate planes near each initial plane in the plane to determine multiple candidate planes.
  • the target image is multi-view image data, and there are multiple target images
  • the target neural network model is also used to obtain the consistency cost of multiple target images, and according to the consistency cost from multiple initial planes and multiple backup Determine multiple target planes in the selected plane.
  • the target neural network model is also specifically used to extract the feature vector of each pixel in multiple target images through the feature extraction network, and perform plane-based Differentiable mapping is used to obtain the corresponding relationship between multiple target images, and the consistency cost is obtained based on the feature vector and the corresponding relationship with the plane as the basic unit.
  • the target neural network model is also specifically used to obtain the semantic segmentation results of multiple target images through the semantic segmentation network, obtain the adaptive weights of multiple target images through the adaptive weight network, and combine the semantic segmentation results with the automatic
  • the adaptation weight is used as the target weight to weight and accumulate the consistency cost.
  • the computer device 1500 provided in the embodiment of the present application can be understood by referring to the corresponding content in the foregoing embodiment of the neural network model training method, and details are not repeated here.
  • FIG. 16 is a schematic diagram of a possible logical structure of a computer device 1600 provided by an embodiment of the present application.
  • the computer device 1600 includes: a processor 1601, a communication interface 1602, a memory 1603, and a bus 1604.
  • the processor 1601 may include a CPU, or at least one of CPU, GPU, NPU, and other types of processors.
  • the processor 1601 , the communication interface 1602 and the memory 1603 are connected to each other through a bus 1604 .
  • the processor 1601 is used to control and manage the actions of the computer device 1600, for example, the processor 1601 is used to execute steps 601 and 602 in FIG. 6, and steps 801 and 802 and and/or other processes for the techniques described herein.
  • the communication interface 1602 is used to support the computer device 1600 to communicate.
  • the memory 1603 is used for storing program codes and data of the computer device 1600 .
  • the processor 1601 may be a central processing unit, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It can implement or execute the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
  • the processor may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like.
  • the bus 1604 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus, etc.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the computer device 1700 includes: a hardware layer 1701 and a virtual machine (virtual machine, VM) layer 1702, and the VM layer may include one or more VMs.
  • the hardware layer 1701 provides hardware resources for the VM to support the running of the VM.
  • the functions of the VM and the processes related to this application can be understood by referring to the corresponding part of the description in the above embodiments.
  • the hardware layer 1701 includes hardware resources such as a processor, a communication interface, and a memory.
  • the processor may include a CPU, or a CPU and at least one of a GPU and an NPU.
  • a computer-readable storage medium in which computer-executable instructions are stored, and when at least one processor of the device executes the computer-executable instructions, the device executes the above implementation
  • the neural network model training method or the vectorized 3D model reconstruction method described in the example is also provided.
  • a computer program product in another embodiment, includes computer-executable instructions, and the computer-executable instructions are stored in a computer-readable storage medium; Reading the storage medium reads the computer-executed instructions, and at least one processor executes the computer-executed instructions to make the device execute the neural network model training method or the vectorized three-dimensional model reconstruction method described in the above embodiments.
  • a chip system in another embodiment, is also provided.
  • the chip system includes at least one processor and an interface, the interface is used to receive data and/or signals, and the at least one processor is used to support the implementation of the above-mentioned embodiment.
  • a neural network model training method or a vectorized 3D model reconstruction method is described.
  • the system-on-a-chip may further include a memory, and the memory is used for storing necessary program instructions and data of the computer device.
  • the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
  • the disclosed system, device and method can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, read-only memory), random access memory (RAM, random access memory), magnetic disk or optical disc, etc., which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Image Analysis (AREA)

Abstract

La présente demande concerne le domaine technique de l'intelligence artificielle (AI), et sont divulgués dans les modes de réalisation de la présente invention un procédé d'entrainement de modèle de réseau neuronal, ainsi qu'un procédé et un dispositif d'établissement de modèle tridimensionnel vectorisé. Dans la solution, un modèle de réseau neuronal entraîne une image d'échantillon et un modèle tridimensionnel vectorisé correspondant dans un processus d'entraînement sur la base d'un plan en tant qu'unité de reconstruction, de telle sorte qu'une image peut être directement introduite dans un modèle de réseau neuronal entraîné pour obtenir le modèle tridimensionnel vectorisé, une reconstruction entièrement automatique est mise en œuvre tout en réduisant considérablement les coûts de reconstruction, l'intervention manuelle n'est pas nécessaire, et un bon effet de reconstruction peut être obtenu dans une zone de texture faible.
PCT/CN2022/131344 2021-11-16 2022-11-11 Procédé d'entrainement de modèle de réseau neuronal, et procédé et dispositif d'établissement de modèle tridimensionnel vectorisé WO2023088177A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111357123.7 2021-11-16
CN202111357123.7A CN116151358A (zh) 2021-11-16 2021-11-16 神经网络模型训练方法、矢量化三维模型建立方法及设备

Publications (1)

Publication Number Publication Date
WO2023088177A1 true WO2023088177A1 (fr) 2023-05-25

Family

ID=86337568

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/131344 WO2023088177A1 (fr) 2021-11-16 2022-11-11 Procédé d'entrainement de modèle de réseau neuronal, et procédé et dispositif d'établissement de modèle tridimensionnel vectorisé

Country Status (2)

Country Link
CN (1) CN116151358A (fr)
WO (1) WO2023088177A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111080804A (zh) * 2019-10-23 2020-04-28 贝壳技术有限公司 三维图像生成方法及装置
CN112116613A (zh) * 2020-09-25 2020-12-22 贝壳技术有限公司 模型训练方法、图像分割方法、图像矢量化方法及其系统
US20210104096A1 (en) * 2019-10-02 2021-04-08 Google Llc Surface geometry object model training and inference
US20210209340A1 (en) * 2019-09-03 2021-07-08 Zhejiang University Methods for obtaining normal vector, geometry and material of three-dimensional objects based on neural network
WO2021147113A1 (fr) * 2020-01-23 2021-07-29 华为技术有限公司 Procédé d'identification de catégorie sémantique de plan et appareil de traitement de données d'image

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210209340A1 (en) * 2019-09-03 2021-07-08 Zhejiang University Methods for obtaining normal vector, geometry and material of three-dimensional objects based on neural network
US20210104096A1 (en) * 2019-10-02 2021-04-08 Google Llc Surface geometry object model training and inference
CN111080804A (zh) * 2019-10-23 2020-04-28 贝壳技术有限公司 三维图像生成方法及装置
WO2021147113A1 (fr) * 2020-01-23 2021-07-29 华为技术有限公司 Procédé d'identification de catégorie sémantique de plan et appareil de traitement de données d'image
CN112116613A (zh) * 2020-09-25 2020-12-22 贝壳技术有限公司 模型训练方法、图像分割方法、图像矢量化方法及其系统

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHUHANG ZOU; ALEX COLBURN; QI SHAN; DEREK HOIEM: "LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image", ARXIV.ORG, 23 March 2018 (2018-03-23), XP080862467 *
JINGWEI HUANG; YICHAO ZHOU; THOMAS FUNKHOUSER; LEONIDAS GUIBAS: "FrameNet: Learning Local Canonical Frames of 3D Surfaces from a Single RGB Image", ARXIV.ORG, 29 March 2019 (2019-03-29), XP081159560 *

Also Published As

Publication number Publication date
CN116151358A (zh) 2023-05-23

Similar Documents

Publication Publication Date Title
US11232286B2 (en) Method and apparatus for generating face rotation image
WO2021175050A1 (fr) Procédé et dispositif de reconstruction tridimensionnelle
CN110458939B (zh) 基于视角生成的室内场景建模方法
US11544900B2 (en) Primitive-based 3D building modeling, sensor simulation, and estimation
CN109685152B (zh) 一种基于dc-spp-yolo的图像目标检测方法
CN112927357B (zh) 一种基于动态图网络的3d物体重建方法
CN109559320B (zh) 基于空洞卷积深度神经网络实现视觉slam语义建图功能的方法及系统
JP6200989B2 (ja) 物体姿勢認識
WO2022178952A1 (fr) Procédé et système d'estimation de position cible basés sur un mécanisme d'attention et un vote de hough
CN111832592B (zh) Rgbd显著性检测方法以及相关装置
Tippetts et al. Dense disparity real-time stereo vision algorithm for resource-limited systems
WO2021165628A1 (fr) Génération de modèles tridimensionnels d'objets à partir d'images bidimensionnelles
WO2023164933A1 (fr) Procédé de modélisation de bâtiment et appareil associé
CN110222718B (zh) 图像处理的方法及装置
CN113553943B (zh) 目标实时检测方法以及装置、存储介质、电子装置
CN114219855A (zh) 点云法向量的估计方法、装置、计算机设备和存储介质
CN116310219A (zh) 一种基于条件扩散模型的三维脚型生成方法
WO2022194035A1 (fr) Procédé et appareil permettant de construire un modèle tridimensionnel, ainsi que procédé et appareil d'apprentissage d'un réseau neuronal
CN115018999A (zh) 一种多机器人协作的稠密点云地图构建方法及装置
CN117115339A (zh) 一种基于NeRF 5D神经辐射场的建筑三维重建与损伤识别方法
CN113139967A (zh) 点云实例分割方法及相关系统、存储介质
WO2023088177A1 (fr) Procédé d'entrainement de modèle de réseau neuronal, et procédé et dispositif d'établissement de modèle tridimensionnel vectorisé
CN116486038A (zh) 一种三维构建网络训练方法、三维模型生成方法以及装置
CN114820755A (zh) 一种深度图估计方法及系统
CN113592705A (zh) 一种房型结构分析方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22894714

Country of ref document: EP

Kind code of ref document: A1