WO2021232941A1 - 三维模型生成方法、装置、计算机设备及存储介质 - Google Patents
三维模型生成方法、装置、计算机设备及存储介质 Download PDFInfo
- Publication number
- WO2021232941A1 WO2021232941A1 PCT/CN2021/083268 CN2021083268W WO2021232941A1 WO 2021232941 A1 WO2021232941 A1 WO 2021232941A1 CN 2021083268 W CN2021083268 W CN 2021083268W WO 2021232941 A1 WO2021232941 A1 WO 2021232941A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature
- semantic
- map
- point
- feature map
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/20—Finite element generation, e.g. wire-frame surface description, tesselation
- G06T17/205—Re-meshing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
Definitions
- the present disclosure relates to the field of computer technology, and in particular to a method, device, computer equipment and storage medium for generating a three-dimensional model.
- the human body 3D model has played an increasingly important role.
- human body action recognition By generating a human body 3D model, human body action recognition, human-computer interaction, etc. can be realized.
- the 3D human body of the target object is generated on the basis of the preset 3D template human body model.
- Model due to the limited expressive ability of the preset three-dimensional template human body model, the accuracy is limited when generating the three-dimensional human body model of the target object. For example, if a person's figure is different from an ordinary person's figure, the accuracy of the generated three-dimensional human body model will be affected due to the limited expressive ability of the existing three-dimensional template human body model.
- the embodiments of the present disclosure provide at least a three-dimensional model generation method, device, computer equipment, and storage medium.
- embodiments of the present disclosure provide a method for generating a three-dimensional model, including:
- the pre-trained first neural network determine the global feature vector, local feature map, and semantic feature map of the image to be processed.
- the feature value of any first feature point in the semantic feature map includes the first feature point in the Semantic coordinates in the semantic space;
- the second feature point corresponding to the first feature point in the local feature map is transformed into the semantic space to form a semantic feature point.
- Feature points constitute a semantic local feature map
- the location map includes the target object in the image to be processed The semantic coordinates and three-dimensional position coordinates of each first position point of;
- a three-dimensional model corresponding to the target object is generated.
- the position map since the three-dimensional model is generated by predicting the position map, the position map includes the three-dimensional position coordinates of each first position point of the target object. After the position map is predicted, it can be predicted based on The position map generates a three-dimensional model corresponding to the target object. Therefore, the method provided in the present disclosure may not be restricted by the expression ability of the preset three-dimensional model, and the generated three-dimensional model has higher accuracy.
- the semantic coordinates of the first feature point include the coordinate value of the first feature point in at least one coordinate direction in the semantic space; the feature value of the first feature point also includes all The semantics of the first feature point is the probability of the target object.
- the determining the global feature vector, the local feature map, and the semantic feature map of the image to be processed based on the pre-trained first neural network includes: down-sampling the image to be processed Process to obtain an intermediate feature map; determine the global feature vector and the local feature map based on the intermediate feature map; perform feature extraction on the local feature map to obtain the semantic feature map.
- determining the global feature vector and the local feature map based on the intermediate feature map includes: performing pooling processing and full connection processing on the intermediate feature map to obtain the to-be-processed The global feature vector corresponding to the image; and, performing up-sampling processing on the intermediate feature map to obtain a local feature map corresponding to the image to be processed.
- the second feature point corresponding to the first feature point in the local feature map is transformed into the semantic space based on the feature value of the first feature point in the semantic feature map , Forming semantic feature points, the semantic feature points forming a semantic local feature map, including: determining that the first feature point is in the pre-generated object semantic map based on the semantic coordinates corresponding to the first feature point in the semantic feature map
- the target location point of the object the object semantic map includes multiple second location points of the three-dimensional preset object and the semantic coordinates of the multiple second location points; the second location point includes the first location point;
- the feature value of the target location point in the object semantic map is updated to the feature value of the target location point at a corresponding position in the local feature map to obtain the semantic local feature map.
- the generating a position map corresponding to the image to be processed based on the semantic local feature map, the global feature vector, and a pre-trained second neural network includes: based on the Global feature vector to generate a global feature map; fuse the semantic image feature map, the global feature map, and the reference position map of the pre-generated reference object in the semantic space to obtain a fusion feature map; combine the fusion feature The image is input into the second neural network to obtain a position map corresponding to the image to be processed.
- the prediction is made through the semantic local feature map and the global feature vector, where both the local features of the image to be processed and the global features of the image to be processed are combined. Therefore, when generating a three-dimensional model corresponding to the target object through the position map, the three-dimensional model has higher accuracy in terms of local details.
- the generating a global feature map based on the global feature vector includes: copying the global feature vector multiple times, and splicing the multiple global feature vectors after the copying.
- the feature vectors of to form the global feature map, and the size of the global feature map is the same as the size of the local feature map.
- generating a three-dimensional model corresponding to the target object based on the location map includes: based on the semantic coordinates of the first location point in the location map, comparing the first location in the location map Points are sampled to obtain each sampling point; based on the three-dimensional position coordinates of each sampling point, a three-dimensional model corresponding to the target object is generated.
- the sampling of the first location point in the location map based on the semantic coordinates of the first location point in the location map to obtain each sampling point includes: according to a preset reference For the reference semantic coordinates of the sampling points, the first location points whose corresponding semantic coordinates are the same as the reference semantic coordinates are filtered out from the location map, and the filtered first location points are used as the sampling points.
- the generating a three-dimensional model corresponding to the target object based on the three-dimensional position coordinates corresponding to each sampling point includes: using the three-dimensional position coordinates corresponding to each sampling point as the vertex of the three-dimensional grid Based on the three-dimensional position coordinates of the vertices of each three-dimensional grid, the three-dimensional model including each three-dimensional grid is generated.
- the embodiments of the present disclosure also provide a three-dimensional model generating device, including:
- the determining module is used to determine the global feature vector, the local feature map, and the semantic feature map of the image to be processed based on the pre-trained first neural network.
- the feature value of any first feature point in the semantic feature map includes the The semantic coordinates of the first feature point in the semantic space;
- the conversion module is configured to convert the second feature point corresponding to the first feature point in the local feature map into the semantic space based on the feature value of the first feature point in the semantic feature map to form a semantic feature Points, the semantic feature points constitute a semantic local feature map;
- the first generation module is configured to generate a location map corresponding to the image to be processed based on the semantic local feature map, the global feature vector, and a second neural network trained in advance; the location map includes the Semantic coordinates and three-dimensional position coordinates of each first position point of the target object in the image to be processed;
- the second generating module is configured to generate a three-dimensional model corresponding to the target object based on the location map.
- the semantic coordinates of the first feature point include coordinate values of the first feature point in at least one coordinate direction in the semantic space;
- the feature value of the first feature point further includes the probability that the semantics of the first feature point is the target object.
- the determining module is used to determine the global feature vector, local feature map, and semantic feature map of the image to be processed based on the pre-trained first neural network:
- the determining module when determining the global feature vector and the local feature map based on the intermediate feature map, is configured to:
- the conversion module converts the second feature point corresponding to the first feature point in the local feature map based on the feature value of the first feature point in the semantic feature map
- semantic feature points are formed.
- the semantic feature points form a semantic local feature map, they are used to:
- the object semantic map includes a plurality of three-dimensional preset objects A second location point and semantic coordinates of the plurality of second location points; the second location point includes the first location point;
- the feature value of the target location point in the object semantic map is updated to the feature value of the target location point at a corresponding position in the local feature map to obtain the semantic local feature map.
- the first generation module generates a position map corresponding to the image to be processed based on the semantic local feature map, the global feature vector, and a pre-trained second neural network.
- a position map corresponding to the image to be processed based on the semantic local feature map, the global feature vector, and a pre-trained second neural network.
- the fusion feature map is input into the second neural network to obtain a position map corresponding to the image to be processed.
- the first generating module is configured to: when generating a global feature map based on the global feature vector:
- the global feature vector is copied multiple times, and the multiple global feature vectors after the copy are spliced.
- the spliced feature vectors constitute the global feature map.
- the size of the global feature map is the same as the size of the local feature map. The dimensions are the same.
- the second generation module is configured to: when generating a three-dimensional model corresponding to the target object based on the position map:
- a three-dimensional model corresponding to the target object is generated.
- the second generation module samples the first location point in the location map based on the semantic coordinates of the first location point in the location map, and obtains each sampling point, Used for:
- the first location points whose corresponding semantic coordinates are the same as the reference semantic coordinates are screened out from the location map, and the screened first location points are used as the sampling points.
- the second generation module is configured to: when generating a three-dimensional model corresponding to the target object based on the three-dimensional position coordinates corresponding to each sampling point:
- the three-dimensional model including each three-dimensional grid is generated.
- embodiments of the present disclosure also provide a computer device, including a processor, a memory, and a bus.
- the memory stores machine-readable instructions executable by the processor.
- the processing The processor communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, the above-mentioned first aspect or the steps in any one of the possible implementation manners of the first aspect are executed.
- the embodiments of the present disclosure also provide a computer-readable storage medium with a computer program stored on the computer-readable storage medium.
- the computer program executes the first aspect or any of the first aspects when the computer program is run by a processor. Steps in one possible implementation.
- FIG. 1 shows a flowchart of a method for generating a three-dimensional model provided by an embodiment of the present disclosure
- FIG. 2 shows a flowchart of a method for determining a location map provided by an embodiment of the present disclosure
- FIG. 3 shows a schematic diagram of a process of generating a three-dimensional human body model provided by an embodiment of the present disclosure
- FIG. 4 shows a training method for preliminary training of a first neural network provided by an embodiment of the present disclosure
- Figure 5 shows a neural network training method provided by an embodiment of the present disclosure
- FIG. 6 shows a schematic structural diagram of a three-dimensional model generating apparatus provided by an embodiment of the present disclosure
- Fig. 7 shows a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.
- the position map since the three-dimensional model is generated by predicting the position map, the position map includes the three-dimensional position coordinates of each first position point of the target object. After the position map is predicted, it can be predicted based on The position map generates a three-dimensional model corresponding to the target object. Therefore, the method provided in the present disclosure may not be restricted by the expression ability of the preset three-dimensional model, and the generated three-dimensional model has higher accuracy.
- predicting neural network model parameters when predicting neural network model parameters, it generally extracts the global features of the image to be processed, and then predicts the neural network model parameters based on the extracted global features. This method ignores the local features of the target object, so , The generated three-dimensional model has a poor ability to express local details.
- the prediction when predicting the location map corresponding to the image to be processed, the prediction is performed by combining the semantic local feature map and the global feature vector.
- both the local features of the image to be processed are combined with The global characteristics of the image, therefore, when generating a three-dimensional model corresponding to the target object through the position map, the three-dimensional model has higher accuracy in terms of local details.
- the execution subject of the method for generating a three-dimensional model provided by the embodiment of the present disclosure is generally a computer device with a certain computing capability.
- the computer equipment includes, for example, terminal equipment or servers or other processing equipment.
- the terminal equipment may be User Equipment (UE), mobile equipment, user terminal, terminal, cellular phone, personal digital assistant (PDA), Handheld devices, computing devices, vehicle-mounted devices, etc.
- UE User Equipment
- PDA personal digital assistant
- FIG. 1 it is a flowchart of a method for generating a three-dimensional model according to an embodiment of the present disclosure.
- the method includes the following steps:
- Step 101 Based on the pre-trained first neural network, determine the global feature vector, the local feature map, and the semantic feature map of the image to be processed.
- the feature value of any first feature point in the semantic feature map includes the first The semantic coordinates of the feature point in the semantic space.
- Step 102 Based on the feature value of the first feature point in the semantic feature map, transform a second feature point corresponding to the first feature point in the local feature map into a semantic space to form a semantic feature point,
- the semantic feature points constitute a semantic local feature map.
- Step 103 Generate a location map corresponding to the image to be processed based on the semantic local feature map, the global feature vector, and the pre-trained second neural network; the location map includes the image to be processed The semantic coordinates and three-dimensional position coordinates of each first position point of the target object.
- Step 104 Generate a three-dimensional model corresponding to the target object based on the location map.
- the image to be processed may be an RGB image including a target object, for example, a person, an animal, a static object (such as a table, a cup), and the like.
- a target object for example, a person, an animal, a static object (such as a table, a cup), and the like.
- pre-stored images to be processed can be obtained from a database, or images to be processed sent by other user terminals can be received, which is not limited by the present disclosure.
- the semantic space is a space in which semantic coordinates and real position points are mapped one by one.
- each semantic coordinate corresponds to a real position point.
- the semantic coordinates (u 1 , v 1 ) can represent the index finger of the left hand.
- the coordinates (u 2 , v 2 ) can represent the left middle finger.
- the semantic coordinates of the left index finger in the feature map are all (u 1 , v 1 ), the semantics of the left index finger
- the coordinates are all (u 2 , v 2 ).
- the semantic feature map includes a plurality of first feature points, and the feature value of the first feature point may be the value of the first feature point on different channels.
- the semantic feature map may be a three-channel feature map.
- the values of each first feature point in the semantic feature map on different channels can be used to represent the probability that the first feature point belongs to the target object, and the The coordinate values of the first feature point in different coordinate directions in the semantic space.
- the probability that the first feature point in the semantic feature map belongs to the target object is used to distinguish the part of the feature points belonging to the target object and the feature points belonging to the background part except the target object in the semantic feature map.
- the probability is greater than or equal to the preset probability value, it is determined that the first feature point belongs to the target object, and when the probability is less than the preset probability value, it is determined that the first feature point belongs to the background part.
- the first feature point belonging to the target object can be The second feature point corresponding to the local feature map is transformed into the semantic space, so that the influence of the background part on the generated target 3D model can be avoided.
- the semantic coordinates of the first feature point whose probability value is greater than or equal to the preset probability value are within the preset coordinate value range, and the probability value is less than the preset probability value.
- the semantic coordinates of the first feature points may be the same and not within the preset coordinate value range, for example, they may all be 0.
- the feature value of the first feature point in the semantic feature map may only include the semantic coordinates of the first feature point in the semantic space, and does not include the probability that the first feature point belongs to the target object.
- the semantic coordinates of the first feature points belonging to the target object can be the same, for example, they can all be 0.
- the first feature point can be passed The semantic coordinates corresponding to the points are distinguished.
- the image to be processed when determining the global feature vector, local feature map, and semantic feature map of the image to be processed based on the trained first neural network, can be down-sampled first to obtain the intermediate feature map, and then based on the intermediate feature map.
- the feature map determines the global feature vector and the local feature map, and then performs feature extraction on the local feature map to obtain a semantic feature map.
- the intermediate feature map when determining the global feature vector and the local feature map based on the intermediate feature map, the intermediate feature map can be pooled and fully connected (that is, the pooling layer and the fully connected layer are input sequentially) to obtain the global corresponding to the image to be processed Feature vector; and, performing up-sampling processing on the intermediate feature map to obtain a local feature map corresponding to the image to be processed.
- the local feature map can be input into the convolutional layer, and the semantic feature map can be obtained by outputting.
- the intermediate feature map After the intermediate feature map is pooled and fully connected, the dimensionality of the intermediate feature map is reduced, and its spatial resolution (ie size) is 1*1. Therefore, the intermediate feature map is obtained after the pooling process and the fully connected process It is an N-dimensional global feature vector; after upsampling, the intermediate feature map has the same spatial resolution as the image to be processed. Therefore, the intermediate feature map is a multi-channel local feature map after upsampling. The number of channels can be N.
- the training process of the first neural network will be introduced in the following content, which will not be explained temporarily.
- step 102 For step 102:
- an object semantic map may be generated in advance, and the object semantic map includes a plurality of second position points of the three-dimensional preset object and the semantic coordinates of the plurality of second position points.
- the multiple second position points included in the object semantic map may be the position points of each position of the three-dimensional preset object, and each position includes the position point covering the limb surface of the three-dimensional preset object; and the step 101 is described in step 101
- the first location point of the target object involved in the semantic feature map can be understood as the location point of the visible part of the target object in the image to be processed, and the second location point includes the first location point, that is, the object semantic map Part of the included second location points may overlap with the first location points included in the semantic feature map.
- the second location point with a spatial association relationship also has an association relationship with the corresponding semantic coordinates. For example, if the real positions of two second location points in the three-dimensional preset object are adjacent, then the In the object semantic graph of, the semantic points corresponding to the two second position points are also adjacent.
- an object semantic map when generating an object semantic map, different parts of a three-dimensional preset object are generated separately. For example, to generate an object semantic map of a human body, the human head, torso, left arm, right arm, and The left leg and the right leg are respectively generated as a whole. The position points on the same whole are related in the generated object semantic graph, but the position points on different wholes do not have the generated object semantic graph. connection relation.
- the solution adopted in the present disclosure is to take the three-dimensional preset object as a whole to generate an object semantic map, and the semantic coordinates of each second position point on the three-dimensional preset object have an association relationship, so that the generated object semantics
- the map combines the spatial position relationship of each second position point. Therefore, when the three-dimensional model is generated, the spatial relative position relationship of each position point is more accurate, and the accuracy of the generated three-dimensional model can be improved.
- the local feature map includes multiple second feature points.
- the first feature point in the semantic feature map corresponds to the second feature point in the local feature map.
- Each first feature point is at the corresponding position of the local feature map.
- Each of the second feature points corresponds to the first feature point.
- the second feature point in the local feature map When converting the second feature point in the local feature map to the semantic space based on the feature value of the first feature point in the semantic feature map, it can be determined based on the semantic coordinates corresponding to each first feature point in the semantic feature map
- the first feature point is a target location point in the pre-generated object semantic map, and then the feature value of the target location point in the object semantic map is updated to the feature value of the corresponding location map of the target location point in the local feature map.
- the first feature point P 1 in the semantic feature map corresponds to the second feature point P 2 in the local feature map.
- the first feature can be determined first The point P 1 is at the target location point M in the pre-generated object semantic graph, and then the feature value of the point M in the object semantic graph is updated to the feature value of the second feature point P 2 .
- each semantic feature point in the local feature map After converting each second feature point in the local feature map into the semantic space, the semantic feature point corresponding to each second feature point is obtained, and each semantic feature point constitutes a semantic local feature map.
- the method shown in Figure 2 can be referred to, including the following steps:
- Step 201 Generate a global feature map based on the global feature vector.
- the global feature vector can be copied multiple times, and the multiple global feature vectors after the copy can be spliced together.
- the spliced global feature vector constitutes a global feature map, and the size of the global feature map is the same as the size of the local feature map. same.
- the global feature vector can be copied 64*64 times, and the copied global feature vector is The size of the local feature map is spliced, and a 64*64*128 feature vector is obtained after splicing, which is a global feature map.
- Step 202 Fusion the semantic image feature map, the global feature map, and the pre-generated reference position map of the reference object in the semantic space to obtain a fused feature map.
- the reference location map is a three-channel location map.
- the values of the location points in the reference location map on the channels represent the three-dimensional location coordinates of the location point.
- Each location point in the reference location map corresponds to two types of coordinates. , One is the semantic coordinates in the semantic space, and the other is the three-dimensional position coordinates corresponding to the location point.
- the semantic image feature map and the global feature map can be concatenated first to obtain the first feature map, and then the first feature map can be concatenated with the reference position map to obtain the fused feature map.
- the semantic image feature map, the global feature map, and the reference position map have the same size.
- the channel values corresponding to the feature points at the same position can be spliced.
- the semantic image feature map is at position N
- the middle position N corresponds to feature point A
- the channel dimension corresponding to feature point A is x
- the position N in the global feature map corresponds to feature point B
- the channel dimension corresponding to feature point B is y
- the corresponding feature at position N in the reference position map Point C the channel dimension corresponding to feature point C is z
- the channel dimension corresponding to feature point at position N of the fused feature map is x+ y+z.
- the number of channels in the fusion feature map is also x+y+z.
- the reference position map can be used as a priori information to avoid too much error in the generated position map.
- Step 203 Input the fusion feature map into the second neural network to obtain a position map corresponding to the image to be processed.
- the second neural network may first perform down-sampling processing on the fusion feature map, and then perform up-sampling processing, and output a position map corresponding to the image to be processed.
- the location map corresponding to the image to be processed is also a three-channel image, and the values of each channel respectively represent values in different coordinate directions in the real world coordinate system, and each location point corresponds to the real world coordinate system The values in different coordinate directions are used as the three-dimensional position coordinates of the position point.
- step 104 For step 104:
- the first location point in the location map can be sampled based on the semantic coordinates of the first location point in the location map to obtain each sampling point, and then based on each The three-dimensional coordinate information corresponding to the sampling point is used to generate a three-dimensional model corresponding to the target object.
- the corresponding semantic coordinates can be filtered from the location map according to the preset reference semantic coordinates of the reference sampling point
- the first position point whose coordinates are the same as the reference semantic coordinates, and the selected first position point is used as the sampling point.
- a person’s left hand can be preset to be represented by 300 sampling points, 300 sampling points are used as reference sampling points, and each reference sampling point corresponds to a reference semantic coordinate, and then the corresponding semantic coordinates and The first location point with the same reference semantic coordinates of each reference sampling point is then the filtered first location point as the sampling point.
- the three-dimensional position coordinates corresponding to each sampling point can be used as the three-dimensional position coordinates of the vertices of the three-dimensional grid to generate a three-dimensional three-dimensional grid including each three-dimensional grid.
- a three-dimensional model including each three-dimensional grid can be generated by means of rendering. After the 3D model is generated, the 3D model can be displayed through the user terminal.
- the three-dimensional model corresponding to the human body can be generated by the above-mentioned method, and then the human body action recognition can be performed according to the generated three-dimensional model, and the interaction between the user and the machine can be realized through the result of the human body action recognition ;
- the RGB image of the human body is input to the first neural network.
- the first neural network first downsampling the RGB image to obtain the intermediate feature map, and then up-sampling the intermediate feature map to obtain the local feature map, and the intermediate feature map
- After outputting the pooling layer input the fully connected layer to obtain the global feature vector, and then perform feature extraction on the local feature map to obtain the semantic feature map.
- the local feature map is converted into the semantic space to obtain the semantic local feature map; and based on the global feature vector, the global feature map is generated, and then the global feature map, the semantic local feature map, and the reference human body are generated
- the reference position maps of are connected in series, the serialized feature maps are input to the second neural network, and the position map corresponding to the predicted RGB image is output, and then based on the predicted position map, a three-dimensional human body model is generated.
- the position map since the three-dimensional model is generated by predicting the position map, the position map includes the three-dimensional position coordinates of each first position point of the target object. After the position map is predicted, it can be predicted based on The position map generates a three-dimensional model corresponding to the target object. Therefore, the method provided in the present disclosure may not be restricted by the expression ability of the preset three-dimensional model, and the generated three-dimensional model has higher accuracy.
- the first neural network and the second neural network When the first neural network and the second neural network are trained, the first neural network can be initially trained, and then based on the first neural network after the initial training, the first neural network and the second neural network can be jointly trained .
- a training method for preliminary training of a first neural network includes the following steps:
- Step 401 Obtain a sample image and a reference semantic feature image corresponding to the sample image.
- Step 402 Input the sample image into the first neural network, and output the predicted semantic feature image.
- Step 403 Determine the first loss value in this training process based on the predicted semantic feature image and the reference semantic feature image.
- Step 404 Determine whether the first loss value is less than a first preset value.
- Step 405 Determine that the first neural network used in this training process is the first neural network completed by the preliminary training.
- the first neural network and the second neural network can be jointly trained. Specifically, refer to the neural network training method shown in Figure 5, which includes the following steps:
- Step 501 Obtain a sample image, a reference semantic feature map corresponding to the sample image, and a sample position map corresponding to the sample image.
- Step 502 Input the sample image into the first neural network, and output a global feature vector, a local feature map, and a predicted semantic feature map.
- Step 503 Based on the first feature point in the predicted semantic feature map, transform a second feature point corresponding to the first feature point in the local feature map into a semantic space to form a semantic feature point.
- the semantic feature points constitute a semantic local feature map.
- Step 504 Based on the semantic local feature map, the global feature vector, and the pre-trained second neural network, generate a predicted position map corresponding to the image to be processed.
- Step 505 Determine a second loss value in the current training process based on the predicted semantic feature map, the reference semantic feature map, the predicted location map, and the reference location map.
- the first prediction loss can be determined based on the predicted semantic feature map and the reference semantic feature map, and the second prediction loss can be determined based on the predicted location map and the reference location map, and then the first prediction loss can be determined.
- the sum of the second predicted loss is used as the second loss value.
- a three-dimensional human body model can also be generated based on the predicted position map, and then the three-dimensional human body model can be projected according to the shooting angle of the sample image to obtain the projected image, and then based on the projected image and the sample image, determine the third Predict the loss, and then perform a weighted summation of the first prediction loss, the second prediction loss, and the third prediction loss, and use the sum result as the second loss value.
- Step 506 Determine whether the second loss value is less than a second preset value.
- step 507 If the judgment result is yes, go to step 507;
- Step 507 Determine that the first neural network used in this training process is the first neural network that has been trained, and determine that the second neural network used in this training process is the second neural network that has been trained.
- the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process.
- the specific execution order of each step should be based on its function and possibility.
- the inner logic is determined.
- the embodiment of the present disclosure also provides a three-dimensional model generation device corresponding to the three-dimensional model generation method. Since the principle of the device in the embodiment of the present disclosure to solve the problem is similar to the above-mentioned three-dimensional model generation method of the embodiment of the present disclosure, The implementation of the device can refer to the implementation of the method, and the repetition will not be repeated.
- FIG. 6 there is a schematic diagram of the architecture of a three-dimensional model generating apparatus provided by an embodiment of the present disclosure.
- the apparatus includes: a determination module 601, a conversion module 602, a first generation module 603, and a second generation module 604; wherein ,
- the determining module 601 is used to determine the global feature vector, the local feature map, and the semantic feature map of the image to be processed based on the pre-trained first neural network.
- the feature value of any first feature point in the semantic feature map includes all State the semantic coordinates of the first feature point in the semantic space;
- the conversion module 602 is configured to convert the second feature point corresponding to the first feature point in the local feature map to the semantic space based on the feature value of the first feature point in the semantic feature map to form semantics Feature points, the semantic feature points constitute a semantic local feature map;
- the first generating module 603 is configured to generate a location map corresponding to the image to be processed based on the semantic local feature map, the global feature vector, and the pre-trained second neural network; the location map includes all The semantic coordinates and three-dimensional position coordinates of each first position point of the target object in the image to be processed;
- the second generating module 604 is configured to generate a three-dimensional model corresponding to the target object based on the location map.
- the semantic coordinates of the first feature point include coordinate values of the first feature point in at least one coordinate direction in the semantic space;
- the feature value of the first feature point further includes the probability that the semantics of the first feature point is the target object.
- the determining module 601 is used to determine the global feature vector, the local feature map, and the semantic feature map of the image to be processed based on the pre-trained first neural network:
- the determining module 601 is configured to: when determining the global feature vector and the local feature map based on the intermediate feature map:
- the conversion module 602 converts the second feature point corresponding to the first feature point in the local feature map based on the feature value of the first feature point in the semantic feature map Transform into the semantic space to form semantic feature points.
- semantic feature points form a semantic local feature map, they are used to:
- the object semantic map includes a plurality of three-dimensional preset objects A second location point and semantic coordinates of the plurality of second location points; the second location point includes the first location point;
- the feature value of the target location point in the object semantic map is updated to the feature value of the target location point at a corresponding position in the local feature map to obtain the semantic local feature map.
- the first generating module 603 generates a position corresponding to the image to be processed based on the semantic local feature map, the global feature vector, and a pre-trained second neural network. When drawing, it is used for:
- the fusion feature map is input into the second neural network to obtain a position map corresponding to the image to be processed.
- the first generating module 603 is configured to: when generating a global feature map based on the global feature vector:
- the global feature vector is copied multiple times, and the multiple global feature vectors after the copy are spliced.
- the spliced feature vectors constitute the global feature map.
- the size of the global feature map is the same as the size of the local feature map. The dimensions are the same.
- the second generating module 604 is configured to: when generating a three-dimensional model corresponding to the target object based on the position map:
- a three-dimensional model corresponding to the target object is generated.
- the second generating module 604 samples the first location point in the location map based on the semantic coordinates of the first location point in the location map to obtain each sampling point For:
- the first location points whose corresponding semantic coordinates are the same as the reference semantic coordinates are screened out from the location map, and the screened first location points are used as the sampling points.
- the second generating module 604 is configured to: when generating a three-dimensional model corresponding to the target object based on the three-dimensional position coordinates corresponding to each sampling point:
- the three-dimensional model including each three-dimensional grid is generated.
- a schematic structural diagram of a computer device 700 provided by an embodiment of the present disclosure includes a processor 701, a memory 702, and a bus 703.
- the memory 702 is used to store execution instructions, including the memory 7021 and the external memory 7022; the memory 7021 here is also called internal memory, which is used to temporarily store the calculation data in the processor 701 and the data exchanged with the external memory 7022 such as the hard disk.
- the processor 701 exchanges data with the external memory 7022 through the memory 7021.
- the processor 701 communicates with the memory 702 through the bus 703, so that the processor 701 executes the following instructions:
- the pre-trained first neural network determine the global feature vector, local feature map, and semantic feature map of the image to be processed.
- the feature value of any first feature point in the semantic feature map includes the first feature point in the Semantic coordinates in the semantic space;
- the second feature point corresponding to the first feature point in the local feature map is transformed into the semantic space to form a semantic feature point.
- Feature points constitute a semantic local feature map
- the location map includes the target object in the image to be processed The semantic coordinates and three-dimensional position coordinates of each first position point of;
- a three-dimensional model corresponding to the target object is generated.
- the embodiments of the present disclosure also provide a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and the computer program executes the steps of the three-dimensional model generation method described in the above method embodiment when the computer program is run by a processor.
- the storage medium may be a volatile or non-volatile computer readable storage medium.
- the computer program product of the three-dimensional model generation method provided by the embodiments of the present disclosure includes a computer-readable storage medium storing program code, and the instructions included in the program code can be used to execute the three-dimensional model generation method described in the above method embodiment
- the instructions included in the program code can be used to execute the three-dimensional model generation method described in the above method embodiment
- the above-mentioned computer program product can be specifically implemented by hardware, software, or a combination thereof.
- the computer program product is specifically embodied as a computer storage medium.
- the computer program product is specifically embodied as a software product, such as a software development kit (SDK), etc. Wait.
- SDK software development kit
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional units in the various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a non-volatile computer readable storage medium executable by a processor.
- the technical solution of the present disclosure essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Graphics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Geometry (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Social Psychology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Psychiatry (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
本公开提供了一种三维模型生成方法、装置、计算机设备及存储介质,包括:基于预先训练好的第一神经网络,确定待处理图像的全局特征向量、局部特征图以及语义特征图,语义特征图中任一第一特征点的特征值包括第一特征点在语义空间中的语义坐标;基于语义特征图中的第一特征点的特征值,将局部特征图中与第一特征点对应的第二特征点转换到语义空间中,形成语义特征点,语义特征点构成语义局部特征图;基于语义局部特征图、全局特征向量、以及预先训练好的第二神经网络,生成待处理图像对应的位置图;位置图中包括待处理图像中的目标对象的各个位置点的语义坐标和三维位置坐标;基于位置图,生成目标对象对应的三维模型。
Description
本公开要求在2020年05月18日提交中国专利局、申请号为202010418882.9、申请名称为“三维模型生成方法、装置、计算机设备及存储介质”的中国专利的优先权,其全部内容通过引用结合在本公开中。
本公开涉及计算机技术领域,尤其涉及到一种三维模型生成方法、装置、计算机设备及存储介质。
随着计算机视觉以及人机交互领域的发展,人体三维模型发挥了越来越重要的作用,通过生成人体三维模型,可以实现人体动作识别、人机交互等。
相关技术中,在生成三维人体模型时,一般依赖于现有的三维模板人体模型,通过预测神经网络模型中的模型参数,在预设的三维模板人体模型的基础上生成出目标对象的三维人体模型。然而由于预设的三维模板人体模型的表达能力有限,在生成目标对象的三维人体模型时,精度有限。例如,若某个人的身材异于常人的身材,由于现有的三维模板人体模型的表达能力有限,则生成出的这个人的三维人体模型的精度会受到影响。
发明内容
本公开实施例至少提供一种三维模型生成方法、装置、计算机设备及存储介质。
第一方面,本公开实施例提供了一种三维模型生成方法,包括:
基于预先训练好的第一神经网络,确定待处理图像的全局特征向量、局部特征图以及语义特征图,所述语义特征图中任一第一特征点的特征值包括所述第一特征点在语义空间中的语义坐标;
基于所述语义特征图中的第一特征点的特征值,将所述局部特征图中与所述第一特征点对应的第二特征点转换到语义空间中,形成语义特征点,所述语义特征点构成语义局部特征图;
基于所述语义局部特征图、所述全局特征向量、以及预先训练好的第二神经网络,生成所述待处理图像对应的位置图;所述位置图中包括所述待处理图像中的目标对象的各个第一位置点的语义坐标和三维位置坐标;
基于所述位置图,生成所述目标对象对应的三维模型。
本公开所提供的方法,由于在生成三维模型时,是通过预测位置图的方式,位置图中包括目标对象的各个第一位置点的三维位置坐标,在预测出位置图之后,可以根据预测的位置图生成目标对象对应的三维模型,因此,本公开所提供的方法可以不受预设的三维模型的表达能力的约束,生成的三维模型的精度更高。
一种可能的实施方式中,所述第一特征点的语义坐标包括该第一特征点在所述语义空间中至少一个坐标方向上的坐标值;所述第一特征点的特征值还包括所述第一特征点的语义为所述目标对象的概率。
一种可能的实施方式中,所述基于预先训练好的第一神经网络,确定所述待处理图像的全局特征向量、局部特征图以及语义特征图,包括:对所述待处理图像进行下采样处理,得到中间特征图;基于所述中间特征图,确定所述全局特征向量和所述局部特征图;对所述局部特征图进行特征提取,得到所述语义特征图。
一种可能的实施方式中,基于所述中间特征图,确定所述全局特征向量和所述局部特征图,包括:对所述中间特征图进行池化处理和全连接处理,得到所述待处理图像对应的所述全局特征向量;以及,对所述中间特征图进行上采样处理,得到所述待处理图像对应的局部特征图。
一种可能的实施方式中,所述基于所述语义特征图中的第一特征点的特征值,将所述局部特征图中与该第一特征点对应的第二特征点转换到语 义空间中,形成语义特征点,所述语义特征点构成语义局部特征图,包括:基于所述语义特征图中第一特征点对应的语义坐标,确定所述第一特征点在预先生成的对象语义图中的目标位置点;所述对象语义图中包括三维预设对象的多个第二位置点以及所述多个第二位置点的语义坐标;所述第二位置点包括所述第一位置点;将所述对象语义图中所述目标位置点的特征值更新为所述目标位置点在所述局部特征图中对应位置处的特征值,得到所述语义局部特征图。
一种可能的实施方式中,所述基于所述语义局部特征图、所述全局特征向量、以及预先训练好的第二神经网络,生成所述待处理图像对应的位置图,包括:基于所述全局特征向量,生成全局特征图;将所述语义图像特征图、所述全局特征图、以及预先生成的参考对象在语义空间中的参考位置图进行融合,得到融合特征图;将所述融合特征图输入到所述第二神经网络中,得到所述待处理图像对应的位置图。
该实施方式中,在预测待处理图像对应的位置图时,是通过语义局部特征图和全局特征向量进行预测的,这里既结合了待处理图像的局部特征,又结合了待处理图像的全局特征,因此,在通过位置图,生成目标对象对应的三维模型时,三维模型在局部细节方面的精度更高。
一种可能的实施方式中,所述基于所述全局特征向量,生成全局特征图,包括:将所述全局特征向量进行多次复制,并将复制之后的多个全局特征向量进行拼接,拼接后的特征向量构成所述全局特征图,所述全局特征图的尺寸与所述局部特征图的尺寸相同。
一种可能的实施方式中,基于所述位置图,生成所述目标对象对应的三维模型,包括:基于所述位置图中第一位置点的语义坐标,对所述位置图中的第一位置点进行采样,得到各个采样点;基于各个采样点的三维位置坐标,生成所述目标对象对应的三维模型。
一种可能的实施方式中,所述基于所述位置图中第一位置点的语义坐 标,对所述位置图中的第一位置点进行采样,得到各个采样点,包括:根据预先设置的参考采样点的参考语义坐标,从所述位置图中筛选出对应的语义坐标与所述参考语义坐标相同的第一位置点,并将筛选出的第一位置点作为采样点。
一种可能的实施方式中,所述基于各个采样点对应的三维位置坐标,生成所述目标对象对应的三维模型,包括:将所述各个采样点对应的三维位置坐标,作为三维网格的顶点的三维位置坐标;基于各三维网格的顶点的三维位置坐标,生成包括各个三维网格的所述三维模型。
第二方面,本公开实施例还提供一种三维模型生成装置,包括:
确定模块,用于基于预先训练好的第一神经网络,确定待处理图像的全局特征向量、局部特征图以及语义特征图,所述语义特征图中任一第一特征点的特征值包括所述第一特征点在语义空间中的语义坐标;
转换模块,用于基于所述语义特征图中的第一特征点的特征值,将所述局部特征图中与所述第一特征点对应的第二特征点转换到语义空间中,形成语义特征点,所述语义特征点构成语义局部特征图;
第一生成模块,用于基于所述语义局部特征图、所述全局特征向量、以及预先训练好的第二神经网络,生成所述待处理图像对应的位置图;所述位置图中包括所述待处理图像中的目标对象的各个第一位置点的语义坐标和三维位置坐标;
第二生成模块,用于基于所述位置图,生成所述目标对象对应的三维模型。
一种可能的实施方式中,所述第一特征点的语义坐标包括该第一特征点在所述语义空间中至少一个坐标方向上的坐标值;
所述第一特征点的特征值还包括所述第一特征点的语义为所述目标对象的概率。
一种可能的实施方式中,所述确定模块,在基于预先训练好的第一神经网络,确定所述待处理图像的全局特征向量、局部特征图以及语义特征图时,用于:
对所述待处理图像进行下采样处理,得到中间特征图;
基于所述中间特征图,确定所述全局特征向量和所述局部特征图;
对所述局部特征图进行特征提取,得到所述语义特征图。
一种可能的实施方式中,所述确定模块,在基于所述中间特征图,确定所述全局特征向量和所述局部特征图时,用于:
对所述中间特征图进行池化处理和全连接处理,得到所述待处理图像对应的所述全局特征向量;以及,对所述中间特征图进行上采样处理,得到所述待处理图像对应的局部特征图。
一种可能的实施方式中,所述转换模块,在基于所述语义特征图中的第一特征点的特征值,将所述局部特征图中与该第一特征点对应的第二特征点转换到语义空间中,形成语义特征点,所述语义特征点构成语义局部特征图时,用于:
基于所述语义特征图中第一特征点对应的语义坐标,确定所述第一特征点在预先生成的对象语义图中的目标位置点;所述对象语义图中包括三维预设对象的多个第二位置点以及所述多个第二位置点的语义坐标;所述第二位置点包括所述第一位置点;
将所述对象语义图中所述目标位置点的特征值更新为所述目标位置点在所述局部特征图中对应位置处的特征值,得到所述语义局部特征图。
一种可能的实施方式中,所述第一生成模块,在基于所述语义局部特征图、所述全局特征向量、以及预先训练好的第二神经网络,生成所述待处理图像对应的位置图时,用于:
基于所述全局特征向量,生成全局特征图;
将所述语义图像特征图、所述全局特征图、以及预先生成的参考对象在语义空间中的参考位置图进行融合,得到融合特征图;
将所述融合特征图输入到所述第二神经网络中,得到所述待处理图像对应的位置图。
一种可能的实施方式中,所述第一生成模块,在基于所述全局特征向量,生成全局特征图时,用于:
将所述全局特征向量进行多次复制,并将复制之后的多个全局特征向量进行拼接,拼接后的特征向量构成所述全局特征图,所述全局特征图的尺寸与所述局部特征图的尺寸相同。
一种可能的实施方式中,所述第二生成模块,在基于所述位置图,生成所述目标对象对应的三维模型时,用于:
基于所述位置图中第一位置点的语义坐标,对所述位置图中的第一位置点进行采样,得到各个采样点;
基于各个采样点对应的三维位置坐标,生成所述目标对象对应的三维模型。
一种可能的实施方式中,所述第二生成模块,在基于所述位置图中第一位置点的语义坐标,对所述位置图中的第一位置点进行采样,得到各个采样点时,用于:
根据预先设置的参考采样点的参考语义坐标,从所述位置图中筛选出对应的语义坐标与所述参考语义坐标相同的第一位置点,并将筛选出的第一位置点作为采样点。
一种可能的实施方式中,所述第二生成模块,在基于各个采样点对应的三维位置坐标,生成所述目标对象对应的三维模型时,用于:
将所述各个采样点对应的三维位置坐标,作为三维网格的顶点的三维位置坐标;
基于各三维网格的顶点的三维位置坐标,生成包括各个三维网格的所述三维模型。
第三方面,本公开实施例还提供一种计算机设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当计算机设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。
第四方面,本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。
关于上述三维模型生成装置、计算机设备、及计算机可读存储介质的效果描述参见上述三维模型生成方法的说明,这里不再赘述。
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1示出了本公开实施例所提供的一种三维模型生成方法的流程图;
图2示出了本公开实施例所提供的一种位置图确定方法的流程图;
图3示出了本公开实施例所提供的一种人体三维模型生成过程的示意图;
图4示出了本公开实施例所提供的一种第一神经网络的初步训练的训练方法;
图5示出了本公开实施例所提供的一种神经网络训练方法;
图6示出了本公开实施例所提供的一种三维模型生成装置的架构示意图;
图7示出了本公开实施例所提供的一种计算机设备的结构示意图。
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
相关技术中,在生成三维人体模型时,一般依赖于预设的三维模板人体模型,通过预测神经网络模型中的模型参数,利用预测的模型参数调整三维模板人体模型,来生成目标对象的三维人体模型,然而这种方法生成出的三维人体模型的模型精度会受到现有的三维模板人体模型的影响。
本公开所提供的方法,由于在生成三维模型时,是通过预测位置图的方式,位置图中包括目标对象的各个第一位置点的三维位置坐标,在预测出位置图之后,可以根据预测的位置图生成目标对象对应的三维模型,因此,本公开所提供的方法可以不受预设的三维模型的表达能力的约束,生成的三维模型的精度更高。
另外,相关技术中,在预测神经网络模型参数时,一般是提取待处理图像的全局特征,然后基于提取的全局特征进行神经网络模型参数的预测, 这种方法忽略了目标对象的局部特征,因此,生成出的三维模型在局部细节方面的表达能力较差。
本公开所提供的方法中,在预测待处理图像对应的位置图时,是通过结合语义局部特征图和全局特征向量进行预测的,这里既结合了待处理图像的局部特征,又结合了待处理图像的全局特征,因此,在通过位置图,生成目标对象对应的三维模型时,三维模型在局部细节方面的精度更高。
针对以上方案所存在的缺陷,均是发明人在经过实践并仔细研究后得出的结果,因此,上述问题的发现过程以及下文中本公开针对上述问题所提出的解决方案,都应该是发明人在本公开过程中对本公开做出的贡献。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
为便于对本实施例进行理解,首先对本公开实施例所公开的一种三维模型生成方法进行详细介绍,本公开实施例所提供的三维模型生成方法的执行主体一般为具有一定计算能力的计算机设备,该计算机设备例如包括:终端设备或服务器或其它处理设备,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、个人数字处理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备等。
参见图1所示,为本公开实施例提供的一种三维模型生成方法的流程图,该方法包括以下几个步骤:
步骤101、基于预先训练好的第一神经网络,确定待处理图像的全局特征向量、局部特征图以及语义特征图,所述语义特征图中任一第一特征点的特征值包括所述第一特征点在语义空间中的语义坐标。
步骤102、基于所述语义特征图中的第一特征点的特征值,将所述局部特征图中与所述第一特征点对应的第二特征点转换到语义空间中,形成语义特征点,所述语义特征点构成语义局部特征图。
步骤103、基于所述语义局部特征图、所述全局特征向量、以及预先训练好的第二神经网络,生成所述待处理图像对应的位置图;所述位置图中包括所述待处理图像中的目标对象的各个第一位置点的语义坐标和三维位置坐标。
步骤104、基于所述位置图,生成所述目标对象对应的三维模型。
以下是对上述步骤101~步骤104的详细说明。
针对步骤101:
所述待处理图像可以是包括目标对象的RGB图像,所述目标对象例如为人、动物、静态物品(如桌子、杯子)等。在一种可能的实施方式中,可以从数据库中获取预先存储的待处理图像,也可以接收其他用户端发送的待处理图像,对此本公开并不限制。
所述语义空间是语义坐标与真实位置点一一映射的空间,在语义空间中,每个语义坐标对应有真实的位置点,例如,语义坐标(u
1,v
1)可以表示左手食指,语义坐标(u
2,v
2)可以表示左手中指,则将任何一特征图转化至语义空间中之后,该特征图中的左手食指的语义坐标均为(u
1,v
1),左手食指的语义坐标均为(u
2,v
2)。
语义特征图中包括多个第一特征点,第一特征点的特征值可以是该第一特征点在不同通道上的取值。示例性的,语义特征图可以为三通道的特征图,语义特征图中每个第一特征点在不同通道上的取值可以分别用于表示该第一特征点属于目标对象的概率,以及该第一特征点在语义空间中不同坐标方向上的坐标值。
示例性的,语义特征图中第一特征点属于所述目标对象的概率用于区分语义特征图中,属于目标对象的部分特征点和属于除目标对象外的背景部分的特征点,当所述概率大于或等于预设概率值时,则确定该第一特征点属于目标对象,当所述概率小于预设概率值时,则确定该第一特征点属于背景部分。
在区分出语义特征图中属于目标对象的第一特征点和属于背景部分的第一特征点之后,在将局部特征图转换到语义空间中时,可以根据属于目标对象的第一特征点,将局部特征图中对应的第二特征点转换到语义空间中,这样可以避免背景部分对于生成目标三维模型的影响。
在一种可能的实施方式中,所述概率的取值大于或等于预设概率值的第一特征点的语义坐标在预设坐标值范围内,所述概率的取值小于预设概率值的第一特征点的语义坐标可以相同,且不在预设坐标值范围内,例如可以均为0。
在另一种可能的实施方式中,语义特征图中的第一特征点的特征值可以仅包括第一特征点在语义空间中的语义坐标,不包括第一特征点属于目标对象的概率,不属于目标对象的第一特征点的语义坐标可以相同,例如可以均为0,在区分语义特征图中属于目标对象的第一特征点和属于背景部分的第一特征点时,可以通过第一特征点对应的语义坐标进行区分。
具体实施中,在基于训练好的第一神经网络,确定待处理图像的全局特征向量、局部特征图以及语义特征图时,可以先对待处理图像进行下采样处理,得到中间特征图,然后基于中间特征图确定全局特征向量和局部特征图,再对局部特征图进行特征提取,得到语义特征图。
其中,在基于中间特征图确定全局特征向量和局部特征图时,可以对中间特征图进行池化处理和全连接处理(即先后输入池化层和全连接层),得到待处理图像对应的全局特征向量;以及,对中间特征图进行上采样处理,得到待处理图像对应的局部特征图。在对局部特征图进行特征提取时,可以是将局部特征图输入至卷积层中,输出得到语义特征图。
中间特征图进行池化处理和全连接处理之后,中间特征图的维度降低,其空间分辨率(即尺寸)为1*1,因此,中间特征图在经过池化处理和全连接处理后得到的为一个N维的全局特征向量;中间特征图在经过上采样处理后,其空间分辨率与待处理图像相同,因此,中间特征图在经过上采样 处理后得到的为多通道的局部特征图,其通道数可以为N。
第一神经网络的训练过程,将在下面内容中展开介绍,对此暂不展开说明。
针对步骤102:
具体实施中,可以预先生成对象语义图,对象语义图中包括三维预设对象的多个第二位置点,以及多个第二位置点的语义坐标。这里,对象语义图中所包括的多个第二位置点可以是三维预设对象各个位置的位置点,所述各个位置包括覆盖三维预设对象的肢体表面的位置点;而步骤101中所述的语义特征图中涉及的目标对象的第一位置点,可以理解为待处理图像中目标对象可见部分的位置点,所述第二位置点包括所述第一位置点,即,对象语义图中包括的部分第二位置点可以与语义特征图中包括的第一位置点重合。
在生成对象语义图时,具有空间关联关系的第二位置点,对应的语义坐标也具有关联关系,例如,若三维预设对象中的两个第二位置点的真实位置相邻,则在生成的对象语义图中,这两个第二位置点对应的语义点也相邻。
相关技术中,在生成对象语义图时,是将三维预设对象的不同部位分别进行生成,例如,若要生成人体的对象语义图,则分别将人体头部、躯干、左胳膊、右胳膊、左腿、右腿分别作为一个整体进行生成,位于同一个整体上的位置点在生成的对象语义图中是关联的,但是位于不同整体上的位置点,在生成的对象语义图中并不具有关联关系。
而本公开所采用的方案,是将三维预设对象作为一个整体,生成对象语义图,三维预设对象上的各个第二位置点的语义坐标之间都具有关联关系,这样所生成的对象语义图结合了各个第二位置点的空间位置关系,因此,在生成三维模型时,各个位置点的空间相对位置关系更加精确,进而可以提高生成的三维模型的精度。
局部特征图中包括多个第二特征点,语义特征图中的第一特征点和局部特征图中的第二特征点一一对应,每一个第一特征点,在局部特征图的对应位置处都有第二特征点与该第一特征点对应。
在基于语义特征图中的第一特征点的特征值,将局部特征图中的第二特征点转换到语义空间中时,可以基于语义特征图中每个第一特征点对应的语义坐标,确定该第一特征点在预先生成的对象语义图中的目标位置点,然后将对象语义图中目标位置点的特征值更新为该目标位置点在局部特征图中对应位置图的特征值。
示例性的,语义特征图中的第一特征点P
1与局部特征图中的第二特征点P
2对应,在将第二特征点P
2转换到语义空间中时,可以先确定第一特征点P
1在预先生成的对象语义图中的目标位置点M,然后将对象语义图中点M的特征值更新为第二特征点P
2的特征值。
在将局部特征图中各个第二特征点转换到语义空间中之后,得到各个第二特征点对应的语义特征点,各个语义特征点构成语义局部特征图。
针对步骤103:
在基于语义局部特征图、全局特征向量、以及预先训练好的第二神经网络,生成待处理图像对应的位置图时,可以参照图2所示的方法,包括以下几个步骤:
步骤201、基于所述全局特征向量,生成全局特征图。
示例性的,可以将全局特征向量进行多次复制,并将复制之后的多个全局特征向量进行拼接,拼接后的全局特征向量构成全局特征图,且全局特征图的尺寸与局部特征图的尺寸相同。
示例性的,若局部特征图的尺寸为64*64,全局特征向量为一个1*1的128维的向量,则可以将全局特征向量复制64*64次,并对复制之后的全局特征向量按照局部特征图的尺寸进行拼接,拼接后得到一个64*64*128的特征向量,该特征向量为全局特征图。
步骤202、将所述语义图像特征图、所述全局特征图、以及预先生成的参考对象在语义空间中的参考位置图进行融合,得到融合特征图。
所述参考位置图为三通道的位置图,参考位置图中的位置点在通道上的取值分别表示该位置点的三维位置坐标,参考位置图中的每个位置点都对应有两类坐标,一类是在语义空间中的语义坐标,一类是该位置点对应的三维位置坐标。
在将语义图像特征图、全局特征图、以及预先生成的参考对象在语义空间中的参考位置图进行融合时,示例性的,可以先从语义图像特征图、全局特征图、以及参考位置图任意选择两个特征图进行串联,然后将串联后的特征图与另外一个特征图进行串联。比如,可以先将语义图像特征图和全局特征图进行串联,得到第一特征图,然后将第一特征图与参考位置图进行串联,得到融合特征图。
这里,语义图像特征图、全局特征图、以及参考位置图的尺寸相同,在进行串联时,可以将同一位置的特征点对应的通道值进行拼接,例如,若在位置N处,语义图像特征图中位置N处对应特征点A,特征点A对应的通道维度为x,全局特征图中位置N处对应特征点B,特征点B对应的通道维度为y,参考位置图中位置N处对应特征点C,特征点C对应的通道维度为z,则在将语义图像特征图、全局特征图、以及参考位置图进行串联之后,融合特征图位置N处的特征点对应的通道维数为x+y+z,同时,由于同一特征图中所有特征点对应的通道数应该是相同的,融合特征图的通道数也为x+y+z。
上述执行过程中,通过将语义局部特征图和全局特征图进行融合,使得在生成三维模型的过程中同时结合局部特征和全局特征,提高生成的三维模型在局部细节方面的精度;再和参考位置图进行融合,可以将参考位置图作为先验信息,避免生成的位置图误差太大。
步骤203、将所述融合特征图输入到所述第二神经网络中,得到所述待 处理图像对应的位置图。
在将融合特征图输入到第二神经网络中之后,第二神经网络可以先对融合特征图进行下采样处理,然后再进行上采样处理,输出得到待处理图像对应的位置图。
示例性的,待处理图像对应的位置图也为三通道的图像,各个通道的取值分别表示在真实世界坐标系中不同坐标方向上的取值,每个位置点对应的真实世界坐标系中不同坐标方向上的取值作为该位置点的三维位置坐标。
第二神经网络的训练过程将在下方介绍,在此暂不展开说明。
针对步骤104:
具体实施中,在基于位置图生成目标对象的三维模型时,可以先基于位置图中第一位置点的语义坐标,对位置图中的第一位置点进行采样,得到各个采样点,然后基于各个采样点对应的三维坐标信息,生成目标对象对应的三维模型。
其中,在基于位置图中第一位置点的语义坐标,对位置图中的第一位置点进行采样时,可以根据预先设置的参考采样点的参考语义坐标,从位置图中筛选出对应的语义坐标与参考语义坐标相同的第一位置点,并将筛选出的第一位置点作为采样点。
示例性的,可以预先设置人的左手通过300个采样点表示,300个采样点作为参考采样点,每个参考采样点均对应有参考语义坐标,然后从位置图分别筛选出对应的语义坐标与各个参考采样点的参考语义坐标相同的第一位置点,然后将筛选出的第一位置点作为采样点。
在基于各个采样点对应的三维位置坐标,生成目标对象对应的三维模型时,可以将各个采样点对应的三维位置坐标,作为三维网格的顶点的三维位置坐标,生成包括各个三维网格的三维模型。
具体实施中,在确定各个三维网络顶点的三维位置坐标之后,可以通过渲染的方式,生成出包括各个三维网格的三维模型。在生成出三维模型之后,可以通过用户端进行三维模型的展示。
在一种可能的应用场景中,通过上述方法可以生成出人体对应的三维模型,然后可以根据生成出的三维模型,进行人体动作识别,通过人体动作识别结果,可以实现用户与机器之间的交互;在另外一种可能的应用场景中,可以实时获取用户对应的待处理图像,并实时生成出用户对应的三维模型,然后进行三维模型的展示,进而增加与用户之间的互动,提高趣味性。
下面,将以生成人体对应的三维模型为例,对上述三维模型的生成过程展开介绍,参见图3所示,为本公开实施例提供的一种人体三维模型生成过程的示意图,首先,将包括人体的RGB图像输入至第一神经网络中,第一神经网络先对RGB图像进行下采样处理,得到中间特征图,然后对中间特征图进行上采样处理,得到局部特征图,以及将中间特征图输出池化层之后,再输入全连接层,得到全局特征向量,再对局部特征图进行特征提取,得到语义特征图。
然后,基于语义特征图,将局部特征图转换至语义空间中,得到语义局部特征图;以及基于全局特征向量,生成全局特征图,再将全局特征图、语义局部特征图、以及基于参考人体生成的参考位置图进行串联,将串联后的特征图输入至第二神经网络中,输出得到预测的RGB图像对应的位置图,再基于预测的位置图,生成三维人体模型。
本公开所提供的方法,由于在生成三维模型时,是通过预测位置图的方式,位置图中包括目标对象的各个第一位置点的三维位置坐标,在预测出位置图之后,可以根据预测的位置图生成目标对象对应的三维模型,因此,本公开所提供的方法可以不受预设的三维模型的表达能力的约束,生成的三维模型的精度更高。
下面,将对第一神经网络和第二神经网络的训练过程展开介绍。
第一神经网络和第二神经网络在训练时,可以先对第一神经网络进行初步训练,然后基于进行初步训练后的第一神经网络,再对第一神经网络和第二神经网络进行联合训练。
参见图4所示,为本公开实施例提供的一种第一神经网络的初步训练的训练方法,包括以下几个步骤:
步骤401、获取样本图像,以及样本图像对应的参考语义特征图像。
步骤402、将样本图像输入至第一神经网络中,输出预测语义特征图像。
步骤403、基于预测语义特征图像和参考语义特征图像,确定本次训练过程中的第一损失值。
步骤404、判断第一损失值是否小于第一预设值。
若判断结果为是,则执行步骤405;
若判断结果为否,则调整本次训练过程中的模型参数,并返回执行步骤402。
步骤405、确定本次训练过程中所使用的第一神经网络为初步训练完成的第一神经网络。
在执行图4所示的初步训练过程之后,可以再将第一神经网络和第二神经网络进行联合训练,具体的,可以参照如图5所示的神经网络训练方法,包括以下几个步骤:
步骤501、获取样本图像,以及样本图像对应的参考语义特征图、样本图像对应的样本位置图。
步骤502、将样本图像输入至第一神经网络中,输出得到全局特征向量、局部特征图以及预测语义特征图。
步骤503、基于所述预测语义特征图中的第一特征点,将所述局部特征 图中与所述第一特征点对应的第二特征点转换到语义空间中,形成语义特征点,所述语义特征点构成语义局部特征图。
步骤504、基于所述语义局部特征图、所述全局特征向量、以及预先训练好的第二神经网络,生成所述待处理图像对应的预测位置图。
步骤505、基于预测语义特征图、参考语义特征图、预测位置图、以及参考位置图,确定本次训练过程中的第二损失值。
其中,在计算第二损失值时,可以基于预测语义特征图和参考语义特征图,确定第一预测损失,以及基于预测位置图以及参考位置图,确定第二预测损失,然后将第一预测损失和第二预测损失之和作为第二损失值。
在另外一种可能的实施方式中,还可以基于预测位置图生成人体三维模型,然后对人体三维模型按照样本图像的拍摄角度进行投影,得到投影图像,然后基于投影图像和样本图像,确定第三预测损失,再将第一预测损失、第二预测损失、第三预测损失进行加权求和,将求和结果作为第二损失值。
步骤506、判断第二损失值是否小于第二预设值。
若判断结果为是,则执行步骤507;
若判断结果为否,则调整本次训练过程中的模型参数,并返回执行步骤502。
步骤507、确定本次训练过程中所使用的第一神经网络为训练完成的第一神经网络,以及确定本次训练过程中所使用的第二神经网络为训练完成的第二神经网络。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
基于同一发明构思,本公开实施例中还提供了与三维模型生成方法对 应的三维模型生成装置,由于本公开实施例中的装置解决问题的原理与本公开实施例上述三维模型生成方法相似,因此装置的实施可以参见方法的实施,重复之处不再赘述。
参照图6所示,为本公开实施例提供的一种三维模型生成装置的架构示意图,所述装置包括:确定模块601、转换模块602、第一生成模块603、以及第二生成模块604;其中,
确定模块601,用于基于预先训练好的第一神经网络,确定待处理图像的全局特征向量、局部特征图以及语义特征图,所述语义特征图中任一第一特征点的特征值包括所述第一特征点在语义空间中的语义坐标;
转换模块602,用于基于所述语义特征图中的第一特征点的特征值,将所述局部特征图中与所述第一特征点对应的第二特征点转换到语义空间中,形成语义特征点,所述语义特征点构成语义局部特征图;
第一生成模块603,用于基于所述语义局部特征图、所述全局特征向量、以及预先训练好的第二神经网络,生成所述待处理图像对应的位置图;所述位置图中包括所述待处理图像中的目标对象的各个第一位置点的语义坐标和三维位置坐标;
第二生成模块604,用于基于所述位置图,生成所述目标对象对应的三维模型。
一种可能的实施方式中,所述第一特征点的语义坐标包括该第一特征点在所述语义空间中至少一个坐标方向上的坐标值;
所述第一特征点的特征值还包括所述第一特征点的语义为所述目标对象的概率。
一种可能的实施方式中,所述确定模块601,在基于预先训练好的第一神经网络,确定所述待处理图像的全局特征向量、局部特征图以及语义特征图时,用于:
对所述待处理图像进行下采样处理,得到中间特征图;
基于所述中间特征图,确定所述全局特征向量和所述局部特征图;
对所述局部特征图进行特征提取,得到所述语义特征图。
一种可能的实施方式中,所述确定模块601,在基于所述中间特征图,确定所述全局特征向量和所述局部特征图时,用于:
对所述中间特征图进行池化处理和全连接处理,得到所述待处理图像对应的所述全局特征向量;以及,对所述中间特征图进行上采样处理,得到所述待处理图像对应的局部特征图。
一种可能的实施方式中,所述转换模块602,在基于所述语义特征图中的第一特征点的特征值,将所述局部特征图中与该第一特征点对应的第二特征点转换到语义空间中,形成语义特征点,所述语义特征点构成语义局部特征图时,用于:
基于所述语义特征图中第一特征点对应的语义坐标,确定所述第一特征点在预先生成的对象语义图中的目标位置点;所述对象语义图中包括三维预设对象的多个第二位置点以及所述多个第二位置点的语义坐标;所述第二位置点包括所述第一位置点;
将所述对象语义图中所述目标位置点的特征值更新为该目标位置点在所述局部特征图中对应位置处的特征值,得到所述语义局部特征图。
一种可能的实施方式中,所述第一生成模块603,在基于所述语义局部特征图、所述全局特征向量、以及预先训练好的第二神经网络,生成所述待处理图像对应的位置图时,用于:
基于所述全局特征向量,生成全局特征图;
将所述语义图像特征图、所述全局特征图、以及预先生成的参考对象在语义空间中的参考位置图进行融合,得到融合特征图;
将所述融合特征图输入到所述第二神经网络中,得到所述待处理图像 对应的位置图。
一种可能的实施方式中,所述第一生成模块603,在基于所述全局特征向量,生成全局特征图时,用于:
将所述全局特征向量进行多次复制,并将复制之后的多个全局特征向量进行拼接,拼接后的特征向量构成所述全局特征图,所述全局特征图的尺寸与所述局部特征图的尺寸相同。
一种可能的实施方式中,所述第二生成模块604,在基于所述位置图,生成所述目标对象对应的三维模型时,用于:
基于所述位置图中第一位置点的语义坐标,对所述位置图中的第一位置点进行采样,得到各个采样点;
基于各个采样点对应的三维位置坐标,生成所述目标对象对应的三维模型。
一种可能的实施方式中,所述第二生成模块604,在基于所述位置图中第一位置点的语义坐标,对所述位置图中的第一位置点进行采样,得到各个采样点时,用于:
根据预先设置的参考采样点的参考语义坐标,从所述位置图中筛选出对应的语义坐标与所述参考语义坐标相同的第一位置点,并将筛选出的第一位置点作为采样点。
一种可能的实施方式中,所述第二生成模块604,在基于各个采样点对应的三维位置坐标,生成所述目标对象对应的三维模型时,用于:
将所述各个采样点对应的三维位置坐标,作为三维网格的顶点的三维位置坐标;
基于各三维网格的顶点的三维位置坐标,生成包括各个三维网格的所述三维模型。
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述 可以参照上述方法实施例中的相关说明,这里不再详述。
基于同一技术构思,本公开实施例还提供了一种计算机设备。参照图7所示,为本公开实施例提供的计算机设备700的结构示意图,包括处理器701、存储器702、和总线703。其中,存储器702用于存储执行指令,包括内存7021和外部存储器7022;这里的内存7021也称内存储器,用于暂时存放处理器701中的运算数据,以及与硬盘等外部存储器7022交换的数据,处理器701通过内存7021与外部存储器7022进行数据交换,当计算机设备700运行时,处理器701与存储器702之间通过总线703通信,使得处理器701在执行以下指令:
基于预先训练好的第一神经网络,确定待处理图像的全局特征向量、局部特征图以及语义特征图,所述语义特征图中任一第一特征点的特征值包括所述第一特征点在语义空间中的语义坐标;
基于所述语义特征图中的第一特征点的特征值,将所述局部特征图中与所述第一特征点对应的第二特征点转换到语义空间中,形成语义特征点,所述语义特征点构成语义局部特征图;
基于所述语义局部特征图、所述全局特征向量、以及预先训练好的第二神经网络,生成所述待处理图像对应的位置图;所述位置图中包括所述待处理图像中的目标对象的各个第一位置点的语义坐标和三维位置坐标;
基于所述位置图,生成所述目标对象对应的三维模型。
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的三维模型生成方法的步骤。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。
本公开实施例所提供的三维模型生成方法的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可用于执行上述方法实施例中所述的三维模型生成方法的步骤,具体可参见上述方 法实施例,在此不再赘述。
上述计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软 件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。
Claims (13)
- 一种三维模型生成方法,包括:基于预先训练好的第一神经网络,确定待处理图像的全局特征向量、局部特征图以及语义特征图,所述语义特征图中任一第一特征点的特征值包括所述第一特征点在语义空间中的语义坐标;基于所述语义特征图中的第一特征点的特征值,将所述局部特征图中与所述第一特征点对应的第二特征点转换到语义空间中,形成语义特征点,所述语义特征点构成语义局部特征图;基于所述语义局部特征图、所述全局特征向量、以及预先训练好的第二神经网络,生成所述待处理图像对应的位置图;所述位置图中包括所述待处理图像中的目标对象的各个第一位置点的语义坐标和三维位置坐标;基于所述位置图,生成所述目标对象对应的三维模型。
- 根据权利要求1所述的方法,所述第一特征点的语义坐标包括该第一特征点在所述语义空间中至少一个坐标方向上的坐标值;所述第一特征点的特征值还包括所述第一特征点的语义为所述目标对象的概率。
- 根据权利要求2所述的方法,所述基于预先训练好的第一神经网络,确定所述待处理图像的全局特征向量、局部特征图以及语义特征图,包括:对所述待处理图像进行下采样处理,得到中间特征图;基于所述中间特征图,确定所述全局特征向量和所述局部特征图;对所述局部特征图进行特征提取,得到所述语义特征图。
- 根据权利要求3所述的方法,所述基于所述中间特征图,确定所述全局特征向量和所述局部特征图,包括:对所述中间特征图进行池化处理和全连接处理,得到所述待处理图像对应的所述全局特征向量;以及,对所述中间特征图进行上采样处理,得到所述待处理图像对应的局部特征图。
- 根据权利要求1~4任一所述的方法,所述基于所述语义特征图中的第一特征点的特征值,将所述局部特征图中与该第一特征点对应的第二特征点转换到语义空间中,形成语义特征点,所述语义特征点构成语义局部特征图,包括:基于所述语义特征图中第一特征点对应的语义坐标,确定所述第一特征点在预先生成的对象语义图中的目标位置点;所述对象语义图中包括三维预设对象的多个第二位置点以及所述多个第二位置点的语义坐标;所述第二位置点包括所述第一位置点;将所述对象语义图中所述目标位置点的特征值更新为所述目标位置点在所述局部特征图中对应位置处的特征值,得到所述语义局部特征图。
- 根据权利要求1~5任一所述的方法,所述基于所述语义局部特征图、所述全局特征向量、以及预先训练好的第二神经网络,生成所述待处理图像对应的位置图,包括:基于所述全局特征向量,生成全局特征图;将所述语义图像特征图、所述全局特征图、以及预先生成的参考对象在语义空间中的参考位置图进行融合,得到融合特征图;将所述融合特征图输入到所述第二神经网络中,得到所述待处理图像对应的位置图。
- 根据权利要求6所述的方法,所述基于所述全局特征向量,生成全局特征图,包括:将所述全局特征向量进行多次复制,并将复制之后的多个全局特征向量进行拼接,拼接后的特征向量构成所述全局特征图,所述全局特征图的 尺寸与所述局部特征图的尺寸相同。
- 根据权利要求1~7任一所述的方法,所述基于所述位置图,生成所述目标对象对应的三维模型,包括:基于所述位置图中第一位置点的语义坐标,对所述位置图中的第一位置点进行采样,得到各个采样点;基于各个采样点的三维位置坐标,生成所述目标对象对应的三维模型。
- 根据权利要求8所述的方法,所述基于所述位置图中第一位置点的语义坐标,对所述位置图中的第一位置点进行采样,得到各个采样点,包括:根据预先设置的参考采样点的参考语义坐标,从所述位置图中筛选出对应的语义坐标与所述参考语义坐标相同的第一位置点,并将筛选出的第一位置点作为采样点。
- 根据权利要求8所述的方法,所述基于各个采样点对应的三维位置坐标,生成所述目标对象对应的三维模型,包括:将所述各个采样点对应的三维位置坐标,作为三维网格的顶点的三维位置坐标;基于各三维网格的顶点的三维位置坐标,生成包括各个三维网格的所述三维模型。
- 一种三维模型生成装置,包括:确定模块,用于基于预先训练好的第一神经网络,确定待处理图像的全局特征向量、局部特征图以及语义特征图,所述语义特征图中任一第一特征点的特征值包括所述第一特征点在语义空间中的语义坐标;转换模块,用于基于所述语义特征图中的第一特征点的特征值,将所述局部特征图中与所述第一特征点对应的第二特征点转换到语义空间中,形成语义特征点,所述语义特征点构成语义局部特征图;第一生成模块,用于基于所述语义局部特征图、所述全局特征向量、以及预先训练好的第二神经网络,生成所述待处理图像对应的位置图;所述位置图中包括所述待处理图像中的目标对象的各个位置点的语义坐标和三维位置坐标;第二生成模块,用于基于所述位置图,生成所述目标对象对应的三维模型。
- 一种计算机设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当计算机设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行如权利要求1至10任一所述的三维模型生成方法的步骤。
- 一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行如权利要求1至10任意一项所述的三维模型生成方法的步骤。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010418882.9 | 2020-05-18 | ||
CN202010418882.9A CN111598111B (zh) | 2020-05-18 | 2020-05-18 | 三维模型生成方法、装置、计算机设备及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021232941A1 true WO2021232941A1 (zh) | 2021-11-25 |
Family
ID=72182921
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/083268 WO2021232941A1 (zh) | 2020-05-18 | 2021-03-26 | 三维模型生成方法、装置、计算机设备及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111598111B (zh) |
WO (1) | WO2021232941A1 (zh) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114565815A (zh) * | 2022-02-25 | 2022-05-31 | 包头市迪迦科技有限公司 | 一种基于三维模型的视频智能融合方法及系统 |
CN115409819A (zh) * | 2022-09-05 | 2022-11-29 | 青岛埃米博创医疗科技有限公司 | 一种肝部图像重建方法以及重建系统 |
CN117473105A (zh) * | 2023-12-28 | 2024-01-30 | 浪潮电子信息产业股份有限公司 | 基于多模态预训练模型的三维内容生成方法及相关组件 |
CN118154713A (zh) * | 2024-03-18 | 2024-06-07 | 北京数原数字化城市研究中心 | 场景渲染方法、装置、电子设备、存储介质及程序产品 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111598111B (zh) * | 2020-05-18 | 2024-01-05 | 商汤集团有限公司 | 三维模型生成方法、装置、计算机设备及存储介质 |
CN112102477B (zh) * | 2020-09-15 | 2024-09-27 | 腾讯科技(深圳)有限公司 | 三维模型重建方法、装置、计算机设备和存储介质 |
CN113538639B (zh) * | 2021-07-02 | 2024-05-21 | 北京达佳互联信息技术有限公司 | 一种图像处理方法、装置、电子设备及存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140098094A1 (en) * | 2012-10-05 | 2014-04-10 | Ulrich Neumann | Three-dimensional point processing and model generation |
CN109978989A (zh) * | 2019-02-26 | 2019-07-05 | 腾讯科技(深圳)有限公司 | 三维人脸模型生成方法、装置、计算机设备及存储介质 |
CN110827342A (zh) * | 2019-10-21 | 2020-02-21 | 中国科学院自动化研究所 | 三维人体模型重建方法及存储设备、控制设备 |
CN111598111A (zh) * | 2020-05-18 | 2020-08-28 | 商汤集团有限公司 | 三维模型生成方法、装置、计算机设备及存储介质 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7289662B2 (en) * | 2002-12-07 | 2007-10-30 | Hrl Laboratories, Llc | Method and apparatus for apparatus for generating three-dimensional models from uncalibrated views |
CN104217454B (zh) * | 2014-08-21 | 2017-11-03 | 中国科学院计算技术研究所 | 一种视频驱动的人脸动画生成方法 |
CN110288695B (zh) * | 2019-06-13 | 2021-05-28 | 电子科技大学 | 基于深度学习的单帧图像三维模型表面重建方法 |
-
2020
- 2020-05-18 CN CN202010418882.9A patent/CN111598111B/zh active Active
-
2021
- 2021-03-26 WO PCT/CN2021/083268 patent/WO2021232941A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140098094A1 (en) * | 2012-10-05 | 2014-04-10 | Ulrich Neumann | Three-dimensional point processing and model generation |
CN109978989A (zh) * | 2019-02-26 | 2019-07-05 | 腾讯科技(深圳)有限公司 | 三维人脸模型生成方法、装置、计算机设备及存储介质 |
CN110827342A (zh) * | 2019-10-21 | 2020-02-21 | 中国科学院自动化研究所 | 三维人体模型重建方法及存储设备、控制设备 |
CN111598111A (zh) * | 2020-05-18 | 2020-08-28 | 商汤集团有限公司 | 三维模型生成方法、装置、计算机设备及存储介质 |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114565815A (zh) * | 2022-02-25 | 2022-05-31 | 包头市迪迦科技有限公司 | 一种基于三维模型的视频智能融合方法及系统 |
CN114565815B (zh) * | 2022-02-25 | 2023-11-03 | 包头市迪迦科技有限公司 | 一种基于三维模型的视频智能融合方法及系统 |
CN115409819A (zh) * | 2022-09-05 | 2022-11-29 | 青岛埃米博创医疗科技有限公司 | 一种肝部图像重建方法以及重建系统 |
CN115409819B (zh) * | 2022-09-05 | 2024-03-29 | 苏州埃米迈德医疗科技有限公司 | 一种肝部图像重建方法以及重建系统 |
CN117473105A (zh) * | 2023-12-28 | 2024-01-30 | 浪潮电子信息产业股份有限公司 | 基于多模态预训练模型的三维内容生成方法及相关组件 |
CN117473105B (zh) * | 2023-12-28 | 2024-04-05 | 浪潮电子信息产业股份有限公司 | 基于多模态预训练模型的三维内容生成方法及相关组件 |
CN118154713A (zh) * | 2024-03-18 | 2024-06-07 | 北京数原数字化城市研究中心 | 场景渲染方法、装置、电子设备、存储介质及程序产品 |
Also Published As
Publication number | Publication date |
---|---|
CN111598111A (zh) | 2020-08-28 |
CN111598111B (zh) | 2024-01-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021232941A1 (zh) | 三维模型生成方法、装置、计算机设备及存储介质 | |
JP7040278B2 (ja) | 顔認識のための画像処理装置の訓練方法及び訓練装置 | |
KR102663519B1 (ko) | 교차 도메인 이미지 변환 기법 | |
JP7475772B2 (ja) | 画像生成方法、画像生成装置、コンピュータ機器、及びコンピュータプログラム | |
KR102287407B1 (ko) | 이미지 생성을 위한 학습 장치 및 방법과 이미지 생성 장치 및 방법 | |
KR20240002898A (ko) | 3d 얼굴 재구성 모델 훈련 방법과 장치 및 3d 얼굴 형상 생성 방법과 장치 | |
CN109376698B (zh) | 人脸建模方法和装置、电子设备、存储介质、产品 | |
CN110619334B (zh) | 基于深度学习的人像分割方法、架构及相关装置 | |
WO2024114321A1 (zh) | 图像数据处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品 | |
WO2021098554A1 (zh) | 一种特征提取方法、装置、设备及存储介质 | |
CN115239888B (zh) | 用于重建三维人脸图像的方法、装置、电子设备和介质 | |
US20230237777A1 (en) | Information processing apparatus, learning apparatus, image recognition apparatus, information processing method, learning method, image recognition method, and non-transitory-computer-readable storage medium | |
CN115457197A (zh) | 基于素描草图的人脸三维重建模型训练方法、重建方法及装置 | |
CN114127785A (zh) | 点云补全方法、网络训练方法、装置、设备及存储介质 | |
CN117099136A (zh) | 用于对象检测的动态头 | |
KR20230071052A (ko) | 이미지 처리 방법 및 장치 | |
KR20200093975A (ko) | 기하학적 모멘트 매칭을 통한 구 위에서의 적대적 생성망을 이용하는 데이터 처리 장치 및 방법 | |
CN117372604A (zh) | 一种3d人脸模型生成方法、装置、设备及可读存储介质 | |
WO2022096944A1 (en) | Method and apparatus for point cloud completion, network training method and apparatus, device, and storage medium | |
US11783501B2 (en) | Method and apparatus for determining image depth information, electronic device, and media | |
EP4086853A2 (en) | Method and apparatus for generating object model, electronic device and storage medium | |
CN116843832A (zh) | 一种单视角三维物体重建方法、装置、设备及存储介质 | |
CN116977544A (zh) | 图像处理方法、装置、设备及存储介质 | |
EP3929866A2 (en) | Inpainting method and apparatus for human image, and electronic device | |
CN113223128B (zh) | 用于生成图像的方法和装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21808907 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21808907 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 08.05.2023) |