CN114386503A

CN114386503A - Method and apparatus for training a model

Info

Publication number: CN114386503A
Application number: CN202210024220.2A
Authority: CN
Inventors: 詹忆冰; 梁亚倩
Original assignee: Jingdong Technology Information Technology Co Ltd
Current assignee: Jingdong Technology Information Technology Co Ltd
Priority date: 2022-01-04
Filing date: 2022-01-04
Publication date: 2022-04-22

Abstract

The application discloses a method and a device for training a model, and relates to the technical field of computers. The method comprises the following steps: acquiring sample data and an initial classification model, and training the initial classification model through multiple rounds of iterative operations; the iterative operation comprises: obtaining the representation of the sample data, and determining the characteristics of the sample data based on the representation of the sample data; training the initial classification model based on comparison learning by adopting the characteristics of sample data to obtain a comparison loss function value output by the trained initial classification model, generating feedback information according to the comparison loss function value, and updating the initial classification model based on the feedback information before executing the next iteration operation; and in response to determining that the initial classification model reaches the preset convergence condition, determining the initial classification model in the last iteration as the target classification model. The model obtained by training based on the method is adopted to classify the three-dimensional model data, so that the accuracy of classifying the three-dimensional model data can be improved.

Description

Method and apparatus for training a model

Technical Field

The present disclosure relates to the field of computer technology, and in particular, to a method and apparatus for training a model.

Background

The three-dimensional model defines the shape of an object by combining spatial points, edges and surfaces, and the flexible geometric structure of the three-dimensional model can efficiently and accurately describe the shape characteristics of the object. The existing method for classifying objects based on three-dimensional models of the objects mainly comprises the following steps: after the three-dimensional model is converted into a two-dimensional image, classifying the two-dimensional image based on the identification of the two-dimensional image; acquiring point cloud data of the three-dimensional model, and classifying objects indicated by the three-dimensional model by using the neural network model based on vertex coordinates in the point cloud data as input of the neural network model; and taking the geometric characteristics of the three-dimensional model as the input of the neural network model so as to classify the object indicated by the three-dimensional model by using the neural network model.

However, the existing method for classifying the object based on the three-dimensional model has the problem of inaccurate classification.

Disclosure of Invention

The present disclosure provides a method, an apparatus, an electronic device, and a computer-readable storage medium for training a model.

According to a first aspect of the present disclosure, there is provided a method for training a model, comprising: acquiring sample data and an initial classification model, and training the initial classification model through multiple rounds of iterative operations; the iterative operation comprises: obtaining the representation of the sample data, and determining the characteristics of the sample data based on the representation of the sample data; training the initial classification model based on comparison learning by adopting the characteristics of sample data to obtain a comparison loss function value output by the trained initial classification model, generating feedback information according to the comparison loss function value, and updating the initial classification model based on the feedback information before executing the next iteration operation; and in response to determining that the initial classification model reaches the preset convergence condition, determining the initial classification model in the last iteration as the target classification model.

In some embodiments, the sample data comprises three-dimensional model data, and obtaining a representation of the sample data comprises: acquiring a preset network, wherein the preset network comprises a spatial feature descriptor and a structural feature descriptor; inputting sample data into a preset network, obtaining the spatial characteristics of the sample data output by the spatial characteristic descriptor, and obtaining the structural characteristics of the sample data output by the structural characteristic descriptor; inputting the spatial characteristics of the sample data into a preset convolution network, and obtaining a first output result output by each convolution layer in the preset convolution network; inputting the structural characteristics of the sample data into a preset convolution network, and obtaining a second output result output by each convolution layer in the preset convolution network; for each convolutional layer in a preset convolutional network, aggregating a first output result output by the convolutional layer and a second output result output by the convolutional layer, inputting the aggregated results into an attention network, and obtaining an output result which is output by the attention network and corresponds to the convolutional layer; and aggregating the output results corresponding to each convolution layer to obtain the representation of the sample data.

In some embodiments, the characteristics of the sample data are used for training the initial classification model based on comparison learning, and the training comprises the following steps: and training the initial classification model based on unsupervised comparative learning by adopting the characteristics of the sample data.

In some embodiments, the method for training a model further comprises: obtaining a label of sample data; the characteristics of sample data are adopted to carry out training based on contrast learning on the initial classification model, and the training comprises the following steps: and training the initial classification model based on supervised contrast learning by adopting the characteristics of the sample data and the label of the sample data.

In some embodiments, the iterative operations further comprise: training an initial classification model by adopting the representation of the sample data and the label of the sample data to obtain a cross entropy loss function value output by the trained initial classification model; generating feedback information according to the contrast loss function value, comprising: and generating feedback information according to the comparison loss function value and the cross entropy loss function value.

According to a second aspect of the present disclosure, there is provided a method for classifying a three-dimensional model, comprising: acquiring three-dimensional model data to be classified; and determining the category of the three-dimensional model data to be classified by adopting a target classification model, wherein the target classification model is obtained by training based on the method described in the first aspect.

According to a third aspect of the present disclosure, there is provided an apparatus for training a model, comprising: the acquisition unit is configured to acquire sample data and an initial classification model, and train the initial classification model by performing multiple rounds of iterative operations through the training unit; the training unit comprises: an acquisition module configured to acquire a characterization of the sample data and determine a characteristic of the sample data based on the characterization of the sample data; the first updating module is configured to perform comparison learning-based training on the initial classification model by adopting the characteristics of sample data to obtain a comparison loss function value output by the trained initial classification model, generate feedback information according to the comparison loss function value, and update the initial classification model based on the feedback information before executing the next iteration operation; a determination module configured to determine the initial classification model in the last iteration as the target classification model in response to determining that the initial classification model reaches a preset convergence condition.

In some embodiments, the sample data comprises three-dimensional model data, the obtaining module comprising: the network acquisition module is configured to acquire a preset network, wherein the preset network comprises a spatial feature descriptor and a structural feature descriptor; the characteristic extraction module is configured to input sample data into a preset network, obtain the spatial characteristics of the sample data output by the spatial characteristic descriptor and obtain the structural characteristics of the sample data output by the structural characteristic descriptor; the first convolution module is configured to input the spatial features of the sample data into a preset convolution network and obtain a first output result output by each convolution layer in the preset convolution network; the second convolution module is configured to input the structural characteristics of the sample data into the preset convolution network and obtain a second output result output by each convolution layer in the preset convolution network; the first aggregation module is configured to aggregate a first output result output by the convolutional layer and a second output result output by the convolutional layer for each convolutional layer in a preset convolutional network, input the aggregated result into the attention network, and obtain an output result output by the attention network and corresponding to the convolutional layer; and the second aggregation module is configured to aggregate the output results corresponding to each convolution layer to obtain the representation of the sample data.

In some embodiments, a first update module comprises: and the first updating submodule is configured to train the initial classification model based on unsupervised contrast learning by adopting the characteristics of the sample data.

In some embodiments, the means for training the model further comprises: the tag acquisition module is configured to acquire a tag of sample data; a first update module comprising: and the second updating submodule is configured to train the initial classification model based on supervised contrast learning by adopting the characteristics of the sample data and the label of the sample data.

In some embodiments, the training unit further comprises: the second updating module is configured to train the initial classification model by adopting the representation of the sample data and the label of the sample data to obtain a cross entropy loss function value output by the trained initial classification model; generating feedback information according to the contrast loss function value, comprising: and generating feedback information according to the comparison loss function value and the cross entropy loss function value.

According to a fourth aspect of the present disclosure, there is provided an apparatus for classifying a three-dimensional model, comprising: a three-dimensional model data acquisition unit configured to acquire three-dimensional model data to be classified; a classification unit configured to determine a class of the three-dimensional model data to be classified using a target classification model, wherein the target classification model is trained based on the apparatus as described in the third aspect.

According to a fifth aspect of the present disclosure, an embodiment of the present disclosure provides an electronic device, including: one or more processors: a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement the method for training a model as provided in the first aspect or to implement the method for classifying a three-dimensional model as provided in the second aspect.

According to a sixth aspect of the present disclosure, embodiments of the present disclosure provide a computer readable storage medium having stored thereon a computer program, wherein the program, when executed by a processor, implements a method for training a model as provided in the first aspect or implements a method for classifying a three-dimensional model as provided in the second aspect.

The method and the device for training the model provided by the disclosure comprise the following steps: acquiring sample data and an initial classification model, and training the initial classification model through multiple rounds of iterative operations; the iterative operation comprises: obtaining the representation of the sample data, and determining the characteristics of the sample data based on the representation of the sample data; training the initial classification model based on comparison learning by adopting the characteristics of sample data to obtain a comparison loss function value output by the trained initial classification model, generating feedback information according to the comparison loss function value, and updating the initial classification model based on the feedback information before executing the next iteration operation; and in response to the fact that the initial classification model is determined to reach the preset convergence condition, determining the initial classification model in the last iteration as a target classification model, and classifying the three-dimensional model data by adopting the target classification model obtained by training based on the contrast training method, so that the accuracy of classifying the three-dimensional model data can be improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is an exemplary system architecture diagram in which embodiments of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for training a model according to the present application;

FIG. 3 is a flow diagram of another embodiment of a method for training a model according to the present application;

FIG. 4 is a flow diagram of obtaining a characterization of sample data in one application scenario of a method for training a model according to the present application;

FIG. 5(a) is a flow chart of a simplified step of a method for training a model according to the present application for obtaining a characterization of sample data in a scene;

FIG. 5(b) is a flow chart of a brief step of obtaining contrast loss function values in a usage scenario of a method for training a model according to the present application;

FIG. 5(c) is a flow chart of a simplified step of obtaining cross-entropy loss function values in a usage scenario of a method for training a model according to the present application;

FIG. 6 is a flow diagram of one embodiment of a method for classifying a three-dimensional model according to the present application;

FIG. 7 is a schematic block diagram of one embodiment of an apparatus for training models according to the present application;

FIG. 8 is a schematic diagram illustrating one embodiment of an apparatus for classifying three-dimensional models according to the present application;

FIG. 9 is a block diagram of an electronic device for implementing a method for training a model according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the present method for training a model or apparatus for training a model may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may be user terminal devices on which various client applications may be installed, such as image-like applications, video-like applications, shopping-like applications, chat-like applications, search-like applications, financial-like applications, etc.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting receiving server messages, including but not limited to smartphones, tablets, e-book readers, electronic players, laptop portable computers, desktop computers, and the like.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, and 103 are hardware, various electronic devices may be used, and when the

terminal devices

101, 102, and 103 are software, the electronic devices may be installed in the above-listed electronic devices. It may be implemented as multiple pieces of software or software modules (e.g., multiple software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may obtain sample data and an initial classification model, and train the initial classification model through multiple rounds of iterative operations; the iterative operation comprises: obtaining the representation of the sample data, and determining the characteristics of the sample data based on the representation of the sample data; training the initial classification model based on comparison learning by adopting the characteristics of sample data to obtain a comparison loss function value output by the trained initial classification model, generating feedback information according to the comparison loss function value, and updating the initial classification model based on the feedback information before executing the next iteration operation; and if the initial classification model is determined to reach the preset convergence condition, determining the initial classification model in the last iteration operation as the target classification model.

It should be noted that the method for training the model provided by the embodiment of the present disclosure may be performed by the server 105, and accordingly, the apparatus for training the model may be disposed in the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for training a model according to the present disclosure is shown, comprising the steps of:

step 201, obtaining sample data and an initial classification model, and training the initial classification model through multiple rounds of iterative operations.

In this embodiment, an executing agent (e.g., the server 105 shown in fig. 1) of the method for training the model may acquire sample data and an initial classification model to perform multiple rounds of iterative training on the initial classification model using the sample data. The sample data may include the acquired data and data obtained by performing data enhancement operation on the acquired data.

Step 202, the iterative operation includes:

step 2021, obtain the characterization of the sample data, and determine the characteristics of the sample data based on the characterization of the sample data.

In this embodiment, the characterization of the sample data may be obtained based on the sample data, and further, the features of the sample data may be extracted based on the characterization of the sample data. The characterization of the sample data may be features of the sample data extracted based on the convolutional network, and the features of the sample data may be features calculated based on a self-attention mechanism on the features extracted from the convolutional network based on the self-attention network.

Step 2022, using the characteristics of the sample data to perform the training based on the comparison learning on the initial classification model to obtain the comparison loss function value output by the trained initial classification model, and generating feedback information according to the comparison loss function value, wherein the initial classification model is updated based on the feedback information before executing the next iteration operation.

In this embodiment, the characteristics of the sample data may be used to perform training based on the comparison learning on the initial classification model to obtain a comparison loss function value output by the trained initial classification model, generate feedback information according to the comparison loss function value, and update the initial classification model with the feedback information before executing the next iteration operation, so as to implement the iterative training on the initial classification model.

And step 203, in response to determining that the initial classification model reaches the preset convergence condition, determining the initial classification model in the last iteration as the target classification model.

In this embodiment, if it is determined that the initial classification model reaches a preset convergence condition, if the comparison loss function value is smaller than a preset threshold or the iteration number reaches a number threshold, the initial classification model in the last iteration operation is determined as the target classification model.

According to the method for training the model, sample data and an initial classification model are obtained, and the initial classification model is trained through multiple rounds of iteration operation; the iterative operation comprises: obtaining the representation of the sample data, and determining the characteristics of the sample data based on the representation of the sample data; training the initial classification model based on comparison learning by adopting the characteristics of sample data to obtain a comparison loss function value output by the trained initial classification model, generating feedback information according to the comparison loss function value, and updating the initial classification model based on the feedback information before executing the next iteration operation; and in response to the fact that the initial classification model is determined to reach the preset convergence condition, determining the initial classification model in the last iteration as a target classification model, and classifying the three-dimensional model data by adopting the target classification model obtained by training based on the contrast training method, so that the accuracy of classifying the three-dimensional model data can be improved.

Optionally, the sample data includes three-dimensional model data, and obtaining a representation of the sample data includes: acquiring a preset network, wherein the preset network comprises a spatial feature descriptor and a structural feature descriptor; inputting sample data into a preset network, obtaining the spatial characteristics of the sample data output by the spatial characteristic descriptor, and obtaining the structural characteristics of the sample data output by the structural characteristic descriptor; inputting the spatial characteristics of the sample data into a preset convolution network, and obtaining a first output result output by each convolution layer in the preset convolution network; inputting the structural characteristics of the sample data into a preset convolution network, and obtaining a second output result output by each convolution layer in the preset convolution network; for each convolutional layer in a preset convolutional network, aggregating a first output result output by the convolutional layer and a second output result output by the convolutional layer, inputting the aggregated results into an attention network, and obtaining an output result which is output by the attention network and corresponds to the convolutional layer; and aggregating the output results corresponding to each convolution layer to obtain the representation of the sample data.

In this embodiment, the sample data includes three-dimensional model data, a preset network may be obtained, the preset network includes a spatial feature descriptor and a structural feature descriptor, and after the sample data is input into the preset network, the spatial feature included in the sample data (i.e., the sample three-dimensional model data) output by the spatial feature descriptor and the structural feature of the sample data output by the structural feature descriptor are obtained. And inputting the spatial characteristics of the sample data into a preset convolution network to obtain a first output result output by each convolution layer in the preset convolution network. And inputting the structural characteristics of the sample data into a preset convolution network to obtain a second output result output by each convolution layer in the preset convolution network.

Then, for each convolutional layer in the preset convolutional network, aggregating the first output result output by the convolutional layer and the second output result output by the convolutional layer, and inputting the aggregated result into the attention network to obtain the output result output by the attention network for the convolutional layer. It can be understood that the convolutional network includes a plurality of convolutional layers, and in this case, a plurality of output results can be obtained, where each output result is an aggregation result of the spatial features and the structural features output by the corresponding convolutional layer.

And finally, aggregating the obtained multiple output results, and determining the aggregated result as the representation of the sample data.

In the embodiment, the global relationship among the characteristics of the three-dimensional model data can be obtained by processing the sample data based on the convolution network and the self-attention network, and the accuracy of the trained model for classifying the three-dimensional model data can be improved by training the classification model by adopting the processed sample data. In addition, the three-dimensional mesh model has a complex geometric structure, and the convolutional neural network cannot be directly applied to the processing of the three-dimensional mesh model. In order to overcome the problem, the existing method designs a large number of complex operations to realize convolution and pooling operations, so that the neural network is more likely to fall into overfitting and cannot cope with attacks on the shape and structure of the three-dimensional grid model, and the robustness of the network is poor.

Optionally, the training based on the comparative learning is performed on the initial classification model by using the features of the sample data, and includes: and training the initial classification model based on unsupervised comparative learning by adopting the characteristics of the sample data.

In this embodiment, the feature of the sample data may be adopted to perform unsupervised contrast learning-based training on the initial classification model.

Optionally, the method for training a model further comprises: obtaining a label of sample data; the characteristics of sample data are adopted to carry out training based on contrast learning on the initial classification model, and the training comprises the following steps: and training the initial classification model based on supervised contrast learning by adopting the characteristics of the sample data and the label of the sample data.

In this embodiment, the label of the sample data may be obtained in advance, and the initial classification model may be trained based on supervised contrast learning by using the characteristics of the sample data and the label of the sample data.

With continued reference to FIG. 3, a flow 300 of another embodiment of a method for training a model according to the present disclosure is shown, comprising the steps of:

step 301, obtaining sample data and an initial classification model, and training the initial classification model through multiple rounds of iterative operations.

Step 302, the iterative operation includes:

step 3021, obtaining a characterization of the sample data, and determining features of the sample data based on the characterization of the sample data.

And step 3022, training the initial classification model based on comparison learning by using the characteristics of the sample data to obtain a comparison loss function value output by the trained initial classification model.

In this embodiment, the characteristics of the sample data may be used to perform training based on contrast learning on the initial classification model, and obtain a contrast loss function value output by the trained initial classification model.

And step 3023, training the initial classification model by using the sample data representation and the sample data label to obtain a cross entropy loss function value output by the trained initial classification model.

In this embodiment, the initial classification model may be trained by using the sample data characterization and the sample data label, and a cross entropy loss function value output by the trained initial classification model may be obtained.

And step 3024, generating feedback information according to the comparison loss function value and the cross entropy loss function value, and updating the initial classification model based on the feedback information before executing the next iteration operation.

In this embodiment, the two losses, i.e., the contrast loss function value and the cross entropy loss function value, obtained after the current round of training may be subjected to weighting operation, and feedback information is generated based on the result of the weighting operation, and before the next iteration operation is performed, the parameters of the initial classification model are updated by using the feedback information.

Step 303, in response to determining that the initial classification model reaches the preset convergence condition, determining the initial classification model in the last iteration as the target classification model.

In this embodiment, the descriptions of step 301, step 3021, and step 303 are the same as the descriptions of step 201, step 2021, and step 203, and are not repeated here.

Compared with the method described in the embodiment of fig. 2, the method for training the model provided in this embodiment adds the steps of training the initial model and obtaining the cross entropy loss function output by the model by using the representation of the sample data and the label of the sample data, and generates the feedback information by using the comparison loss function value obtained by the comparison learning training and the weighting result of the cross entropy loss function value, so as to complete the model training, thereby improving the accuracy of the trained target classification model in classifying the data.

In some application scenarios, the method for training a model comprises the steps of:

the method comprises the following steps of firstly, obtaining sample data, namely sample data which are a plurality of sample three-dimensional model data, and enhancing the sample three-dimensional model data after data enhancing operation. Random enhancement operation can be performed on each sample three-dimensional model data twice, so that the characteristics of the sample three-dimensional model data are changed by utilizing random transformation, the machine learning process is more difficult, and a regular term is added to the neural network.

The data enhancement operation performed on the sample three-dimensional model data may include performing pixel perturbation, translation, scaling, and rotation on vertex coordinates of the three-dimensional model data, and performing model deformation and edge flipping operations on the sample three-dimensional model data. The model deformation operation refers to the fact that a new position is appointed to each vertex in the three-dimensional model through the self-contained deformation technology (FFD), and therefore deformation of the whole three-dimensional model is achieved; the edge turning operation means that the connection relation between points in the three-dimensional model is changed, so that the two points which are originally connected to form the edge are not connected, and the two points which are originally not connected are connected to form a new edge. It is understood that global feature transformation of the three-dimensional model can be achieved through model morphing operations as well as edge flipping operations.

Second, as shown in fig. 4, a preset network (e.g., a MeshNet network) is used to obtain a representation of the sample data, where the preset network may include a spatial feature descriptor and a structural feature descriptor. Inputting the sample data into a spatial feature descriptor to obtain spatial features of the sample data, inputting the spatial features of the sample data into a preset convolution network, and obtaining a first output result output by each layer of convolution layer in the preset convolution network.

Inputting the sample data into the structural feature descriptor to obtain the structural features of the sample data, inputting the structural features of the sample data into a preset convolution network, and obtaining a second output result output by each layer of convolution layer in the preset convolution network.

And for each convolutional layer in a preset convolutional network, aggregating a first output result output by the convolutional network and a second output result output by the convolutional layer based on a full connection layer, inputting the aggregated results into the attention network, and obtaining an output result which is output by the attention network and corresponds to the convolutional layer.

Since the convolutional network is preset to have a plurality of convolutional layers, at this time, an output result corresponding to each convolutional layer in the plurality of convolutional layers is obtained, and after the output results of each convolutional layer are aggregated based on the fully-connected layers, the aggregated result is used as a representation of sample data.

The process of obtaining a characterization of sample data may be understood as an encoding process and the device/unit employed to perform this encoding process of obtaining a characterization of sample data may be referred to as an encoder/encoder network.

The schematic flow chart of the first step and the second step can be seen in fig. 5 (a).

Thirdly, mapping the input of the sample data to a network (z)_x＝g(h_x) To obtain the characteristics of the sample data output by the mapping network, wherein the mapping network may include a multi-layer perceptron and a hidden layer.

z_x＝g(h_x)＝W⁽²⁾ReLU(W⁽¹⁾h_x)

Wherein z is_xRepresenting features of the sample data, h_xRepresenting a representation of sample data, ReLU () representing a predetermined function in machine learning, W⁽¹⁾、W⁽²⁾Representing a learning parameter.

And fourthly, training the initial classification model based on comparison learning by adopting the characteristics of the sample data, and obtaining a comparison loss function.

In unsupervised contrast learning, the sample data does not have a class label, and the positive sample is from data enhancement results before and after data enhancement on the anchor sample. The negative samples are all samples except the positive samples and the anchor samples, and the contrast loss function of the unsupervised contrast learning can adopt the following formula:

wherein i represents an identity of the anchor sample; j represents the identity of a positive sample; k represents the identity of a negative sample; τ represents the annealing coefficient.

In the supervised contrast learning, since the sample data has a class label, and the class of each sample data is known, at this time, the contrast loss function of the supervised contrast learning may adopt the following formula:

wherein i represents an identity of the anchor sample; p (i) represents a set of positive samples in the anchor samples, p represents an identification of the positive samples; a (i) represents a set of negative examples, a represents an identification of a negative example. Because the loss function of supervised contrast learning enables the encoder to provide closer expression for samples from the same class, a more robust expression spatial clustering effect can be obtained.

In addition, in the classification task, the loss function adopted in the comparison learning is calculated by using the feature mapping of the whole model, and in order to bring more transformation to the overall features of the three-dimensional grid model, random data enhancement can be performed twice on all sample data, and two enhanced samples can be obtained. For the anchor sample i, the positive sample thereof comprises an enhanced sample subjected to data enhancement and a sample which is the same as the anchor sample; the negative sample of the anchor sample i consists of a sample that is not in the same class as the negative sample. In the segmentation task, the function used in the contrast learning is calculated using the feature map of a single patch. Processing the features of all patches requires very high computational overhead, and therefore a sampling strategy is used to pick the features of some of the patches to reduce the number of features. Since most of the faces with wrong class prediction are from the boundary position in the segmentation result, the boundary sample can be selected to force the network to pay more attention to the boundary sample when selecting the anchor sample based on the boundary-aware sampling method. When selecting positive and negative samples, the following two methods can be used:

the first method is based on the comparison of boundary samples with difficult samples. The difficult sample refers to a sample with a wrong prediction, and specifically, for a negative sample, when the similarity between the negative sample and the anchor sample is close to 1, the negative sample is considered as the difficult sample; for a positive sample, when its similarity to the anchor sample is close to 0, it is considered to be a difficult sample. Constructing a dynamic storage space to store the difficult samples, calculating the similarity (comprising k positive samples and k negative samples) of the anchor samples and adjacent samples thereof in a first training iteration, and storing the difficult samples in the adjacent samples into the dynamic storage space; in subsequent training iterations, the similarity between the anchor sample and the adjacent sample and the similarity between the anchor sample and the difficult sample need to be calculated, when a certain adjacent sample n is more difficult than the difficult sample h in the dynamic space, n is stored in the dynamic space to replace h, and therefore the sample in the dynamic space is always the most difficult.

The second method is based on the comparison of boundary samples with boundary samples. The comparison is performed between boundary samples at the same position (for example, samples at the boundary of the head and the body in the human body three-dimensional model data), so that the positive samples are close to each other and the negative samples are far from each other.

The schematic flow chart of the third step and the fourth step can be seen in fig. 5 (b).

And fifthly, calculating an overall loss function for training the network, and training the initial classification model by adopting the overall loss function to obtain a target classification model.

First, a cross entropy loss function for training the network is obtained:

wherein h is_iRepresenting the normalized output of the network (i.e., model prediction labels), y_iA real tag representing sample data, i represents the identity of the tag.

Secondly, the overall loss function is the weighted sum of the contrast loss function obtained in the fourth step and the cross entropy loss function obtained in the present step:

L_all＝L_ce+α·L_cl

wherein L is_ceRepresents the contrast loss function and alpha represents the weight of the contrast loss function. In the actual training process, due to L_ceAnd L_clThe initial value difference between the two is large, and the descending trends of the two are different, and a proper weight is set, so that the difference between the two can be balanced, and the two can play an effective role in training the model. The weights may be set based on the impact of both on the optimizer: first, L is calculated separately_ceAnd L_clFor variable z_iAnd setting the gradient values of the two to be the same order of magnitude; then, L is adjusted by adjusting the proportional parameter n_ceAnd L_clThe relationship between:

and when the training times reach the preset iteration times or the loss function is smaller than the preset threshold value, finishing the training to obtain the target classification model.

A brief flow chart for calculating the cross entropy loss function in step 5 can be seen in FIG. 5(c)

With continued reference to FIG. 6, a flow 600 of one embodiment of a method for classifying a three-dimensional model according to the present disclosure is shown, comprising the steps of:

step 601, obtaining three-dimensional model data to be classified.

In the present embodiment, an executing subject (e.g., the server 105 shown in fig. 1) of the method for training a model may acquire three-dimensional model data to be classified. The three-dimensional model data refers to model data having a three-dimensional spatial characteristic and a three-dimensional structural characteristic.

Step 602, determining the category of the three-dimensional model data to be classified by using a target classification model, wherein the target classification model is obtained by training based on the method in the embodiment described in fig. 2 or fig. 3.

In this embodiment, the classification of the three-dimensional model data to be classified may be implemented by using a target classification model trained based on the method in the embodiment described in fig. 2 or fig. 3.

According to the method for classifying the three-dimensional model, the target classification model obtained by training based on the contrast training method is adopted to classify the three-dimensional model data, so that the accuracy of classifying the three-dimensional model data can be improved.

With further reference to fig. 7, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for training a model, which corresponds to the method embodiments shown in fig. 2 and 3, and which can be applied in various electronic devices.

As shown in fig. 7, the apparatus for training a model of the present embodiment includes: an acquisition unit 701, a training unit 702, and a determination unit 703. The acquisition unit is configured to acquire sample data and an initial classification model, and train the initial classification model by performing multiple rounds of iterative operations through the training unit; the training unit comprises: an acquisition module configured to acquire a characterization of the sample data and determine a characteristic of the sample data based on the characterization of the sample data; the first updating module is configured to perform comparison learning-based training on the initial classification model by adopting the characteristics of sample data to obtain a comparison loss function value output by the trained initial classification model, generate feedback information according to the comparison loss function value, and update the initial classification model based on the feedback information before executing the next iteration operation; a determination module configured to determine the initial classification model in the last iteration as the target classification model in response to determining that the initial classification model reaches a preset convergence condition.

The units in the apparatus 700 described above correspond to the steps in the method described with reference to fig. 2 and 3. Thus, the operations, features and technical effects that can be achieved by the methods for training a model described above are also applicable to the apparatus 700 and the units included therein, and are not described herein again.

With further reference to fig. 8, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for classifying a three-dimensional model, which corresponds to the method embodiment shown in fig. 6, and which is particularly applicable to various electronic devices.

As shown in fig. 8, the apparatus for training a model of the present embodiment includes: three-dimensional model data acquisition section 801 and classification section 802. Wherein the three-dimensional model data acquisition unit is configured to acquire three-dimensional model data to be classified; a classification unit configured to determine a class of the three-dimensional model data to be classified using a target classification model, wherein the target classification model is trained based on the apparatus as described in the third aspect.

The units in the apparatus 800 described above correspond to the steps in the method described with reference to fig. 6. Thus, the operations, features and technical effects that can be achieved described above for the method for classifying a three-dimensional model are also applicable to the apparatus 800 and the units included therein, and are not described herein again.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 9, a block diagram of an electronic device 900 for a method of training a model according to an embodiment of the application is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 9, the electronic apparatus includes: one or more processors 901, memory 902, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 9 illustrates an example of a processor 901.

Memory 902 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for training a model provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method for training a model provided herein.

The memory 902, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method for training a model in the embodiments of the present application (e.g., the obtaining unit 701, the training unit 702, and the determining unit 703 shown in fig. 7). The processor 901 executes various functional applications of the server and data processing by executing non-transitory software programs, instructions, and modules stored in the memory 902, that is, implements the method for training the model in the above method embodiments.

The memory 902 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device for extracting the video clip, and the like. Further, the memory 902 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 902 may optionally include memory located remotely from the processor 901, which may be connected to an electronic device for retrieving video clips over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device for the method of training a model may further comprise: an input device 903, an output device 904, and a bus 905. The processor 901, the memory 902, the input device 903, and the output device 904 may be connected by a bus 905 or in other ways, and are exemplified by the bus 905 in fig. 9.

The input device 903 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus for extracting the video clip, such as an input device of a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 904 may include a display device, auxiliary lighting devices (e.g., LEDs), tactile feedback devices (e.g., vibrating motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for training a model, comprising:

obtaining sample data and an initial classification model, and training the initial classification model through multiple rounds of iterative operation;

the iterative operation comprises:

obtaining a representation of the sample data, and determining features of the sample data based on the representation of the sample data;

training the initial classification model based on comparison learning by adopting the characteristics of the sample data to obtain a comparison loss function value output by the trained initial classification model, and generating feedback information according to the comparison loss function value, wherein the initial classification model is updated based on the feedback information before executing the next iteration operation;

and in response to determining that the initial classification model reaches a preset convergence condition, determining the initial classification model in the last iteration as a target classification model.

2. The method of claim 1, wherein the sample data comprises three-dimensional model data, said obtaining a representation of the sample data comprising:

acquiring a preset network, wherein the preset network comprises a spatial feature descriptor and a structural feature descriptor;

inputting the sample data into the preset network, obtaining the spatial characteristics of the sample data output by the spatial characteristic descriptor, and obtaining the structural characteristics of the sample data output by the structural characteristic descriptor;

inputting the spatial characteristics of the sample data into a preset convolution network, and obtaining a first output result output by each convolution layer in the preset convolution network;

inputting the structural characteristics of the sample data into a preset convolution network, and obtaining a second output result output by each convolution layer in the preset convolution network;

for each convolutional layer in the preset convolutional network, aggregating a first output result output by the convolutional layer and a second output result output by the convolutional layer, inputting the aggregated results into a self-attention network, and obtaining an output result which is output by the self-attention network and corresponds to the convolutional layer;

and aggregating the output results corresponding to each convolution layer to obtain the representation of the sample data.

3. The method of claim 1, wherein said training said initial classification model based on contrast learning using features of said sample data comprises:

and training the initial classification model based on unsupervised contrast learning by adopting the characteristics of the sample data.

4. The method of claim 1, wherein the method further comprises:

obtaining a label of the sample data;

training the initial classification model based on comparative learning by adopting the characteristics of the sample data, and comprises the following steps:

and training the initial classification model based on supervised contrast learning by adopting the characteristics of the sample data and the label of the sample data.

5. The method of claim 1, wherein the iterative operations further comprise:

training the initial classification model by adopting the representation of the sample data and the label of the sample data to obtain a cross entropy loss function value output by the trained initial classification model;

generating feedback information according to the contrast loss function value comprises:

and generating the feedback information according to the contrast loss function value and the cross entropy loss function value.

6. A method for classifying a three-dimensional model, comprising:

acquiring three-dimensional model data to be classified;

determining the category of the three-dimensional model data to be classified by adopting a target classification model, wherein the target classification model is obtained by training based on the method according to one of claims 1 to 5.

7. An apparatus for training a model, comprising:

the acquisition unit is configured to acquire sample data and an initial classification model, and train the initial classification model by performing multiple rounds of iterative operations through the training unit;

the training unit includes:

an acquisition module configured to acquire a characterization of the sample data and determine a characteristic of the sample data based on the characterization of the sample data;

a first updating module, configured to perform comparison learning-based training on the initial classification model by using the characteristics of the sample data to obtain a comparison loss function value output by the trained initial classification model, and generate feedback information according to the comparison loss function value, wherein the initial classification model is updated based on the feedback information before executing the next iteration operation;

a determination module configured to determine the initial classification model in the last iteration as the target classification model in response to determining that the initial classification model reaches a preset convergence condition.

8. The apparatus of claim 7, wherein the sample data comprises three-dimensional model data, the obtaining module comprising:

the network acquisition module is configured to acquire a preset network, wherein the preset network comprises a spatial feature descriptor and a structural feature descriptor;

the characteristic extraction module is configured to input the sample data into the preset network, obtain the spatial characteristics of the sample data output by the spatial characteristic descriptor, and obtain the structural characteristics of the sample data output by the structural characteristic descriptor;

the first convolution module is configured to input the spatial features of the sample data into a preset convolution network and obtain a first output result output by each convolution layer in the preset convolution network;

the second convolution module is configured to input the structural characteristics of the sample data into a preset convolution network and obtain a second output result output by each convolution layer in the preset convolution network;

a first aggregation module configured to aggregate, for each convolutional layer in the preset convolutional network, a first output result output by the convolutional layer and a second output result output by the convolutional layer, input the aggregated result into a self-attention network, and obtain an output result output by the self-attention network and corresponding to the convolutional layer;

and the second aggregation module is configured to aggregate the output results corresponding to each convolution layer to obtain the representation of the sample data.

9. The apparatus of claim 7, wherein the first update module comprises:

a first updating submodule configured to perform unsupervised contrast learning-based training on the initial classification model using features of the sample data.

10. The apparatus of claim 7, wherein the apparatus further comprises:

a tag obtaining module configured to obtain a tag of the sample data;

the first update module, comprising:

a second updating submodule configured to perform supervised contrast learning-based training on the initial classification model using the features of the sample data and the labels of the sample data.

11. The apparatus of claim 7, wherein the training unit further comprises:

the second updating module is configured to train the initial classification model by adopting the representation of the sample data and the label of the sample data to obtain a cross entropy loss function value output by the trained initial classification model;

12. An apparatus for classifying a three-dimensional model, comprising:

a three-dimensional model data acquisition unit configured to acquire three-dimensional model data to be classified;

a classification unit configured to determine a class of the three-dimensional model data to be classified using a target classification model, wherein the target classification model is trained based on the apparatus according to one of claims 7 to 10.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.