WO2023202620A1 - 模型训练及模态信息的预测方法及装置、电子设备、存储介质和计算机程序产品 - Google Patents

模型训练及模态信息的预测方法及装置、电子设备、存储介质和计算机程序产品 Download PDF

Info

Publication number
WO2023202620A1
WO2023202620A1 PCT/CN2023/089228 CN2023089228W WO2023202620A1 WO 2023202620 A1 WO2023202620 A1 WO 2023202620A1 CN 2023089228 W CN2023089228 W CN 2023089228W WO 2023202620 A1 WO2023202620 A1 WO 2023202620A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
prediction model
sample data
prediction
parameter value
Prior art date
Application number
PCT/CN2023/089228
Other languages
English (en)
French (fr)
Inventor
颜子轲
查红彬
刘浩敏
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023202620A1 publication Critical patent/WO2023202620A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to, but is not limited to, the technical field of deep learning, and in particular, to a method and device for model training and prediction of modal information, electronic equipment, storage media, and computer program products.
  • neural network models are used to predict scene properties of spatial points, thereby expanding the feature information of spatial points.
  • neural networks In the application process of neural networks, people's demand for neural networks may continue to increase. At this time, neural networks need to be able to learn new modal information to continuously meet people's needs. At present, when using a trained neural network model to learn information about a new modality, the phenomenon of catastrophic forgetting will occur. That is, after the neural network learns information about a new modality, it will forget about the information about the previously learned modality. The forecast accuracy dropped significantly.
  • the present disclosure at least provides a method and device for model training and prediction of modal information, electronic equipment, storage media and computer program products.
  • the first aspect of the present disclosure provides a model training method.
  • the method includes: obtaining a prediction model obtained through initial training, wherein the prediction model obtained through initial training is used to predict and obtain information of the original modality; using preset sample data to The prediction model is retrained, and the preset sample data includes the first sample data marked with the information of the target modality.
  • the retrained prediction model is also used to predict the information of the target modality; wherein, the preset sample data also includes The second sample data is annotated with the information of the original modality, and/or the retraining of the prediction model includes constraining the adjustment of at least some network parameters of the prediction model.
  • the prediction model will not forget what has been done in the initial training.
  • the learned original modal information also enables the prediction model to be used to predict the target modal information, thus realizing the expansion of the modal information that the prediction model can predict.
  • a second aspect of the present disclosure provides a method for predicting modal information.
  • the method includes: using the above first method
  • the prediction model is trained by a comprehensive method; the target data is obtained; the target data is predicted using the trained prediction model, and information about at least one modality of the target data is obtained.
  • the prediction model can predict both the information of the original modality and the information of the target modality.
  • a third aspect of the present disclosure provides a model training device.
  • the device includes: an acquisition part and a retraining part.
  • the acquisition part is configured to acquire a prediction model obtained through initial training, wherein the prediction model obtained through initial training is used to predict information of the original modality;
  • the retraining part is configured to retrain the prediction model using preset sample data, the preset sample data includes the first sample data labeled with the information of the target modality, and the retrained prediction model also Used to predict and obtain information of the target modality; wherein, the preset sample data also includes second sample data marked with information of the original modality, and/or, retraining the prediction model includes retraining at least part of the prediction model.
  • the adjustment of network parameters is constrained.
  • a fourth aspect of the present disclosure provides a modal information prediction device.
  • the device includes: a first acquisition part, a second acquisition part and a prediction part, wherein the first acquisition part is configured to train using the method described in the first aspect.
  • a fifth aspect of the present disclosure provides an electronic device, including a memory and a processor coupled to each other.
  • the processor is configured to execute program instructions stored in the memory to implement the model training method in the first aspect, or the third aspect.
  • a sixth aspect of the present disclosure provides a computer-readable storage medium on which program instructions are stored.
  • the program instructions are executed by a processor, the model training method in the first aspect is implemented, or the modality described in the second aspect is implemented.
  • Information forecasting methods are implemented.
  • a seventh aspect of the present disclosure provides a computer program product.
  • the computer program product includes a computer program or instructions. When the computer program or instructions are run on a computer, the computer is caused to perform the steps in the first aspect.
  • the prediction model can not forget the initial training.
  • the original modal information that has been learned enables the prediction model to be used to predict the target modal information, thereby realizing the expansion of the modal information that the prediction model can predict.
  • Figure 1 is a schematic flow chart of the first embodiment of the disclosed model training method
  • Figure 2 is a schematic flow chart of a second embodiment of the disclosed model training method
  • Figure 3 is a schematic flowchart of a third embodiment of the disclosed model training method
  • Figure 4 is a schematic flow chart of the fourth embodiment of the disclosed model training method
  • Figure 5 is a schematic flowchart of the fifth embodiment of the disclosed model training method
  • Figure 6 is a schematic flow chart of the re-training process in the model training method of the present disclosure
  • Figure 7 is a schematic flow chart of an embodiment of the method for predicting modal information of the present disclosure
  • Figure 8 is a schematic framework diagram of an embodiment of the training device of the disclosed model
  • Figure 9 is a schematic framework diagram of an embodiment of the modal information prediction device of the present disclosure.
  • Figure 10 is a schematic framework diagram of an embodiment of the electronic device of the present disclosure.
  • FIG. 11 is a schematic diagram of an embodiment of a computer-readable storage medium of the present disclosure.
  • a and/or B can mean: A exists alone, A and B exist simultaneously, and they exist alone. B these three situations.
  • the character "/" in this article generally indicates that the related objects are an "or” relationship.
  • "many” in this article means two or more than two.
  • the term "at least one” herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, and C, which can mean including from A, Any one or more elements selected from the set composed of B and C.
  • Figure 1 is a schematic flowchart of a first embodiment of a model training method of the present disclosure. Specifically, it can include the following steps:
  • Step S11 Obtain the prediction model obtained through initial training.
  • the prediction model obtained through initial training is used to predict the information of the original modality.
  • the original mode can include one mode or two or more modes.
  • a source or a presentation form of information can become a modality.
  • information in different modalities includes voice information, video information, text information, etc.
  • the information of different scene properties of the spatial point is also information of different modalities.
  • the initial training of the prediction model may be to train the prediction model with sample data containing information of the original modality, so that the prediction model can be used to predict the information of the original modality.
  • Step S12 Retrain the prediction model using preset sample data.
  • the preset sample data includes the first sample data marked with information about the target modality.
  • the retrained prediction model is also used to predict the target modality. Information.
  • the preset sample data also includes second sample data marked with information of the original modality, and/or, retraining the prediction model includes constraining the adjustment of at least some network parameters of the prediction model. .
  • a new output part can be added to the prediction model for outputting information of the target modality.
  • the above-mentioned first sample data and second sample data both include position data of spatial points.
  • the position data of the spatial point is, for example, the three-dimensional coordinates of the spatial point.
  • the spatial points are, for example, obtained through three-dimensional reconstruction or three-dimensional modeling. This disclosure does not limit the acquisition method of the spatial points.
  • the preset sample data when the preset sample data also includes second sample data marked with information of the original modality, the preset sample data is used to retrain the prediction model, that is, the first sample data and The second sample data is input into the prediction model, and the prediction model outputs a prediction result.
  • the prediction result includes the information of the original modality and the information of the target modality.
  • the three-dimensional spatial points can be annotated with the scene property information of the original modality and the scene property information of the target modality.
  • the spatial points are both the first sample data and the scene properties of the target modality.
  • the spatial points can then be input to the prediction model, and then the scene property information of the original modality of the spatial point and the scene property information of the target modality output by the prediction model after predicting the spatial point are obtained.
  • the prediction model can use the first sample data to learn the information of the target modality, and at the same time, the second sample data can be used to prevent the prediction model from forgetting. Information about the original modality that has been learned during initial training.
  • retraining the prediction model includes constraining the adjustment of at least some network parameters of the prediction model.
  • the restriction adjustment means that the network parameters of this part cannot be modified, or the network parameters of this part cannot be significantly changed. Adjustment, substantial adjustment, for example, it can be understood that the difference in network parameters before and after adjustment is less than a preset threshold, etc.
  • the impact on the existing network parameters can be reduced, so that the network parameters of the prediction model can still be consistent with the original
  • the modal information has accurate mapping, so that the prediction model will not forget the original modal information that has been learned in the initial training.
  • the prediction model will not forget what has been done in the initial training.
  • the learned information of the original modality also enables the prediction model to be used to predict the information of the target modality, thus realizing the expansion of the modal information that the prediction model can predict.
  • the prediction model before retraining, is used to predict scene property information about the original modality of the spatial point. After retraining, the prediction model is used to predict the original modality and the target mode about the spatial point. State-predicted scenario property information.
  • the scene property information of the original mode of the space point is, for example, color and geometric information
  • the scene property information of the target mode of the space point is, for example, object material information and surface reflection characteristics. Information and so on.
  • the prediction model is initially trained so that the prediction model can predict the original modal prediction scene property information. By retraining the prediction model, the prediction model can be used to predict and obtain predicted scene property information about the original mode and target mode of the spatial point.
  • the prediction model it can be used to predict the scene property information about the original mode of the spatial point before retraining, and it can be used to predict the predicted scene about the original mode and the target mode of the spatial point after retraining.
  • Property information is used to expand the modal information that the prediction model can predict.
  • the prediction model can be a network model built based on a neural network implicit scene representation (Implicit Neural Scene Representation), which is used to predict scene property information of spatial points.
  • a neural network implicit scene representation Implicit Neural Scene Representation
  • the spatial points are points on the surface of the object.
  • the object can be a real object in reality obtained through three-dimensional reconstruction, or a virtual object obtained through three-dimensional modeling.
  • the prediction model can be used to predict the original mode information and the target mode information of the points on the object surface.
  • the predicted scene property information includes at least one of color, brightness, surface geometry, semantics, and surface material.
  • the above predicted scene property information is related to the viewing angle, and the scene property information related to the viewing angle is, for example, color information, brightness information, and so on.
  • Both the first sample data and the second sample data also include perspective information of the spatial point, that is, the prediction model can use the perspective information of the first sample data and the second sample data to make predictions, thereby obtaining the predicted scene properties related to the perspective. information.
  • Viewing angle information is, for example, tilt angle ⁇ , north angle
  • the viewing angle information can be Perform feature transformations such as Fourier transforms so that perspective information can be utilized by prediction models. Therefore, by setting the predicted scene property information to be related to the viewing angle, the prediction model can predict and obtain the predicted scene property information related to the viewing angle.
  • the annotation information of the preset sample data is a reference image.
  • the reference image is used to represent the actual scene property information of the corresponding spatial point. That is, the image information of the projection point of the spatial point on the reference image can represent the actual scene property of the spatial point. Scene property information.
  • the step can be continued: generating a prediction image based on the prediction result, and the prediction image can represent the predicted scene property information of the spatial point.
  • the same viewing angle information as the reference image can be used for rendering, so as to generate the predicted image based on the prediction result, that is, the viewing angle information of the predicted image and the reference image is the same.
  • the difference between the predicted image and the reference image is the difference between the prediction result and the annotation information of the preset sample data, which can be specifically reflected as the difference in image information between the predicted image and the reference image, such as color difference, brightness Differences and more. Therefore, by determining the annotation information of the preset sample data as the reference image, and generating a predicted image based on the prediction result, the difference between the predicted image and the reference image is then used as the difference between the prediction result and the annotation information of the preset sample data. , so that the reference image can be used to train the prediction model.
  • the preset sample data includes first sample data and second sample data, and the first sample data is marked with The first annotation information of the target modality, and the second sample data are annotated with the second annotation information about the original modality.
  • the above-mentioned step "retraining the prediction model using preset sample data" specifically includes step S21 and step S22.
  • Step S21 Use the prediction model to predict the first sample data and the second sample data respectively, and obtain the first prediction result about the target modality and the second prediction result about the original modality.
  • the first sample data and the second sample data can be input into the prediction model respectively, so that the prediction model can be used to obtain the first prediction result about the target modality and the second prediction result about the original modality.
  • the prediction model corresponds to The first prediction for the target modality is the predicted brightness, and the second prediction for the original modality is the predicted color.
  • the first sample data and the second sample data are the same spatial point
  • the first annotation information of the target modality of the spatial point is surface material and geometric information
  • the second annotation information of the original modality is semantic information
  • the prediction model correspondingly obtains the first prediction result about the target modality as the surface material and geometric information of the spatial point
  • the second prediction result of the original modality is the predicted semantic information
  • the second sample data includes at least one of the following: original data collected and derived data generated using a generative model.
  • the original data collected are, for example, spatial points obtained through three-dimensional reconstruction based on the collected two-dimensional images.
  • the generative model is, for example, a neural network model specifically used to generate second sample data. For example, by inputting a certain spatial point into the generative model, the generative model can correspondingly generate corresponding second annotation information for the spatial point. Therefore, by collecting the original data and using the derived data generated by the generative model, the second sample data can be used to retrain the prediction model, thereby enabling the prediction model to expand the predicted modal information.
  • Step S22 Adjust the network parameters of the prediction model using the first difference between the first prediction result and the first annotation information and the second difference between the second prediction result and the second annotation information.
  • the first difference can be obtained by comparing the first prediction result with the first annotation information
  • the second difference can be obtained by comparing the second prediction result with the second annotation information, thereby determining the prediction.
  • the prediction accuracy of the model on the information of the target modality and the information of the original modality is used to adjust the network parameters of the prediction model, thereby retraining the prediction model.
  • the prediction model uses the prediction model to predict the first sample data and the second sample data respectively, and based on the first difference between the first prediction result and the first annotation information and the difference between the second prediction result and the second annotation information.
  • the second difference is used to adjust the network parameters of the prediction model, so that the prediction model can not only use the first sample data to learn the information of the target mode, but also use the second sample data so that the prediction model will not forget what has been learned in the initial training.
  • the original modal information is used to expand the modal information that the prediction model can predict.
  • retraining the prediction model includes constraining the adjustment of at least some network parameters of the prediction model.
  • the above-mentioned step "retraining the prediction model using preset sample data” specifically includes steps S31 to S33.
  • Step S31 Use the preset sample data to determine the prediction losses of the prediction models corresponding to different parameter value sets.
  • each parameter value set includes a set of candidate parameter values corresponding to each network parameter in the prediction model, and each network parameter value of the prediction model corresponding to the parameter value set is assigned to the corresponding candidate parameter value in the parameter value set.
  • the set of parameter values may be determined at different stages in retraining. For example, the first set of parameter values ⁇ 1 may be determined first, and then the second set of parameter values ⁇ 2 may be determined based on the training situation corresponding to the first set of parameter values ⁇ 1 , thereby implementing iterative training.
  • Step S32 The set of parameter values that will cause the target loss to meet the preset conditions is used as the set of target parameter values.
  • the target loss corresponding to the parameter value set is obtained by using the prediction loss and regularization loss corresponding to the parameter value set.
  • the prediction loss may be determined based on the difference between the prediction result of the prediction model and the annotation information.
  • the above-mentioned step of "using preset sample data to determine the prediction losses of prediction models corresponding to different parameter value sets" specifically includes step S311 and step S312 (not shown).
  • Step S311 For each parameter value set, use the prediction model corresponding to the parameter value set to predict the first sample data to obtain the first prediction result corresponding to the parameter value set.
  • Step S312 Use the first difference between the first prediction result corresponding to the parameter value set and the first annotation information of the first sample data to obtain the prediction loss corresponding to the parameter value set.
  • the prediction model can be used to predict the first sample data to obtain a first prediction result corresponding to the parameter value set.
  • the prediction accuracy of the prediction model can be determined, thereby measuring the accuracy of the parameter value set, and thereby being able to respond accordingly Determine the prediction loss corresponding to the set of parameter values.
  • the first prediction result can be obtained based on the first difference between the first prediction result and the first annotation information of the first sample data.
  • the prediction loss corresponding to the parameter value set measures the accuracy of the parameter value set.
  • the regularization loss corresponding to the parameter value set is obtained by combining the weight of each network parameter and the change of each network parameter.
  • the regularization loss corresponding to the parameter value set is obtained by weighting the change representation of each network parameter using the weight of each network parameter.
  • the change representation of network parameters is obtained based on the difference between the candidate parameter value and the reference parameter value corresponding to the network parameter in the parameter value set.
  • the difference between the candidate parameter value and the reference parameter value is, for example, the difference between the candidate parameter value and the reference parameter value. The difference, or the square of the difference, etc.
  • Reference parameter values can be set as needed.
  • the reference parameter value may be set to the parameter value after initial training.
  • the regularization loss can be determined to reflect the degree of deviation of the network parameters in the retraining stage from the parameter value after initial training.
  • the reference parameter values may be set to parameter values used during the retraining process.
  • the weight of the network parameters is related to the degree of influence of the network parameters on predicting the information of the original modality. For example, the greater the weight of a certain network parameter, the greater the influence of the network parameter on predicting the original mode.
  • the prediction model will cause the prediction model to have a greater impact on the accuracy of predicting the information state of the original mode. , such that the accuracy of the prediction model in predicting the information state of the original module is greatly reduced. Therefore, by calculating the regularization loss during retraining, the adjustment of at least some network parameters of the prediction model can be constrained. In some embodiments, it can be determined using EWC (elastic weight consolidation) related methods in the art.
  • EWC elastic weight consolidation
  • multiple parameter value sets can be generated, whereby multiple reference parameter value sets can be used to assign values to the prediction model, and corresponding to each reference parameter value set can be determined accordingly.
  • Prediction loss and regularization loss thereby being able to determine the target loss.
  • iterative training can be carried out to obtain a set of parameter values whose target loss meets the preset conditions, and the set of parameter values that meet the preset conditions is used as the set of target parameter values.
  • Step S33 Adjust each network parameter of the prediction model to the corresponding candidate parameter value in the target parameter value set.
  • the prediction model can be retrained.
  • the adjustment of at least some network parameters of the prediction model can be constrained during the retraining process, so that the prediction model will not forget what has been done in the initial training.
  • the learned information of the original modality also enables the prediction model to be used to predict the information of the target modality, thus realizing the expansion of the modal information that the prediction model can predict.
  • Figure 4 is a schematic flow chart of a fourth embodiment of the model training method of the present disclosure.
  • the model training method may also include steps S41 to S43.
  • Step S41 For each parameter value set, obtain at least one of the first sub-regular loss and the second sub-regular loss of the parameter value set.
  • the first sub-regular loss is obtained by using the weight of each network parameter to perform a weighted summation of the change representations of each network parameter corresponding to the parameter value set.
  • the following formula (1) can be used to calculate the first sub-regular loss L 1 corresponding to the parameter value set.
  • ⁇ 1 is a hyperparameter
  • ⁇ i is the candidate parameter value
  • I represents all parameters of the prediction model
  • i represents a specific parameter of the prediction model
  • b i represents the weight of network parameter i.
  • the second sub-regular loss is obtained by using the weight of each network parameter to perform a weighted sum of the change representation processing values of each network parameter corresponding to the parameter value set.
  • the change representation processing value of network parameters is obtained by using the change representation of network parameters and the change representation of initial training parameters.
  • the initial training parameter change representation of network parameters is obtained by using the difference between the reference parameter value of each network parameter and the initial parameter value of the network parameter. of.
  • the initial parameter values of the network parameters can be considered as the values determined after initialization of each network parameter of the prediction model during initial training.
  • the initial parameter values of the network parameters can also be considered as the values corresponding to each network parameter before the first initial training.
  • the following formula (2) can be used to calculate the second sub-regular loss L 2 corresponding to the parameter value set.
  • ⁇ 2 is a hyperparameter
  • ⁇ i is a representation of changes in network parameters
  • b i represents the weight of the network parameter i
  • the meaning of the other parameters is the same as formula (1).
  • Step S42 Use at least one of the first sub-regular loss and the second sub-regular loss of the parameter value set to obtain the regular loss of the parameter value set.
  • only one of the first sub-regular loss and the second sub-regular loss may be used as the regular loss of the parameter value set. In other embodiments, both the first sub-regular loss and the second sub-regular loss may be used to obtain the regular loss of the parameter value set.
  • the following formula (3) can be used to calculate the regularization loss L corresponding to the parameter value set.
  • L L 1 +L 2 (3)
  • L 1 is the first sub-regular loss
  • L 2 is the second sub-regular loss
  • the parameter weight of the above-mentioned network parameters there is a positive correlation between the parameter weight of the above-mentioned network parameters and the degree of influence of the network parameters on predicting the original mode.
  • the greater the influence of network parameter i on the prediction of the original mode information the greater b i . Therefore, by setting the parameter weight of the network parameters, there is a positive correlation with the degree of influence of the network parameters on the prediction of the original mode.
  • the network parameters with larger parameter weights can be adjusted to have a greater impact on the regularization loss. This can make During the retraining process, changes in network parameters with larger parameter weights will be minimized to reduce changes in prediction loss for retraining, and constraints on network parameters with larger parameter weights will be achieved.
  • Step S43 Use the regular loss corresponding to the parameter value set and the prediction loss corresponding to the parameter value set to obtain the target loss corresponding to the parameter value set.
  • the target loss corresponding to the parameter value set and the positive loss corresponding to the parameter value set can be set. Then there is a positive correlation between the loss and the predicted loss, so that the changes in the regular loss and the predicted loss can be directly reflected by the target loss.
  • the following formula (4) can be used to calculate the target loss corresponding to the parameter value set.
  • L( ⁇ i ) represents the prediction loss corresponding to the parameter value set ⁇ i
  • L′( ⁇ i ) represents the prediction loss corresponding to the parameter value set ⁇ i .
  • the regular loss corresponding to the parameter value set can be obtained.
  • the greater the weight of the network parameter the greater the corresponding regular loss. , which in turn will also make the prediction loss larger, so that during the retraining process, changes in network parameters with larger parameter weights will be minimized to reduce changes in the target loss to achieve retraining, so as to achieve larger parameter weights. network parameter constraints.
  • retraining the prediction model includes constraining the adjustment of at least some network parameters of the prediction model.
  • the above-mentioned "retraining the prediction model using preset sample data" specifically includes step S51 and step S52.
  • Step S51 Use the prediction model to predict the first sample data to obtain the first prediction result about the target modality.
  • Step S52 Adjust the target network parameters in the prediction model based on the first difference between the first prediction result and the first annotation information about the target modality of the first sample data.
  • the target network parameters include the first network parameters that are not adjusted during the initial training process, and the network parameters in the prediction model other than the target network parameters are not adjusted.
  • the first network parameters that are not adjusted during the initial training process can be considered as network parameters that have no impact on the information of the prediction model predicting the original modality.
  • the first network parameters can be newly added during the retraining stage.
  • the parameters of the network connection such as the network parameters of the newly added convolution kernel, can also be network parameters that already exist in the initial training stage but have not been adjusted in the initial training stage.
  • the above-mentioned target network parameters also include second network parameters that have been adjusted by the prediction model during the initial training process.
  • the second network parameters may be part of the ones that have been adjusted during the initial training process.
  • a part of the network parameters that have been adjusted during the initial training process can also be used as target network parameters, so that the prediction model can perform better without forgetting the information of the original modality that has been learned in the initial training. information of the predicted target modality.
  • the network parameters related to the information of predicting the original modality in the prediction model will not be adjusted, so that the prediction model will not forget the original information that has been learned in the initial training.
  • the modal information enables the prediction model to be used to predict the original modal information and the target modal information, thus realizing the expansion of the modal information that the prediction model can predict.
  • the model training method of the present disclosure before performing the above step of "retraining the prediction model using preset sample data", further includes: adding at least one network parameter to the prediction model as the first network parameter.
  • at least one network parameter may be added as the first network parameter by adding a new network connection.
  • the newly added network parameters can be specially used to predict the information of the target modality, and the original network parameters are used to predict the information of the original modality. Therefore, by adding at least one network parameter to the prediction model as the first network parameter, and adjusting the first network parameter during the retraining stage, the prediction model can predict both the original modal information and the Information about the target modality.
  • the input sample data includes position information 101 and perspective information 02 of spatial points.
  • the prediction model 103 includes a feature extraction layer 1031 and several modal information output parts, including a modal information output part 1032, a modal information output part 1033, a modal information output part 1034 and a modal information output part 1035.
  • the modal information output part 1032 and the modal information output part 1033 are configured to output scene property information of spatial points related to the viewing angle
  • the modal information output part 1034 and the modal information output part 1035 are configured to Output scene property information of spatial points independent of perspective. Therefore, the modal information output part 1034 and the modal information output part 1035 can be set to use the feature information extracted by the first five feature extraction layers 1031 to perform decoding to output the scene property information of the spatial point independent of the perspective, and the perspective information 102 is set to do
  • the input of the sixth feature extraction layer is used for the subsequent modal information output part 1032 and the modal information output part 1033 for outputting scene property information of spatial points related to the viewing angle.
  • the position information 101 of the spatial point can also be used as input again in an intermediate feature extraction layer 1031 to improve the prediction accuracy of the prediction model.
  • the position information 101 of the spatial point is both the first sample data and the second sample data, and the perspective information 102 belongs to the second sample data.
  • the modal information output part 1035 is configured to output the scene property information of the target modality, and the modal information output part 1032, the modal information output part 1033, and the modal information output part 1034 are configured to output the scene property information of the original modality. Subsequently, the network parameters of the prediction model can be adjusted based on the output results of each modal information output part to achieve retraining of the prediction model.
  • the position information 101 and the perspective information 102 of the spatial point are the first sample data
  • the modality information output part 1032 is configured to output the scene property information of the target modality.
  • other modal information output parts will not output prediction results.
  • the prediction loss and regularization loss can be determined accordingly based on the prediction result of the scene property information of the target modality output by the modality information output part 1032, and then the network parameters of the prediction model can be adjusted to realize retraining of the prediction model.
  • the position information 101 and the perspective information 102 of the spatial point are the first sample data
  • the modal information output part 1032 is configured to output scene property information of the target modality, and other modal information.
  • the output section does not output prediction results.
  • newly added network connections can also be added in the feature extraction layer 1031, such as adding convolution kernels, activation layers, etc., to use the newly added network connections to extract feature information of the target modal information.
  • network parameters involved in the newly added network connection can also be determined as target network parameters. In this way, the prediction loss can be determined accordingly based on the prediction result of the scene property information of the target modality output by the modality information output part 1032, and then the target network parameters can be adjusted to achieve retraining of the prediction model.
  • FIG. 7 is a schematic flowchart of an embodiment of a method for predicting modal information according to the present disclosure.
  • the modal information prediction method includes steps S61 to S63.
  • Step S61 Train the prediction model.
  • the above-mentioned model training method can be used to train the prediction model to implement training of the prediction model.
  • Step S62 Obtain target data.
  • the target data is, for example, position data of spatial points, pixels in an image, etc.
  • three-dimensional point cloud data can be used as target data, and the point cloud data can be obtained through three-dimensional reconstruction, three-dimensional modeling, and other methods.
  • Step S63 Use the trained prediction model to predict the target data and obtain information about at least one modality of the target data.
  • the information of at least one modality in this embodiment may include the information of the original modality or the information of the target modality in the process of retraining the prediction model.
  • the prediction model can be used to predict scene property information of at least one modality of a spatial point.
  • the scene property information is, for example, surface material.
  • the surface material information can be information of the original modality or information of the target modality.
  • the prediction model can predict both the original modal information and the target modal information.
  • the model training device 20 includes an acquisition part 21 and a retraining part 22.
  • the acquisition part 21 is configured to acquire a prediction model obtained through initial training, wherein the prediction model obtained through initial training is configured to predict information of the original modality;
  • the retraining part 22 is configured to retrain the prediction model using preset sample data, the preset sample data includes the first sample data labeled with information of the target modality, and the retrained prediction model is also configured to predict the obtained Information of the target modality; wherein, the preset sample data also includes second sample data marked with information of the original modality, and/or, retraining the prediction model includes adjusting at least some network parameters of the prediction model. Make constraints.
  • the above-mentioned preset sample data includes first sample data and second sample data.
  • the first sample data is marked with first annotation information about the target modality
  • the second sample data is annotated with the second annotation information about the original modality.
  • the above-mentioned retraining part 22 is configured to retrain the prediction model using preset sample data, It includes: using the prediction model to predict the first sample data and the second sample data respectively, and correspondingly obtaining the first prediction result about the target mode and the second prediction result about the original mode; using the first prediction result and the first prediction result.
  • the first difference between the annotation information and the second difference between the second prediction result and the second annotation information are used to adjust the network parameters of the prediction model.
  • the above-mentioned second sample data includes at least one of the following: original data collected and derived data generated using a generative model.
  • the above-mentioned retraining of the prediction model includes constraining the adjustment of at least part of the network parameters of the prediction model; wherein the above-mentioned retraining part 22 is configured to retrain the prediction model using preset sample data, including: using Preset sample data to determine the prediction loss of the prediction model corresponding to different parameter value sets, where each parameter value set includes a set of candidate parameter values corresponding to each network parameter in the prediction model, and each parameter value set corresponding to each prediction model
  • the network parameters are assigned to the corresponding candidate parameter values in the parameter value set; the parameter value set that will make the target loss meet the preset conditions is used as the target parameter value set, where the target loss corresponding to the parameter value set is predicted using the parameter value set. loss and regularization loss.
  • the regularization loss corresponding to the parameter value set is obtained by combining the weight of each network parameter and the change representation of each network parameter.
  • the weight of the network parameter is related to the degree of influence of the network parameter on the prediction of the original mode information.
  • the network The parameter change representation is based on the difference between the corresponding candidate parameter value of the network parameter in the parameter value set and the reference parameter value; each network parameter of the prediction model is adjusted to the corresponding candidate parameter value in the target parameter value set.
  • the above-mentioned retraining part 22 is configured to use preset sample data to determine the prediction losses of prediction models corresponding to different parameter value sets, including: for each parameter value set, using the prediction model corresponding to the parameter value set to first Predict the sample data to obtain the first prediction result corresponding to the parameter value set; use the first difference between the first prediction result corresponding to the parameter value set and the first annotation information of the first sample data to obtain the parameter value set corresponding predicted losses;
  • the training device 20 of the model also includes a target loss determination part.
  • the target loss determination part is configured to For each parameter value set, at least one of the first sub-regular loss and the second sub-regular loss of the parameter value set is obtained, where the first sub-regular loss is to use the weight of each network parameter to compare each network parameter corresponding to the parameter value set
  • the second sub-regular loss is obtained by using the weight of each network parameter to perform a weighted sum of the change representation processing values of each network parameter corresponding to the parameter value set.
  • the change representation processing value of the network parameter It is obtained by using the change representation of network parameters and the change representation of initial training parameters.
  • the initial training parameter change representation of network parameters is obtained by using the difference between the reference parameter value of each network parameter and the initial parameter value of the network parameter; using the parameter value set At least one of the first sub-regular loss and the second sub-regular loss is used to obtain the regular loss of the parameter value set; the weight of each network parameter is used to perform a weighted summation of the change representation of each network parameter to obtain the regularization corresponding to the parameter value set. Loss; use the regular loss corresponding to the parameter value set and the prediction loss corresponding to the parameter value set to obtain the target loss corresponding to the parameter value set. lose.
  • the target loss corresponding to the parameter value set is equal to the regularization loss and prediction loss corresponding to the parameter value set.
  • the reference parameter value is the value of the network parameter after the initial training of the prediction model.
  • the above-mentioned retraining of the prediction model includes constraining the adjustment of at least some network parameters of the prediction model; the above-mentioned retraining part 22 is configured to use preset sample data to retrain the prediction model, including: using the prediction model Predict the first sample data to obtain a first prediction result about the target modality; adjust the prediction model based on the first difference between the first prediction result and the first annotation information about the target modality of the first sample data
  • the target network parameters in wherein the target network parameters include the first network parameters that are not adjusted during the initial training process, and the network parameters other than the target network parameters in the prediction model are not adjusted.
  • the above-mentioned target network parameters also include second network parameters that have been adjusted by the prediction model during the initial training process.
  • the model training device 20 also includes a first network parameter determination part.
  • the first network parameter determination part is configured to provide a new model for the prediction model. Add at least one network parameter as the first network parameter.
  • the above-mentioned first sample data and second sample data both include position data of spatial points; before retraining, the prediction model is configured to predict scene property information about the original modality of the spatial points. After retraining , the prediction model is configured to predict the predicted scene property information about the original modality and the target modality of the spatial point.
  • the above-mentioned spatial points are points on the surface of the object; and/or, the predicted scene property information is related to the perspective, and both the first sample data and the second sample data also include perspective information of the spatial point; and/or, the predicted scene
  • the property information includes at least one of color, brightness, surface geometry, semantics and surface material; and/or, the annotation information of the preset sample data is a reference image, and the reference image is used to characterize the actual scene property information of the corresponding spatial point; model
  • model The training device 20 also includes a difference comparison part. After the retraining part 22 is configured to obtain the prediction result of the prediction model for the preset sample data, the difference comparison part is configured to generate a prediction image based on the prediction result, and the prediction image can represent the spatial point.
  • the predicted scene property information is the difference between the predicted image and the reference image is the difference between the prediction result and the annotation information of the preset sample data.
  • the modal information prediction device 30 includes a first acquisition part 31, a second acquisition part 32 and a prediction part 33, wherein the first acquisition part 31 is configured to implement the training method mentioned above to train the prediction model to obtain the prediction model;
  • the second acquisition part 32 is configured to acquire target data;
  • the prediction part 33 is configured to predict the target data using the trained prediction model to obtain information about at least one modality of the target data.
  • the above-mentioned target data includes position data of spatial points; at least one modality regarding the target data
  • the information includes scene property information about at least one modality of the spatial point.
  • FIG. 10 is a schematic framework diagram of an embodiment of the electronic device of the present disclosure.
  • the electronic device 40 includes a memory 41 and a processor 42 coupled to each other.
  • the processor 42 is configured to execute program instructions stored in the memory 41 to implement the steps in the training method embodiments of any of the above models, or the modal information. steps in the prediction method embodiment.
  • the electronic device 40 may include but is not limited to: a microcomputer and a server.
  • the electronic device 40 may also include mobile devices such as laptop computers and tablet computers, which are not limited here.
  • the processor 42 is configured to control itself and the memory 41 to implement the steps in the training method embodiment of any of the above models, or the steps in the modal information prediction method embodiment.
  • the processor 42 may also be called a CPU (Central Processing Unit).
  • the processor 42 may be an integrated circuit chip with signal processing capabilities.
  • the processor 42 can also be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field-Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the processor 42 may be implemented by an integrated circuit chip.
  • FIG. 11 is a schematic diagram of a framework of an embodiment of a computer-readable storage medium of the present disclosure.
  • the computer-readable storage medium 50 stores program instructions 51 that can be run by the processor.
  • the program instructions 51 are used to implement the steps in the training method embodiments of any of the above models, or the steps in the modal information prediction method embodiments.
  • the computer-readable storage medium may be a tangible device that holds and stores instructions for use by an instruction execution device, and may be a volatile storage medium or a non-volatile storage medium.
  • the computer-readable storage medium may be, for example, but not limited to: an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above.
  • Non-exhaustive list of computer-readable storage media include: portable computer disks, hard drives, random access memory (RAM), read only memory (ROM), erasable memory Erasable Programmable Read Only Memory (EPROM or flash memory), Static Random-Access Memory (SRAM), Portable Compact Disk Read Only Memory (Compact Disk Read Only Memory, CD-ROM) , Digital Versatile Disc (DVD), memory stick, floppy disk, mechanical encoding device, such as a punched card or a raised structure in a groove with instructions stored thereon, and any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable memory Erasable Programmable Read Only Memory
  • SRAM Static Random-Access Memory
  • CD-ROM Compact Disk Read Only Memory
  • DVD Digital Versatile Disc
  • memory stick floppy disk
  • mechanical encoding device such as a punched card or a raised structure in a groove with instructions stored thereon, and any suitable combination of the above.
  • computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or through electrical wires. transmitted electrical signals.
  • the functions or modules provided by the device provided by the embodiments of the present disclosure can be used to The method described in the above method embodiment is executed, and its specific implementation may refer to the description of the above method embodiment.
  • part may be part of a circuit, part of a processor, part of a program or software, etc., of course, it may also be a unit, it may be a module or it may be non-modular.
  • the prediction model can not forget the initial training.
  • the original modal information that has been learned also enables the prediction model to be used to predict the target modal information, thereby realizing the expansion of the modal information that the prediction model can predict.
  • the disclosed methods and devices can be implemented in other ways.
  • the device implementation described above is only illustrative.
  • the division of modules or units is only a logical function division.
  • there may be other divisions for example, units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • each functional unit in various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software functional units.
  • Integrated units may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as independent products.
  • the technical solution of the present disclosure is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including a number of instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to execute all or part of the steps of the various implementation methods of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本公开涉及一种模型训练及模态信息的预测方法及装置、电子设备、存储介质和计算机程序产品,模型训练方法包括:获取经初始训练得到的预测模型,其中,经初始训练得到的预测模型用于预测得到原始模态的信息;利用预设样本数据对预测模型进行重训练,预设样本数据包括标注有目标模态的信息的第一样本数据,经重训练的预测模型还用于预测得到目标模态的信息;其中,预设样本数据还包括标注有原始模态的信息的第二样本数据,和/或,所述对预测模型进行重训练包括对预测模型的至少部分网络参数的调整进行约束。

Description

模型训练及模态信息的预测方法及装置、电子设备、存储介质和计算机程序产品
相关申请的交叉引用
本公开基于申请号为202210419003.3、申请日为2022年04月20日、申请名称为“模型训练及模态信息的预测方法、相关装置、设备、介质”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本公开作为参考。
技术领域
本公开涉及但不限于深度学习技术领域,特别是涉及一种模型训练及模态信息的预测方法及装置、电子设备、存储介质和计算机程序产品。
背景技术
深度学习的快速发展,使得神经网络模型广泛的应用于生活的各个方面。例如是利用神经网络来对空间点进行场景性质的预测,以此实现对空间点的特征信息的扩展。
在神经网络的应用过程中,人们对于神经网络的需求可能会不断增加,此时就需要神经网络能够学习新的模态的信息,以不断满足人们的需求。目前,在利用已经训练好的神经网络模型学习新的模态的信息时,会出现灾难性遗忘的现象,即神经网络在学习新的模态的信息后,对于之前已经学习的模态的信息的预测准确度出现大幅下降。
发明内容
本公开至少提供一种模型训练及模态信息的预测方法及装置、电子设备、存储介质和计算机程序产品。
本公开第一方面提供了一种模型训练方法,方法包括:获取经初始训练得到的预测模型,其中,经初始训练得到的预测模型用于预测得到原始模态的信息;利用预设样本数据对预测模型进行重训练,预设样本数据包括标注有目标模态的信息的第一样本数据,经重训练的预测模型还用于预测得到目标模态的信息;其中,预设样本数据还包括标注有原始模态的信息的第二样本数据,和/或,所述对预测模型进行重训练包括对预测模型的至少部分网络参数的调整进行约束。
因此,通过设置设样本数据还包括标注有原始模态的信息的第二样本数据,和/或,对预测模型的至少部分网络参数的调整进行约束,可以使得预测模型不会忘记初始训练中已经学习到的原始模态的信息,而且还使得预测模型能够用于预测得到目标模态信息,以此实现了预测模型能够预测的模态信息的拓展。
本公开第二方面提供了一种模态信息的预测方法,方法包括:利用上述第一方 面的方法训练得到的预测模型;获取目标数据;利用经训练得到的预测模型对目标数据进行预测,得到关于目标数据的至少一种模态的信息。
因此,通过利用上述第一方面描述的模型训练方法来训练预测模型,以此使得预测模型既能预测原始模态的信息,也能预测目标模态的信息。
本公开第三方面提供了一种模型的训练装置,装置包括:获取部分和重训练部分,获取部分被配置为获取经初始训练得到预测模型,其中,经初始训练得到的预测模型用于预测得到原始模态的信息;重训练部分被配置为利用预设样本数据对预测模型进行重训练,预设样本数据包括标注有目标模态的信息的第一样本数据,经重训练的预测模型还用于预测得到目标模态的信息;其中,预设样本数据还包括标注有原始模态的信息的第二样本数据,和/或,所述对预测模型进行重训练包括对预测模型的至少部分网络参数的调整进行约束。
本公开第四方面提供了一种模态信息的预测装置,装置包括:第一获取部分、第二获取部分和预测部分,其中,第一获取部分被配置为利用上述第一方面描述的方法训练得到的预测模型;第二获取部分被配置为获取目标数据;预测部分被配置为利用经训练得到的预测模型对目标数据进行预测,得到关于目标数据的至少一种模态的信息。
本公开第五方面提供了一种电子设备,包括相互耦接的存储器和处理器,处理器被配置为执行存储器中存储的程序指令,以实现上述第一方面中的模型训练方法,或者是第二方面中描述的模态信息的预测方法。
本公开第六方面提供了一种计算机可读存储介质,其上存储有程序指令,程序指令被处理器执行时实现上述第一方面中的模型训练方法,或者是第二方面中描述的模态信息的预测方法。
本公开第七方面提供了一种计算机程序产品,所述计算机程序产品包括计算机程序或指令,在所述计算机程序或指令在计算机上运行的情况下,使得所述计算机执行上述第一方面中的模型训练方法,或者是第二方面中描述的模态信息的预测方法。
上述方案,通过设置设样本数据还包括标注有原始模态的信息的第二样本数据,和/或,对预测模型的至少部分网络参数的调整进行约束,可以使得预测模型不会忘记初始训练中已经学习到的原始模态的信息,而且使得预测模型能够用于预测得到目标模态信息,以此实现了预测模型能够预测的模态信息的拓展。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本 公开的实施例,并与说明书一起用于说明本公开的技术方案。
图1是本公开模型训练方法第一实施例的流程示意图;
图2是本公开模型训练方法第二实施例的流程示意图;
图3是本公开模型训练方法第三实施例的流程示意图;
图4是本公开模型训练方法第四实施例的流程示意图;
图5是本公开模型训练方法第五实施例的流程示意图;
图6是本公开模型训练方法中重训练流程的流程示意图;
图7是本公开模态信息的预测方法实施例的流程示意图;
图8是本公开模型的训练装置实施例的框架示意图;
图9是本公开模态信息的预测装置实施例的框架示意图;
图10是本公开电子设备一实施例的框架示意图;
图11是本公开计算机可读存储介质一实施例的框架示意图。
具体实施方式
下面结合说明书附图,对本公开实施例的方案进行详细说明。
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、接口、技术之类的具体细节,以便透彻理解本公开。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。此外,本文中的“多”表示两个或者多于两个。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。
请参阅图1,图1是本公开模型训练方法第一实施例的流程示意图。具体而言,可以包括如下步骤:
步骤S11:获取经初始训练得到的预测模型。
在本实施例中,经初始训练得到的预测模型用于预测得到原始模态的信息。原始模态可以包括一种模态或两种及以上模态。在一些实施例中,信息的一种来源或者一种表现形式即可成为一种模态,例如不同模态的信息例如包括语音信息、视频信息、文字信息等。又如,对于一个空间点,该空间点的不同的场景性质的信息,也是不同模态的信息。
对预测模型的初始训练可以是包含原始模态的信息的样本数据对预测模型进行训练,以此使得预测模型能够用于预测得到原始模态的信息。
步骤S12:利用预设样本数据对预测模型进行重训练,预设样本数据包括标注有目标模态的信息的第一样本数据,经重训练的预测模型还用于预测得到目标模态 的信息。
在本实施例中,预设样本数据还包括标注有原始模态的信息的第二样本数据,和/或,所述对预测模型进行重训练包括对预测模型的至少部分网络参数的调整进行约束。在一些实施方式中,可以在预测模型中新增加一个输出部分,用于输出目标模态的信息。
在一些实施方式中,上述的第一样本数据和第二样本数据均包括空间点的位置数据。空间点的位置数据例如是空间点的三维坐标。空间点例如是通过三维重建、或者是三维建模等方法得到的空间点,本公开不对空间点的获取方式进行限制。
在本实施例中,在预设样本数据还包括标注有原始模态的信息的第二样本数据的情况下,利用预设样本数据对预测模型进行重训练,即是将第一样本数据和第二样本数据输入至预测模型中,由预测模型输出预测结果,预测结果包括原始模态的信息和目标模态的信息。例如,对于点云数据中的空间点的场景性质,可以为空间三维点标注有原始模态的场景性质信息和目标模态的场景性质信息,此时的空间点既是第一样本数据,也是第二样本数据。然后可以将空间点输入至预测模型,进而得到预测模型对空间点进行预测后输出的关于空间点的原始模态的场景性质信息和目标模态的场景性质信息。通过设置设样本数据还包括标注有原始模态的信息的第二样本数据,可以在预测模型利用第一样本数据学习目标模态的信息的同时,利用第二样本数据使得预测模型不会忘记初始训练中已经学习到的原始模态的信息。
在本实施例中,对预测模型进行重训练包括对预测模型的至少部分网络参数的调整进行约束。对预测模型的至少部分网络参数的调整进行约束,可以认为限制预测模型的至少部分网络参数的调整,限制调整例如是该部分的网络参数不能修改,或者是不能将该部分网络参数进行大幅度地调整,大幅度调整例如可以理解的调整前后的网络参数的差值小于预设阈值等。通过对预测模型的至少部分网络参数的调整进行约束,使得在利用预设样本数据对预测模型进行重训练时,能够减少对已有的网络参数的影响,使得预测模型的网络参数依然能够与原始模态的信息有准确的映射,使得预测模型不会忘记初始训练中已经学习到的原始模态的信息。
因此,通过设置设样本数据还包括标注有原始模态的信息的第二样本数据,和/或,对预测模型的至少部分网络参数的调整进行约束,可以使得预测模型不会忘记初始训练中已经学习到的原始模态的信息,而且还使得预测模型能够用于预测得到目标模态的信息,以此实现了预测模型能够预测的模态信息的拓展。
在一个实施例中,在重训练之前,预测模型用于预测得到关于空间点的原始模态的场景性质信息,在重训练之后,预测模型用于预测得到关于空间点的原始模态和目标模态的预测场景性质信息。空间点的原始模态的场景性质信息例如是颜色和几何信息,空间点的目标模态的场景性质信息例如是物体材质信息、表面反射特性 信息等等。在一些实施例中,通过对预测模型进行初始训练,使得预测模型能够预测得到原始模态预测场景性质信息。通过对预测模型进行重训练,使得预测模型能够用于预测得到关于空间点的原始模态和目标模态的预测场景性质信息。因此,通过设置预测模型在重训练之前能够用于预测得到关于空间点的原始模态的场景性质信息,在重训练之后能够用于预测得到关于空间点的原始模态和目标模态的预测场景性质信息,以此实现了预测模型能够预测的模态信息的拓展。
在一些实施例中,预测模型可以基于神经网络隐式场景表示(Implicit Neural Scene Representation)搭建的网络模型,用于预测空间点的场景性质信息。
在一些实施例中,空间点为物体表面上的点。物体可以是通过三维重建得到的现实中的真实物体,或者是通过三维建模得到的虚拟物体。通过将空间点设置为物体表面上的点,以此能够利用预测模型预测得到物体表面上的点的原始模态的信息和目标模态的信息。
在一些实施例中,预测场景性质信息包括颜色、亮度、表面几何、语义和表面材质中的至少一种。在一些实施例中,上述的预测场景性质信息与视角相关,与视角相关的场景性质信息例如是颜色信息、亮度信息等等。第一样本数据和第二样本数据还均包括空间点的视角信息,即预测模型能够利用第一样本数据和第二样本数据的视角信息进行预测,以此得到与视角相关的预测场景性质信息。视角信息例如是倾斜角θ、偏北角在一些实施例中,可以对视角信息进行特征变换如傅里叶变换,以此使得视角信息能够为预测模型所利用。因此,通过设置预测场景性质信息与视角相关,使得预测模型能够预测得到与视角相关的预测场景性质信息。
在一些实施例中,预设样本数据的标注信息为参考图像,参考图像用于表征对应空间点的实际场景性质信息,即空间点在参考图像上的投影点的图像信息能够表征空间点的实际场景性质信息。在本实施例中,在得到预测模型对预设样本数据的预测结果之后,还可以继续执行步骤:基于预测结果生成预测图像,预测图像能够表征空间点的预测场景性质信息。具体地,可以利用与参考图像相同的视角信息进行渲染,以此实现基于预测结果生成预测图像,也即,预测图像和参考图像的视角信息相同。此时,预测图像与参考图像之间的差异作为预测结果与预设样本数据的标注信息之间的差异,具体可以体现为预测图像与参考图像之间的图像信息差异,例如是颜色差异、亮度差异等等。因此,通过将预设样本数据的标注信息确定为参考图像,并基于预测结果生成预测图像,进而将预测图像与参考图像之间的差异作为预测结果与预设样本数据的标注信息之间的差异,以此能够利用参考图像实现对预测模型的训练。
请参阅图2,图2是本公开模型训练方法第二实施例的流程示意图。在本实施例中,预设样本数据包括第一样本数据和第二样本数据,第一样本数据标注有关于 目标模态的第一标注信息,第二样本数据标注有关于原始模态的第二标注信息。在此情况下,上述提及的步骤“利用预设样本数据对预测模型进行重训练”具体包括步骤S21和步骤S22。
步骤S21:利用预测模型分别对第一样本数据、第二样本数据进行预测,对应得到关于目标模态的第一预测结果和关于原始模态的第二预测结果。
在本实施例中,可以将第一样本数据、第二样本数据分别输入至预测模型中,以利用预测模型对应得到关于目标模态的第一预测结果和关于原始模态的第二预测结果。例如,第一样本数据和第二样本数据为同一张图片,该图片的目标模态的第一标注信息为亮度,该图片的原始模态的第二标注信息为颜色,则预测模型对应得到关于目标模态的第一预测结果为预测的亮度,原始模态的第二预测结果为预测的颜色。又如,第一样本数据和第二样本数据为同一空间点,该空间点的目标模态的第一标注信息为表面材质和几何信息,原始模态的第二标注信息为语义信息,则预测模型对应得到关于目标模态的第一预测结果为空间点的表面材质和几何信息,原始模态的第二预测结果为预测的语义信息。
在一些实施方式中,第二样本数据包括以下至少之一:采集得到的原始数据和利用生成模型生成的衍生数据。采集得到的原始数据例如是通过基于采集的二维图像进行三维重建后得到的空间点。生成模型例如是专门用于生成第二样本数据的神经网络模型,例如,通过将某一空间点输入至生成模型,生成模型可以对应地为该空间点生成对应的第二标注信息。因此,通过采集得到的原始数据和利用生成模型生成的衍生数据,以此可以利用第二样本数据对预测模型进行重训练,进而使得预测模型能够实现预测的模态信息的拓展。
步骤S22:利用第一预测结果与第一标注信息之间的第一差异、第二预测结果与第二标注信息之间的第二差异,调整预测模型的网络参数。
在得到第一预测结果和第二预测结果以后,即可通过比较第一预测结果与第一标注信息得到第一差异,比较第二预测结果与第二标注信息得到第二差异,以此确定预测模型对目标模态的信息和原始模态的信息的预测准确度,并以此来调整预测模型的网络参数,进而实现对预测模型的重训练。
因此,通过利用预测模型分别对第一样本数据、第二样本数据进行预测,并基于第一预测结果与第一标注信息之间的第一差异和第二预测结果与第二标注信息之间的第二差异来调整预测模型的网络参数,使得预测模型在利用第一样本数据学习目标模态的信息的同时,也能利用第二样本数据使得预测模型不会忘记初始训练中已经学习到的原始模态的信息,以此实现了预测模型能够预测的模态信息的拓展。
请参阅图3,图3是本公开模型训练方法第三实施例的流程示意图。在本实施例中,对预测模型进行重训练包括对预测模型的至少部分网络参数的调整进行约束。 在此情况下,上述提及的步骤“利用预设样本数据对预测模型进行重训练”具体包括步骤S31至步骤S33。
步骤S31:利用预设样本数据,分别确定不同参数值集合对应的预测模型的预测损失。
在本实施例中,每个参数值集合包括预测模型中各网络参数对应的一组候选参数值,参数值集合对应的预测模型的各网络参数赋值为参数值集合中对应的候选参数值。
在一些实施例中,参数值集合可以是在重训练中的不同阶段确定的。例如,可以首先确定第一参数值集合θ1,然后基于第一参数值集合θ1对应的训练情况,来确定第二参数值集合θ2,以此实现迭代训练。
步骤S32:将使目标损失满足预设条件的参数值集合,作为目标参数值集合。
在本实施例中,参数值集合对应的目标损失是利用参数值集合对应的预测损失和正则损失得到。
在一些实施方式中,预测损失可以是基于预测模型的预测结果和标注信息的差异确定的。
在一些实施方式中,上述的步骤“利用预设样本数据,分别确定不同参数值集合对应的预测模型的预测损失”具体包括步骤S311和步骤S312(图未示)。
步骤S311:对于各参数值集合,利用参数值集合对应的预测模型对第一样本数据进行预测,得到参数值集合对应的第一预测结果。
步骤S312:利用参数值集合对应的第一预测结果与第一样本数据的第一标注信息之间的第一差异,得到参数值集合对应的预测损失。
在确定参数值集合并将参数值集合赋值于预测模型以后,可以利用预测模型对第一样本数据进行预测,得到参数值集合对应的第一预测结果。
通过参数值集合对应的第一预测结果与第一样本数据的第一标注信息之间的第一差异,可以确定预测模型的预测准确度,以此衡量参数值集合的准确度,进而能够相应确定参数值集合对应的预测损失。
因此,通过利用参数值集合对应的预测模型对第一样本数据进行预测得到第一预测结果,进而能够基于第一预测结果与第一样本数据的第一标注信息之间的第一差异得到参数值集合对应的预测损失,实现对参数值集合的准确度的衡量。
在一些实施方式中,参数值集合对应的正则损失是结合各网络参数的权重以及各网络参数的变化表征得到的。示例性地,参数值集合对应的正则损失是利用各网络参数的权重对各网络参数的变化表征进行加权得到的。具体地,网络参数的变化表征是基于网络参数在参数值集合中对应的候选参数值与参考参数值之间的差异得到的。候选参数值与参考参数值之间的差异例如是候选参数值与参考参数值之间的 差值,或者是差值的平方等。
参考参数值可以根据需要进行设置。在一些实施例中,可以将参考参数值设置为经过初始训练后的参数值。通过将参考参数值为预测模型经初始训练后的网络参数的值,以此可以通过确定正则损失反映出重训练阶段网络参数与经过初始训练后的参数值的偏离程度。在另一些实施例中,可以将参考参数值设置为在重训练过程中使用过的参数值。在本实施例方式中,网络参数的权重与网络参数对预测原始模态的信息的影响程度相关。例如,某一网络参数的权重越大,则表明该网络参数对预测原始模态的影响程度越大,改变了该网络参数则会导致预测模型预测原始模的信息态的准确度产生越大影响,如使得预测模型预测原始模的信息态的准确度大幅下降。因此,通过在重训练时计算正则损失,以此可以对预测模型的至少部分网络参数的调整进行约束。在一些实施例中,可以利用本领域中关于EWC(elastic weight consolidation)相关方法确定。
在一些实施例中,在重训练的过程中,可以产生多个参数值集合,以此可以利用多个参考参数值集合对预测模型进行赋值,并相应地确定与每一个参考参数值集合对应的预测损失和正则损失,进而能够确定目标损失。在此基础上便能进行迭代训练,得到目标损失满足预设条件的参数值集合,并将满足预设条件的参数值集合作为目标参数值集合。
步骤S33:将预测模型的各网络参数调整为目标参数值集合中对应的候选参数值。
通过将预测模型的各网络参数调整为目标参数值集合中对应的候选参数值,以此能够完成对预测模型的重训练。
因此,通过利用参数值集合对应的预测损失和正则损失得到目标损失,以此能够在重训练的过程中对预测模型的至少部分网络参数的调整进行约束,使得预测模型不会忘记初始训练中已经学习到的原始模态的信息,而且还使得预测模型能够用于预测得到目标模态的信息,以此实现了预测模型能够预测的模态信息的拓展。
请参阅图4,图4是本公开模型训练方法第四实施例的流程示意图。在本实施例中,在执行上述的步骤“将使目标损失满足预设条件的参数值集合,作为目标参数值集合”之前,模型训练方法还可以包括步骤S41至步骤S43。
步骤S41:对于各参数值集合,获取参数值集合的第一子正则损失和第二子正则损失中的至少之一。
在本实施例中,第一子正则损失是利用各网络参数的权重对参数值集合对应的各网络参数的变化表征进行加权求和得到的。
在一些实施例中,可以利用以下公式(1)计算参数值集合对应的第一子正则损失L1
其中,λ1为超参数,为参考参数值,θi为候选参数值,为网络参数的变化表征,I表示预测模型的所有参数,i表示预测模型的一具体参数,bi表示网络参数i的权重。
在本实施例中,第二子正则损失是利用各网络参数的权重对参数值集合对应的各网络参数的变化表征处理值进行加权求和得到的。网络参数的变化表征处理值是利用网络参数的变化表征和初始训练参数变化表征得到的,网络参数的初始训练参数变化表征是利用各网络参数的参考参数值与网络参数的初始参数值的差异得到的。例如,网络参数的初始参数值可以认为是在初始训练时,会对预测模型的各网络参数进行初始化,在初始化后确定的值。又如,网络参数的初始参数值还可以认为是,在第一次进行初始训练之前,各网络参数对应的值。
在一些实施例中,可以利用以下公式(2)计算参数值集合对应的第二子正则损失L2
其中,λ2为超参数,Δθi为网络参数的变化表征,为网络参数的初始训练参数变化表征,bi表示网络参数i的权重,其余参数的意义与公式(1)相同。
步骤S42:利用参数值集合的第一子正则损失和第二子正则损失的至少之一,得到参数值集合的正则损失。
在一些实施例中,可以仅将第一子正则损失和第二子正则损失的其中之一,作为参数值集合的正则损失。在另一些实施例中,也可以利用第一子正则损失和第二子正则损失二者得到参数值集合的正则损失。
在一些实施例中,可以利用以下公式(3)计算参数值集合对应的正则损失L。
L=L1+L2   (3)
其中,L1为第一子正则损失,L2为第二子正则损失。
在一些实施例中,上述的网络参数的参数权重与网络参数对预测原始模态的影响程度存在正相关关系。对应于公式(1)至公式(3),则是网络参数i对预测原始模态的信息的影响程度越大,则bi越大。因此,通过设置网络参数的参数权重与网络参数对预测原始模态的影响程度存在正相关关系,可以使得调整参数权重越大的网络参数,其对正则损失的影响就越大,以此可以使得重训练的过程中,会尽量降低参数权重较大的网络参数的变化以减少预测损失的变化来进行重训练,实现对参数权重较大的网络参数的约束。
步骤S43:利用参数值集合对应的正则损失与参数值集合对应的预测损失,得到参数值集合对应的目标损失。
在一些实施例中,可以设置参数值集合对应的目标损失与参数值集合对应的正 则损失、预测损失均存在正相关关系,这样,正则损失和预测损失的变化能够通过目标损失直接反映。
在一些实施例中,可以利用以下公式(4)计算参数值集合对应的目标损失。
其中,L(θi)表示参数值集合θi对应的预测损失,表示参数值集合θi对应的正则损失,L′(θi)表示参数值集合θi对应的预测损失。
因此,通过利用各网络参数的权重对各网络参数的变化表征进行加权求和,以得到参数值集合对应的正则损失,以此可以使得网络参数的权重越大,其对应的正则损失就越大,进而也会使得预测损失越大,以此使得重训练的过程中会尽量降低参数权重较大的网络参数的变化,以减少目标损失的变化来实现重训练,以此实现对参数权重较大的网络参数的约束。
请参阅图5,图5是本公开模型训练方法第五实施例的流程示意图。在本实施例中,对预测模型进行重训练包括对预测模型的至少部分网络参数的调整进行约束。在此情况下,上述的“利用预设样本数据对预测模型进行重训练”具体包括步骤S51和步骤S52。
步骤S51:利用预测模型对第一样本数据进行预测,得到关于目标模态的第一预测结果。
关于本步骤的详细描述,请参阅上述实施例的相关描述。
步骤S52:基于第一预测结果与第一样本数据关于目标模态的第一标注信息之间的第一差异,调整预测模型中的目标网络参数。
在本实施例中,目标网络参数包括在初始训练过程中未调整的第一网络参数,并且预测模型中除目标网络参数以外的网络参数不调整。在本实施例中,初始训练过程中未调整的第一网络参数可以认为是对预测模型预测原始模态的信息没有影响的网络参数,具体地,第一网络参数可以是在重训练阶段新增加的网络连接的参数,如新增的卷积核的网络参数,也可以是在初始训练阶段就已存在,但是在初始训练阶段未调整的网络参数。通过设置为仅调整目标网络参数,以此可以对预测模型的至少部分网络参数的调整进行约束,使得被约束的网络参数不会发生变化。
在一些实施方式中,上述的目标网络参数还包括预测模型在初始训练过程中已调整的第二网络参数,第二网络参数可以是初始训练过程中已调整的一部分。在本实施方式中,可以将初始训练过程中已调整的一部分网络参数也作为目标网络参数,使得预测模型在不会忘记初始训练中已经学习到的原始模态的信息的情况下,可以更好的预测目标模态的信息。
因此,通过仅调整目标网络参数,可以使得预测模型中与预测原始模态的信息相关的网络参数不会被调整,使得预测模型不会忘记初始训练中已经学习到的原始 模态的信息,使得预测模型能够用于预测得到原始模态的信息和目标模态的信息,以此实现了预测模型能够预测的模态信息的拓展。
在一个实施例中,在执行上述的步骤“利用预设样本数据对预测模型进行重训练”之前,本公开的模型训练方法还包括:为预测模型新增至少一个网络参数作为第一网络参数。在本实施例,具体可以是通过新增加网络连接的方式来新增至少一个网络参数作为第一网络参数。具体地,新增加的网络参数可以专门用于预测目标模态的信息,原有的网络参数用于预测原始模态的信息。因此,通过为预测模型新增至少一个网络参数作为第一网络参数,并在重训练阶段调整第一网络参数,以此使得预测模型在重训练后既能预测原始模态的信息,也能预测目标模态的信息。
请参阅图6,图6是本公开模型训练方法中重训练流程的流程示意图。在图6中,输入的样本数据包括空间点的位置信息101和视角信息02。预测模型103包括特征提取层1031和若干模态信息输出部分,包括模态信息输出部分1032、模态信息输出部分1033、模态信息输出部分1034和模态信息输出部分1035。
在本实施例中,模态信息输出部分1032和模态信息输出部分1033被配置为输出与视角相关的空间点的场景性质信息,模态信息输出部分1034和模态信息输出部分1035被配置为输出与视角无关的空间点的场景性质信息。因此,可以设置模态信息输出部分1034和模态信息输出部分1035利用前5个特征提取层1031提取的特征信息来进行解码以输出与视角无关的空间点的场景性质信息,设置视角信息102做第6个特征提取层的输入,用于后续模态信息输出部分1032和模态信息输出部分1033用于输出与视角相关的空间点的场景性质信息。另外,还可以在某一中间特征提取层1031再次将空间点的位置信息101作为输入,以提高预测模型的预测准确度。
在一些实施方式中,空间点的位置信息101即为第一样本数据,也是第二样本数据,视角信息102属于第二样本数据。模态信息输出部分1035被配置为输出目标模态的场景性质信息,模态信息输出部分1032、模态信息输出部分1033和模态信息输出部分1034被配置为原始模态的场景性质信息。后续便可依据各个模态信息输出部分的输出结果,调整预测模型的网络参数,以此实现预测模型的重训练。
在一些实施方式中,空间点的位置信息101和视角信息102为第一样本数据,模态信息输出部分1032被配置为输出目标模态的场景性质信息。此时,其他的模态信息输出部分不会输出预测结果。以此,能够相应地根据模态信息输出部分1032输出的目标模态的场景性质信息的预测结果,确定预测损失和正则损失,进而调整预测模型的网络参数,实现预测模型的重训练。
在一些实施方式中,空间点的位置信息101和视角信息102为第一样本数据,模态信息输出部分1032被配置为输出目标模态的场景性质信息,其他的模态信息 输出部分不会输出预测结果。此外,还可以在特征提取层1031中新增加的网络连接,如增加卷积核,激活层等,以利用新增加的网络连接来提取目标模态信息的特征信息。并且,还可以将新增加的网络连接涉及的网络参数确定为目标网络参数。以此,便能够相应地根据模态信息输出部分1032输出的目标模态的场景性质信息的预测结果来确定预测损失,进而调整目标网络参数,实现预测模型的重训练。
请参阅图7,图7是本公开模态信息的预测方法实施例的流程示意图。在本实施例中,模态信息的预测方法包括步骤S61至步骤S63。
步骤S61:训练得到的预测模型。
在本实施例中,可以利用上述的模型训练方法来训练预测模型,以实现对预测模型的训练。
步骤S62:获取目标数据。
目标数据例如是空间点的位置数据、图像中的像素点等。例如可以将三维点云数据作为目标数据,点云数据可以是通过三维重建、三维建模等方法得到的。
步骤S63:利用经训练得到的预测模型对目标数据进行预测,得到关于目标数据的至少一种模态的信息。
本实施例中的至少一种模态的信息可以包括预测模型重训练过程中的原始模态的信息或者是目标模态的信息。例如,可以利用预测模型预测空间点的至少一个模态的场景性质信息,场景性质信息例如是表面材质,表面材质信息可以是原始模态的信息,也可以是目标模态的信息。通过设置目标数据包括空间点的位置数据,目标数据的至少一种模态的信息包括关于空间点的至少一个模态的场景性质信息,以此可以利用预测模型预测空间点的场景性质信息。
因此,通过利用上述的模型训练方法来训练预测模型,以此使得预测模型既能预测原始模态的信息,也能预测目标模态的信息。
请参阅图8,图8是本公开模型的训练装置实施例的框架示意图。模型的训练装置20包括获取部分21和重训练部分22,获取部分21被配置为获取经初始训练得到的预测模型,其中,经初始训练得到的预测模型被配置为预测得到原始模态的信息;重训练部分22被配置为利用预设样本数据对预测模型进行重训练,预设样本数据包括标注有目标模态的信息的第一样本数据,经重训练的预测模型还被配置为预测得到目标模态的信息;其中,预设样本数据还包括标注有原始模态的信息的第二样本数据,和/或,所述对预测模型进行重训练包括对预测模型的至少部分网络参数的调整进行约束。
其中,上述的预设样本数据包括第一样本数据和第二样本数据,第一样本数据标注有关于目标模态的第一标注信息,第二样本数据标注有关于原始模态的第二标注信息;上述的重训练部分22被配置为利用预设样本数据对预测模型进行重训练, 包括:利用预测模型分别对第一样本数据、第二样本数据进行预测,对应得到关于目标模态的第一预测结果和关于原始模态的第二预测结果;利用第一预测结果与第一标注信息之间的第一差异、第二预测结果与第二标注信息之间的第二差异,调整预测模型的网络参数。
其中,上述的第二样本数据包括以下至少之一:采集得到的原始数据和利用生成模型生成的衍生数据。
其中,上述的对预测模型进行重训练包括对预测模型的至少部分网络参数的调整进行约束;其中,上述的重训练部分22被配置为利用预设样本数据对预测模型进行重训练,包括:利用预设样本数据,分别确定不同参数值集合对应的预测模型的预测损失,其中,每个参数值集合包括预测模型中各网络参数对应的一组候选参数值,参数值集合对应的预测模型的各网络参数赋值为参数值集合中对应的候选参数值;将使目标损失满足预设条件的参数值集合,作为目标参数值集合,其中,参数值集合对应的目标损失是利用参数值集合对应的预测损失和正则损失得到,参数值集合对应的正则损失是结合各网络参数的权重以及各网络参数的变化表征得到的,网络参数的权重与网络参数对预测原始模态的信息的影响程度相关,网络参数的变化表征是基于网络参数在参数值集合中对应的候选参数值与参考参数值之间的差异得到的;将预测模型的各网络参数调整为目标参数值集合中对应的候选参数值。
其中,上述的重训练部分22被配置为利用预设样本数据,分别确定不同参数值集合对应的预测模型的预测损失,包括:对于各参数值集合,利用参数值集合对应的预测模型对第一样本数据进行预测,得到参数值集合对应的第一预测结果;利用参数值集合对应的第一预测结果与第一样本数据的第一标注信息之间的第一差异,得到参数值集合对应的预测损失;
其中,模型的训练装置20还包括目标损失确定部分,在重训练部分22被配置为将使目标损失满足预设条件的参数值集合,作为目标参数值集合之前,目标损失确定部分被配置为对于各参数值集合,获取参数值集合的第一子正则损失和第二子正则损失中的至少之一,其中,第一子正则损失是利用各网络参数的权重对参数值集合对应的各网络参数的变化表征进行加权求和得到的,第二子正则损失是利用各网络参数的权重对参数值集合对应的各网络参数的变化表征处理值进行加权求和得到的,网络参数的变化表征处理值是利用网络参数的变化表征和初始训练参数变化表征得到的,网络参数的初始训练参数变化表征是利用各网络参数的参考参数值与网络参数的初始参数值的差异得到的;利用参数值集合的第一子正则损失和第二子正则损失的至少之一,得到参数值集合的正则损失;利用各网络参数的权重对各网络参数的变化表征进行加权求和,以得到参数值集合对应的正则损失;利用参数值集合对应的正则损失与参数值集合对应的预测损失,得到参数值集合对应的目标损 失。
其中,上述的网络参数的参数权重与网络参数对预测原始模态的信息的影响程度存在正相关关系;和/或,参数值集合对应的目标损失与参数值集合对应的正则损失、预测损失均存在正相关关系;和/或,参考参数值为预测模型经初始训练后的网络参数的值。
其中,上述的对预测模型进行重训练包括对预测模型的至少部分网络参数的调整进行约束;上述的重训练部分22被配置为利用预设样本数据对预测模型进行重训练,包括:利用预测模型对第一样本数据进行预测,得到关于目标模态的第一预测结果;基于第一预测结果与第一样本数据关于目标模态的第一标注信息之间的第一差异,调整预测模型中的目标网络参数,其中,目标网络参数包括在初始训练过程中未调整的第一网络参数,预测模型中除目标网络参数以外的网络参数不调整。
其中,上述的目标网络参数还包括预测模型在初始训练过程中已调整的第二网络参数。
其中,模型的训练装置20还包括第一网络参数确定部分,在重训练部分22被配置为利用预设样本数据对预测模型进行重训练之前,第一网络参数确定部分被配置为为预测模型新增至少一个网络参数作为第一网络参数。
其中,上述的第一样本数据和第二样本数据均包括空间点的位置数据;在重训练之前,预测模型被配置为预测得到关于空间点的原始模态的场景性质信息,在重训练之后,预测模型被配置为预测得到关于空间点的原始模态和目标模态的预测场景性质信息。
其中,上述的空间点为物体表面上的点;和/或,预测场景性质信息与视角相关,第一样本数据和第二样本数据还均包括空间点的视角信息;和/或,预测场景性质信息包括颜色、亮度、表面几何、语义和表面材质中的至少一种;和/或,预设样本数据的标注信息为参考图像,参考图像用于表征对应空间点的实际场景性质信息;模型的训练装置20还包括差异比较部分,在重训练部分22被配置为得到预测模型对预设样本数据的预测结果之后,差异比较部分被配置为基于预测结果生成预测图像,预测图像能够表征空间点的预测场景性质信息;其中,预测图像与参考图像之间的差异作为预测结果与预设样本数据的标注信息之间的差异。
请参阅图9,图9是本公开模态信息的预测装置实施例的框架示意图。模态信息的预测装置30包括第一获取部分31、第二获取部分32和预测部分33,其中,第一获取部分31被配置为上述模型的训练方法实施提及的训练方法训练得到预测模型;第二获取部分32被配置为获取目标数据;预测部分33被配置为利用经训练得到的预测模型对目标数据进行预测,得到关于目标数据的至少一种模态的信息。
其中,上述的目标数据包括空间点的位置数据;关于目标数据的至少一种模态 的信息包括关于空间点的至少一个模态的场景性质信息。
请参阅图10,图10是本公开电子设备一实施例的框架示意图。电子设备40包括相互耦接的存储器41和处理器42,处理器42被配置为执行存储器41中存储的程序指令,以实现上述任一模型的训练方法实施例中的步骤,或者是模态信息的预测方法实施例中的步骤。在一些实施例中,电子设备40可以包括但不限于:微型计算机、服务器,此外,电子设备40还可以包括笔记本电脑、平板电脑等移动设备,在此不做限定。
具体而言,处理器42被配置为控制其自身以及存储器41以实现上述任一模型的训练方法实施例中的步骤,或者是模态信息的预测方法实施例中的步骤。处理器42还可以称为CPU(Central Processing Unit,中央处理单元)。处理器42可能是一种集成电路芯片,具有信号的处理能力。处理器42还可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。另外,处理器42可以由集成电路芯片共同实现。
请参阅图11,图11为本公开计算机可读存储介质一实施例的框架示意图。计算机可读存储介质50存储有能够被处理器运行的程序指令51,程序指令51用于实现上述任一模型的训练方法实施例中的步骤,或者是模态信息的预测方法实施例中的步骤。所述计算机可读存储介质可以是保持和存储由指令执行设备使用的指令的有形设备,可以是易失性存储介质或非易失性存储介质。计算机可读存储介质例如可以是但不限于:电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(Random Access Memory,RAM)、只读存储器(Read Only Memory,ROM)、可擦式可编程只读存储器(Erasable Programmable Read Only Memory,EPROM或闪存)、静态随机存取存储器(Static Random-Access Memory,SRAM)、便携式压缩盘只读存储器(Compact Disk Read Only Memory,CD-ROM)、数字多功能盘(Digital versatile Disc,DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于 执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述。
在本公开实施例以及其他的实施例中,“部分”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是单元,还可以是模块也可以是非模块化的。
上述方案,通过设置设样本数据还包括标注有原始模态的信息的第二样本数据,和/或,对预测模型的至少部分网络参数的调整进行约束,可以使得预测模型不会忘记初始训练中已经学习到的原始模态的信息,而且还使得预测模型还能够用于预测得到目标模态信息,以此实现了预测模型能够预测的模态信息的拓展。
上文对各个实施例的描述倾向于强调各个实施例之间的不同之处,其相同或相似之处可以互相参考。
在本公开所提供的几个实施例中,应该理解到,所揭露的方法和装置,可以通过其它的方式实现。例如,以上所描述的装置实施方式仅仅是示意性的,例如,模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性、机械或其它的形式。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本公开各个实施方式方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (18)

  1. 一种模型训练方法,包括:
    获取经初始训练得到的预测模型,其中,经初始训练得到的所述预测模型用于预测得到原始模态的信息;
    利用预设样本数据对所述预测模型进行重训练,所述预设样本数据包括标注有目标模态的信息的第一样本数据,经所述重训练的所述预测模型还用于预测得到所述目标模态的信息;
    其中,所述预设样本数据还包括标注有所述原始模态的信息的第二样本数据,和/或,所述对所述预测模型进行重训练包括对所述预测模型的至少部分网络参数的调整进行约束。
  2. 根据权利要求1所述的方法,其中,所述预设样本数据包括所述第一样本数据和所述第二样本数据,所述第一样本数据标注有关于目标模态的第一标注信息,所述第二样本数据标注有关于所述原始模态的第二标注信息;
    所述利用预设样本数据对所述预测模型进行重训练,包括:
    利用所述预测模型分别对所述第一样本数据、第二样本数据进行预测,对应得到关于所述目标模态的第一预测结果和关于所述原始模态的第二预测结果;
    利用所述第一预测结果与第一标注信息之间的第一差异、所述第二预测结果与第二标注信息之间的第二差异,调整所述预测模型的网络参数。
  3. 根据权利要求1或2所述的方法,其中,所述第二样本数据包括以下至少之一:采集得到的原始数据和利用生成模型生成的衍生数据。
  4. 根据权利要求1至3任一项所述的方法,其中,所述对所述预测模型进行重训练包括对所述预测模型的至少部分网络参数的调整进行约束;所述利用预设样本数据对所述预测模型进行重训练,包括:
    利用所述预设样本数据,分别确定不同参数值集合对应的预测模型的预测损失,其中,所述不同参数值集合中每个所述参数值集合包括预测模型中各网络参数对应的一组候选参数值,所述参数值集合对应的预测模型的各网络参数赋值为所述参数值集合中对应的候选参数值;
    将使目标损失满足预设条件的所述参数值集合,作为目标参数值集合,其中,所述参数值集合对应的目标损失是利用所述参数值集合对应的预测损失和正则损失得到,所述参数值集合对应的正则损失是结合各网络参数的权重以及各网络参数的变化表征得到的,所述网络参数的权重与所述网络参数对预测所述原始模态的信息的影响程度相关,所述网络参数的变化表征是基于所述网络参数在所述参数值集合中对应的候选参数值与参考参数值之间的差异得到的;
    将所述预测模型的各网络参数调整为所述目标参数值集合中对应的候选参数值。
  5. 根据权利要求4所述的方法,其中,所述利用所述预设样本数据,分别确定不同参数值集合对应的预测模型的预测损失,包括:
    对于各所述参数值集合,利用所述参数值集合对应的预测模型对所述第一样本数据进行预测,得到所述参数值集合对应的第一预测结果;
    利用所述参数值集合对应的第一预测结果与所述第一样本数据的第一标注信息之间的第一差异,得到所述参数值集合对应的所述预测损失。
  6. 根据权利要求4或5所述的方法,其中,在所述将使目标损失满足预设条件的所述参数值集合,作为目标参数值集合之前,所述方法还包括:
    对于各所述参数值集合,获取所述参数值集合的第一子正则损失和第二子正则损失中的至少之一,其中,所述第一子正则损失是利用各所述网络参数的权重对所述参数值集合对应的各所述网络参数的变化表征进行加权求和得到的,所述第二子正则损失是利用各所述网络参数的权重对所述参数值集合对应的各所述网络参数的变化表征处理值进行加权求和得到的,所述网络参数的变化表征处理值是利用所述网络参数的变化表征和初始训练参数变化表征得到的,所述网络参数的初始训练参数变化表征是利用各所述网络参数的参考参数值与所述网络参数的初始参数值的差异得到的;
    利用所述参数值集合的所述第一子正则损失和所述第二子正则损失的至少之一,得到所述参数值集合的正则损失;
    利用所述参数值集合对应的正则损失与所述参数值集合对应的预测损失,得到所述参数值集合对应的目标损失。
  7. 根据权利要求4至6任一项所述的方法,其中,所述网络参数的参数权重与所述网络参数对预测所述原始模态的信息的影响程度存在正相关关系;
    和/或,所述参数值集合对应的目标损失与所述参数值集合对应的正则损失、预测损失均存在正相关关系;
    和/或,所述参考参数值为所述预测模型经初始训练后的网络参数的值。
  8. 根据权利要求1至7任一项所述的方法,其中,所述对所述预测模型进行重训练包括对所述预测模型的至少部分网络参数的调整进行约束;所述利用预设样本数据对所述预测模型进行重训练,包括:
    利用所述预测模型对所述第一样本数据进行预测,得到关于所述目标模态的第一预测结果;
    基于所述第一预测结果与所述第一样本数据关于目标模态的第一标注信息之间的第一差异,调整所述预测模型中的目标网络参数,其中,所述目标网络参数包括在所述初始训练过程中未调整的第一网络参数,所述预测模型中除所述目标网络参数以外的网络参数不调整。
  9. 根据权利要求8所述的方法,其中,
    所述目标网络参数还包括所述预测模型在所述初始训练过程中已调整的第二网络参数;
    和/或,所述利用预设样本数据对所述预测模型进行重训练之前,还包括:
    为所述预测模型新增至少一个网络参数作为所述第一网络参数。
  10. 根据权利要求1至9任一项所述的方法,其中,所述第一样本数据和所述第二样本数据均包括空间点的位置数据;在所述重训练之前,所述预测模型用于预测得到关于所述空间点的所述原始模态的场景性质信息,在所述重训练之后,所述预测模型用于预测得到关于所述空间点的所述原始模态和目标模态的预测场景性质信息。
  11. 根据权利要求10所述的方法,其中,所述空间点为物体表面上的点;
    和/或,所述预测场景性质信息与视角相关,所述第一样本数据和所述第二样本数据还均包括所述空间点的视角信息;
    和/或,所述预测场景性质信息包括颜色、亮度、表面几何、语义和表面材质中的至少一种;
    和/或,所述预设样本数据的标注信息为参考图像,所述参考图像用于表征对应所述空间点的实际场景性质信息;在得到所述预测模型对所述预设样本数据的预测结果之后,所述方法还包括:
    基于所述预测结果生成预测图像,所述预测图像能够表征所述空间点的预测场景性质信息;其中,所述预测图像与所述参考图像之间的差异作为所述预测结果与所述预设样本数据的标注信息之间的差异。
  12. 一种模态信息的预测方法,包括:
    利用权利要求1至11任一项所述的方法训练得到预测模型;
    获取目标数据;
    利用经训练得到的所述预测模型对所述目标数据进行预测,得到关于所述目标数据的至少一种模态的信息。
  13. 根据权利要求12所述的方法,其中,所述目标数据包括空间点的位置数据;所述关于所述目标数据的至少一种模态的信息包括关于所述空间点的至少一个模态的场景性质信息。
  14. 一种模型的训练装置,包括:
    获取部分,被配置为获取经初始训练得到的预测模型,其中,经初始训练得到的所述预测模型用于预测得到原始模态的信息;
    重训练部分,被配置为利用预设样本数据对所述预测模型进行重训练,所述预设样本数据包括标注有目标模态的信息的第一样本数据,经所述重训练的所述预测 模型还用于预测得到所述目标模态的信息;
    其中,所述预设样本数据还包括标注有所述原始模态的信息的第二样本数据,和/或,所述对所述预测模型进行重训练包括对所述预测模型的至少部分网络参数的调整进行约束。
  15. 一种模态信息的预测装置,包括:
    第一获取部分,被配置为利用权利要求1至11任一项所述的方法训练得到预测模型;
    第二获取部分,被配置为获取目标数据;
    预测部分,被配置为利用经训练得到的所述预测模型对所述目标数据进行预测,得到关于所述目标数据的至少一种模态的信息。
  16. 一种电子设备,包括相互耦接的存储器和处理器,所述处理器被配置为执行所述存储器中存储的程序指令,以实现权利要求1至11任一项所述的模型训练方法,或者实现权利要求12或13所述的模模态信息的预测方法。
  17. 一种计算机可读存储介质,其上存储有程序指令,所述程序指令被处理器执行时实现权利要求1至11任一项所述的模型训练方法,或者实现权利要求12或13所述的模态信息的预测方法。
  18. 一种计算机程序产品,所述计算机程序产品包括计算机程序或指令,在所述计算机程序或指令在计算机上运行的情况下,使得所述计算机执行如权利要求1至11任一项所述的模型训练方法,或者权利要求12或13所述的模态信息的预测方法。
PCT/CN2023/089228 2022-04-20 2023-04-19 模型训练及模态信息的预测方法及装置、电子设备、存储介质和计算机程序产品 WO2023202620A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210419003.3A CN114925748B (zh) 2022-04-20 2022-04-20 模型训练及模态信息的预测方法、相关装置、设备、介质
CN202210419003.3 2022-04-20

Publications (1)

Publication Number Publication Date
WO2023202620A1 true WO2023202620A1 (zh) 2023-10-26

Family

ID=82807507

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/089228 WO2023202620A1 (zh) 2022-04-20 2023-04-19 模型训练及模态信息的预测方法及装置、电子设备、存储介质和计算机程序产品

Country Status (2)

Country Link
CN (1) CN114925748B (zh)
WO (1) WO2023202620A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114925748B (zh) * 2022-04-20 2024-05-24 北京市商汤科技开发有限公司 模型训练及模态信息的预测方法、相关装置、设备、介质
CN116486491A (zh) * 2022-09-07 2023-07-25 支付宝(杭州)信息技术有限公司 活体检测模型训练方法、装置、存储介质以及终端
CN115952904A (zh) * 2022-12-29 2023-04-11 广东南方财经控股有限公司 基于分步关联权重的预测模型构建方法、预测方法及装置
CN116894192A (zh) * 2023-09-11 2023-10-17 浙江大华技术股份有限公司 大模型训练方法及相关方法、装置、设备、系统和介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020077202A1 (en) * 2018-10-12 2020-04-16 The Medical College Of Wisconsin, Inc. Medical image segmentation using deep learning models trained with random dropout and/or standardized inputs
CN111783505A (zh) * 2019-05-10 2020-10-16 北京京东尚科信息技术有限公司 伪造人脸的识别方法、装置和计算机可读存储介质
CN113192639A (zh) * 2021-04-29 2021-07-30 平安科技(深圳)有限公司 信息预测模型的训练方法、装置、设备及存储介质
CN114925748A (zh) * 2022-04-20 2022-08-19 北京市商汤科技开发有限公司 模型训练及模态信息的预测方法、相关装置、设备、介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021098796A1 (zh) * 2019-11-20 2021-05-27 Oppo广东移动通信有限公司 图像处理方法、装置、设备及计算机可读存储介质
CN111210024B (zh) * 2020-01-14 2023-09-15 深圳供电局有限公司 模型训练方法、装置、计算机设备和存储介质
US20220027786A1 (en) * 2020-07-24 2022-01-27 Macau University Of Science And Technology Multimodal Self-Paced Learning with a Soft Weighting Scheme for Robust Classification of Multiomics Data
CN112668498B (zh) * 2020-12-30 2024-02-06 西安电子科技大学 空中辐射源个体智能增量识别方法、系统、终端及应用

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020077202A1 (en) * 2018-10-12 2020-04-16 The Medical College Of Wisconsin, Inc. Medical image segmentation using deep learning models trained with random dropout and/or standardized inputs
CN111783505A (zh) * 2019-05-10 2020-10-16 北京京东尚科信息技术有限公司 伪造人脸的识别方法、装置和计算机可读存储介质
CN113192639A (zh) * 2021-04-29 2021-07-30 平安科技(深圳)有限公司 信息预测模型的训练方法、装置、设备及存储介质
CN114925748A (zh) * 2022-04-20 2022-08-19 北京市商汤科技开发有限公司 模型训练及模态信息的预测方法、相关装置、设备、介质

Also Published As

Publication number Publication date
CN114925748A (zh) 2022-08-19
CN114925748B (zh) 2024-05-24

Similar Documents

Publication Publication Date Title
WO2023202620A1 (zh) 模型训练及模态信息的预测方法及装置、电子设备、存储介质和计算机程序产品
US20230229919A1 (en) Learning to generate synthetic datasets for training neural networks
CN106204522B (zh) 对单个图像的联合深度估计和语义标注
US20210264227A1 (en) Method for locating image region, model training method, and related apparatus
US20210090327A1 (en) Neural network processing for multi-object 3d modeling
WO2022007823A1 (zh) 一种文本数据处理方法及装置
WO2021254499A1 (zh) 编辑模型生成、人脸图像编辑方法、装置、设备及介质
JP7350878B2 (ja) 画像解析方法、装置、プログラム
US20210097401A1 (en) Neural network systems implementing conditional neural processes for efficient learning
US9213899B2 (en) Context-aware tracking of a video object using a sparse representation framework
JP2023545199A (ja) モデル訓練方法、人体姿勢検出方法、装置、デバイスおよび記憶媒体
US20230042221A1 (en) Modifying digital images utilizing a language guided image editing model
WO2024036847A1 (zh) 图像处理方法和装置、电子设备和存储介质
WO2023082870A1 (zh) 图像分割模型的训练方法、图像分割方法、装置及设备
WO2020244151A1 (zh) 图像处理方法、装置、终端及存储介质
WO2022127333A1 (zh) 图像分割模型的训练方法、图像分割方法、装置、设备
TWI831016B (zh) 機器學習方法、機器學習系統以及非暫態電腦可讀取媒體
CN114241411B (zh) 基于目标检测的计数模型处理方法、装置及计算机设备
US20240012966A1 (en) Method and system for providing a three-dimensional computer aided-design (cad) model in a cad environment
CN115222859A (zh) 图像动画化
US20230040793A1 (en) Performance of Complex Optimization Tasks with Improved Efficiency Via Neural Meta-Optimization of Experts
US20230281843A1 (en) Generating depth images for image data
US11961249B2 (en) Generating stereo-based dense depth images
WO2022117014A1 (en) System, method and apparatus for training a machine learning model
KR20220126581A (ko) 전자 장치 및 그 제어 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23791275

Country of ref document: EP

Kind code of ref document: A1