WO2023124029A1 - Procédé et appareil d'entraînement de modèle d'apprentissage profond, et procédé et appareil de recommandation de contenu - Google Patents

Procédé et appareil d'entraînement de modèle d'apprentissage profond, et procédé et appareil de recommandation de contenu Download PDF

Info

Publication number
WO2023124029A1
WO2023124029A1 PCT/CN2022/106805 CN2022106805W WO2023124029A1 WO 2023124029 A1 WO2023124029 A1 WO 2023124029A1 CN 2022106805 W CN2022106805 W CN 2022106805W WO 2023124029 A1 WO2023124029 A1 WO 2023124029A1
Authority
WO
WIPO (PCT)
Prior art keywords
deep learning
feature
target
learning model
network layer
Prior art date
Application number
PCT/CN2022/106805
Other languages
English (en)
Chinese (zh)
Inventor
陈意超
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Publication of WO2023124029A1 publication Critical patent/WO2023124029A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to the technical field of artificial intelligence, in particular to deep learning, intelligent recommendation and other technical fields, and more specifically, to a deep learning model training method, content recommendation method, device, electronic equipment, media, and program product.
  • the deep learning model can be used to recommend relevant content.
  • a lot of labor and time costs need to be invested, and there is a high technical threshold, which leads to the failure of the deep learning model.
  • the training efficiency is low.
  • the present disclosure provides a training method of a deep learning model, a content recommendation method, a device, an electronic device, a storage medium, and a program product.
  • a method for training a deep learning model including: obtaining a configuration file, wherein the configuration file includes model type data and candidate feature configuration data; based on the model type data, selecting an initial network Layer type and initial network layer structure; Based on the initial network layer type and the initial network layer structure, an initial deep learning model is obtained; based on the candidate feature configuration data, the first training sample is processed to obtain the first training feature data; The first training feature data trains the initial deep learning model; based on the trained initial deep learning model, a target deep learning model is obtained.
  • a method for recommending content including: determining object feature data for a target object; for target content in at least one candidate content, determining content feature data for the target content; The object feature data and the content feature data are input into the target deep learning model to obtain an output result, wherein the target deep learning model is generated by the method according to the present disclosure, and the output result represents the impact of the target object on the target The degree of interest in the content; in response to the output result meeting a preset condition, recommending the target content to the target object.
  • a training device for a deep learning model including: an acquisition module, a selection module, a first acquisition module, a first processing module, a first training module, and a second acquisition module.
  • An acquisition module configured to acquire a configuration file, wherein the configuration file includes model type data and candidate feature configuration data; a selection module, configured to select an initial network layer type and an initial network layer structure based on the model type data; the first The obtaining module is used to obtain an initial deep learning model based on the initial network layer type and the initial network layer structure; the first processing module is used to process the first training sample based on the candidate feature configuration data to obtain the first training Feature data; a first training module, configured to use the first training feature data to train the initial deep learning model; a second obtaining module, configured to obtain a target deep learning model based on the trained initial deep learning model.
  • a content recommendation device including: a first determination module, a second determination module, an input module and a recommendation module.
  • the first determination module is used to determine the object feature data for the target object;
  • the second determination module is used to determine the content feature data for the target content for the target content in at least one candidate content;
  • the input module is used for The object feature data and the content feature data are input into the target deep learning model to obtain an output result, wherein the target deep learning model is generated by the device according to the present disclosure, and the output result represents the target object's effect on the target object.
  • the degree of interest in the target content a recommendation module, configured to recommend the target content to the target object in response to the output result meeting a preset condition.
  • an electronic device including: at least one processor and a memory communicatively connected to the at least one processor.
  • the memory stores instructions that can be executed by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the above-mentioned deep learning model training method and /or content recommendation method.
  • a non-transitory computer-readable storage medium storing computer instructions, the computer instructions are used to enable the computer to execute the above-mentioned deep learning model training method and/or content recommendation method .
  • a computer program product including a computer program.
  • the computer program When the computer program is executed by a processor, the above-mentioned deep learning model training method and/or content recommendation method are implemented.
  • FIG. 1 schematically shows a system architecture for training a deep learning model and recommending content according to an embodiment of the present disclosure
  • Fig. 2 schematically shows a flow chart of a method for training a deep learning model according to an embodiment of the present disclosure
  • FIG. 3 schematically shows a flow chart of a method for training a deep learning model according to another embodiment of the present disclosure
  • Fig. 4 schematically shows a schematic diagram of a training method of a deep learning model according to an embodiment of the present disclosure
  • Fig. 5 schematically shows a schematic diagram of a content recommendation method according to an embodiment of the present disclosure
  • Fig. 6 schematically shows a block diagram of a training device for a deep learning model according to an embodiment of the present disclosure
  • Fig. 7 schematically shows a block diagram of a content recommendation device according to an embodiment of the present disclosure.
  • FIG. 8 is a block diagram of an electronic device for performing deep learning model training and/or content recommendation to implement an embodiment of the present disclosure.
  • FIG. 1 schematically shows a system architecture of deep learning model training and content recommendation according to an embodiment of the present disclosure. It should be noted that, what is shown in FIG. 1 is only an example of the system architecture to which the embodiments of the present disclosure can be applied, so as to help those skilled in the art understand the technical content of the present disclosure, but it does not mean that the embodiments of the present disclosure cannot be used in other device, system, environment or scenario.
  • a system architecture 100 may include clients 101 , 102 , 103 , a network 104 and a server 105 .
  • the network 104 is used as a medium for providing communication links between the clients 101 , 102 , 103 and the server 105 .
  • Network 104 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
  • Clients 101, 102, 103 Users may use clients 101, 102, 103 to interact with server 105 over network 104 to receive or send messages, and the like.
  • Clients 101, 102, and 103 can be installed with various communication client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, email clients, social platform software, etc. (just for example).
  • Clients 101, 102, 103 may be various electronic devices with display screens and supporting web browsing, including but not limited to smartphones, tablet computers, laptop computers, desktop computers, and the like.
  • the clients 101, 102, and 103 in the embodiments of the present disclosure may, for example, run applications.
  • the server 105 may be a server that provides various services, such as a background management server that provides support for websites browsed by users using the clients 101 , 102 , 103 (just an example).
  • the background management server can analyze and process received user requests and other data, and feed back processing results (such as webpages, information, or data, etc. obtained or generated according to user requests) to the client.
  • the server 105 may also be a cloud server, that is, the server 105 has a cloud computing function.
  • the deep learning model training method and/or the content recommendation method provided by the embodiment of the present disclosure may be executed by the server 105 .
  • the deep learning model training device and/or the content recommendation device provided by the embodiments of the present disclosure may be set in the server 105 .
  • the deep learning model training method and/or content recommendation method provided by the embodiments of the present disclosure may also be executed by a server or server cluster that is different from the server 105 and can communicate with the clients 101 , 102 , 103 and/or the server 105 .
  • the deep learning model training device and/or content recommendation device may also be set on a server or a server that is different from the server 105 and can communicate with the clients 101, 102, 103 and/or the server 105 in the cluster.
  • the server 105 can receive training samples from the clients 101, 102, 103 through the network 104, and use the training samples to train the deep learning model, and then the server 105 can send the trained deep learning model to the client through the network 104 101, 102, 103, the client can use the trained deep learning model to recommend content.
  • the server 105 may also directly use the deep learning model to recommend content.
  • a method for training a deep learning model and a method for recommending content according to an exemplary embodiment of the present disclosure will be described below with reference to FIGS. 2 to 5 in conjunction with the system architecture of FIG. 1 .
  • the deep learning model training method and the content recommendation method of the embodiments of the present disclosure can be executed by the server shown in FIG. 1 , for example, the server shown in FIG. 1 is the same as or similar to the electronic device below.
  • Fig. 2 schematically shows a flowchart of a method for training a deep learning model according to an embodiment of the present disclosure.
  • the deep learning model training method 200 of the embodiment of the present disclosure may include, for example, operation S210 to operation S260.
  • a configuration file is obtained, and the configuration file includes model type data and candidate feature configuration data.
  • an initial network layer type and an initial network layer structure are selected.
  • an initial deep learning model is obtained based on the initial network layer type and the initial network layer structure.
  • the first training sample is processed based on the candidate feature configuration data to obtain first training feature data.
  • an initial deep learning model is trained using the first training feature data.
  • a target deep learning model is obtained based on the trained initial deep learning model.
  • the configuration file includes model type data, and the model type data, for example, characterizes the model type of the initial deep learning model, and the model type includes, for example, a deep neural network (Deep Neural Networks, DNN) type.
  • DNN Deep Neural Networks
  • the initial network layer type includes, for example, attention layer, fully connected layer, pooling layer and other types of layers, and the initial network layer type may also represent the connection relationship between each layer.
  • the initial network layer structure for example, characterizes the number of nodes in each layer.
  • the initial network layer type may include multiple optional initial network layer types
  • the initial network layer structure may include multiple optional initial network layer structures
  • the data based on the model type may be selected from multiple initial network layer types and multiple Select the initial network layer type and initial network layer structure required by the DNN model in each initial network layer structure.
  • different initial network layer types and different initial network layer structures can be selected in turn to build the initial deep learning model. For each Build the initial deep learning model for training.
  • the candidate feature configuration data represents a processing method for the first training sample.
  • the candidate feature configuration data represents a feature type and a feature dimension for extracting feature data from the first training sample. Processing the first training sample based on the candidate feature configuration data can obtain the first training feature data suitable for training the initial deep learning model.
  • the candidate feature configuration data includes feature types and feature dimensions for the first training sample.
  • the feature type includes, for example, age, gender, content category and other features
  • the feature dimension is, for example, the dimension of a feature vector
  • the dimension of a feature vector includes, for example, 1*128 dimension, 1*256 dimension, and so on.
  • the first training sample used to train the model For example, when training the first initial deep learning model, for the first training sample used to train the model, select age and gender features from features such as age, gender, content category, etc., from feature dimensions 1*128 dimensions, 1 Select 1*128 dimension among *256 dimensions, and then process the first training sample to obtain the first training feature data of 1*128 dimension for age and gender.
  • the features of gender and content category from features such as age, gender, content category, etc., from the feature dimension 1*128 dimensions, Select 1*256 dimension among 1*256 dimensions, and then process the first training sample to obtain 1*256-dimensional first training feature data for gender and content category.
  • the first training sample can be processed based on the candidate feature configuration data to obtain the first training feature data, and the initial deep learning model can be trained through the first training feature data.
  • the candidate feature configuration data may also include multiple candidate feature configuration data.
  • different candidate feature configuration data may be sequentially selected to process the first training samples used to train the corresponding initial deep learning models.
  • a target deep learning model can be obtained based on the initial deep learning model.
  • the initial deep learning model can be directly used as the target deep learning model, or model construction and model training can be re-performed based on the initial deep learning model to obtain a deep learning model.
  • the model type data and the candidate feature configuration data are defined through the configuration file.
  • the corresponding initial network layer type and initial network layer structure can be selected based on the configuration file to construct the corresponding initial deep learning model, and based on the candidate feature configuration data to process the first training sample to obtain the corresponding first training feature data, so as to train the initial deep learning model based on the first training feature data, and then obtain the target deep learning model based on the initial deep learning model.
  • the initial neural network is constructed based on the configuration file and the first training sample is processed so as to automatically and quickly train multiple initial deep learning models, which improves the efficiency of model training and reduces the cost of model training. No modification is required through the configuration file Code, lowering the technical threshold of model training.
  • Fig. 3 schematically shows a flowchart of a method for training a deep learning model according to another embodiment of the present disclosure.
  • the deep learning model training method 300 of the embodiment of the present disclosure may include, for example, operation S301 to operation S311 .
  • a configuration file is acquired, and the configuration file includes model type data and candidate feature configuration data.
  • an initial network layer type and an initial network layer structure are selected.
  • an initial deep learning model is obtained based on the initial network layer type and the initial network layer structure.
  • the first training sample is processed based on the candidate feature configuration data to obtain first training feature data.
  • an initial deep learning model is trained using the first training feature data.
  • operation S301 to operation S305 are the same as or similar to the operations of the above-mentioned embodiment, and will not be repeated here.
  • a target deep learning model is obtained based on the trained initial deep learning model, see operations S306 to S311.
  • the initial deep learning model includes at least one trained initial deep learning model, the initial network layer type, initial network layer structure or candidate feature corresponding to each trained deep learning model in at least one trained initial deep learning model Configuration data can be different.
  • the configuration file also includes evaluation conditions, which are used to evaluate the training effect of the initial deep learning model. The following operations S306 to S308 describe obtaining the target network layer type, target network layer structure, and target feature configuration data with better training effect by evaluating the initial deep learning model.
  • the verification sample is processed based on the candidate feature configuration data to obtain verification feature data.
  • the verification feature data are respectively input into at least one trained initial deep learning model, and at least one verification result is obtained.
  • operation S308 based on at least one verification result and evaluation condition, determine the target network layer type, target network layer structure, and target feature configuration data from the network layer type set, network layer structure set, and feature configuration data set, respectively.
  • a set of network layer types, a set of network layer structures, and a set of feature configuration data corresponding to the multiple initial deep learning models are obtained.
  • the set of network layer types includes initial network layer types for multiple trained initial deep learning models.
  • the set of network layer structures includes initial network layer structures for a plurality of trained initial deep learning models.
  • the feature configuration data set includes initial feature configuration data for a plurality of trained initial deep learning models, and the initial feature configuration data in the feature configuration data set is, for example, at least part of multiple candidate feature configuration data.
  • each initial deep learning model for the candidate feature configuration data corresponding to the initial deep learning model, process the verification sample based on the candidate feature configuration data to obtain verification feature data, and use the verification feature data to train the initial depth
  • the learning model is validated.
  • a plurality of verification results corresponding to a plurality of initial deep learning models can be obtained.
  • the verification result includes, for example, the recall rate or precision rate of the initial deep learning model on the verification sample
  • the evaluation condition includes, for example, conditions for the recall rate and precision rate, for example, the recall rate of the verification result is evaluated by the evaluation condition Or whether the accuracy rate reaches a certain threshold.
  • the evaluation condition is related to, for example, an AUC (Area Under Curve) curve
  • the verification result can be evaluated based on the AUC curve
  • the AUC curve is an evaluation index.
  • the target network layer type, target network layer structure, and target feature configuration data are respectively determined from the network layer type set, network layer structure set, and feature configuration data set.
  • the verification results are evaluated by evaluation conditions, so as to determine the target network layer type, target network layer structure, target
  • the feature configuration data improves the determination accuracy of the target network layer type, the target network layer structure, and the target feature configuration data.
  • the model After obtaining the target network layer type, target network layer structure, and target feature configuration data, the model can be retrained based on the target network layer type, target network layer structure, and target feature configuration data to obtain the target deep learning model, see the following operation S309 ⁇ operation S311.
  • a target deep learning model to be trained is obtained based on the type of the target network layer and the structure of the target network layer.
  • a target deep learning model is constructed.
  • the second training sample is processed based on the target feature configuration data to obtain second training feature data.
  • the target feature configuration data represents how to process the second training samples used to train the target deep learning model, so as to obtain the second training feature data suitable for training the target deep learning model.
  • operation S311 use the second training feature data to train the target deep learning model to be trained to obtain the target deep learning model.
  • the second training sample can be processed based on the target feature configuration data to obtain the second training feature data, and the target deep learning model is trained through the second training feature data.
  • the training process of the target deep learning model It is similar to the training process of the initial deep learning model and will not be repeated here.
  • the process of training multiple initial deep learning models can be regarded as an experimental process of searching network layer types, network layer structures, and feature configuration data.
  • the initial deep learning model whose verification result satisfies the evaluation condition can be directly used as the final target deep learning model.
  • the target network layer type, target network layer structure, and target feature configuration data may come from different initial deep learning models.
  • the initial deep learning model can be saved instead of Optimal target network layer type, target network layer structure, and target feature configuration data. Then, rebuild and train the target deep learning model based on the target network layer type, target network layer structure, and target feature configuration data.
  • the target network layer type, target network layer structure, and target feature configuration data are obtained by training the initial deep learning model, and then the target deep learning model is obtained by retraining based on the target network layer type, target network layer structure, and target feature configuration data , which not only improves the accuracy of the target deep learning model, but also reduces the consumption rate of data storage space.
  • Fig. 4 schematically shows a schematic diagram of a method for training a deep learning model according to an embodiment of the present disclosure.
  • the configuration file 410 includes, for example, model type data 411 , multiple candidate feature configuration data 412 , and evaluation conditions 413 .
  • the multiple candidate network layer types 420 include, for example, candidate network layer types A1-A4, and the multiple candidate hyperparameters 430, for example, include candidate hyperparameters B1-B4.
  • the candidate network layer type A1 and the candidate hyperparameter B1 select the candidate network layer type A1 and the candidate hyperparameter B1 as the initial network layer type and initial network layer structure for the initial deep learning model 431, for example, the candidate network layer type A1 includes In the fully connected layer and the pooling layer, the candidate hyperparameter B1 (target hyperparameter) is M nodes in the fully connected layer and N nodes in the pooling layer, and both M and N are integers greater than 0.
  • the candidate network layer type A2 and the candidate hyperparameter B2 are selected as the initial network layer type and initial network layer structure for the initial deep learning model 432, respectively.
  • the candidate network layer type A3 and the candidate hyperparameter B3 are selected as the initial network layer type and initial network layer structure for the initial deep learning model 433, respectively.
  • an initial deep learning model 431 based on candidate network layer type A1 and candidate hyperparameter B1
  • construct an initial deep learning model 432 based on candidate network layer type A2 and candidate hyperparameter B2
  • construct an initial deep learning model 432 based on candidate network layer type A3 and candidate hyperparameter B3 Initial Deep Learning Model 433 .
  • the initial deep learning models 431 , 432 , 433 need to be trained based on the first training samples 440 .
  • the initial feature configuration data for the initial deep learning model is selected from a plurality of candidate feature configuration data 412 .
  • the candidate feature configuration data C1 is selected as the initial feature configuration data for the initial deep learning model 431
  • the candidate feature configuration data C2 is selected as the initial feature configuration data for the initial deep learning model 432
  • the candidate feature configuration data C3 is selected as the initial feature configuration data for the initial deep learning model 431.
  • Initial feature configuration data for the learning model 433 is selected from a plurality of candidate feature configuration data 412 .
  • the candidate feature configuration data C1 is selected as the initial feature configuration data for the initial deep learning model 431
  • the candidate feature configuration data C2 is selected as the initial feature configuration data for the initial deep learning model 432
  • the candidate feature configuration data C3 is selected as the initial feature configuration data for the initial deep learning model 431.
  • Initial feature configuration data for the learning model 433 is selected from a plurality of candidate feature configuration data 412 .
  • the first training sample needs to be processed 440 based on the corresponding initial feature configuration data.
  • the first feature type and the first feature dimension are determined based on the initial feature configuration data (C1), for example, the initial feature configuration data (C1) defines the first feature type and the first feature dimension, the first
  • the feature types include, for example, features such as age, gender, and content category.
  • the first feature dimension is, for example, the dimension of a feature vector, and the dimension of a feature vector is, for example, 1*128 dimensions.
  • a first sub-sample is extracted from the first training sample 440 based on the first feature type, for example, the first sub-sample is for content including age, gender, content category and other features.
  • the first sub-sample is processed based on the first feature dimension to obtain the first training feature data 441.
  • the first training feature data 441 is, for example, a feature vector, and the dimension of the feature vector is, for example, 1*128 dimensions.
  • the process of obtaining the first training feature data 442 and the first training feature data 443 is similar to the process of obtaining the first training feature data 441 , and will not be repeated here.
  • the network layer type set 451 includes, for example, initial network layer types A1, A2, and A3
  • the network layer structure set 452 includes, for example, initial network layer structures B1, B2, and B3
  • the feature configuration data set 453 includes, for example, initial feature configuration data C1, C2, and C3. .
  • the process is similar to the above content, and will not be repeated here.
  • the target deep learning model 480 based on the target network layer type 471 (A1) and the target network layer structure 462 (B2). After the target deep learning model 480 is constructed, the target deep learning model 480 needs to be trained based on the second training sample 490 .
  • the second training sample 490 is processed based on the target feature configuration data 473 ( C3 ) to obtain the second training feature data 491 .
  • the second feature type and the second feature dimension are determined based on the target feature configuration data 473 (C3).
  • the target feature configuration data 473 (C3) defines, for example, the second feature type and the second feature dimension.
  • the second feature type includes, for example, age , gender and other features
  • the second feature dimension is, for example, the dimension of the feature vector
  • the dimension of the feature vector is, for example, 1*256 dimensions.
  • a second sub-sample is extracted from the second training sample 490 based on the second feature type, for example, the second sub-sample is for content including age, gender and other features.
  • the second sub-sample is processed based on the second feature dimension to obtain the second training feature data 491.
  • the second training feature data 491 is, for example, a feature vector, and the dimension of the feature vector is, for example, 1*256 dimensions.
  • the model can be trained based on the paddle training framework PaddlePaddle and open source distributed Ray. For example, using PaddlePaddle to implement model building and model training, using Ray to seamlessly switch between local training and cluster training, Ray can automatically schedule available resources for parallel training, improving resource utilization and parallel training levels.
  • a configuration file includes two files, a feature configuration file and a training configuration file.
  • the feature configuration file includes, for example, candidate feature configuration data
  • the feature configuration file may also include a processing method of the feature, and the processing method includes, for example, normalization, hash operation, and the like.
  • the training configuration file includes data other than features, for example including model type data, evaluation conditions, and the like.
  • the training samples, verification samples, candidate feature configuration data, model structure, hyperparameters, and training resource configuration used in the training process can all be called through the configuration file, without modifying the framework code, and the experimental training can be started with one click, reducing the Technical threshold and training difficulty.
  • the target deep learning model is retrained based on the search results and the second training samples.
  • the model type data in the configuration file defines how to select the initial model type and network layer structure (search direction), and the candidate feature configuration data, for example, defines the feature type search and feature dimension search.
  • search directions hyperparameter search, feature type search, feature dimension search and model structure search.
  • Feature types include features or combined features that need to be extracted from sample data during model training.
  • Features include, for example, gender and age, and combined features include, for example, a combination of gender and age.
  • the hyperparameter search includes, for example, a search space, a search algorithm, and a scheduler algorithm (scheduling algorithm).
  • the search space includes algorithms such as random search (random search), grid search (grid search), and uniform distribution extraction.
  • the search space represents which candidate hyperparameters are available for search.
  • Search algorithms include grid search (grid search) algorithm, Bayesian optimization algorithm, OPTUNA optimization and other algorithms.
  • OPTUNA is a framework for automatic hyperparameter optimization.
  • the search algorithm is used to determine the optimal hyperparameter based on the training results of candidate hyperparameters. parameter.
  • the scheduler algorithm (scheduling algorithm) includes the first-in-first-out FIFO algorithm, ASHA algorithm, etc.
  • the ASHA algorithm is a parameter tuning algorithm.
  • the scheduler algorithm represents how to schedule computing resources to perform parallel training based on candidate hyperparameters.
  • Combined features can be searched through AutoCross (automatic crossover) algorithm, AutoFis (automatic adjustment) and other models.
  • the AutoCross model is responsible for screening useful explicit crossover features, such as screening features that improve the training effect of the model.
  • the AutoFis model is responsible for filtering the useless second-order cross features (implicit cross features) in the FM (Factorization Machine) model and the DeepFM model.
  • the explicit intersection feature is, for example, the combination or concatenation of multiple features
  • the implicit intersection feature is, for example, the point product of multiple features.
  • AutoDim algorithm For the feature dimension, AutoDim algorithm and AutoDis algorithm can be used to search.
  • AutoDim algorithm is an algorithm for automatic dimension optimization
  • AutoDis algorithm is an automatic discretization algorithm for numerical features.
  • the AutoDim algorithm searches out different dimension sizes from different feature dimensions, that is, searches for suitable dimensions for discrete features.
  • the AutoDis algorithm supports continuous feature embedding (discretization of continuous features), and searches for the most suitable dimension size for different continuous features during the training process.
  • Model structure search can learn the weight corresponding to the child architecture (network layer) through the NAS model (a compression model), so as to obtain an optimal model structure. For example, by learning the weights corresponding to multiple candidate network layers, the candidate network layer with a larger weight is used as the final network layer.
  • the VisualDL tool is a visual analysis tool in PaddlePaddle, a flying paddle training framework. It uses rich charts to show the influence of different hyperparameters in the experiment on the experimental results, and can more intuitively understand the search space and search algorithm for the recommendation model. Impact.
  • the training process of the model supports batch offline training search and incremental training search.
  • batch offline search training or incremental search training can be selected through configuration.
  • For batch offline search compare the experimental results on the same data set to select the optimal search result.
  • For incremental search training if the experimental effect of incremental search is better than the original experiment, replace it, otherwise keep the original model structure and hyperparameters and continue training.
  • the training process can be carried out in a parallel manner. For example, some computing resources are trained based on a part of hyperparameters, model structure, and training samples, and some computing resources are trained based on another part of hyperparameters, model structure, and training samples, thereby improving training efficiency.
  • Fig. 5 schematically shows a flowchart of a content recommendation method according to an embodiment of the present disclosure.
  • the content recommendation method 500 of the embodiment of the present disclosure may include, for example, operation S510 to operation S540.
  • object feature data for the target object is determined.
  • content feature data for the target content is determined for the target content in the at least one candidate content.
  • the object feature data and the content feature data are input into the target deep learning model to obtain an output result.
  • the above-mentioned initial deep learning model or target deep learning model is suitable for content recommendation scenarios, including but not limited to articles, commodities, and news.
  • the target object is an object that browses content
  • the object feature data includes, for example, the target object's age, gender, historical browsing records, browsed content category, and so on. Any one of multiple candidate contents is taken as the target content, and the content feature data of the target content is determined.
  • the content feature data includes, but not limited to, content category, topic information, and keyword information.
  • the object feature data and content feature data are input into the target deep learning model to obtain an output result, and the output result represents the degree of interest of the target object in the target content.
  • the object feature data and content feature data may also be input into the initial deep learning model to obtain an output result.
  • the initial deep learning model or the target deep learning model can automatically learn the association between object feature data and content feature data. If the output result satisfies the preset condition, it means that the target object is more interested in the target content, and at this time the target content can be recommended to the target object. If the output result does not meet the preset condition, it means that the target object is less interested in the target content, and at this time the target content may not be recommended to the target object.
  • the content recommendation is performed through the initial deep learning model or the target deep learning model, which improves the accuracy and efficiency of content recommendation, and the recommended content meets the needs of the target object and improves the user experience of the target object.
  • Fig. 6 schematically shows a block diagram of a training device for a deep learning model according to an embodiment of the present disclosure.
  • the training device 600 of the deep learning model of the embodiment of the present disclosure includes, for example, an acquisition module 610, a selection module 620, a first acquisition module 630, a first processing module 640, a first training module 650 and a second acquisition module 660.
  • the obtaining module 610 may be used to obtain a configuration file, wherein the configuration file includes model type data and candidate feature configuration data. According to an embodiment of the present disclosure, the acquiring module 610 may, for example, perform the operation S210 described above with reference to FIG. 2 , which will not be repeated here.
  • the selection module 620 may be configured to select an initial network layer type and an initial network layer structure based on the model type data. According to an embodiment of the present disclosure, the selection module 620 may, for example, perform the operation S220 described above with reference to FIG. 2 , which will not be repeated here.
  • the first obtaining module 630 can be used to obtain an initial deep learning model based on the initial network layer type and the initial network layer structure. According to an embodiment of the present disclosure, the first obtaining module 630 may, for example, perform operation S230 described above with reference to FIG. 2 , which will not be repeated here.
  • the first processing module 640 may be configured to process the first training sample based on the candidate feature configuration data to obtain first training feature data. According to an embodiment of the present disclosure, the first processing module 640 may, for example, execute the operation S240 described above with reference to FIG. 2 , which will not be repeated here.
  • the first training module 650 can be used to train an initial deep learning model using the first training feature data. According to an embodiment of the present disclosure, the first training module 650 may, for example, execute the operation S250 described above with reference to FIG. 2 , which will not be repeated here.
  • the second obtaining module 660 can be used to obtain a target deep learning model based on the trained initial deep learning model. According to an embodiment of the present disclosure, the second obtaining module 660 may, for example, perform the operation S260 described above with reference to FIG. 2 , which will not be repeated here.
  • the trained initial deep learning model includes at least one trained initial deep learning model; the configuration file also includes evaluation conditions; the second obtaining module includes: a first processing sub-module, an input sub-module, a first Identify submodules and get submodules.
  • the first processing submodule is used to process verification samples based on candidate feature configuration data to obtain verification feature data; the input submodule is used to input verification feature data into at least one trained initial deep learning model to obtain at least one verification result ;
  • the first determining submodule is used to determine the target network layer type, target network layer structure, and target feature configuration data from the network layer type set, network layer structure set, and feature configuration data set based on at least one verification result and evaluation condition ;
  • the obtaining sub-module is used to obtain the target deep learning model based on the target network layer type, target network layer structure, and target feature configuration data.
  • the network layer type set includes an initial network layer type for at least one trained initial deep learning model; the network layer structure set includes an initial network layer structure for at least one trained initial deep learning model; features
  • the configuration data set includes initial feature configuration data for at least one trained initial deep learning model, and the initial feature configuration data in the feature configuration data set is at least part of the candidate feature configuration data.
  • the obtaining submodule includes: an obtaining unit, a processing unit and a training unit.
  • the obtaining unit is used to obtain the target deep learning model to be trained based on the target network layer type and the target network layer structure;
  • the processing unit is used to process the second training sample based on the target feature configuration data to obtain the second training feature data;
  • the training unit It is used for using the second training feature data to train the target deep learning model to be trained to obtain the target deep learning model.
  • the candidate feature configuration data includes at least one candidate feature configuration data
  • the first processing module 640 includes: a first selection submodule, a second determination submodule, an extraction submodule and a second processing submodule.
  • the first selection submodule is used to select the initial feature configuration data for the initial deep learning model from at least one candidate configuration data
  • the second determination submodule is used to determine the first feature type and the first feature based on the initial feature configuration data dimension
  • the extraction sub-module is used to extract the first sub-sample from the first training sample based on the first feature type
  • the second processing sub-module is used to process the first sub-sample based on the first feature dimension to obtain the first training feature data.
  • the processing unit includes: a determination subunit, an extraction subunit, and a processing subunit.
  • the determination subunit is used to determine the second feature type and the second feature dimension based on the target feature configuration data;
  • the extraction subunit is used to extract the second subsample from the second training sample based on the second feature type;
  • the processing subunit for processing the second sub-sample based on the second feature dimension to obtain the second training feature data.
  • the selection module 620 includes: a second selection submodule and a third selection submodule.
  • the second selection submodule is used to select the initial network layer type for the initial deep learning model from at least one candidate network layer type based on the model type data;
  • the third selection submodule is used to select the target from at least one candidate hyperparameter Hyperparameters, as the initial network layer structure for the initial deep learning model.
  • Fig. 7 schematically shows a block diagram of a content recommendation device according to an embodiment of the present disclosure.
  • the content recommendation device 700 of the embodiment of the present disclosure includes, for example, a first determination module 710 , a second determination module 720 , an input module 730 and a recommendation module 740 .
  • the first determination module 710 may be used to determine object feature data for the target object. According to an embodiment of the present disclosure, the first determining module 710 may, for example, perform the operation S510 described above with reference to FIG. 5 , which will not be repeated here.
  • the second determination module 720 may be configured to determine content feature data for the target content for the target content in at least one candidate content. According to an embodiment of the present disclosure, the second determining module 720 may, for example, perform the operation S520 described above with reference to FIG. 5 , which will not be repeated here.
  • the input module 730 can be used to input object feature data and content feature data into the target deep learning model to obtain an output result, wherein the target deep learning model is generated using the above-mentioned deep learning model training device, and the output result represents the target object's effect on the target. Level of interest in the content. According to an embodiment of the present disclosure, the input module 730 may, for example, perform the operation S530 described above with reference to FIG. 5 , which will not be repeated here.
  • the recommendation module 740 may be configured to recommend target content to the target object in response to the output result meeting the preset condition. According to an embodiment of the present disclosure, the recommendation module 740 may, for example, perform the operation S540 described above with reference to FIG. 5 , which will not be repeated here.
  • the user's authorization or consent is obtained.
  • the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
  • FIG. 8 is a block diagram of an electronic device for performing deep learning model training and/or content recommendation to implement an embodiment of the present disclosure.
  • FIG. 8 shows a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure.
  • Electronic device 800 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the device 800 includes a computing unit 801 that can execute according to a computer program stored in a read-only memory (ROM) 802 or loaded from a storage unit 808 into a random access memory (RAM) 803. Various appropriate actions and treatments. In the RAM 803, various programs and data necessary for the operation of the device 800 can also be stored.
  • the computing unit 801, ROM 802, and RAM 803 are connected to each other through a bus 804.
  • An input/output (I/O) interface 805 is also connected to the bus 804 .
  • the I/O interface 805 includes: an input unit 806, such as a keyboard, a mouse, etc.; an output unit 807, such as various types of displays, speakers, etc.; a storage unit 808, such as a magnetic disk, an optical disk, etc. ; and a communication unit 809, such as a network card, a modem, a wireless communication transceiver, and the like.
  • the communication unit 809 allows the device 800 to exchange information/data with other devices over a computer network such as the Internet and/or various telecommunication networks.
  • the computing unit 801 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing units 801 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc.
  • the calculation unit 801 executes various methods and processes described above, such as a deep learning model training method and/or a content recommendation method.
  • the deep learning model training method and/or the content recommendation method may be implemented as a computer software program, which is tangibly embodied in a machine-readable medium, such as the storage unit 808 .
  • part or all of the computer program may be loaded and/or installed on the device 800 via the ROM 802 and/or the communication unit 809.
  • the computer program When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the above-described deep learning model training method and/or content recommendation method can be executed.
  • the computing unit 801 may be configured in any other appropriate way (for example, by means of firmware) to execute a deep learning model training method and/or a content recommendation method.
  • Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips Implemented in a system of systems (SOC), complex programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC system of systems
  • CPLD complex programmable logic device
  • computer hardware firmware, software, and/or combinations thereof.
  • programmable processor can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
  • Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes can be provided to a processor or controller of a general-purpose computer, a special-purpose computer, or other programmable deep learning model training device and/or content recommendation device, so that the program code when executed by the processor or controller makes the flowchart and and/or the functions/operations specified in the block diagrams are implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or a trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
  • the systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.
  • a computer system may include clients and servers.
  • Clients and servers are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
  • the server can be a cloud server, a server of a distributed system, or a server combined with a blockchain.
  • steps may be reordered, added or deleted using the various forms of flow shown above.
  • each step described in the present disclosure may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.

Abstract

L'invention concerne un procédé et un appareil d'entraînement de modèle d'apprentissage profond, un procédé et un appareil de recommandation de contenu, un dispositif, un support et un produit, se rapportant aux domaines techniques de l'apprentissage profond et de la recommandation intelligente dans l'intelligence artificielle. Le procédé d'entraînement de modèle d'apprentissage profond comprend les étapes consistant à : acquérir un fichier de configuration, le fichier de configuration comprenant des données de type de modèle et des données de configuration de caractéristiques candidates (S210) ; sélectionner un type de couche de réseau initial et une structure de couche de réseau initiale sur la base des données de type de modèle (S220) ; obtenir un modèle d'apprentissage profond initial sur la base du type de couche de réseau initial et de la structure de couche de réseau initiale (S230) ; traiter un premier échantillon d'entraînement sur la base des données de configuration de caractéristiques candidates pour obtenir des premières données de caractéristiques d'entraînement (S240) ; entraîner le modèle d'apprentissage profond initial au moyen des premières données de caractéristiques d'entraînement (S250) ; et obtenir un modèle d'apprentissage profond cible sur la base du modèle d'apprentissage profond initial entraîné (S260).
PCT/CN2022/106805 2021-12-27 2022-07-20 Procédé et appareil d'entraînement de modèle d'apprentissage profond, et procédé et appareil de recommandation de contenu WO2023124029A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111618428.9 2021-12-27
CN202111618428.9A CN114329201B (zh) 2021-12-27 2021-12-27 深度学习模型的训练方法、内容推荐方法和装置

Publications (1)

Publication Number Publication Date
WO2023124029A1 true WO2023124029A1 (fr) 2023-07-06

Family

ID=81014934

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/106805 WO2023124029A1 (fr) 2021-12-27 2022-07-20 Procédé et appareil d'entraînement de modèle d'apprentissage profond, et procédé et appareil de recommandation de contenu

Country Status (2)

Country Link
CN (1) CN114329201B (fr)
WO (1) WO2023124029A1 (fr)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114329201B (zh) * 2021-12-27 2023-08-11 北京百度网讯科技有限公司 深度学习模型的训练方法、内容推荐方法和装置
CN114968412B (zh) * 2022-06-20 2024-02-02 中国平安财产保险股份有限公司 基于人工智能的配置文件生成方法、装置、设备及介质
CN115456168B (zh) * 2022-09-05 2023-08-25 北京百度网讯科技有限公司 强化学习模型的训练方法、能耗确定方法和装置
CN115660064B (zh) * 2022-11-10 2023-09-29 北京百度网讯科技有限公司 基于深度学习平台的模型训练方法、数据处理方法和装置
CN115906921B (zh) * 2022-11-30 2023-11-21 北京百度网讯科技有限公司 深度学习模型的训练方法、目标对象检测方法和装置
CN116151215B (zh) * 2022-12-28 2023-12-01 北京百度网讯科技有限公司 文本处理方法、深度学习模型训练方法、装置以及设备
CN117112640B (zh) * 2023-10-23 2024-02-27 腾讯科技(深圳)有限公司 一种内容排序方法以及相关设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325541A (zh) * 2018-09-30 2019-02-12 北京字节跳动网络技术有限公司 用于训练模型的方法和装置
CN113052328A (zh) * 2021-04-02 2021-06-29 上海商汤科技开发有限公司 深度学习模型生产系统、电子设备和存储介质
CN113469358A (zh) * 2021-07-05 2021-10-01 北京市商汤科技开发有限公司 一种神经网络训练方法、装置、计算机设备及存储介质
WO2021233342A1 (fr) * 2020-05-19 2021-11-25 华为技术有限公司 Procédé et système de construction de réseau de neurones artificiels
CN113723615A (zh) * 2020-12-31 2021-11-30 京东城市(北京)数字科技有限公司 基于超参优化的深度强化学习模型的训练方法、装置
CN113761348A (zh) * 2021-02-26 2021-12-07 北京沃东天骏信息技术有限公司 一种信息推荐方法、装置、电子设备和存储介质
CN114329201A (zh) * 2021-12-27 2022-04-12 北京百度网讯科技有限公司 深度学习模型的训练方法、内容推荐方法和装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108228794B (zh) * 2017-12-29 2020-03-31 三角兽(北京)科技有限公司 信息管理装置、信息处理装置及自动回复/附言方法
CN111552884A (zh) * 2020-05-13 2020-08-18 腾讯科技(深圳)有限公司 用于内容推荐的方法和设备
CN112492390A (zh) * 2020-11-20 2021-03-12 海信视像科技股份有限公司 一种显示设备及内容推荐方法
CN113469067B (zh) * 2021-07-05 2024-04-16 北京市商汤科技开发有限公司 一种文档解析方法、装置、计算机设备和存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325541A (zh) * 2018-09-30 2019-02-12 北京字节跳动网络技术有限公司 用于训练模型的方法和装置
WO2021233342A1 (fr) * 2020-05-19 2021-11-25 华为技术有限公司 Procédé et système de construction de réseau de neurones artificiels
CN113723615A (zh) * 2020-12-31 2021-11-30 京东城市(北京)数字科技有限公司 基于超参优化的深度强化学习模型的训练方法、装置
CN113761348A (zh) * 2021-02-26 2021-12-07 北京沃东天骏信息技术有限公司 一种信息推荐方法、装置、电子设备和存储介质
CN113052328A (zh) * 2021-04-02 2021-06-29 上海商汤科技开发有限公司 深度学习模型生产系统、电子设备和存储介质
CN113469358A (zh) * 2021-07-05 2021-10-01 北京市商汤科技开发有限公司 一种神经网络训练方法、装置、计算机设备及存储介质
CN114329201A (zh) * 2021-12-27 2022-04-12 北京百度网讯科技有限公司 深度学习模型的训练方法、内容推荐方法和装置

Also Published As

Publication number Publication date
CN114329201B (zh) 2023-08-11
CN114329201A (zh) 2022-04-12

Similar Documents

Publication Publication Date Title
WO2023124029A1 (fr) Procédé et appareil d'entraînement de modèle d'apprentissage profond, et procédé et appareil de recommandation de contenu
US11080340B2 (en) Systems and methods for classifying electronic information using advanced active learning techniques
US20180276553A1 (en) System for querying models
US20190362222A1 (en) Generating new machine learning models based on combinations of historical feature-extraction rules and historical machine-learning models
US20210374542A1 (en) Method and apparatus for updating parameter of multi-task model, and storage medium
WO2023109059A1 (fr) Procédé de détermination de paramètre de fusion, procédé de recommandation d'informations et procédé d'apprentissage de modèle
US10606910B2 (en) Ranking search results using machine learning based models
CN114861889B (zh) 深度学习模型的训练方法、目标对象检测方法和装置
EP4134900A2 (fr) Procédé et appareil de recommandation de contenu, procédé et appareil d'apprentissage de modèle de classement, dispositif et support de stockage
JP2023031322A (ja) 問答処理方法、問答モデルのトレーニング方法、装置、電子機器、記憶媒体及びコンピュータプログラム
US11645540B2 (en) Deep graph de-noise by differentiable ranking
CN111191825A (zh) 用户违约预测方法、装置及电子设备
US20220414474A1 (en) Search method, electronic device and storage medium based on neural network model
WO2023040220A1 (fr) Procédé et appareil de diffusion sélective de vidéos, dispositif électronique et support de stockage
EP3992814A2 (fr) Procede et dispositif de creation de profils d'interet d'utilisateur, dispositif electronique et support de stockage
CN110852078A (zh) 生成标题的方法和装置
CN114358024A (zh) 日志分析方法、装置、设备、介质和程序产品
CN113612777A (zh) 训练方法、流量分级方法、装置、电子设备以及存储介质
CN115700548A (zh) 用户行为预测的方法、设备和计算机程序产品
CN114066278B (zh) 物品召回的评估方法、装置、介质及程序产品
US10740403B1 (en) Systems and methods for identifying ordered sequence data
US20230004774A1 (en) Method and apparatus for generating node representation, electronic device and readable storage medium
US20230386237A1 (en) Classification method and apparatus, electronic device and storage medium
US20230147798A1 (en) Search method, computing device and storage medium
US20230145853A1 (en) Method of generating pre-training model, electronic device, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22913318

Country of ref document: EP

Kind code of ref document: A1