WO2023124029A1 - Deep learning model training method and apparatus, and content recommendation method and apparatus - Google Patents

Deep learning model training method and apparatus, and content recommendation method and apparatus Download PDF

Info

Publication number
WO2023124029A1
WO2023124029A1 PCT/CN2022/106805 CN2022106805W WO2023124029A1 WO 2023124029 A1 WO2023124029 A1 WO 2023124029A1 CN 2022106805 W CN2022106805 W CN 2022106805W WO 2023124029 A1 WO2023124029 A1 WO 2023124029A1
Authority
WO
WIPO (PCT)
Prior art keywords
deep learning
feature
target
learning model
network layer
Prior art date
Application number
PCT/CN2022/106805
Other languages
French (fr)
Chinese (zh)
Inventor
陈意超
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Publication of WO2023124029A1 publication Critical patent/WO2023124029A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to the technical field of artificial intelligence, in particular to deep learning, intelligent recommendation and other technical fields, and more specifically, to a deep learning model training method, content recommendation method, device, electronic equipment, media, and program product.
  • the deep learning model can be used to recommend relevant content.
  • a lot of labor and time costs need to be invested, and there is a high technical threshold, which leads to the failure of the deep learning model.
  • the training efficiency is low.
  • the present disclosure provides a training method of a deep learning model, a content recommendation method, a device, an electronic device, a storage medium, and a program product.
  • a method for training a deep learning model including: obtaining a configuration file, wherein the configuration file includes model type data and candidate feature configuration data; based on the model type data, selecting an initial network Layer type and initial network layer structure; Based on the initial network layer type and the initial network layer structure, an initial deep learning model is obtained; based on the candidate feature configuration data, the first training sample is processed to obtain the first training feature data; The first training feature data trains the initial deep learning model; based on the trained initial deep learning model, a target deep learning model is obtained.
  • a method for recommending content including: determining object feature data for a target object; for target content in at least one candidate content, determining content feature data for the target content; The object feature data and the content feature data are input into the target deep learning model to obtain an output result, wherein the target deep learning model is generated by the method according to the present disclosure, and the output result represents the impact of the target object on the target The degree of interest in the content; in response to the output result meeting a preset condition, recommending the target content to the target object.
  • a training device for a deep learning model including: an acquisition module, a selection module, a first acquisition module, a first processing module, a first training module, and a second acquisition module.
  • An acquisition module configured to acquire a configuration file, wherein the configuration file includes model type data and candidate feature configuration data; a selection module, configured to select an initial network layer type and an initial network layer structure based on the model type data; the first The obtaining module is used to obtain an initial deep learning model based on the initial network layer type and the initial network layer structure; the first processing module is used to process the first training sample based on the candidate feature configuration data to obtain the first training Feature data; a first training module, configured to use the first training feature data to train the initial deep learning model; a second obtaining module, configured to obtain a target deep learning model based on the trained initial deep learning model.
  • a content recommendation device including: a first determination module, a second determination module, an input module and a recommendation module.
  • the first determination module is used to determine the object feature data for the target object;
  • the second determination module is used to determine the content feature data for the target content for the target content in at least one candidate content;
  • the input module is used for The object feature data and the content feature data are input into the target deep learning model to obtain an output result, wherein the target deep learning model is generated by the device according to the present disclosure, and the output result represents the target object's effect on the target object.
  • the degree of interest in the target content a recommendation module, configured to recommend the target content to the target object in response to the output result meeting a preset condition.
  • an electronic device including: at least one processor and a memory communicatively connected to the at least one processor.
  • the memory stores instructions that can be executed by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the above-mentioned deep learning model training method and /or content recommendation method.
  • a non-transitory computer-readable storage medium storing computer instructions, the computer instructions are used to enable the computer to execute the above-mentioned deep learning model training method and/or content recommendation method .
  • a computer program product including a computer program.
  • the computer program When the computer program is executed by a processor, the above-mentioned deep learning model training method and/or content recommendation method are implemented.
  • FIG. 1 schematically shows a system architecture for training a deep learning model and recommending content according to an embodiment of the present disclosure
  • Fig. 2 schematically shows a flow chart of a method for training a deep learning model according to an embodiment of the present disclosure
  • FIG. 3 schematically shows a flow chart of a method for training a deep learning model according to another embodiment of the present disclosure
  • Fig. 4 schematically shows a schematic diagram of a training method of a deep learning model according to an embodiment of the present disclosure
  • Fig. 5 schematically shows a schematic diagram of a content recommendation method according to an embodiment of the present disclosure
  • Fig. 6 schematically shows a block diagram of a training device for a deep learning model according to an embodiment of the present disclosure
  • Fig. 7 schematically shows a block diagram of a content recommendation device according to an embodiment of the present disclosure.
  • FIG. 8 is a block diagram of an electronic device for performing deep learning model training and/or content recommendation to implement an embodiment of the present disclosure.
  • FIG. 1 schematically shows a system architecture of deep learning model training and content recommendation according to an embodiment of the present disclosure. It should be noted that, what is shown in FIG. 1 is only an example of the system architecture to which the embodiments of the present disclosure can be applied, so as to help those skilled in the art understand the technical content of the present disclosure, but it does not mean that the embodiments of the present disclosure cannot be used in other device, system, environment or scenario.
  • a system architecture 100 may include clients 101 , 102 , 103 , a network 104 and a server 105 .
  • the network 104 is used as a medium for providing communication links between the clients 101 , 102 , 103 and the server 105 .
  • Network 104 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
  • Clients 101, 102, 103 Users may use clients 101, 102, 103 to interact with server 105 over network 104 to receive or send messages, and the like.
  • Clients 101, 102, and 103 can be installed with various communication client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, email clients, social platform software, etc. (just for example).
  • Clients 101, 102, 103 may be various electronic devices with display screens and supporting web browsing, including but not limited to smartphones, tablet computers, laptop computers, desktop computers, and the like.
  • the clients 101, 102, and 103 in the embodiments of the present disclosure may, for example, run applications.
  • the server 105 may be a server that provides various services, such as a background management server that provides support for websites browsed by users using the clients 101 , 102 , 103 (just an example).
  • the background management server can analyze and process received user requests and other data, and feed back processing results (such as webpages, information, or data, etc. obtained or generated according to user requests) to the client.
  • the server 105 may also be a cloud server, that is, the server 105 has a cloud computing function.
  • the deep learning model training method and/or the content recommendation method provided by the embodiment of the present disclosure may be executed by the server 105 .
  • the deep learning model training device and/or the content recommendation device provided by the embodiments of the present disclosure may be set in the server 105 .
  • the deep learning model training method and/or content recommendation method provided by the embodiments of the present disclosure may also be executed by a server or server cluster that is different from the server 105 and can communicate with the clients 101 , 102 , 103 and/or the server 105 .
  • the deep learning model training device and/or content recommendation device may also be set on a server or a server that is different from the server 105 and can communicate with the clients 101, 102, 103 and/or the server 105 in the cluster.
  • the server 105 can receive training samples from the clients 101, 102, 103 through the network 104, and use the training samples to train the deep learning model, and then the server 105 can send the trained deep learning model to the client through the network 104 101, 102, 103, the client can use the trained deep learning model to recommend content.
  • the server 105 may also directly use the deep learning model to recommend content.
  • a method for training a deep learning model and a method for recommending content according to an exemplary embodiment of the present disclosure will be described below with reference to FIGS. 2 to 5 in conjunction with the system architecture of FIG. 1 .
  • the deep learning model training method and the content recommendation method of the embodiments of the present disclosure can be executed by the server shown in FIG. 1 , for example, the server shown in FIG. 1 is the same as or similar to the electronic device below.
  • Fig. 2 schematically shows a flowchart of a method for training a deep learning model according to an embodiment of the present disclosure.
  • the deep learning model training method 200 of the embodiment of the present disclosure may include, for example, operation S210 to operation S260.
  • a configuration file is obtained, and the configuration file includes model type data and candidate feature configuration data.
  • an initial network layer type and an initial network layer structure are selected.
  • an initial deep learning model is obtained based on the initial network layer type and the initial network layer structure.
  • the first training sample is processed based on the candidate feature configuration data to obtain first training feature data.
  • an initial deep learning model is trained using the first training feature data.
  • a target deep learning model is obtained based on the trained initial deep learning model.
  • the configuration file includes model type data, and the model type data, for example, characterizes the model type of the initial deep learning model, and the model type includes, for example, a deep neural network (Deep Neural Networks, DNN) type.
  • DNN Deep Neural Networks
  • the initial network layer type includes, for example, attention layer, fully connected layer, pooling layer and other types of layers, and the initial network layer type may also represent the connection relationship between each layer.
  • the initial network layer structure for example, characterizes the number of nodes in each layer.
  • the initial network layer type may include multiple optional initial network layer types
  • the initial network layer structure may include multiple optional initial network layer structures
  • the data based on the model type may be selected from multiple initial network layer types and multiple Select the initial network layer type and initial network layer structure required by the DNN model in each initial network layer structure.
  • different initial network layer types and different initial network layer structures can be selected in turn to build the initial deep learning model. For each Build the initial deep learning model for training.
  • the candidate feature configuration data represents a processing method for the first training sample.
  • the candidate feature configuration data represents a feature type and a feature dimension for extracting feature data from the first training sample. Processing the first training sample based on the candidate feature configuration data can obtain the first training feature data suitable for training the initial deep learning model.
  • the candidate feature configuration data includes feature types and feature dimensions for the first training sample.
  • the feature type includes, for example, age, gender, content category and other features
  • the feature dimension is, for example, the dimension of a feature vector
  • the dimension of a feature vector includes, for example, 1*128 dimension, 1*256 dimension, and so on.
  • the first training sample used to train the model For example, when training the first initial deep learning model, for the first training sample used to train the model, select age and gender features from features such as age, gender, content category, etc., from feature dimensions 1*128 dimensions, 1 Select 1*128 dimension among *256 dimensions, and then process the first training sample to obtain the first training feature data of 1*128 dimension for age and gender.
  • the features of gender and content category from features such as age, gender, content category, etc., from the feature dimension 1*128 dimensions, Select 1*256 dimension among 1*256 dimensions, and then process the first training sample to obtain 1*256-dimensional first training feature data for gender and content category.
  • the first training sample can be processed based on the candidate feature configuration data to obtain the first training feature data, and the initial deep learning model can be trained through the first training feature data.
  • the candidate feature configuration data may also include multiple candidate feature configuration data.
  • different candidate feature configuration data may be sequentially selected to process the first training samples used to train the corresponding initial deep learning models.
  • a target deep learning model can be obtained based on the initial deep learning model.
  • the initial deep learning model can be directly used as the target deep learning model, or model construction and model training can be re-performed based on the initial deep learning model to obtain a deep learning model.
  • the model type data and the candidate feature configuration data are defined through the configuration file.
  • the corresponding initial network layer type and initial network layer structure can be selected based on the configuration file to construct the corresponding initial deep learning model, and based on the candidate feature configuration data to process the first training sample to obtain the corresponding first training feature data, so as to train the initial deep learning model based on the first training feature data, and then obtain the target deep learning model based on the initial deep learning model.
  • the initial neural network is constructed based on the configuration file and the first training sample is processed so as to automatically and quickly train multiple initial deep learning models, which improves the efficiency of model training and reduces the cost of model training. No modification is required through the configuration file Code, lowering the technical threshold of model training.
  • Fig. 3 schematically shows a flowchart of a method for training a deep learning model according to another embodiment of the present disclosure.
  • the deep learning model training method 300 of the embodiment of the present disclosure may include, for example, operation S301 to operation S311 .
  • a configuration file is acquired, and the configuration file includes model type data and candidate feature configuration data.
  • an initial network layer type and an initial network layer structure are selected.
  • an initial deep learning model is obtained based on the initial network layer type and the initial network layer structure.
  • the first training sample is processed based on the candidate feature configuration data to obtain first training feature data.
  • an initial deep learning model is trained using the first training feature data.
  • operation S301 to operation S305 are the same as or similar to the operations of the above-mentioned embodiment, and will not be repeated here.
  • a target deep learning model is obtained based on the trained initial deep learning model, see operations S306 to S311.
  • the initial deep learning model includes at least one trained initial deep learning model, the initial network layer type, initial network layer structure or candidate feature corresponding to each trained deep learning model in at least one trained initial deep learning model Configuration data can be different.
  • the configuration file also includes evaluation conditions, which are used to evaluate the training effect of the initial deep learning model. The following operations S306 to S308 describe obtaining the target network layer type, target network layer structure, and target feature configuration data with better training effect by evaluating the initial deep learning model.
  • the verification sample is processed based on the candidate feature configuration data to obtain verification feature data.
  • the verification feature data are respectively input into at least one trained initial deep learning model, and at least one verification result is obtained.
  • operation S308 based on at least one verification result and evaluation condition, determine the target network layer type, target network layer structure, and target feature configuration data from the network layer type set, network layer structure set, and feature configuration data set, respectively.
  • a set of network layer types, a set of network layer structures, and a set of feature configuration data corresponding to the multiple initial deep learning models are obtained.
  • the set of network layer types includes initial network layer types for multiple trained initial deep learning models.
  • the set of network layer structures includes initial network layer structures for a plurality of trained initial deep learning models.
  • the feature configuration data set includes initial feature configuration data for a plurality of trained initial deep learning models, and the initial feature configuration data in the feature configuration data set is, for example, at least part of multiple candidate feature configuration data.
  • each initial deep learning model for the candidate feature configuration data corresponding to the initial deep learning model, process the verification sample based on the candidate feature configuration data to obtain verification feature data, and use the verification feature data to train the initial depth
  • the learning model is validated.
  • a plurality of verification results corresponding to a plurality of initial deep learning models can be obtained.
  • the verification result includes, for example, the recall rate or precision rate of the initial deep learning model on the verification sample
  • the evaluation condition includes, for example, conditions for the recall rate and precision rate, for example, the recall rate of the verification result is evaluated by the evaluation condition Or whether the accuracy rate reaches a certain threshold.
  • the evaluation condition is related to, for example, an AUC (Area Under Curve) curve
  • the verification result can be evaluated based on the AUC curve
  • the AUC curve is an evaluation index.
  • the target network layer type, target network layer structure, and target feature configuration data are respectively determined from the network layer type set, network layer structure set, and feature configuration data set.
  • the verification results are evaluated by evaluation conditions, so as to determine the target network layer type, target network layer structure, target
  • the feature configuration data improves the determination accuracy of the target network layer type, the target network layer structure, and the target feature configuration data.
  • the model After obtaining the target network layer type, target network layer structure, and target feature configuration data, the model can be retrained based on the target network layer type, target network layer structure, and target feature configuration data to obtain the target deep learning model, see the following operation S309 ⁇ operation S311.
  • a target deep learning model to be trained is obtained based on the type of the target network layer and the structure of the target network layer.
  • a target deep learning model is constructed.
  • the second training sample is processed based on the target feature configuration data to obtain second training feature data.
  • the target feature configuration data represents how to process the second training samples used to train the target deep learning model, so as to obtain the second training feature data suitable for training the target deep learning model.
  • operation S311 use the second training feature data to train the target deep learning model to be trained to obtain the target deep learning model.
  • the second training sample can be processed based on the target feature configuration data to obtain the second training feature data, and the target deep learning model is trained through the second training feature data.
  • the training process of the target deep learning model It is similar to the training process of the initial deep learning model and will not be repeated here.
  • the process of training multiple initial deep learning models can be regarded as an experimental process of searching network layer types, network layer structures, and feature configuration data.
  • the initial deep learning model whose verification result satisfies the evaluation condition can be directly used as the final target deep learning model.
  • the target network layer type, target network layer structure, and target feature configuration data may come from different initial deep learning models.
  • the initial deep learning model can be saved instead of Optimal target network layer type, target network layer structure, and target feature configuration data. Then, rebuild and train the target deep learning model based on the target network layer type, target network layer structure, and target feature configuration data.
  • the target network layer type, target network layer structure, and target feature configuration data are obtained by training the initial deep learning model, and then the target deep learning model is obtained by retraining based on the target network layer type, target network layer structure, and target feature configuration data , which not only improves the accuracy of the target deep learning model, but also reduces the consumption rate of data storage space.
  • Fig. 4 schematically shows a schematic diagram of a method for training a deep learning model according to an embodiment of the present disclosure.
  • the configuration file 410 includes, for example, model type data 411 , multiple candidate feature configuration data 412 , and evaluation conditions 413 .
  • the multiple candidate network layer types 420 include, for example, candidate network layer types A1-A4, and the multiple candidate hyperparameters 430, for example, include candidate hyperparameters B1-B4.
  • the candidate network layer type A1 and the candidate hyperparameter B1 select the candidate network layer type A1 and the candidate hyperparameter B1 as the initial network layer type and initial network layer structure for the initial deep learning model 431, for example, the candidate network layer type A1 includes In the fully connected layer and the pooling layer, the candidate hyperparameter B1 (target hyperparameter) is M nodes in the fully connected layer and N nodes in the pooling layer, and both M and N are integers greater than 0.
  • the candidate network layer type A2 and the candidate hyperparameter B2 are selected as the initial network layer type and initial network layer structure for the initial deep learning model 432, respectively.
  • the candidate network layer type A3 and the candidate hyperparameter B3 are selected as the initial network layer type and initial network layer structure for the initial deep learning model 433, respectively.
  • an initial deep learning model 431 based on candidate network layer type A1 and candidate hyperparameter B1
  • construct an initial deep learning model 432 based on candidate network layer type A2 and candidate hyperparameter B2
  • construct an initial deep learning model 432 based on candidate network layer type A3 and candidate hyperparameter B3 Initial Deep Learning Model 433 .
  • the initial deep learning models 431 , 432 , 433 need to be trained based on the first training samples 440 .
  • the initial feature configuration data for the initial deep learning model is selected from a plurality of candidate feature configuration data 412 .
  • the candidate feature configuration data C1 is selected as the initial feature configuration data for the initial deep learning model 431
  • the candidate feature configuration data C2 is selected as the initial feature configuration data for the initial deep learning model 432
  • the candidate feature configuration data C3 is selected as the initial feature configuration data for the initial deep learning model 431.
  • Initial feature configuration data for the learning model 433 is selected from a plurality of candidate feature configuration data 412 .
  • the candidate feature configuration data C1 is selected as the initial feature configuration data for the initial deep learning model 431
  • the candidate feature configuration data C2 is selected as the initial feature configuration data for the initial deep learning model 432
  • the candidate feature configuration data C3 is selected as the initial feature configuration data for the initial deep learning model 431.
  • Initial feature configuration data for the learning model 433 is selected from a plurality of candidate feature configuration data 412 .
  • the first training sample needs to be processed 440 based on the corresponding initial feature configuration data.
  • the first feature type and the first feature dimension are determined based on the initial feature configuration data (C1), for example, the initial feature configuration data (C1) defines the first feature type and the first feature dimension, the first
  • the feature types include, for example, features such as age, gender, and content category.
  • the first feature dimension is, for example, the dimension of a feature vector, and the dimension of a feature vector is, for example, 1*128 dimensions.
  • a first sub-sample is extracted from the first training sample 440 based on the first feature type, for example, the first sub-sample is for content including age, gender, content category and other features.
  • the first sub-sample is processed based on the first feature dimension to obtain the first training feature data 441.
  • the first training feature data 441 is, for example, a feature vector, and the dimension of the feature vector is, for example, 1*128 dimensions.
  • the process of obtaining the first training feature data 442 and the first training feature data 443 is similar to the process of obtaining the first training feature data 441 , and will not be repeated here.
  • the network layer type set 451 includes, for example, initial network layer types A1, A2, and A3
  • the network layer structure set 452 includes, for example, initial network layer structures B1, B2, and B3
  • the feature configuration data set 453 includes, for example, initial feature configuration data C1, C2, and C3. .
  • the process is similar to the above content, and will not be repeated here.
  • the target deep learning model 480 based on the target network layer type 471 (A1) and the target network layer structure 462 (B2). After the target deep learning model 480 is constructed, the target deep learning model 480 needs to be trained based on the second training sample 490 .
  • the second training sample 490 is processed based on the target feature configuration data 473 ( C3 ) to obtain the second training feature data 491 .
  • the second feature type and the second feature dimension are determined based on the target feature configuration data 473 (C3).
  • the target feature configuration data 473 (C3) defines, for example, the second feature type and the second feature dimension.
  • the second feature type includes, for example, age , gender and other features
  • the second feature dimension is, for example, the dimension of the feature vector
  • the dimension of the feature vector is, for example, 1*256 dimensions.
  • a second sub-sample is extracted from the second training sample 490 based on the second feature type, for example, the second sub-sample is for content including age, gender and other features.
  • the second sub-sample is processed based on the second feature dimension to obtain the second training feature data 491.
  • the second training feature data 491 is, for example, a feature vector, and the dimension of the feature vector is, for example, 1*256 dimensions.
  • the model can be trained based on the paddle training framework PaddlePaddle and open source distributed Ray. For example, using PaddlePaddle to implement model building and model training, using Ray to seamlessly switch between local training and cluster training, Ray can automatically schedule available resources for parallel training, improving resource utilization and parallel training levels.
  • a configuration file includes two files, a feature configuration file and a training configuration file.
  • the feature configuration file includes, for example, candidate feature configuration data
  • the feature configuration file may also include a processing method of the feature, and the processing method includes, for example, normalization, hash operation, and the like.
  • the training configuration file includes data other than features, for example including model type data, evaluation conditions, and the like.
  • the training samples, verification samples, candidate feature configuration data, model structure, hyperparameters, and training resource configuration used in the training process can all be called through the configuration file, without modifying the framework code, and the experimental training can be started with one click, reducing the Technical threshold and training difficulty.
  • the target deep learning model is retrained based on the search results and the second training samples.
  • the model type data in the configuration file defines how to select the initial model type and network layer structure (search direction), and the candidate feature configuration data, for example, defines the feature type search and feature dimension search.
  • search directions hyperparameter search, feature type search, feature dimension search and model structure search.
  • Feature types include features or combined features that need to be extracted from sample data during model training.
  • Features include, for example, gender and age, and combined features include, for example, a combination of gender and age.
  • the hyperparameter search includes, for example, a search space, a search algorithm, and a scheduler algorithm (scheduling algorithm).
  • the search space includes algorithms such as random search (random search), grid search (grid search), and uniform distribution extraction.
  • the search space represents which candidate hyperparameters are available for search.
  • Search algorithms include grid search (grid search) algorithm, Bayesian optimization algorithm, OPTUNA optimization and other algorithms.
  • OPTUNA is a framework for automatic hyperparameter optimization.
  • the search algorithm is used to determine the optimal hyperparameter based on the training results of candidate hyperparameters. parameter.
  • the scheduler algorithm (scheduling algorithm) includes the first-in-first-out FIFO algorithm, ASHA algorithm, etc.
  • the ASHA algorithm is a parameter tuning algorithm.
  • the scheduler algorithm represents how to schedule computing resources to perform parallel training based on candidate hyperparameters.
  • Combined features can be searched through AutoCross (automatic crossover) algorithm, AutoFis (automatic adjustment) and other models.
  • the AutoCross model is responsible for screening useful explicit crossover features, such as screening features that improve the training effect of the model.
  • the AutoFis model is responsible for filtering the useless second-order cross features (implicit cross features) in the FM (Factorization Machine) model and the DeepFM model.
  • the explicit intersection feature is, for example, the combination or concatenation of multiple features
  • the implicit intersection feature is, for example, the point product of multiple features.
  • AutoDim algorithm For the feature dimension, AutoDim algorithm and AutoDis algorithm can be used to search.
  • AutoDim algorithm is an algorithm for automatic dimension optimization
  • AutoDis algorithm is an automatic discretization algorithm for numerical features.
  • the AutoDim algorithm searches out different dimension sizes from different feature dimensions, that is, searches for suitable dimensions for discrete features.
  • the AutoDis algorithm supports continuous feature embedding (discretization of continuous features), and searches for the most suitable dimension size for different continuous features during the training process.
  • Model structure search can learn the weight corresponding to the child architecture (network layer) through the NAS model (a compression model), so as to obtain an optimal model structure. For example, by learning the weights corresponding to multiple candidate network layers, the candidate network layer with a larger weight is used as the final network layer.
  • the VisualDL tool is a visual analysis tool in PaddlePaddle, a flying paddle training framework. It uses rich charts to show the influence of different hyperparameters in the experiment on the experimental results, and can more intuitively understand the search space and search algorithm for the recommendation model. Impact.
  • the training process of the model supports batch offline training search and incremental training search.
  • batch offline search training or incremental search training can be selected through configuration.
  • For batch offline search compare the experimental results on the same data set to select the optimal search result.
  • For incremental search training if the experimental effect of incremental search is better than the original experiment, replace it, otherwise keep the original model structure and hyperparameters and continue training.
  • the training process can be carried out in a parallel manner. For example, some computing resources are trained based on a part of hyperparameters, model structure, and training samples, and some computing resources are trained based on another part of hyperparameters, model structure, and training samples, thereby improving training efficiency.
  • Fig. 5 schematically shows a flowchart of a content recommendation method according to an embodiment of the present disclosure.
  • the content recommendation method 500 of the embodiment of the present disclosure may include, for example, operation S510 to operation S540.
  • object feature data for the target object is determined.
  • content feature data for the target content is determined for the target content in the at least one candidate content.
  • the object feature data and the content feature data are input into the target deep learning model to obtain an output result.
  • the above-mentioned initial deep learning model or target deep learning model is suitable for content recommendation scenarios, including but not limited to articles, commodities, and news.
  • the target object is an object that browses content
  • the object feature data includes, for example, the target object's age, gender, historical browsing records, browsed content category, and so on. Any one of multiple candidate contents is taken as the target content, and the content feature data of the target content is determined.
  • the content feature data includes, but not limited to, content category, topic information, and keyword information.
  • the object feature data and content feature data are input into the target deep learning model to obtain an output result, and the output result represents the degree of interest of the target object in the target content.
  • the object feature data and content feature data may also be input into the initial deep learning model to obtain an output result.
  • the initial deep learning model or the target deep learning model can automatically learn the association between object feature data and content feature data. If the output result satisfies the preset condition, it means that the target object is more interested in the target content, and at this time the target content can be recommended to the target object. If the output result does not meet the preset condition, it means that the target object is less interested in the target content, and at this time the target content may not be recommended to the target object.
  • the content recommendation is performed through the initial deep learning model or the target deep learning model, which improves the accuracy and efficiency of content recommendation, and the recommended content meets the needs of the target object and improves the user experience of the target object.
  • Fig. 6 schematically shows a block diagram of a training device for a deep learning model according to an embodiment of the present disclosure.
  • the training device 600 of the deep learning model of the embodiment of the present disclosure includes, for example, an acquisition module 610, a selection module 620, a first acquisition module 630, a first processing module 640, a first training module 650 and a second acquisition module 660.
  • the obtaining module 610 may be used to obtain a configuration file, wherein the configuration file includes model type data and candidate feature configuration data. According to an embodiment of the present disclosure, the acquiring module 610 may, for example, perform the operation S210 described above with reference to FIG. 2 , which will not be repeated here.
  • the selection module 620 may be configured to select an initial network layer type and an initial network layer structure based on the model type data. According to an embodiment of the present disclosure, the selection module 620 may, for example, perform the operation S220 described above with reference to FIG. 2 , which will not be repeated here.
  • the first obtaining module 630 can be used to obtain an initial deep learning model based on the initial network layer type and the initial network layer structure. According to an embodiment of the present disclosure, the first obtaining module 630 may, for example, perform operation S230 described above with reference to FIG. 2 , which will not be repeated here.
  • the first processing module 640 may be configured to process the first training sample based on the candidate feature configuration data to obtain first training feature data. According to an embodiment of the present disclosure, the first processing module 640 may, for example, execute the operation S240 described above with reference to FIG. 2 , which will not be repeated here.
  • the first training module 650 can be used to train an initial deep learning model using the first training feature data. According to an embodiment of the present disclosure, the first training module 650 may, for example, execute the operation S250 described above with reference to FIG. 2 , which will not be repeated here.
  • the second obtaining module 660 can be used to obtain a target deep learning model based on the trained initial deep learning model. According to an embodiment of the present disclosure, the second obtaining module 660 may, for example, perform the operation S260 described above with reference to FIG. 2 , which will not be repeated here.
  • the trained initial deep learning model includes at least one trained initial deep learning model; the configuration file also includes evaluation conditions; the second obtaining module includes: a first processing sub-module, an input sub-module, a first Identify submodules and get submodules.
  • the first processing submodule is used to process verification samples based on candidate feature configuration data to obtain verification feature data; the input submodule is used to input verification feature data into at least one trained initial deep learning model to obtain at least one verification result ;
  • the first determining submodule is used to determine the target network layer type, target network layer structure, and target feature configuration data from the network layer type set, network layer structure set, and feature configuration data set based on at least one verification result and evaluation condition ;
  • the obtaining sub-module is used to obtain the target deep learning model based on the target network layer type, target network layer structure, and target feature configuration data.
  • the network layer type set includes an initial network layer type for at least one trained initial deep learning model; the network layer structure set includes an initial network layer structure for at least one trained initial deep learning model; features
  • the configuration data set includes initial feature configuration data for at least one trained initial deep learning model, and the initial feature configuration data in the feature configuration data set is at least part of the candidate feature configuration data.
  • the obtaining submodule includes: an obtaining unit, a processing unit and a training unit.
  • the obtaining unit is used to obtain the target deep learning model to be trained based on the target network layer type and the target network layer structure;
  • the processing unit is used to process the second training sample based on the target feature configuration data to obtain the second training feature data;
  • the training unit It is used for using the second training feature data to train the target deep learning model to be trained to obtain the target deep learning model.
  • the candidate feature configuration data includes at least one candidate feature configuration data
  • the first processing module 640 includes: a first selection submodule, a second determination submodule, an extraction submodule and a second processing submodule.
  • the first selection submodule is used to select the initial feature configuration data for the initial deep learning model from at least one candidate configuration data
  • the second determination submodule is used to determine the first feature type and the first feature based on the initial feature configuration data dimension
  • the extraction sub-module is used to extract the first sub-sample from the first training sample based on the first feature type
  • the second processing sub-module is used to process the first sub-sample based on the first feature dimension to obtain the first training feature data.
  • the processing unit includes: a determination subunit, an extraction subunit, and a processing subunit.
  • the determination subunit is used to determine the second feature type and the second feature dimension based on the target feature configuration data;
  • the extraction subunit is used to extract the second subsample from the second training sample based on the second feature type;
  • the processing subunit for processing the second sub-sample based on the second feature dimension to obtain the second training feature data.
  • the selection module 620 includes: a second selection submodule and a third selection submodule.
  • the second selection submodule is used to select the initial network layer type for the initial deep learning model from at least one candidate network layer type based on the model type data;
  • the third selection submodule is used to select the target from at least one candidate hyperparameter Hyperparameters, as the initial network layer structure for the initial deep learning model.
  • Fig. 7 schematically shows a block diagram of a content recommendation device according to an embodiment of the present disclosure.
  • the content recommendation device 700 of the embodiment of the present disclosure includes, for example, a first determination module 710 , a second determination module 720 , an input module 730 and a recommendation module 740 .
  • the first determination module 710 may be used to determine object feature data for the target object. According to an embodiment of the present disclosure, the first determining module 710 may, for example, perform the operation S510 described above with reference to FIG. 5 , which will not be repeated here.
  • the second determination module 720 may be configured to determine content feature data for the target content for the target content in at least one candidate content. According to an embodiment of the present disclosure, the second determining module 720 may, for example, perform the operation S520 described above with reference to FIG. 5 , which will not be repeated here.
  • the input module 730 can be used to input object feature data and content feature data into the target deep learning model to obtain an output result, wherein the target deep learning model is generated using the above-mentioned deep learning model training device, and the output result represents the target object's effect on the target. Level of interest in the content. According to an embodiment of the present disclosure, the input module 730 may, for example, perform the operation S530 described above with reference to FIG. 5 , which will not be repeated here.
  • the recommendation module 740 may be configured to recommend target content to the target object in response to the output result meeting the preset condition. According to an embodiment of the present disclosure, the recommendation module 740 may, for example, perform the operation S540 described above with reference to FIG. 5 , which will not be repeated here.
  • the user's authorization or consent is obtained.
  • the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
  • FIG. 8 is a block diagram of an electronic device for performing deep learning model training and/or content recommendation to implement an embodiment of the present disclosure.
  • FIG. 8 shows a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure.
  • Electronic device 800 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the device 800 includes a computing unit 801 that can execute according to a computer program stored in a read-only memory (ROM) 802 or loaded from a storage unit 808 into a random access memory (RAM) 803. Various appropriate actions and treatments. In the RAM 803, various programs and data necessary for the operation of the device 800 can also be stored.
  • the computing unit 801, ROM 802, and RAM 803 are connected to each other through a bus 804.
  • An input/output (I/O) interface 805 is also connected to the bus 804 .
  • the I/O interface 805 includes: an input unit 806, such as a keyboard, a mouse, etc.; an output unit 807, such as various types of displays, speakers, etc.; a storage unit 808, such as a magnetic disk, an optical disk, etc. ; and a communication unit 809, such as a network card, a modem, a wireless communication transceiver, and the like.
  • the communication unit 809 allows the device 800 to exchange information/data with other devices over a computer network such as the Internet and/or various telecommunication networks.
  • the computing unit 801 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing units 801 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc.
  • the calculation unit 801 executes various methods and processes described above, such as a deep learning model training method and/or a content recommendation method.
  • the deep learning model training method and/or the content recommendation method may be implemented as a computer software program, which is tangibly embodied in a machine-readable medium, such as the storage unit 808 .
  • part or all of the computer program may be loaded and/or installed on the device 800 via the ROM 802 and/or the communication unit 809.
  • the computer program When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the above-described deep learning model training method and/or content recommendation method can be executed.
  • the computing unit 801 may be configured in any other appropriate way (for example, by means of firmware) to execute a deep learning model training method and/or a content recommendation method.
  • Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips Implemented in a system of systems (SOC), complex programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC system of systems
  • CPLD complex programmable logic device
  • computer hardware firmware, software, and/or combinations thereof.
  • programmable processor can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
  • Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes can be provided to a processor or controller of a general-purpose computer, a special-purpose computer, or other programmable deep learning model training device and/or content recommendation device, so that the program code when executed by the processor or controller makes the flowchart and and/or the functions/operations specified in the block diagrams are implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or a trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
  • the systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.
  • a computer system may include clients and servers.
  • Clients and servers are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
  • the server can be a cloud server, a server of a distributed system, or a server combined with a blockchain.
  • steps may be reordered, added or deleted using the various forms of flow shown above.
  • each step described in the present disclosure may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.

Abstract

A deep learning model training method and apparatus, a content recommendation method and apparatus, a device, a medium, and a product, relating to the technical fields of deep learning and intelligent recommendation in artificial intelligence. The deep learning model training method comprises: acquiring a configuration file, wherein the configuration file comprises model type data and candidate feature configuration data (S210); selecting an initial network layer type and an initial network layer structure on the basis of the model type data (S220); obtaining an initial deep learning model on the basis of the initial network layer type and the initial network layer structure (S230); processing a first training sample on the basis of the candidate feature configuration data to obtain first training feature data (S240); training the initial deep learning model by means of the first training feature data (S250); and obtaining a target deep learning model on the basis of the trained initial deep learning model (S260).

Description

深度学习模型的训练方法、内容推荐方法和装置Deep learning model training method, content recommendation method and device
本申请要求于2021年12月27日提交的、申请号为202111618428.9的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to a Chinese patent application with application number 202111618428.9 filed on December 27, 2021, the entire contents of which are incorporated herein by reference.
技术领域technical field
本公开涉及人工智能技术领域,尤其涉及深度学习、智能推荐等技术领域,更具体地,涉及一种深度学习模型的训练方法、内容推荐方法、装置、电子设备、介质和程序产品。The present disclosure relates to the technical field of artificial intelligence, in particular to deep learning, intelligent recommendation and other technical fields, and more specifically, to a deep learning model training method, content recommendation method, device, electronic equipment, media, and program product.
背景技术Background technique
相关技术中,可以通过深度学习模型来推荐相关内容,但是,为了训练得到一个较优的深度学习模型,需要投入大量的人工成本和时间成本,并且存在较高的技术门槛,导致深度学习模型的训练效率较低。In related technologies, the deep learning model can be used to recommend relevant content. However, in order to train a better deep learning model, a lot of labor and time costs need to be invested, and there is a high technical threshold, which leads to the failure of the deep learning model. The training efficiency is low.
发明内容Contents of the invention
本公开提供了一种深度学习模型的训练方法、内容推荐方法、装置、电子设备、存储介质以及程序产品。The present disclosure provides a training method of a deep learning model, a content recommendation method, a device, an electronic device, a storage medium, and a program product.
根据本公开的一方面,提供了一种深度学习模型的训练方法,包括:获取配置文件,其中,所述配置文件包括模型类型数据和候选特征配置数据;基于所述模型类型数据,选择初始网络层类型和初始网络层结构;基于所述初始网络层类型和所述初始网络层结构,获得初始深度学习模型;基于所述候选特征配置数据处理第一训练样本,得到第一训练特征数据;利用所述第一训练特征数据训练所述初始深度学习模型;基于经训练的初始深度学习模型,得到目标深度学习模型。According to an aspect of the present disclosure, a method for training a deep learning model is provided, including: obtaining a configuration file, wherein the configuration file includes model type data and candidate feature configuration data; based on the model type data, selecting an initial network Layer type and initial network layer structure; Based on the initial network layer type and the initial network layer structure, an initial deep learning model is obtained; based on the candidate feature configuration data, the first training sample is processed to obtain the first training feature data; The first training feature data trains the initial deep learning model; based on the trained initial deep learning model, a target deep learning model is obtained.
根据本公开的一方面,提供了一种内容推荐方法,包括:确定针对目标对象的对象特征数据;针对至少一个候选内容中的目标内容,确定针对所述目标内容的内容特征数据;将所述对象特征数据和所述内容特征数据输入目标深度学习模型中,得到输出结果,其中,所述目标深度学习模型采用根据本公开的方法生成,所述输出结果表征了所述目标对象对所述目标内容的感兴趣程度;响应于所述输出结果满足预设条件,向所述目标 对象推荐所述目标内容。According to an aspect of the present disclosure, there is provided a method for recommending content, including: determining object feature data for a target object; for target content in at least one candidate content, determining content feature data for the target content; The object feature data and the content feature data are input into the target deep learning model to obtain an output result, wherein the target deep learning model is generated by the method according to the present disclosure, and the output result represents the impact of the target object on the target The degree of interest in the content; in response to the output result meeting a preset condition, recommending the target content to the target object.
根据本公开的另一方面,提供了一种深度学习模型的训练装置,包括:获取模块、选择模块、第一获得模块、第一处理模块、第一训练模块和第二获得模块。获取模块,用于获取配置文件,其中,所述配置文件包括模型类型数据和候选特征配置数据;选择模块,用于基于所述模型类型数据,选择初始网络层类型和初始网络层结构;第一获得模块,用于基于所述初始网络层类型和所述初始网络层结构,获得初始深度学习模型;第一处理模块,用于基于所述候选特征配置数据处理第一训练样本,得到第一训练特征数据;第一训练模块,用于利用所述第一训练特征数据训练所述初始深度学习模型;第二获得模块,用于基于经训练的初始深度学习模型,得到目标深度学习模型。According to another aspect of the present disclosure, a training device for a deep learning model is provided, including: an acquisition module, a selection module, a first acquisition module, a first processing module, a first training module, and a second acquisition module. An acquisition module, configured to acquire a configuration file, wherein the configuration file includes model type data and candidate feature configuration data; a selection module, configured to select an initial network layer type and an initial network layer structure based on the model type data; the first The obtaining module is used to obtain an initial deep learning model based on the initial network layer type and the initial network layer structure; the first processing module is used to process the first training sample based on the candidate feature configuration data to obtain the first training Feature data; a first training module, configured to use the first training feature data to train the initial deep learning model; a second obtaining module, configured to obtain a target deep learning model based on the trained initial deep learning model.
根据本公开的一方面,提供了一种内容推荐装置,包括:第一确定模块、第二确定模块、输入模块和推荐模块。第一确定模块,用于确定针对目标对象的对象特征数据;第二确定模块,用于针对至少一个候选内容中的目标内容,确定针对所述目标内容的内容特征数据;输入模块,用于将所述对象特征数据和所述内容特征数据输入目标深度学习模型中,得到输出结果,其中,所述目标深度学习模型采用根据本公开的装置生成,所述输出结果表征了所述目标对象对所述目标内容的感兴趣程度;推荐模块,用于响应于所述输出结果满足预设条件,向所述目标对象推荐所述目标内容。According to an aspect of the present disclosure, a content recommendation device is provided, including: a first determination module, a second determination module, an input module and a recommendation module. The first determination module is used to determine the object feature data for the target object; the second determination module is used to determine the content feature data for the target content for the target content in at least one candidate content; the input module is used for The object feature data and the content feature data are input into the target deep learning model to obtain an output result, wherein the target deep learning model is generated by the device according to the present disclosure, and the output result represents the target object's effect on the target object. The degree of interest in the target content; a recommendation module, configured to recommend the target content to the target object in response to the output result meeting a preset condition.
根据本公开的另一方面,提供了一种电子设备,包括:至少一个处理器和与所述至少一个处理器通信连接的存储器。其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述的深度学习模型的训练方法和/或内容推荐方法。According to another aspect of the present disclosure, an electronic device is provided, including: at least one processor and a memory communicatively connected to the at least one processor. Wherein, the memory stores instructions that can be executed by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the above-mentioned deep learning model training method and /or content recommendation method.
根据本公开的另一方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,所述计算机指令用于使所述计算机执行上述的深度学习模型的训练方法和/或内容推荐方法。According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions, the computer instructions are used to enable the computer to execute the above-mentioned deep learning model training method and/or content recommendation method .
根据本公开的另一方面,提供了一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现上述的深度学习模型的训练方法和/或内容推荐方法。According to another aspect of the present disclosure, a computer program product is provided, including a computer program. When the computer program is executed by a processor, the above-mentioned deep learning model training method and/or content recommendation method are implemented.
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will be readily understood through the following description.
附图说明Description of drawings
附图用于更好地理解本方案,不构成对本公开的限定。其中:The accompanying drawings are used to better understand the present solution, and do not constitute a limitation to the present disclosure. in:
图1示意性示出了根据本公开一实施例的深度学习模型的训练和内容推荐的系统架构;FIG. 1 schematically shows a system architecture for training a deep learning model and recommending content according to an embodiment of the present disclosure;
图2示意性示出了根据本公开一实施例的深度学习模型的训练方法的流程图;Fig. 2 schematically shows a flow chart of a method for training a deep learning model according to an embodiment of the present disclosure;
图3示意性示出了根据本公开另一实施例的深度学习模型的训练方法的流程图;Fig. 3 schematically shows a flow chart of a method for training a deep learning model according to another embodiment of the present disclosure;
图4示意性示出了根据本公开一实施例的深度学习模型的训练方法的原理图;Fig. 4 schematically shows a schematic diagram of a training method of a deep learning model according to an embodiment of the present disclosure;
图5示意性示出了根据本公开一实施例的内容推荐方法的示意图;Fig. 5 schematically shows a schematic diagram of a content recommendation method according to an embodiment of the present disclosure;
图6示意性示出了根据本公开一实施例的深度学习模型的训练装置的框图;Fig. 6 schematically shows a block diagram of a training device for a deep learning model according to an embodiment of the present disclosure;
图7示意性示出了根据本公开一实施例的内容推荐装置的框图;以及Fig. 7 schematically shows a block diagram of a content recommendation device according to an embodiment of the present disclosure; and
图8是用来实现本公开实施例的用于执行深度学习模型的训练和/或内容推荐的电子设备的框图。FIG. 8 is a block diagram of an electronic device for performing deep learning model training and/or content recommendation to implement an embodiment of the present disclosure.
具体实施方式Detailed ways
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
在此使用的术语仅仅是为了描述具体实施例,而并非意在限制本公开。在此使用的术语“包括”、“包含”等表明了所述特征、步骤、操作和/或部件的存在,但是并不排除存在或添加一个或多个其他特征、步骤、操作或部件。The terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the present disclosure. The terms "comprising", "comprising", etc. used herein indicate the presence of stated features, steps, operations and/or components, but do not exclude the presence or addition of one or more other features, steps, operations or components.
在此使用的所有术语(包括技术和科学术语)具有本领域技术人员通常所理解的含义,除非另外定义。应注意,这里使用的术语应解释为具有与本说明书的上下文相一致的含义,而不应以理想化或过于刻板的方式来解释。All terms (including technical and scientific terms) used herein have the meaning commonly understood by one of ordinary skill in the art, unless otherwise defined. It should be noted that the terms used herein should be interpreted to have a meaning consistent with the context of this specification, and not be interpreted in an idealized or overly rigid manner.
在使用类似于“A、B和C等中至少一个”这样的表述的情况下,一般来说应该按照本领域技术人员通常理解该表述的含义来予以解释(例如,“具有A、B和C中至少一个的系统”应包括但不限于单独具有A、单独具有B、单独具有C、具有A和B、具有A和C、具有B和C、和/或具有A、B、C的系统等)。Where expressions such as "at least one of A, B, and C, etc." are used, they should generally be interpreted as those skilled in the art would normally understand the expression (for example, "having A, B, and C A system of at least one of "shall include, but not be limited to, systems with A alone, B alone, C alone, A and B, A and C, B and C, and/or A, B, C, etc. ).
图1示意性示出了根据本公开一实施例的深度学习模型的训练和内容推荐的系统架构。需要注意的是,图1所示仅为可以应用本公开实施例的系统架构的示例,以帮助本领域技术人员理解本公开的技术内容,但并不意味着本公开实施例不可以用于其他设备、系统、环境或场景。Fig. 1 schematically shows a system architecture of deep learning model training and content recommendation according to an embodiment of the present disclosure. It should be noted that, what is shown in FIG. 1 is only an example of the system architecture to which the embodiments of the present disclosure can be applied, so as to help those skilled in the art understand the technical content of the present disclosure, but it does not mean that the embodiments of the present disclosure cannot be used in other device, system, environment or scenario.
如图1所示,根据该实施例的系统架构100可以包括客户端101、102、103,网络104和服务器105。网络104用以在客户端101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1 , a system architecture 100 according to this embodiment may include clients 101 , 102 , 103 , a network 104 and a server 105 . The network 104 is used as a medium for providing communication links between the clients 101 , 102 , 103 and the server 105 . Network 104 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
用户可以使用客户端101、102、103通过网络104与服务器105交互,以接收或发送消息等。客户端101、102、103上可以安装有各种通讯客户端应用,例如购物类应用、网页浏览器应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等(仅为示例)。Users may use clients 101, 102, 103 to interact with server 105 over network 104 to receive or send messages, and the like. Clients 101, 102, and 103 can be installed with various communication client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, email clients, social platform software, etc. (just for example).
客户端101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。本公开实施例的客户端101、102、103例如可以运行应用程序。 Clients 101, 102, 103 may be various electronic devices with display screens and supporting web browsing, including but not limited to smartphones, tablet computers, laptop computers, desktop computers, and the like. The clients 101, 102, and 103 in the embodiments of the present disclosure may, for example, run applications.
服务器105可以是提供各种服务的服务器,例如对用户利用客户端101、102、103所浏览的网站提供支持的后台管理服务器(仅为示例)。后台管理服务器可以对接收到的用户请求等数据进行分析等处理,并将处理结果(例如根据用户请求获取或生成的网页、信息、或数据等)反馈给客户端。另外,服务器105还可以是云服务器,即服务器105具有云计算功能。The server 105 may be a server that provides various services, such as a background management server that provides support for websites browsed by users using the clients 101 , 102 , 103 (just an example). The background management server can analyze and process received user requests and other data, and feed back processing results (such as webpages, information, or data, etc. obtained or generated according to user requests) to the client. In addition, the server 105 may also be a cloud server, that is, the server 105 has a cloud computing function.
需要说明的是,本公开实施例所提供的深度学习模型的训练方法和/或内容推荐方法可以由服务器105执行。相应地,本公开实施例所提供的深度学习模型的训练装置和/或内容推荐装置可以设置于服务器105中。本公开实施例所提供的深度学习模型的训练方法和/或内容推荐方法也可以由不同于服务器105且能够与客户端101、102、103和/或服务器105通信的服务器或服务器集群执行。相应地,本公开实施例所提供的深度学习模型的训练装置和/或内容推荐装置也可以设置于不同于服务器105且能够与客户端101、102、103和/或服务器105通信的服务器或服务器集群中。It should be noted that the deep learning model training method and/or the content recommendation method provided by the embodiment of the present disclosure may be executed by the server 105 . Correspondingly, the deep learning model training device and/or the content recommendation device provided by the embodiments of the present disclosure may be set in the server 105 . The deep learning model training method and/or content recommendation method provided by the embodiments of the present disclosure may also be executed by a server or server cluster that is different from the server 105 and can communicate with the clients 101 , 102 , 103 and/or the server 105 . Correspondingly, the deep learning model training device and/or content recommendation device provided by the embodiments of the present disclosure may also be set on a server or a server that is different from the server 105 and can communicate with the clients 101, 102, 103 and/or the server 105 in the cluster.
示例性地,服务器105可以通过网络104接收来自客户端101、102、103的训练样本,并利用训练样本训练深度学习模型,然后服务器105可以通过网络104将经训练的深度学习模型发送给客户端101、102、103,客户端可以利用经训练的深度学习模型进行内容推荐。或者,服务器105也可以直接利用深度学习模型进行内容推荐。Exemplarily, the server 105 can receive training samples from the clients 101, 102, 103 through the network 104, and use the training samples to train the deep learning model, and then the server 105 can send the trained deep learning model to the client through the network 104 101, 102, 103, the client can use the trained deep learning model to recommend content. Alternatively, the server 105 may also directly use the deep learning model to recommend content.
应该理解,图1中的客户端、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的客户端、网络和服务器。It should be understood that the numbers of clients, networks and servers in Fig. 1 are only illustrative. You can have as many clients, networks, and servers as you need for your implementation.
下面结合图1的系统架构,参考图2~图5来描述根据本公开示例性实施方式的深度 学习模型的训练方法和内容推荐方法。本公开实施例的深度学习模型的训练方法和内容推荐方法例如可以由图1所示的服务器来执行,图1所示的服务器例如以下文的电子设备相同或类似。A method for training a deep learning model and a method for recommending content according to an exemplary embodiment of the present disclosure will be described below with reference to FIGS. 2 to 5 in conjunction with the system architecture of FIG. 1 . The deep learning model training method and the content recommendation method of the embodiments of the present disclosure can be executed by the server shown in FIG. 1 , for example, the server shown in FIG. 1 is the same as or similar to the electronic device below.
图2示意性示出了根据本公开一实施例的深度学习模型的训练方法的流程图。Fig. 2 schematically shows a flowchart of a method for training a deep learning model according to an embodiment of the present disclosure.
如图2所示,本公开实施例的深度学习模型的训练方法200例如可以包括操作S210~操作S260。As shown in FIG. 2 , the deep learning model training method 200 of the embodiment of the present disclosure may include, for example, operation S210 to operation S260.
在操作S210,获取配置文件,配置文件包括模型类型数据和候选特征配置数据。In operation S210, a configuration file is obtained, and the configuration file includes model type data and candidate feature configuration data.
在操作S220,基于模型类型数据,选择初始网络层类型和初始网络层结构。In operation S220, based on the model type data, an initial network layer type and an initial network layer structure are selected.
在操作S230,基于初始网络层类型和初始网络层结构,得到初始深度学习模型。In operation S230, an initial deep learning model is obtained based on the initial network layer type and the initial network layer structure.
在操作S240,基于候选特征配置数据处理第一训练样本,得到第一训练特征数据。In operation S240, the first training sample is processed based on the candidate feature configuration data to obtain first training feature data.
在操作S250,利用第一训练特征数据训练初始深度学习模型。In operation S250, an initial deep learning model is trained using the first training feature data.
在操作S260,基于经训练的初始深度学习模型,得到目标深度学习模型。In operation S260, a target deep learning model is obtained based on the trained initial deep learning model.
示例性地,配置文件包括模型类型数据,模型类型数据例如表征了初始深度学习模型的模型类型,模型类型例如包括深度神经网络(Deep Neural Networks,DNN)类型。基于模型类型数据确定初始深度学习模型的模型类型(例如DNN)之后,可进一步确定DNN模型的初始网络层类型和初始网络层结构。Exemplarily, the configuration file includes model type data, and the model type data, for example, characterizes the model type of the initial deep learning model, and the model type includes, for example, a deep neural network (Deep Neural Networks, DNN) type. After determining the model type (such as DNN) of the initial deep learning model based on the model type data, the initial network layer type and initial network layer structure of the DNN model can be further determined.
示例性地,初始网络层类型例如包括注意力层、全连接层、池化层等类型的层级,初始网络层类型还可以表征各层级之间的连接关系。初始网络层结构例如表征了每一层级的节点数量。Exemplarily, the initial network layer type includes, for example, attention layer, fully connected layer, pooling layer and other types of layers, and the initial network layer type may also represent the connection relationship between each layer. The initial network layer structure, for example, characterizes the number of nodes in each layer.
示例性地,初始网络层类型可以包括多个供选择的初始网络层类型,初始网络层结构可以包括多个供选择的初始网络层结构,基于模型类型数据可以从多个初始网络层类型和多个初始网络层结构中选择DNN模型所需的初始网络层类型和初始网络层结构,例如,可以依次选择不同的初始网络层类型和不同的初始网络层结构来构建初始深度学习模型,对每次构建得到初始深度学习模型进行训练。Exemplarily, the initial network layer type may include multiple optional initial network layer types, the initial network layer structure may include multiple optional initial network layer structures, and the data based on the model type may be selected from multiple initial network layer types and multiple Select the initial network layer type and initial network layer structure required by the DNN model in each initial network layer structure. For example, different initial network layer types and different initial network layer structures can be selected in turn to build the initial deep learning model. For each Build the initial deep learning model for training.
示例性地,候选特征配置数据表征了针对第一训练样本的处理方式,换言之,候选特征配置数据表征了从第一训练样本中提取特征数据的特征类型和特征维度。基于候选特征配置数据处理第一训练样本可以得到适用于训练初始深度学习模型的第一训练特征数据。在一示例中,候选特征配置数据包括针对第一训练样本的特征类型和特征维度。特征类型例如包括年龄、性别、内容类别等特征,特征维度例如为特征向量的维度,特征向量的维度例如包括1*128维、1*256维等等。Exemplarily, the candidate feature configuration data represents a processing method for the first training sample. In other words, the candidate feature configuration data represents a feature type and a feature dimension for extracting feature data from the first training sample. Processing the first training sample based on the candidate feature configuration data can obtain the first training feature data suitable for training the initial deep learning model. In an example, the candidate feature configuration data includes feature types and feature dimensions for the first training sample. The feature type includes, for example, age, gender, content category and other features, and the feature dimension is, for example, the dimension of a feature vector, and the dimension of a feature vector includes, for example, 1*128 dimension, 1*256 dimension, and so on.
例如,当训练第一个初始深度学习模型时,针对用于训练模型的第一训练样本,从年龄、性别、内容类别等特征中选择年龄和性别的特征,从特征维度1*128维、1*256维中选择1*128维,然后处理第一训练样本得到针对年龄和性别的1*128维的第一训练特征数据。For example, when training the first initial deep learning model, for the first training sample used to train the model, select age and gender features from features such as age, gender, content category, etc., from feature dimensions 1*128 dimensions, 1 Select 1*128 dimension among *256 dimensions, and then process the first training sample to obtain the first training feature data of 1*128 dimension for age and gender.
例如,当训练第二个初始深度学习模型时,针对用于训练模型的第一训练样本,从年龄、性别、内容类别等特征中选择性别和内容类别的特征,从特征维度1*128维、1*256维中选择1*256维,然后处理第一训练样本得到针对性别和内容类别的1*256维的第一训练特征数据。For example, when training the second initial deep learning model, for the first training sample used to train the model, select the features of gender and content category from features such as age, gender, content category, etc., from the feature dimension 1*128 dimensions, Select 1*256 dimension among 1*256 dimensions, and then process the first training sample to obtain 1*256-dimensional first training feature data for gender and content category.
例如,构建得到初始深度学习模型之后,可以基于候选特征配置数据处理第一训练样本得到第一训练特征数据,并通过第一训练特征数据训练初始深度学习模型。候选特征配置数据也可以包括多个候选特征配置数据,针对不同的初始深度学习模型,可以依次选择不同的候选特征配置数据来处理用于训练对应初始深度学习模型的第一训练样本。在得到经训练的初始深度学习模型之后,可以基于初始深度学习模型得到目标深度学习模型。例如可以直接将初始深度学习模型作为目标深度学习模型,或者基于初始深度学习模型重新进行模型构建和模型训练以得到深度学习模型。For example, after the initial deep learning model is constructed, the first training sample can be processed based on the candidate feature configuration data to obtain the first training feature data, and the initial deep learning model can be trained through the first training feature data. The candidate feature configuration data may also include multiple candidate feature configuration data. For different initial deep learning models, different candidate feature configuration data may be sequentially selected to process the first training samples used to train the corresponding initial deep learning models. After obtaining the trained initial deep learning model, a target deep learning model can be obtained based on the initial deep learning model. For example, the initial deep learning model can be directly used as the target deep learning model, or model construction and model training can be re-performed based on the initial deep learning model to obtain a deep learning model.
根据本公开的实施例,通过配置文件来定义模型类型数据和候选特征配置数据,在训练初始深度学习模型时,可以基于配置文件选择相应的初始网络层类型和初始网络层结构来构建相应的初始深度学习模型,并基于候选特征配置数据来处理第一训练样本得到相应的第一训练特征数据,以便基于第一训练特征数据训练初始深度学习模型,然后基于初始深度学习模型得到目标深度学习模型。可以理解,基于配置文件构建初始神经网络和处理第一训练样本,以便自动快速地训练多个初始深度学习模型,提高了模型训练的效率,降低了模型训练的成本,通过配置文件的方式无需修改代码,降低模型训练的技术门槛。According to the embodiment of the present disclosure, the model type data and the candidate feature configuration data are defined through the configuration file. When training the initial deep learning model, the corresponding initial network layer type and initial network layer structure can be selected based on the configuration file to construct the corresponding initial deep learning model, and based on the candidate feature configuration data to process the first training sample to obtain the corresponding first training feature data, so as to train the initial deep learning model based on the first training feature data, and then obtain the target deep learning model based on the initial deep learning model. It can be understood that the initial neural network is constructed based on the configuration file and the first training sample is processed so as to automatically and quickly train multiple initial deep learning models, which improves the efficiency of model training and reduces the cost of model training. No modification is required through the configuration file Code, lowering the technical threshold of model training.
图3示意性示出了根据本公开另一实施例的深度学习模型的训练方法的流程图。Fig. 3 schematically shows a flowchart of a method for training a deep learning model according to another embodiment of the present disclosure.
如图3所示,本公开实施例的深度学习模型的训练方法300例如可以包括操作S301~操作S311。As shown in FIG. 3 , the deep learning model training method 300 of the embodiment of the present disclosure may include, for example, operation S301 to operation S311 .
在操作S301,获取配置文件,配置文件包括模型类型数据和候选特征配置数据。In operation S301, a configuration file is acquired, and the configuration file includes model type data and candidate feature configuration data.
在操作S302,基于模型类型数据,选择初始网络层类型和初始网络层结构。In operation S302, based on the model type data, an initial network layer type and an initial network layer structure are selected.
在操作S303,基于初始网络层类型和初始网络层结构,获得初始深度学习模型。In operation S303, an initial deep learning model is obtained based on the initial network layer type and the initial network layer structure.
在操作S304,基于候选特征配置数据处理第一训练样本,得到第一训练特征数据。In operation S304, the first training sample is processed based on the candidate feature configuration data to obtain first training feature data.
在操作S305,利用第一训练特征数据训练初始深度学习模型。In operation S305, an initial deep learning model is trained using the first training feature data.
根据本公开实施例,操作S301~操作S305与上文提及的实施例的操作相同或类似,在此不再赘述。通过操作S301~操作S305得到经训练的初始深度学习模型之后,基于经训练的初始深度学习模型,得到目标深度学习模型,参见操作S306~S311。According to the embodiment of the present disclosure, operation S301 to operation S305 are the same as or similar to the operations of the above-mentioned embodiment, and will not be repeated here. After obtaining the trained initial deep learning model through operations S301 to S305, a target deep learning model is obtained based on the trained initial deep learning model, see operations S306 to S311.
示例性地,初始深度学习模型包括至少一个经训练的初始深度学习模型,至少一个经训练的初始深度学习模型中各个经训练的深度学习模型对应的初始网络层类型、初始网络层结构或者候选特征配置数据可以不同。配置文件中例如还包括评价条件,评价条件用于评价初始深度学习模型的训练效果。以下操作S306~操作S308描述了通过评价初始深度学习模型,得到训练效果较好的目标网络层类型、目标网络层结构、目标特征配置数据。Exemplarily, the initial deep learning model includes at least one trained initial deep learning model, the initial network layer type, initial network layer structure or candidate feature corresponding to each trained deep learning model in at least one trained initial deep learning model Configuration data can be different. For example, the configuration file also includes evaluation conditions, which are used to evaluate the training effect of the initial deep learning model. The following operations S306 to S308 describe obtaining the target network layer type, target network layer structure, and target feature configuration data with better training effect by evaluating the initial deep learning model.
在操作S306,基于候选特征配置数据处理验证样本,得到验证特征数据。In operation S306, the verification sample is processed based on the candidate feature configuration data to obtain verification feature data.
在操作S307,将验证特征数据分别输入至少一个经训练的初始深度学习模型中,得到至少一个验证结果。In operation S307, the verification feature data are respectively input into at least one trained initial deep learning model, and at least one verification result is obtained.
在操作S308,基于至少一个验证结果和评价条件,从网络层类型集合、网络层结构集合、特征配置数据集合中分别确定目标网络层类型、目标网络层结构、目标特征配置数据。In operation S308, based on at least one verification result and evaluation condition, determine the target network layer type, target network layer structure, and target feature configuration data from the network layer type set, network layer structure set, and feature configuration data set, respectively.
示例性地,当训练得到多个初始深度学习模型之后,得到与多个初始深度学习模型对应的网络层类型集合、网络层结构集合和特征配置数据集合。例如,网络层类型集合中包括针对多个经训练的初始深度学习模型的初始网络层类型。网络层结构集合包括针对多个经训练的初始深度学习模型的初始网络层结构。特征配置数据集合包括针对多个经训练的初始深度学习模型的初始特征配置数据,特征配置数据集合中的初始特征配置数据例如为候多个选特征配置数据中的至少部分。Exemplarily, after training multiple initial deep learning models, a set of network layer types, a set of network layer structures, and a set of feature configuration data corresponding to the multiple initial deep learning models are obtained. For example, the set of network layer types includes initial network layer types for multiple trained initial deep learning models. The set of network layer structures includes initial network layer structures for a plurality of trained initial deep learning models. The feature configuration data set includes initial feature configuration data for a plurality of trained initial deep learning models, and the initial feature configuration data in the feature configuration data set is, for example, at least part of multiple candidate feature configuration data.
示例性地,针对每个初始深度学习模型,针对与该初始深度学习模型对应的候选特征配置数据,基于该候选特征配置数据处理验证样本,得到验证特征数据,并利用验证特征数据训练该初始深度学习模型得到验证结果。由此,可以得到与多个初始深度学习模型一一对应的多个验证结果。Exemplarily, for each initial deep learning model, for the candidate feature configuration data corresponding to the initial deep learning model, process the verification sample based on the candidate feature configuration data to obtain verification feature data, and use the verification feature data to train the initial depth The learning model is validated. Thus, a plurality of verification results corresponding to a plurality of initial deep learning models can be obtained.
在一示例中,验证结果例如包括初始深度学习模型对验证样本的召回率或精准率等等,评价条件例如包括针对召回率户精准率的条件,例如,通过评价条件来评价验证结果的召回率或精准率是否达到一定阈值。在另一示例中,评价条件例如与AUC(Area Under Curve)曲线相关,可以基于AUC曲线来评价验证结果,AUC曲线是一种评价指 标。基于针对多个初始深度学习模型的验证结果和评价条件,从网络层类型集合、网络层结构集合、特征配置数据集合中分别确定目标网络层类型、目标网络层结构、目标特征配置数据。In an example, the verification result includes, for example, the recall rate or precision rate of the initial deep learning model on the verification sample, and the evaluation condition includes, for example, conditions for the recall rate and precision rate, for example, the recall rate of the verification result is evaluated by the evaluation condition Or whether the accuracy rate reaches a certain threshold. In another example, the evaluation condition is related to, for example, an AUC (Area Under Curve) curve, the verification result can be evaluated based on the AUC curve, and the AUC curve is an evaluation index. Based on the verification results and evaluation conditions for multiple initial deep learning models, the target network layer type, target network layer structure, and target feature configuration data are respectively determined from the network layer type set, network layer structure set, and feature configuration data set.
根据本公开的实施例,通过评价条件来评价验证结果,以便从网络层类型集合、网络层结构集合、特征配置数据集合中分别确定训练效果较好的目标网络层类型、目标网络层结构、目标特征配置数据,提高了目标网络层类型、目标网络层结构、目标特征配置数据的确定准确性。According to the embodiment of the present disclosure, the verification results are evaluated by evaluation conditions, so as to determine the target network layer type, target network layer structure, target The feature configuration data improves the determination accuracy of the target network layer type, the target network layer structure, and the target feature configuration data.
在得到目标网络层类型、目标网络层结构、目标特征配置数据之后,可以基于目标网络层类型、目标网络层结构、目标特征配置数据来重新训练模型得到目标深度学习模型,参见以下操作S309~操作S311。After obtaining the target network layer type, target network layer structure, and target feature configuration data, the model can be retrained based on the target network layer type, target network layer structure, and target feature configuration data to obtain the target deep learning model, see the following operation S309~operation S311.
在操作S309,基于目标网络层类型和目标网络层结构,得到待训练目标深度学习模型。In operation S309, a target deep learning model to be trained is obtained based on the type of the target network layer and the structure of the target network layer.
示例性地,基于目标网络层类型和目标网络层结构,构建目标深度学习模型。Exemplarily, based on the type of the target network layer and the structure of the target network layer, a target deep learning model is constructed.
在操作S310,基于目标特征配置数据处理第二训练样本,得到第二训练特征数据。In operation S310, the second training sample is processed based on the target feature configuration data to obtain second training feature data.
示例性地,目标特征配置数据表征了如何处理用于训练目标深度学习模型的第二训练样本,以得到适用于训练目标深度学习模型的第二训练特征数据。Exemplarily, the target feature configuration data represents how to process the second training samples used to train the target deep learning model, so as to obtain the second training feature data suitable for training the target deep learning model.
在操作S311,利用第二训练特征数据训练待训练目标深度学习模型,得到目标深度学习模型。In operation S311, use the second training feature data to train the target deep learning model to be trained to obtain the target deep learning model.
示例性地,构建得到目标深度学习模型之后,可以基于目标特征配置数据处理第二训练样本得到第二训练特征数据,并通过第二训练特征数据训练目标深度学习模型,目标深度学习模型的训练过程与初始深度学习模型的训练过程类似,在此不再赘述。Exemplarily, after the target deep learning model is constructed, the second training sample can be processed based on the target feature configuration data to obtain the second training feature data, and the target deep learning model is trained through the second training feature data. The training process of the target deep learning model It is similar to the training process of the initial deep learning model and will not be repeated here.
根据本公开的实施例,训练多个初始深度学习模型的过程,可以看成是搜索网络层类型、网络层结构、特征配置数据的实验过程。According to the embodiments of the present disclosure, the process of training multiple initial deep learning models can be regarded as an experimental process of searching network layer types, network layer structures, and feature configuration data.
在一示例中,可以直接将验证结果满足评价条件的初始深度学习模型作为最终的目标深度学习模型。In an example, the initial deep learning model whose verification result satisfies the evaluation condition can be directly used as the final target deep learning model.
在另一示例中,目标网络层类型、目标网络层结构、目标特征配置数据可能是来自不同初始深度学习模型的,为了降低数据存储空间的耗费率,可以不保存初始深度学习模型,而保存较优的目标网络层类型、目标网络层结构、目标特征配置数据。然后,基于目标网络层类型、目标网络层结构、目标特征配置数据重新构建和训练目标深度学习模型。In another example, the target network layer type, target network layer structure, and target feature configuration data may come from different initial deep learning models. In order to reduce the consumption rate of data storage space, the initial deep learning model can be saved instead of Optimal target network layer type, target network layer structure, and target feature configuration data. Then, rebuild and train the target deep learning model based on the target network layer type, target network layer structure, and target feature configuration data.
可以理解,首先通过训练初始深度学习模型来得到目标网络层类型、目标网络层结构、目标特征配置数据,再基于目标网络层类型、目标网络层结构、目标特征配置数据重新训练得到目标深度学习模型,不仅提高了目标深度学习模型的精度,还降低了数据存储空间的耗费率。It can be understood that first, the target network layer type, target network layer structure, and target feature configuration data are obtained by training the initial deep learning model, and then the target deep learning model is obtained by retraining based on the target network layer type, target network layer structure, and target feature configuration data , which not only improves the accuracy of the target deep learning model, but also reduces the consumption rate of data storage space.
图4示意性示出了根据本公开一实施例的深度学习模型的训练方法的原理图。Fig. 4 schematically shows a schematic diagram of a method for training a deep learning model according to an embodiment of the present disclosure.
如图4所示,配置文件410例如包括模型类型数据411、多个候选特征配置数据412、评价条件413。As shown in FIG. 4 , the configuration file 410 includes, for example, model type data 411 , multiple candidate feature configuration data 412 , and evaluation conditions 413 .
示例性地,多个候选网络层类型420中例如包括候选网络层类型A1~A4,多个候选超参数430中例如包括候选超参数B1~B4。Exemplarily, the multiple candidate network layer types 420 include, for example, candidate network layer types A1-A4, and the multiple candidate hyperparameters 430, for example, include candidate hyperparameters B1-B4.
基于模型类型数据411,从多个候选网络层类型420中选择针对初始深度学习模型的初始网络层类型,并从多个候选网络层类型420中随机选择目标超参数,作为针对初始深度学习模型的初始网络层结构。Based on the model type data 411, select an initial network layer type for the initial deep learning model from multiple candidate network layer types 420, and randomly select target hyperparameters from multiple candidate network layer types 420 as the initial deep learning model. Initial network layer structure.
以初始深度学习模型431、432、433为例,选择候选网络层类型A1和候选超参数B1分别作为针对初始深度学习模型431的初始网络层类型和初始网络层结构,例如候选网络层类型A1包括全连接层、池化层,候选超参数B1(目标超参数)为全连接层有M个节点、池化层有N个节点,M和N均为大于0的整数。类似地,选择候选网络层类型A2和候选超参数B2分别作为针对初始深度学习模型432的初始网络层类型和初始网络层结构。选择候选网络层类型A3和候选超参数B3分别作为针对初始深度学习模型433的初始网络层类型和初始网络层结构。Taking the initial deep learning models 431, 432, and 433 as examples, select the candidate network layer type A1 and the candidate hyperparameter B1 as the initial network layer type and initial network layer structure for the initial deep learning model 431, for example, the candidate network layer type A1 includes In the fully connected layer and the pooling layer, the candidate hyperparameter B1 (target hyperparameter) is M nodes in the fully connected layer and N nodes in the pooling layer, and both M and N are integers greater than 0. Similarly, the candidate network layer type A2 and the candidate hyperparameter B2 are selected as the initial network layer type and initial network layer structure for the initial deep learning model 432, respectively. The candidate network layer type A3 and the candidate hyperparameter B3 are selected as the initial network layer type and initial network layer structure for the initial deep learning model 433, respectively.
然后,基于候选网络层类型A1和候选超参数B1构建初始深度学习模型431,基于候选网络层类型A2和候选超参数B2构建初始深度学习模型432,基于候选网络层类型A3和候选超参数B3构建初始深度学习模型433。Then, construct an initial deep learning model 431 based on candidate network layer type A1 and candidate hyperparameter B1, construct an initial deep learning model 432 based on candidate network layer type A2 and candidate hyperparameter B2, and construct an initial deep learning model 432 based on candidate network layer type A3 and candidate hyperparameter B3 Initial Deep Learning Model 433 .
在构建得到初始深度学习模型431、432、433之后,需要基于第一训练样本440训练初始深度学习模型431、432、433。After the initial deep learning models 431 , 432 , 433 are constructed, the initial deep learning models 431 , 432 , 433 need to be trained based on the first training samples 440 .
示例性地,从多个候选特征配置数据412中选择针对初始深度学习模型的初始特征配置数据。例如,选择候选特征配置数据C1作为针对初始深度学习模型431的初始特征配置数据,选择候选特征配置数据C2作为针对初始深度学习模型432的初始特征配置数据,选择候选特征配置数据C3作为针对初始深度学习模型433的初始特征配置数据。Exemplarily, the initial feature configuration data for the initial deep learning model is selected from a plurality of candidate feature configuration data 412 . For example, the candidate feature configuration data C1 is selected as the initial feature configuration data for the initial deep learning model 431, the candidate feature configuration data C2 is selected as the initial feature configuration data for the initial deep learning model 432, and the candidate feature configuration data C3 is selected as the initial feature configuration data for the initial deep learning model 431. Initial feature configuration data for the learning model 433 .
针对每个初始深度学习模型,需要基于对应的初始特征配置数据处理第一训练样本 440。以初始深度学习模型431为例,基于初始特征配置数据(C1)确定第一特征类型和第一特征维度,例如初始特征配置数据(C1)定义了第一特征类型和第一特征维度,第一特征类型例如包括年龄、性别、内容类别等特征,第一特征维度例如为特征向量的维度,特征向量的维度例如为1*128维。For each initial deep learning model, the first training sample needs to be processed 440 based on the corresponding initial feature configuration data. Taking the initial deep learning model 431 as an example, the first feature type and the first feature dimension are determined based on the initial feature configuration data (C1), for example, the initial feature configuration data (C1) defines the first feature type and the first feature dimension, the first The feature types include, for example, features such as age, gender, and content category. The first feature dimension is, for example, the dimension of a feature vector, and the dimension of a feature vector is, for example, 1*128 dimensions.
然后,基于第一特征类型从第一训练样本440中提取第一子样本,第一子样本例如是针对包括年龄、性别、内容类别等特征的内容。基于第一特征维度处理第一子样本,得到第一训练特征数据441,第一训练特征数据441例如是特征向量,特征向量的维度例如为1*128维。Then, a first sub-sample is extracted from the first training sample 440 based on the first feature type, for example, the first sub-sample is for content including age, gender, content category and other features. The first sub-sample is processed based on the first feature dimension to obtain the first training feature data 441. The first training feature data 441 is, for example, a feature vector, and the dimension of the feature vector is, for example, 1*128 dimensions.
获得第一训练特征数据442、第一训练特征数据443的过程与获得第一训练特征数据441的过程类似,在此不再赘述。The process of obtaining the first training feature data 442 and the first training feature data 443 is similar to the process of obtaining the first training feature data 441 , and will not be repeated here.
然后,利用第一训练特征数据441训练初始深度学习模型431,利用第一训练特征数据442训练初始深度学习模型432,利用第一训练特征数据443训练初始深度学习模型433。Then, use the first training feature data 441 to train the initial deep learning model 431 , use the first training feature data 442 to train the initial deep learning model 432 , and use the first training feature data 443 to train the initial deep learning model 433 .
初始深度学习模型431~433训练完成之后,得到针对初始深度学习模型431~433的网络层类型集合451、网络层结构集合452、特征配置数据集合453。网络层类型集合451例如包括初始网络层类型A1、A2、A3,网络层结构集合452例如包括初始网络层结构B1、B2、B3,特征配置数据集合453例如包括初始特征配置数据C1、C2、C3。After the initial deep learning models 431-433 are trained, a network layer type set 451, a network layer structure set 452, and a feature configuration data set 453 for the initial deep learning models 431-433 are obtained. The network layer type set 451 includes, for example, initial network layer types A1, A2, and A3, the network layer structure set 452 includes, for example, initial network layer structures B1, B2, and B3, and the feature configuration data set 453 includes, for example, initial feature configuration data C1, C2, and C3. .
然后,基于评价条件413和验证样本460从网络层类型集合451、网络层结构集合452、特征配置数据集合453中分别确定目标网络层类型471(A1)、目标网络层结构472(B2)、目标特征配置数据473(C3),过程与上文内容类似,在此不再赘述。Then, based on the evaluation condition 413 and the verification sample 460, the target network layer type 471 (A1), target network layer structure 472 (B2), target network layer structure 472 (B2), target Feature configuration data 473 (C3), the process is similar to the above content, and will not be repeated here.
接下来,基于目标网络层类型471(A1)和目标网络层结构462(B2)构建目标深度学习模型480。在构建得到目标深度学习模型480之后,需要基于第二训练样本490训练目标深度学习模型480。Next, construct the target deep learning model 480 based on the target network layer type 471 (A1) and the target network layer structure 462 (B2). After the target deep learning model 480 is constructed, the target deep learning model 480 needs to be trained based on the second training sample 490 .
示例性地,基于目标特征配置数据473(C3)处理第二训练样本490得到第二训练特征数据491。例如,基于目标特征配置数据473(C3)确定第二特征类型和第二特征维度,目标特征配置数据473(C3)例如定义了第二特征类型和第二特征维度,第二特征类型例如包括年龄、性别等特征,第二特征维度例如为特征向量的维度,特征向量的维度例如为1*256维。Exemplarily, the second training sample 490 is processed based on the target feature configuration data 473 ( C3 ) to obtain the second training feature data 491 . For example, the second feature type and the second feature dimension are determined based on the target feature configuration data 473 (C3). The target feature configuration data 473 (C3) defines, for example, the second feature type and the second feature dimension. The second feature type includes, for example, age , gender and other features, the second feature dimension is, for example, the dimension of the feature vector, and the dimension of the feature vector is, for example, 1*256 dimensions.
然后,基于第二特征类型从第二训练样本490中提取第二子样本,第二子样本例如是针对包括年龄、性别等特征的内容。基于第二特征维度处理第二子样本,得到第二训 练特征数据491,第二训练特征数据491例如是特征向量,特征向量的维度例如为1*256维。Then, a second sub-sample is extracted from the second training sample 490 based on the second feature type, for example, the second sub-sample is for content including age, gender and other features. The second sub-sample is processed based on the second feature dimension to obtain the second training feature data 491. The second training feature data 491 is, for example, a feature vector, and the dimension of the feature vector is, for example, 1*256 dimensions.
接下来,利用第二训练特征数据491训练目标深度学习模型480,得到经训练的目标深度学习模型480为最终的深度学习模型。Next, use the second training feature data 491 to train the target deep learning model 480, and obtain the trained target deep learning model 480 as the final deep learning model.
在本公开的另一示例中,可以基于飞桨训练框架PaddlePaddle和开源分布式Ray来训练模型。例如,使用PaddlePaddle实现模型构建和模型训练,使用Ray在本地训练和集群训练之间无缝切换,Ray可以自动调度可用资源进行并行训练,提高了资源利用率和并行训练程度数。In another example of the present disclosure, the model can be trained based on the paddle training framework PaddlePaddle and open source distributed Ray. For example, using PaddlePaddle to implement model building and model training, using Ray to seamlessly switch between local training and cluster training, Ray can automatically schedule available resources for parallel training, improving resource utilization and parallel training levels.
例如,配置文件包括特征配置文件和训练配置两份文件。特征配置文件例如包括候选特征配置数据,特征配置文件还可以包括特征的处理方式,处理方式例如包括归一化、哈希操作等。训练配置文件包括除特征之外的其他数据,例如包括模型类型数据、评价条件等等。For example, a configuration file includes two files, a feature configuration file and a training configuration file. The feature configuration file includes, for example, candidate feature configuration data, and the feature configuration file may also include a processing method of the feature, and the processing method includes, for example, normalization, hash operation, and the like. The training configuration file includes data other than features, for example including model type data, evaluation conditions, and the like.
训练过程所使用的训练样本、验证样本、候选特征配置数据、模型结构、超参数、训练资源配置均可以通过配置文件的方式进行调用,无需修改框架代码,一键启动即可开始实验训练,降低了技术门槛和训练难度。The training samples, verification samples, candidate feature configuration data, model structure, hyperparameters, and training resource configuration used in the training process can all be called through the configuration file, without modifying the framework code, and the experimental training can be started with one click, reducing the Technical threshold and training difficulty.
例如,第一步,输入配置文件、第一训练样本、验证样本进行初始深度学习模型的自动训练搜索,搜索结果例如包括超参数、特征类型、特征维度(embedding dimension)、模型结构等等。第二步,基于搜索结果和第二训练样本重新训练目标深度学习模型。For example, in the first step, input the configuration file, the first training sample, and the verification sample to perform automatic training search of the initial deep learning model, and the search results include, for example, hyperparameters, feature types, feature dimensions (embedding dimension), model structure, and the like. In the second step, the target deep learning model is retrained based on the search results and the second training samples.
配置文件中的模型类型数据例如定义了如何选择初始模型类型和网络层结构(搜索方向),候选特征配置数据例如定义了特征类型搜索和特征维度搜索。其中,超参数搜索、特征类型搜索、特征维度搜索和模型结构搜索可以统称为搜索方向。The model type data in the configuration file, for example, defines how to select the initial model type and network layer structure (search direction), and the candidate feature configuration data, for example, defines the feature type search and feature dimension search. Among them, hyperparameter search, feature type search, feature dimension search and model structure search can be collectively referred to as search directions.
特征类型包括模型训练时需要从样本数据中提取的特征或组合特征,特征例如包括性别、年龄等特征,组合特征例如包括性别和年龄的组合。Feature types include features or combined features that need to be extracted from sample data during model training. Features include, for example, gender and age, and combined features include, for example, a combination of gender and age.
示例性地,超参数搜索例如包括搜索空间、搜索算法和scheduler算法(调度算法)。搜索空间包括random search(随机搜索)、grid search(网格搜索)、均匀分布抽取等算法,搜索空间表征有哪些候选超参数可供搜索。搜索算法包括grid search(网格搜索)算法、贝叶斯优化算法、OPTUNA优化等算法,OPTUNA是一个自动超参数优化的框架,搜索算法用于基于候选超参数的训练结果来确定最优的超参数。scheduler算法(调度算法)包括先进先出FIFO算法、ASHA算法等,ASHA算法是一种调参算法,scheduler算法表征如何调度计算资源来基于候选超参数进行并行训练。Exemplarily, the hyperparameter search includes, for example, a search space, a search algorithm, and a scheduler algorithm (scheduling algorithm). The search space includes algorithms such as random search (random search), grid search (grid search), and uniform distribution extraction. The search space represents which candidate hyperparameters are available for search. Search algorithms include grid search (grid search) algorithm, Bayesian optimization algorithm, OPTUNA optimization and other algorithms. OPTUNA is a framework for automatic hyperparameter optimization. The search algorithm is used to determine the optimal hyperparameter based on the training results of candidate hyperparameters. parameter. The scheduler algorithm (scheduling algorithm) includes the first-in-first-out FIFO algorithm, ASHA algorithm, etc. The ASHA algorithm is a parameter tuning algorithm. The scheduler algorithm represents how to schedule computing resources to perform parallel training based on candidate hyperparameters.
组合特征可以通过AutoCross(自动交叉)算法、AutoFis(自动调整)等模型进行搜索,AutoCross模型负责筛选有用的显式交叉特征,例如筛选对模型训练效果有提升的特征。AutoFis模型负责过滤FM(因式分解机)模型和DeepFM模型中无用的二阶交叉特征(隐式交叉特征)。显式交叉特征例如为多个特征的合并或拼接,隐式交叉特征例如为多个特征的点乘。Combined features can be searched through AutoCross (automatic crossover) algorithm, AutoFis (automatic adjustment) and other models. The AutoCross model is responsible for screening useful explicit crossover features, such as screening features that improve the training effect of the model. The AutoFis model is responsible for filtering the useless second-order cross features (implicit cross features) in the FM (Factorization Machine) model and the DeepFM model. The explicit intersection feature is, for example, the combination or concatenation of multiple features, and the implicit intersection feature is, for example, the point product of multiple features.
针对特征维度,可以利用AutoDim算法和AutoDis算法进行搜索,AutoDim算法是一种自动维度寻优的算法,AutoDis算法是一种数值特征自动离散化算法。AutoDim算法从不同的特征维度中搜索出不同的维度大小,即为离散特征搜索适合的维度。AutoDis算法支持连续特征embedding化(将连续特征离散化),并在训练过程中,给不同的连续特征搜索出最适合的维度大小。For the feature dimension, AutoDim algorithm and AutoDis algorithm can be used to search. AutoDim algorithm is an algorithm for automatic dimension optimization, and AutoDis algorithm is an automatic discretization algorithm for numerical features. The AutoDim algorithm searches out different dimension sizes from different feature dimensions, that is, searches for suitable dimensions for discrete features. The AutoDis algorithm supports continuous feature embedding (discretization of continuous features), and searches for the most suitable dimension size for different continuous features during the training process.
模型结构搜索可以通过NAS模型(一种压缩模型)学习child architecture(网络层)对应的权重,从而得到一个最优模型结构。例如,通过学习多个候选网络层对应的权重,将权重较大的候选网络层作为最终的网络层。Model structure search can learn the weight corresponding to the child architecture (network layer) through the NAS model (a compression model), so as to obtain an optimal model structure. For example, by learning the weights corresponding to multiple candidate network layers, the candidate network layer with a larger weight is used as the final network layer.
在进行模型的搜索和训练的实验时,可以将实验过程和实验结果进行可视化。例如通过VisualDL工具进行公式化,VisualDL工具为飞桨训练框架PaddlePaddle中的可视化分析工具,以丰富的图表展现实验中不同超参数对实验结果的影响,能更直观地理解搜索空间、搜索算法对推荐模型的影响。When performing model search and training experiments, the experimental process and experimental results can be visualized. For example, it is formulated through the VisualDL tool. The VisualDL tool is a visual analysis tool in PaddlePaddle, a flying paddle training framework. It uses rich charts to show the influence of different hyperparameters in the experiment on the experimental results, and can more intuitively understand the search space and search algorithm for the recommendation model. Impact.
模型的训练过程支持批量离线训练搜索和增量训练搜索。例如,通过配置的方式选择批量离线搜索训练或者增量搜索训练。对于批量离线搜索,在同一个数据集上对比实验结果选择最优搜索结果。对于增量搜索训练,如果增量搜索的实验效果优于原来的实验,则替换,否则保留原来的模型结构和超参数并继续训练。The training process of the model supports batch offline training search and incremental training search. For example, batch offline search training or incremental search training can be selected through configuration. For batch offline search, compare the experimental results on the same data set to select the optimal search result. For incremental search training, if the experimental effect of incremental search is better than the original experiment, replace it, otherwise keep the original model structure and hyperparameters and continue training.
训练过程可以通过并行的方式进行,例如有些计算资源基于一部分超参数、模型结构、训练样本进行训练,有些计算资源基于另一部分超参数、模型结构、训练样本进行训练,从而提高训练效率。The training process can be carried out in a parallel manner. For example, some computing resources are trained based on a part of hyperparameters, model structure, and training samples, and some computing resources are trained based on another part of hyperparameters, model structure, and training samples, thereby improving training efficiency.
图5示意性示出了根据本公开一实施例的内容推荐方法的流程图。Fig. 5 schematically shows a flowchart of a content recommendation method according to an embodiment of the present disclosure.
如图5所示,本公开实施例的内容推荐方法500例如可以包括操作S510~操作S540。As shown in FIG. 5 , the content recommendation method 500 of the embodiment of the present disclosure may include, for example, operation S510 to operation S540.
在操作S510,确定针对目标对象的对象特征数据。In operation S510, object feature data for the target object is determined.
在操作S520,针对至少一个候选内容中的目标内容,确定针对目标内容的内容特征数据。In operation S520, content feature data for the target content is determined for the target content in the at least one candidate content.
在操作S530,将对象特征数据和内容特征数据输入目标深度学习模型中,得到输 出结果。In operation S530, the object feature data and the content feature data are input into the target deep learning model to obtain an output result.
在操作S540,响应于输出结果满足预设条件,向目标对象推荐目标内容。In operation S540, in response to the output result satisfying the preset condition, the target content is recommended to the target object.
示例性地,上文提及的初始深度学习模型或目标深度学习模型适用于内容推荐场景,内容包括但不仅限于文章、商品、新闻。Exemplarily, the above-mentioned initial deep learning model or target deep learning model is suitable for content recommendation scenarios, including but not limited to articles, commodities, and news.
例如,目标对象为浏览内容的对象,对象特征数据例如包括目标对象的年龄、性别、历史浏览记录、所浏览的内容类别等等。将多个候选内容中的任意一个作为目标内容,并确定目标内容的内容特征数据,内容特征数据例如包括但不仅限于内容类别、主题信息、关键词信息。For example, the target object is an object that browses content, and the object feature data includes, for example, the target object's age, gender, historical browsing records, browsed content category, and so on. Any one of multiple candidate contents is taken as the target content, and the content feature data of the target content is determined. The content feature data includes, but not limited to, content category, topic information, and keyword information.
将对象特征数据和内容特征数据输入目标深度学习模型中得到输出结果,输出结果表征了目标对象对目标内容的感兴趣程度。在另一示例中,当初始深度学习模型的模型精度符合要求时,也可以将对象特征数据和内容特征数据输入初始深度学习模型中得到输出结果。初始深度学习模型或目标深度学习模型可以自动学习得到对象特征数据和内容特征数据之间的关联。如果输出结果满足预设条件,表示目标对象对目标内容的感兴趣程度较大,此时可以向目标对象推荐目标内容。如果输出结果不满足预设条件,表示目标对象对目标内容的感兴趣程度较小,此时可以不向目标对象推荐目标内容。The object feature data and content feature data are input into the target deep learning model to obtain an output result, and the output result represents the degree of interest of the target object in the target content. In another example, when the model accuracy of the initial deep learning model meets requirements, the object feature data and content feature data may also be input into the initial deep learning model to obtain an output result. The initial deep learning model or the target deep learning model can automatically learn the association between object feature data and content feature data. If the output result satisfies the preset condition, it means that the target object is more interested in the target content, and at this time the target content can be recommended to the target object. If the output result does not meet the preset condition, it means that the target object is less interested in the target content, and at this time the target content may not be recommended to the target object.
根据本公开的实施例,通过初始深度学习模型或目标深度学习模型进行内容推荐,提高了内容推荐的准确性和效率,推荐的内容满足目标对象的需求,提高目标对象的使用体验。According to the embodiments of the present disclosure, the content recommendation is performed through the initial deep learning model or the target deep learning model, which improves the accuracy and efficiency of content recommendation, and the recommended content meets the needs of the target object and improves the user experience of the target object.
图6示意性示出了根据本公开一实施例的深度学习模型的训练装置的框图。Fig. 6 schematically shows a block diagram of a training device for a deep learning model according to an embodiment of the present disclosure.
如图6所示,本公开实施例的深度学习模型的训练装置600例如包括获取模块610、选择模块620、第一获得模块630、第一处理模块640、第一训练模块650以及第二获得模块660。As shown in Figure 6, the training device 600 of the deep learning model of the embodiment of the present disclosure includes, for example, an acquisition module 610, a selection module 620, a first acquisition module 630, a first processing module 640, a first training module 650 and a second acquisition module 660.
获取模块610可以用于获取配置文件,其中,配置文件包括模型类型数据和候选特征配置数据。根据本公开实施例,获取模块610例如可以执行上文参考图2描述的操作S210,在此不再赘述。The obtaining module 610 may be used to obtain a configuration file, wherein the configuration file includes model type data and candidate feature configuration data. According to an embodiment of the present disclosure, the acquiring module 610 may, for example, perform the operation S210 described above with reference to FIG. 2 , which will not be repeated here.
选择模块620可以用于基于模型类型数据,选择初始网络层类型和初始网络层结构。根据本公开实施例,选择模块620例如可以执行上文参考图2描述的操作S220,在此不再赘述。The selection module 620 may be configured to select an initial network layer type and an initial network layer structure based on the model type data. According to an embodiment of the present disclosure, the selection module 620 may, for example, perform the operation S220 described above with reference to FIG. 2 , which will not be repeated here.
第一获得模块630可以用于基于初始网络层类型和初始网络层结构,获得初始深度学习模型。根据本公开实施例,第一获得模块630例如可以执行上文参考图2描述的操 作S230,在此不再赘述。The first obtaining module 630 can be used to obtain an initial deep learning model based on the initial network layer type and the initial network layer structure. According to an embodiment of the present disclosure, the first obtaining module 630 may, for example, perform operation S230 described above with reference to FIG. 2 , which will not be repeated here.
第一处理模块640可以用于基于候选特征配置数据处理第一训练样本,得到第一训练特征数据。根据本公开实施例,第一处理模块640例如可以执行上文参考图2描述的操作S240,在此不再赘述。The first processing module 640 may be configured to process the first training sample based on the candidate feature configuration data to obtain first training feature data. According to an embodiment of the present disclosure, the first processing module 640 may, for example, execute the operation S240 described above with reference to FIG. 2 , which will not be repeated here.
第一训练模块650可以用于利用第一训练特征数据训练初始深度学习模型。根据本公开实施例,第一训练模块650例如可以执行上文参考图2描述的操作S250,在此不再赘述。The first training module 650 can be used to train an initial deep learning model using the first training feature data. According to an embodiment of the present disclosure, the first training module 650 may, for example, execute the operation S250 described above with reference to FIG. 2 , which will not be repeated here.
第二获得模块660可以用于基于经训练的初始深度学习模型,得到目标深度学习模型。根据本公开实施例,第二获得模块660例如可以执行上文参考图2描述的操作S260,在此不再赘述。The second obtaining module 660 can be used to obtain a target deep learning model based on the trained initial deep learning model. According to an embodiment of the present disclosure, the second obtaining module 660 may, for example, perform the operation S260 described above with reference to FIG. 2 , which will not be repeated here.
根据本公开的实施例,经训练的初始深度学习模型包括至少一个经训练的初始深度学习模型;配置文件还包括评价条件;第二获得模块包括:第一处理子模块、输入子模块、第一确定子模块和获得子模块。第一处理子模块,用于基于候选特征配置数据处理验证样本,得到验证特征数据;输入子模块,用于将验证特征数据分别输入至少一个经训练的初始深度学习模型中,得到至少一个验证结果;第一确定子模块,用于基于至少一个验证结果和评价条件,从网络层类型集合、网络层结构集合、特征配置数据集合中分别确定目标网络层类型、目标网络层结构、目标特征配置数据;获得子模块,用于基于目标网络层类型、目标网络层结构、目标特征配置数据,得到目标深度学习模型。According to an embodiment of the present disclosure, the trained initial deep learning model includes at least one trained initial deep learning model; the configuration file also includes evaluation conditions; the second obtaining module includes: a first processing sub-module, an input sub-module, a first Identify submodules and get submodules. The first processing submodule is used to process verification samples based on candidate feature configuration data to obtain verification feature data; the input submodule is used to input verification feature data into at least one trained initial deep learning model to obtain at least one verification result ; The first determining submodule is used to determine the target network layer type, target network layer structure, and target feature configuration data from the network layer type set, network layer structure set, and feature configuration data set based on at least one verification result and evaluation condition ; The obtaining sub-module is used to obtain the target deep learning model based on the target network layer type, target network layer structure, and target feature configuration data.
根据本公开的实施例,网络层类型集合包括针对至少一个经训练的初始深度学习模型的初始网络层类型;网络层结构集合包括针对至少一个经训练的初始深度学习模型的初始网络层结构;特征配置数据集合包括针对至少一个经训练的初始深度学习模型的初始特征配置数据,特征配置数据集合中的初始特征配置数据为候选特征配置数据中的至少部分。According to an embodiment of the present disclosure, the network layer type set includes an initial network layer type for at least one trained initial deep learning model; the network layer structure set includes an initial network layer structure for at least one trained initial deep learning model; features The configuration data set includes initial feature configuration data for at least one trained initial deep learning model, and the initial feature configuration data in the feature configuration data set is at least part of the candidate feature configuration data.
根据本公开的实施例,获得子模块包括:获得单元、处理单元和训练单元。获得单元,用于基于目标网络层类型和目标网络层结构,得到待训练目标深度学习模型;处理单元,用于基于目标特征配置数据处理第二训练样本,得到第二训练特征数据;训练单元,用于利用第二训练特征数据训练待训练目标深度学习模型,得到所述目标深度学习模型。According to an embodiment of the present disclosure, the obtaining submodule includes: an obtaining unit, a processing unit and a training unit. The obtaining unit is used to obtain the target deep learning model to be trained based on the target network layer type and the target network layer structure; the processing unit is used to process the second training sample based on the target feature configuration data to obtain the second training feature data; the training unit, It is used for using the second training feature data to train the target deep learning model to be trained to obtain the target deep learning model.
根据本公开的实施例,候选特征配置数据包括至少一个候选特征配置数据;第一处理模块640包括:第一选择子模块、第二确定子模块、提取子模块和第二处理子模块。 第一选择子模块,用于从至少一个候选配置数据中选择针对初始深度学习模型的初始特征配置数据;第二确定子模块,用于基于初始特征配置数据,确定第一特征类型和第一特征维度;提取子模块,用于基于第一特征类型,从第一训练样本中提取第一子样本;第二处理子模块,用于基于第一特征维度处理第一子样本,得到第一训练特征数据。According to an embodiment of the present disclosure, the candidate feature configuration data includes at least one candidate feature configuration data; the first processing module 640 includes: a first selection submodule, a second determination submodule, an extraction submodule and a second processing submodule. The first selection submodule is used to select the initial feature configuration data for the initial deep learning model from at least one candidate configuration data; the second determination submodule is used to determine the first feature type and the first feature based on the initial feature configuration data dimension; the extraction sub-module is used to extract the first sub-sample from the first training sample based on the first feature type; the second processing sub-module is used to process the first sub-sample based on the first feature dimension to obtain the first training feature data.
根据本公开的实施例,处理单元包括:确定子单元、提取子单元和处理子单元。确定子单元,用于基于目标特征配置数据,确定第二特征类型和第二特征维度;提取子单元,用于基于第二特征类型,从第二训练样本中提取第二子样本;处理子单元,用于基于第二特征维度处理第二子样本,得到第二训练特征数据。According to an embodiment of the present disclosure, the processing unit includes: a determination subunit, an extraction subunit, and a processing subunit. The determination subunit is used to determine the second feature type and the second feature dimension based on the target feature configuration data; the extraction subunit is used to extract the second subsample from the second training sample based on the second feature type; the processing subunit , for processing the second sub-sample based on the second feature dimension to obtain the second training feature data.
根据本公开的实施例,选择模块620包括:第二选择子模块和第三选择子模块。第二选择子模块,用于基于模型类型数据,从至少一个候选网络层类型中选择针对初始深度学习模型的初始网络层类型;第三选择子模块,用于从至少一个候选超参数中选择目标超参数,作为针对初始深度学习模型的初始网络层结构。According to an embodiment of the present disclosure, the selection module 620 includes: a second selection submodule and a third selection submodule. The second selection submodule is used to select the initial network layer type for the initial deep learning model from at least one candidate network layer type based on the model type data; the third selection submodule is used to select the target from at least one candidate hyperparameter Hyperparameters, as the initial network layer structure for the initial deep learning model.
图7示意性示出了根据本公开一实施例的内容推荐装置的框图。Fig. 7 schematically shows a block diagram of a content recommendation device according to an embodiment of the present disclosure.
如图7所示,本公开实施例的内容推荐装置700例如包括第一确定模块710、第二确定模块720、输入模块730和推荐模块740。As shown in FIG. 7 , the content recommendation device 700 of the embodiment of the present disclosure includes, for example, a first determination module 710 , a second determination module 720 , an input module 730 and a recommendation module 740 .
第一确定模块710可以用于确定针对目标对象的对象特征数据。根据本公开实施例,第一确定模块710例如可以执行上文参考图5描述的操作S510,在此不再赘述。The first determination module 710 may be used to determine object feature data for the target object. According to an embodiment of the present disclosure, the first determining module 710 may, for example, perform the operation S510 described above with reference to FIG. 5 , which will not be repeated here.
第二确定模块720可以用于针对至少一个候选内容中的目标内容,确定针对目标内容的内容特征数据。根据本公开实施例,第二确定模块720例如可以执行上文参考图5描述的操作S520,在此不再赘述。The second determination module 720 may be configured to determine content feature data for the target content for the target content in at least one candidate content. According to an embodiment of the present disclosure, the second determining module 720 may, for example, perform the operation S520 described above with reference to FIG. 5 , which will not be repeated here.
输入模块730可以用于将对象特征数据和内容特征数据输入目标深度学习模型中,得到输出结果,其中,目标深度学习模型采用上述的深度学习模型的训练装置生成,输出结果表征了目标对象对目标内容的感兴趣程度。根据本公开实施例,输入模块730例如可以执行上文参考图5描述的操作S530,在此不再赘述。The input module 730 can be used to input object feature data and content feature data into the target deep learning model to obtain an output result, wherein the target deep learning model is generated using the above-mentioned deep learning model training device, and the output result represents the target object's effect on the target. Level of interest in the content. According to an embodiment of the present disclosure, the input module 730 may, for example, perform the operation S530 described above with reference to FIG. 5 , which will not be repeated here.
推荐模块740可以用于响应于输出结果满足预设条件,向目标对象推荐目标内容。根据本公开实施例,推荐模块740例如可以执行上文参考图5描述的操作S540,在此不再赘述。The recommendation module 740 may be configured to recommend target content to the target object in response to the output result meeting the preset condition. According to an embodiment of the present disclosure, the recommendation module 740 may, for example, perform the operation S540 described above with reference to FIG. 5 , which will not be repeated here.
在本公开的技术方案中,所涉及的用户个人信息的收集、存储、使用、加工、传输、提供、公开和应用等处理,均符合相关法律法规的规定,采取了必要保密措施,且不违背公序良俗。In the technical solution of this disclosure, the collection, storage, use, processing, transmission, provision, disclosure, and application of the user's personal information involved are all in compliance with relevant laws and regulations, necessary confidentiality measures have been taken, and they do not violate the Public order and good customs.
在本公开的技术方案中,在获取或采集用户个人信息之前,均获取了用户的授权或同意。In the technical solution of the present disclosure, before acquiring or collecting the user's personal information, the user's authorization or consent is obtained.
根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。According to the embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
图8是用来实现本公开实施例的用于执行深度学习模型的训练和/或内容推荐的电子设备的框图。FIG. 8 is a block diagram of an electronic device for performing deep learning model training and/or content recommendation to implement an embodiment of the present disclosure.
图8示出了可以用来实施本公开实施例的示例电子设备800的示意性框图。电子设备800旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。FIG. 8 shows a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic device 800 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
如图8所示,设备800包括计算单元801,其可以根据存储在只读存储器(ROM)802中的计算机程序或者从存储单元808加载到随机访问存储器(RAM)803中的计算机程序,来执行各种适当的动作和处理。在RAM 803中,还可存储设备800操作所需的各种程序和数据。计算单元801、ROM 802以及RAM 803通过总线804彼此相连。输入/输出(I/O)接口805也连接至总线804。As shown in FIG. 8, the device 800 includes a computing unit 801 that can execute according to a computer program stored in a read-only memory (ROM) 802 or loaded from a storage unit 808 into a random access memory (RAM) 803. Various appropriate actions and treatments. In the RAM 803, various programs and data necessary for the operation of the device 800 can also be stored. The computing unit 801, ROM 802, and RAM 803 are connected to each other through a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804 .
设备800中的多个部件连接至I/O接口805,包括:输入单元806,例如键盘、鼠标等;输出单元807,例如各种类型的显示器、扬声器等;存储单元808,例如磁盘、光盘等;以及通信单元809,例如网卡、调制解调器、无线通信收发机等。通信单元809允许设备800通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Multiple components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, etc.; an output unit 807, such as various types of displays, speakers, etc.; a storage unit 808, such as a magnetic disk, an optical disk, etc. ; and a communication unit 809, such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 809 allows the device 800 to exchange information/data with other devices over a computer network such as the Internet and/or various telecommunication networks.
计算单元801可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元801的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元801执行上文所描述的各个方法和处理,例如深度学习模型的训练方法和/或内容推荐方法。例如,在一些实施例中,深度学习模型的训练方法和/或内容推荐方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元808。在一些实施例中,计算机程序的部分或者全部可以经由ROM 802和/或通信单元809而被载入和/或安装到设备800上。当 计算机程序加载到RAM 803并由计算单元801执行时,可以执行上文描述的深度学习模型的训练方法和/或内容推荐方法的一个或多个步骤。备选地,在其他实施例中,计算单元801可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行深度学习模型的训练方法和/或内容推荐方法。The computing unit 801 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing units 801 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 801 executes various methods and processes described above, such as a deep learning model training method and/or a content recommendation method. For example, in some embodiments, the deep learning model training method and/or the content recommendation method may be implemented as a computer software program, which is tangibly embodied in a machine-readable medium, such as the storage unit 808 . In some embodiments, part or all of the computer program may be loaded and/or installed on the device 800 via the ROM 802 and/or the communication unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the above-described deep learning model training method and/or content recommendation method can be executed. Alternatively, in other embodiments, the computing unit 801 may be configured in any other appropriate way (for example, by means of firmware) to execute a deep learning model training method and/or a content recommendation method.
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、复杂可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips Implemented in a system of systems (SOC), complex programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程深度学习模型的训练装置和/或内容推荐装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes can be provided to a processor or controller of a general-purpose computer, a special-purpose computer, or other programmable deep learning model training device and/or content recommendation device, so that the program code when executed by the processor or controller makes the flowchart and and/or the functions/operations specified in the block diagrams are implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例 如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide for interaction with the user, the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,也可以为分布式系统的服务器,或者是结合了区块链的服务器。A computer system may include clients and servers. Clients and servers are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, a server of a distributed system, or a server combined with a blockchain.
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, each step described in the present disclosure may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。The specific implementation manners described above do not limit the protection scope of the present disclosure. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present disclosure shall be included within the protection scope of the present disclosure.

Claims (19)

  1. 一种深度学习模型的训练方法,包括:A training method for a deep learning model, comprising:
    获取配置文件,其中,所述配置文件包括模型类型数据和候选特征配置数据;Obtain a configuration file, wherein the configuration file includes model type data and candidate feature configuration data;
    基于所述模型类型数据,选择初始网络层类型和初始网络层结构;Selecting an initial network layer type and an initial network layer structure based on the model type data;
    基于所述初始网络层类型和所述初始网络层结构,获得初始深度学习模型;Obtain an initial deep learning model based on the initial network layer type and the initial network layer structure;
    基于所述候选特征配置数据处理第一训练样本,得到第一训练特征数据;Processing a first training sample based on the candidate feature configuration data to obtain first training feature data;
    利用所述第一训练特征数据训练所述初始深度学习模型;以及training the initial deep learning model using the first training feature data; and
    基于经训练的初始深度学习模型,得到目标深度学习模型。Based on the trained initial deep learning model, a target deep learning model is obtained.
  2. 根据权利要求1所述的方法,其中,所述经训练的初始深度学习模型包括至少一个经训练的初始深度学习模型;所述配置文件还包括评价条件;所述基于经训练的初始深度学习模型,得到目标深度学习模型包括:The method according to claim 1, wherein said trained initial deep learning model comprises at least one trained initial deep learning model; said configuration file also includes evaluation conditions; said initial deep learning model based on training , to get the target deep learning model including:
    基于所述候选特征配置数据处理验证样本,得到验证特征数据;Processing verification samples based on the candidate feature configuration data to obtain verification feature data;
    将所述验证特征数据分别输入所述至少一个经训练的初始深度学习模型中,得到至少一个验证结果;respectively inputting the verification feature data into the at least one trained initial deep learning model to obtain at least one verification result;
    基于所述至少一个验证结果和所述评价条件,从网络层类型集合、网络层结构集合、特征配置数据集合中分别确定目标网络层类型、目标网络层结构、目标特征配置数据;以及Based on the at least one verification result and the evaluation condition, determine the target network layer type, target network layer structure, and target feature configuration data from the network layer type set, network layer structure set, and feature configuration data set, respectively; and
    基于所述目标网络层类型、所述目标网络层结构、所述目标特征配置数据,得到所述目标深度学习模型。The target deep learning model is obtained based on the target network layer type, the target network layer structure, and the target feature configuration data.
  3. 根据权利要求2所述的方法,其中:The method of claim 2, wherein:
    所述网络层类型集合包括针对所述至少一个经训练的初始深度学习模型的初始网络层类型;the set of network layer types includes an initial network layer type for the at least one trained initial deep learning model;
    所述网络层结构集合包括针对所述至少一个经训练的初始深度学习模型的初始网络层结构;The set of network layer structures includes an initial network layer structure for the at least one trained initial deep learning model;
    所述特征配置数据集合包括针对所述至少一个经训练的初始深度学习模型的初始特征配置数据,所述特征配置数据集合中的初始特征配置数据为所述候选特征配置数据中的至少部分。The feature configuration data set includes initial feature configuration data for the at least one trained initial deep learning model, and the initial feature configuration data in the feature configuration data set is at least part of the candidate feature configuration data.
  4. 根据权利要求2或3所述的方法,其中,所述基于所述目标网络层类型、所述目标网络层结构、所述目标特征配置数据,得到所述目标深度学习模型包括:The method according to claim 2 or 3, wherein said obtaining said target deep learning model based on said target network layer type, said target network layer structure, and said target feature configuration data comprises:
    基于所述目标网络层类型和所述目标网络层结构,得到待训练目标深度学习模型;Based on the target network layer type and the target network layer structure, a target deep learning model to be trained is obtained;
    基于所述目标特征配置数据处理第二训练样本,得到第二训练特征数据;以及processing a second training sample based on the target feature configuration data to obtain second training feature data; and
    利用所述第二训练特征数据训练所述待训练目标深度学习模型,得到所述目标深度学习模型。Using the second training feature data to train the target deep learning model to be trained to obtain the target deep learning model.
  5. 根据权利要求1所述的方法,其中,所述候选特征配置数据包括至少一个候选特征配置数据;所述基于所述候选特征配置数据处理第一训练样本,得到第一训练特征数据包括:The method according to claim 1, wherein the candidate feature configuration data comprises at least one candidate feature configuration data; the processing of the first training sample based on the candidate feature configuration data to obtain the first training feature data comprises:
    从所述至少一个候选配置数据中选择针对所述初始深度学习模型的初始特征配置数据;selecting initial feature configuration data for the initial deep learning model from the at least one candidate configuration data;
    基于所述初始特征配置数据,确定第一特征类型和第一特征维度;determining a first feature type and a first feature dimension based on the initial feature configuration data;
    基于所述第一特征类型,从所述第一训练样本中提取第一子样本;以及extracting a first subsample from the first training sample based on the first feature type; and
    基于所述第一特征维度处理所述第一子样本,得到第一训练特征数据。Processing the first sub-sample based on the first feature dimension to obtain first training feature data.
  6. 根据权利要求4所述的方法,其中,所述基于所述目标特征配置数据处理第二训练样本,得到第二训练特征数据包括:The method according to claim 4, wherein said processing the second training sample based on the target feature configuration data to obtain the second training feature data comprises:
    基于所述目标特征配置数据,确定第二特征类型和第二特征维度;determining a second feature type and a second feature dimension based on the target feature configuration data;
    基于所述第二特征类型,从所述第二训练样本中提取第二子样本;以及extracting a second subsample from the second training sample based on the second feature type; and
    基于所述第二特征维度处理所述第二子样本,得到第二训练特征数据。Processing the second sub-sample based on the second feature dimension to obtain second training feature data.
  7. 根据权利要求1所述的方法,其中,所述基于所述模型类型数据,选择初始网络层类型和初始网络层结构包括:The method according to claim 1, wherein said selecting an initial network layer type and an initial network layer structure based on said model type data comprises:
    基于所述模型类型数据,从至少一个候选网络层类型中选择针对初始深度学习模型的初始网络层类型;以及selecting an initial network layer type for an initial deep learning model from at least one candidate network layer type based on the model type data; and
    从至少一个候选超参数中选择目标超参数,作为针对初始深度学习模型的初始网络层结构。A target hyperparameter is selected from at least one candidate hyperparameter as an initial network layer structure for the initial deep learning model.
  8. 一种内容推荐方法,包括:A content recommendation method comprising:
    确定针对目标对象的对象特征数据;determining object characteristic data for the target object;
    针对至少一个候选内容中的目标内容,确定针对所述目标内容的内容特征数据;For target content in at least one candidate content, determine content characteristic data for the target content;
    将所述对象特征数据和所述内容特征数据输入目标深度学习模型中,得到输出结果,其中,所述目标深度学习模型采用如权利要求1-7中任意一项所述的方法生成,所述输出结果表征了所述目标对象对所述目标内容的感兴趣程度;以及Input the object feature data and the content feature data into the target deep learning model to obtain an output result, wherein the target deep learning model is generated by the method according to any one of claims 1-7, and the The output result characterizes the degree of interest of the target object in the target content; and
    响应于所述输出结果满足预设条件,向所述目标对象推荐所述目标内容。In response to the output result meeting a preset condition, recommending the target content to the target object.
  9. 一种深度学习模型的训练装置,包括:A training device for a deep learning model, comprising:
    获取模块,用于获取配置文件,其中,所述配置文件包括模型类型数据和候选特征配置数据;An acquisition module, configured to acquire a configuration file, wherein the configuration file includes model type data and candidate feature configuration data;
    选择模块,用于基于所述模型类型数据,选择初始网络层类型和初始网络层结构;A selection module, configured to select an initial network layer type and an initial network layer structure based on the model type data;
    第一获得模块,用于基于所述初始网络层类型和所述初始网络层结构,获得初始深度学习模型;A first obtaining module, configured to obtain an initial deep learning model based on the initial network layer type and the initial network layer structure;
    第一处理模块,用于基于所述候选特征配置数据处理第一训练样本,得到第一训练特征数据;A first processing module, configured to process a first training sample based on the candidate feature configuration data to obtain first training feature data;
    第一训练模块,用于利用所述第一训练特征数据训练所述初始深度学习模型;以及A first training module, configured to use the first training feature data to train the initial deep learning model; and
    第二获得模块,用于基于经训练的初始深度学习模型,得到目标深度学习模型。The second obtaining module is used to obtain a target deep learning model based on the trained initial deep learning model.
  10. 根据权利要求9所述的装置,其中,所述经训练的初始深度学习模型包括至少一个经训练的初始深度学习模型;所述配置文件还包括评价条件;所述第二获得模块包括:The device according to claim 9, wherein the trained initial deep learning model includes at least one trained initial deep learning model; the configuration file also includes evaluation conditions; the second obtaining module includes:
    第一处理子模块,用于基于所述候选特征配置数据处理验证样本,得到验证特征数据;The first processing submodule is used to process verification samples based on the candidate feature configuration data to obtain verification feature data;
    输入子模块,用于将所述验证特征数据分别输入所述至少一个经训练的初始深度学习模型中,得到至少一个验证结果;The input submodule is used to respectively input the verification feature data into the at least one trained initial deep learning model to obtain at least one verification result;
    第一确定子模块,用于基于所述至少一个验证结果和所述评价条件,从网络层类型集合、网络层结构集合、特征配置数据集合中分别确定目标网络层类型、目标网络层结构、目标特征配置数据;以及The first determining submodule is used to determine the target network layer type, target network layer structure, and target from the network layer type set, network layer structure set, and feature configuration data set based on the at least one verification result and the evaluation condition. feature configuration data; and
    获得子模块,用于基于所述目标网络层类型、所述目标网络层结构、所述目标特征配置数据,得到所述目标深度学习模型。The obtaining submodule is used to obtain the target deep learning model based on the target network layer type, the target network layer structure, and the target feature configuration data.
  11. 根据权利要求10所述的装置,其中:The apparatus of claim 10, wherein:
    所述网络层类型集合包括针对所述至少一个经训练的初始深度学习模型的初始网络层类型;the set of network layer types includes an initial network layer type for the at least one trained initial deep learning model;
    所述网络层结构集合包括针对所述至少一个经训练的初始深度学习模型的初始网络层结构;The set of network layer structures includes an initial network layer structure for the at least one trained initial deep learning model;
    所述特征配置数据集合包括针对所述至少一个经训练的初始深度学习模型的初始特征配置数据,所述特征配置数据集合中的初始特征配置数据为所述候选特征配置数据中的至少部分。The feature configuration data set includes initial feature configuration data for the at least one trained initial deep learning model, and the initial feature configuration data in the feature configuration data set is at least part of the candidate feature configuration data.
  12. 根据权利要求10或11所述的装置,其中,所述获得子模块包括:The device according to claim 10 or 11, wherein the obtaining submodule comprises:
    获得单元,用于基于所述目标网络层类型和所述目标网络层结构,得到待训练目标深度学习模型;An obtaining unit, configured to obtain a target deep learning model to be trained based on the target network layer type and the target network layer structure;
    处理单元,用于基于所述目标特征配置数据处理第二训练样本,得到第二训练特征数据;以及a processing unit, configured to process a second training sample based on the target feature configuration data to obtain second training feature data; and
    训练单元,用于利用所述第二训练特征数据训练所述待训练目标深度学习模型,得到所述目标深度学习模型。A training unit, configured to use the second training feature data to train the target deep learning model to be trained to obtain the target deep learning model.
  13. 根据权利要求9所述的装置,其中,所述候选特征配置数据包括至少一个候选特征配置数据;所述第一处理模块包括:The apparatus according to claim 9, wherein the candidate feature configuration data comprises at least one candidate feature configuration data; the first processing module comprises:
    第一选择子模块,用于从所述至少一个候选配置数据中选择初始特征配置数据;A first selection submodule, configured to select initial feature configuration data from the at least one candidate configuration data;
    第二确定子模块,用于基于所述初始特征配置数据,确定第一特征类型和第一特征维度;A second determining submodule, configured to determine a first feature type and a first feature dimension based on the initial feature configuration data;
    提取子模块,用于基于所述第一特征类型,从所述第一训练样本中提取第一子样本;以及an extracting submodule, configured to extract a first subsample from the first training sample based on the first feature type; and
    第二处理子模块,用于基于所述第一特征维度处理所述第一子样本,得到第一训练特征数据。The second processing submodule is configured to process the first sub-sample based on the first feature dimension to obtain first training feature data.
  14. 根据权利要求12所述的装置,其中,所述处理单元包括:The apparatus according to claim 12, wherein the processing unit comprises:
    确定子单元,用于基于所述目标特征配置数据,确定第二特征类型和第二特征维度;a determining subunit, configured to determine a second feature type and a second feature dimension based on the target feature configuration data;
    提取子单元,用于基于所述第二特征类型,从所述第二训练样本中提取第二子样本;以及an extracting subunit for extracting a second subsample from the second training sample based on the second feature type; and
    处理子单元,用于基于所述第二特征维度处理所述第二子样本,得到第二训练特征数据。A processing subunit, configured to process the second sub-sample based on the second feature dimension to obtain second training feature data.
  15. 根据权利要求9所述的装置,其中,所述选择模块包括:The apparatus of claim 9, wherein the selection module comprises:
    第二选择子模块,用于基于所述模型类型数据,从至少一个候选网络层类型中选择针对初始深度学习模型的初始网络层类型;以及A second selection submodule, configured to select an initial network layer type for an initial deep learning model from at least one candidate network layer type based on the model type data; and
    第三选择子模块,用于从至少一个候选超参数中选择目标超参数,作为针对初始深度学习模型的初始网络层结构。The third selection submodule is used to select a target hyperparameter from at least one candidate hyperparameter as the initial network layer structure for the initial deep learning model.
  16. 一种内容推荐装置,包括:A content recommendation device, comprising:
    第一确定模块,用于确定针对目标对象的对象特征数据;The first determination module is used to determine the object feature data for the target object;
    第二确定模块,用于针对至少一个候选内容中的目标内容,确定针对所述目标内容的内容特征数据;A second determining module, configured to determine, for target content in at least one candidate content, content characteristic data for the target content;
    输入模块,用于将所述对象特征数据和所述内容特征数据输入目标深度学习模型中,得到输出结果,其中,所述目标深度学习模型采用如权利要求9-15中任意一项所述的装置生成,所述输出结果表征了所述目标对象对所述目标内容的感兴趣程度;以及The input module is used to input the object feature data and the content feature data into the target deep learning model to obtain an output result, wherein the target deep learning model adopts the method described in any one of claims 9-15 Generated by the device, the output result characterizes the degree of interest of the target object in the target content; and
    推荐模块,用于响应于所述输出结果满足预设条件,向所述目标对象推荐所述目标内容。A recommending module, configured to recommend the target content to the target object in response to the output result meeting a preset condition.
  17. 一种电子设备,包括:An electronic device comprising:
    至少一个处理器;以及at least one processor; and
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-8中任一项所述的方法。The memory stores instructions executable by the at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can perform any one of claims 1-8. Methods.
  18. 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行权利要求1-8中任一项所述的方法。A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method according to any one of claims 1-8.
  19. 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1-8中任一项所述的方法。A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8.
PCT/CN2022/106805 2021-12-27 2022-07-20 Deep learning model training method and apparatus, and content recommendation method and apparatus WO2023124029A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111618428.9A CN114329201B (en) 2021-12-27 2021-12-27 Training method of deep learning model, content recommendation method and device
CN202111618428.9 2021-12-27

Publications (1)

Publication Number Publication Date
WO2023124029A1 true WO2023124029A1 (en) 2023-07-06

Family

ID=81014934

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/106805 WO2023124029A1 (en) 2021-12-27 2022-07-20 Deep learning model training method and apparatus, and content recommendation method and apparatus

Country Status (2)

Country Link
CN (1) CN114329201B (en)
WO (1) WO2023124029A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114329201B (en) * 2021-12-27 2023-08-11 北京百度网讯科技有限公司 Training method of deep learning model, content recommendation method and device
CN114968412B (en) * 2022-06-20 2024-02-02 中国平安财产保险股份有限公司 Configuration file generation method, device, equipment and medium based on artificial intelligence
CN115456168B (en) * 2022-09-05 2023-08-25 北京百度网讯科技有限公司 Training method of reinforcement learning model, energy consumption determining method and device
CN115660064B (en) * 2022-11-10 2023-09-29 北京百度网讯科技有限公司 Model training method based on deep learning platform, data processing method and device
CN115906921B (en) * 2022-11-30 2023-11-21 北京百度网讯科技有限公司 Training method of deep learning model, target object detection method and device
CN116151215B (en) * 2022-12-28 2023-12-01 北京百度网讯科技有限公司 Text processing method, deep learning model training method, device and equipment
CN117112640B (en) * 2023-10-23 2024-02-27 腾讯科技(深圳)有限公司 Content sorting method and related equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325541A (en) * 2018-09-30 2019-02-12 北京字节跳动网络技术有限公司 Method and apparatus for training pattern
CN113052328A (en) * 2021-04-02 2021-06-29 上海商汤科技开发有限公司 Deep learning model production system, electronic device, and storage medium
CN113469358A (en) * 2021-07-05 2021-10-01 北京市商汤科技开发有限公司 Neural network training method and device, computer equipment and storage medium
WO2021233342A1 (en) * 2020-05-19 2021-11-25 华为技术有限公司 Neural network construction method and system
CN113723615A (en) * 2020-12-31 2021-11-30 京东城市(北京)数字科技有限公司 Training method and device of deep reinforcement learning model based on hyper-parametric optimization
CN113761348A (en) * 2021-02-26 2021-12-07 北京沃东天骏信息技术有限公司 Information recommendation method and device, electronic equipment and storage medium
CN114329201A (en) * 2021-12-27 2022-04-12 北京百度网讯科技有限公司 Deep learning model training method, content recommendation method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108228794B (en) * 2017-12-29 2020-03-31 三角兽(北京)科技有限公司 Information management apparatus, information processing apparatus, and automatic replying/commenting method
CN111552884A (en) * 2020-05-13 2020-08-18 腾讯科技(深圳)有限公司 Method and apparatus for content recommendation
CN112492390A (en) * 2020-11-20 2021-03-12 海信视像科技股份有限公司 Display device and content recommendation method
CN113469067B (en) * 2021-07-05 2024-04-16 北京市商汤科技开发有限公司 Document analysis method, device, computer equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325541A (en) * 2018-09-30 2019-02-12 北京字节跳动网络技术有限公司 Method and apparatus for training pattern
WO2021233342A1 (en) * 2020-05-19 2021-11-25 华为技术有限公司 Neural network construction method and system
CN113723615A (en) * 2020-12-31 2021-11-30 京东城市(北京)数字科技有限公司 Training method and device of deep reinforcement learning model based on hyper-parametric optimization
CN113761348A (en) * 2021-02-26 2021-12-07 北京沃东天骏信息技术有限公司 Information recommendation method and device, electronic equipment and storage medium
CN113052328A (en) * 2021-04-02 2021-06-29 上海商汤科技开发有限公司 Deep learning model production system, electronic device, and storage medium
CN113469358A (en) * 2021-07-05 2021-10-01 北京市商汤科技开发有限公司 Neural network training method and device, computer equipment and storage medium
CN114329201A (en) * 2021-12-27 2022-04-12 北京百度网讯科技有限公司 Deep learning model training method, content recommendation method and device

Also Published As

Publication number Publication date
CN114329201A (en) 2022-04-12
CN114329201B (en) 2023-08-11

Similar Documents

Publication Publication Date Title
WO2023124029A1 (en) Deep learning model training method and apparatus, and content recommendation method and apparatus
US11080340B2 (en) Systems and methods for classifying electronic information using advanced active learning techniques
US20180276553A1 (en) System for querying models
US20190362222A1 (en) Generating new machine learning models based on combinations of historical feature-extraction rules and historical machine-learning models
US20210374542A1 (en) Method and apparatus for updating parameter of multi-task model, and storage medium
WO2023109059A1 (en) Method for determining fusion parameter, information recommendation method, and model training method
US10606910B2 (en) Ranking search results using machine learning based models
CN114861889B (en) Deep learning model training method, target object detection method and device
US20220300543A1 (en) Method of retrieving query, electronic device and medium
EP4134900A2 (en) Method and apparatus for recommending content, method and apparatus for training ranking model, device, and storage medium
JP2023031322A (en) Question and answer processing method, training method for question and answer model, apparatus, electronic device, storage medium and computer program
CN114036322A (en) Training method for search system, electronic device, and storage medium
US11645540B2 (en) Deep graph de-noise by differentiable ranking
CN111191825A (en) User default prediction method and device and electronic equipment
WO2023040220A1 (en) Video pushing method and apparatus, and electronic device and storage medium
US20220198358A1 (en) Method for generating user interest profile, electronic device and storage medium
CN110852078A (en) Method and device for generating title
CN113612777A (en) Training method, traffic classification method, device, electronic device and storage medium
CN115700548A (en) Method, apparatus and computer program product for user behavior prediction
CN114066278B (en) Method, apparatus, medium, and program product for evaluating article recall
US10740403B1 (en) Systems and methods for identifying ordered sequence data
US20230004774A1 (en) Method and apparatus for generating node representation, electronic device and readable storage medium
US20230386237A1 (en) Classification method and apparatus, electronic device and storage medium
US20230147798A1 (en) Search method, computing device and storage medium
US20230044508A1 (en) Data labeling processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22913318

Country of ref document: EP

Kind code of ref document: A1