WO2021254114A1 - 构建多任务学习模型的方法、装置、电子设备及存储介质 - Google Patents

构建多任务学习模型的方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2021254114A1
WO2021254114A1 PCT/CN2021/095977 CN2021095977W WO2021254114A1 WO 2021254114 A1 WO2021254114 A1 WO 2021254114A1 CN 2021095977 W CN2021095977 W CN 2021095977W WO 2021254114 A1 WO2021254114 A1 WO 2021254114A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
search
sub
task
layer
Prior art date
Application number
PCT/CN2021/095977
Other languages
English (en)
French (fr)
Inventor
陈潇凯
顾晓光
付立波
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2021254114A1 publication Critical patent/WO2021254114A1/zh
Priority to US17/883,439 priority Critical patent/US20220383200A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning

Definitions

  • This application relates to artificial intelligence technology, and in particular to a method, device, electronic device, and computer-readable storage medium for constructing a multi-task learning model.
  • Artificial Intelligence is a comprehensive technology of computer science. Through the study of the design principles and implementation methods of various intelligent machines, the machines have the functions of perception, reasoning and decision-making. Artificial intelligence technology is a comprehensive subject, covering a wide range of fields, such as natural language processing technology and machine learning/deep learning. With the development of technology, artificial intelligence technology will be applied in more fields and will be used more and more. Value that has become more and more important.
  • the embodiments of the present application provide a method, device, electronic device, and storage medium for constructing a multi-task learning model, which can automatically and accurately construct a multi-task learning model and improve the efficiency of constructing a multi-task learning model.
  • the embodiment of the present application provides a method for constructing a multi-task learning model, including:
  • the parameters of the candidate network structure are trained to generate a multi-task learning model for multi-task prediction.
  • the embodiment of the present application provides an apparatus for constructing a multi-task learning model, including:
  • the building module is used to construct a search space composed of multiple sub-network layers and multiple search layers by interleaving the sub-network layer and the search layer between the input node and multiple task nodes ;
  • a sampling module configured to sample the path from the input node to each of the task nodes via the search space to obtain a candidate path, which is used as a candidate network structure;
  • the generating module is used to train the parameters of the candidate network structure according to the sample data, and generate a multi-task learning model for multi-task prediction.
  • An embodiment of the present application provides an electronic device for constructing a multi-task learning model, and the electronic device includes:
  • Memory used to store executable instructions
  • the processor is configured to implement the method for constructing a multi-task learning model provided by the embodiment of the present application when executing the executable instructions stored in the memory.
  • the embodiment of the present application provides a computer-readable storage medium that stores executable instructions for causing a processor to execute, to implement the method for constructing a multi-task learning model provided by the embodiment of the present application.
  • a search space with a multi-layer structure is constructed, and the search space is searched for multiple tasks based on sample data Predictive multi-task learning model, so as to realize the automatic and accurate construction of multi-task learning model, and improve the efficiency of multi-task learning model construction; further, according to the search space composed of multiple sub-network layers and multiple search layers, multiple layers are determined.
  • the structured multi-task learning model enables the multi-task learning model to perform hierarchical multi-task learning and improve learning ability.
  • Figure 1 is a schematic diagram of the structure of a multi-gate and multi-expert model provided by related technologies
  • Fig. 2 is a schematic diagram of an application scenario of a multi-task learning model construction system provided by an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of an electronic device for constructing a multi-task learning model provided by an embodiment of the present application
  • 4-7 are schematic flowcharts of the method for constructing a multi-task learning model provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a search block provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a search space provided by an embodiment of the present application.
  • FIG. 10 is a schematic flowchart of a search process provided by an embodiment of the present application.
  • first ⁇ second involved is only to distinguish similar objects, and does not represent a specific order for the objects. Understandably, “first ⁇ second” can be used if permitted.
  • the specific order or sequence is exchanged, so that the embodiments of the present application described herein can be implemented in a sequence other than those illustrated or described herein.
  • Deep Learning A new research direction in the field of Machine Learning (ML). Learn the inherent laws and representation levels of sample data to obtain interpretations of text, images, and sound data. In the end, the machine will be able to analyze and learn like humans, be able to recognize data such as text, images and sounds, and imitate human activities such as audiovisual and thinking.
  • Multi-task learning model used to classify or predict multiple tasks, for example, for news recommendation, the click-through rate and completion degree of news are estimated through the multi-task learning model, so as to proceed according to the click-through rate and completion degree of each news Personalized news recommendations.
  • the multi-task learning model includes an input node, a sub-network layer, a search layer, and a task node.
  • the input nodes respectively correspond to the entry of the multi-task learning model, and the data received by the input node is used as multiple (ie at least 2) task nodes
  • the sub-network layer includes multiple sub-network modules (that is, experts in the multi-gate multi-expert model, which is an independent neural network module, which can be composed of a single fully connected layer and activation function);
  • the search layer includes multiple search blocks.
  • Each search block represents a sub-search space and contains several local network structures (for example, connections between sub-network modules); the task node corresponds to the exit of the multi-task learning model, and the number of task nodes Related to the number of classification or prediction tasks that need to be implemented in a specific application scenario.
  • Network parameters Refers to the parameters of each module (such as sub-network module, search block, task node, etc.) in the network structure during calculation.
  • Structure parameter It is used to characterize the possibility that the local structure in the search block in the search space is sampled. For example, if the i-th search block includes N local structures, the structure parameter ⁇ i is an N-dimensional vector, and the structure parameter ⁇ i The greater the value in, the greater the probability that the local structure corresponding to the value will be sampled.
  • multi-task learning is performed through a multi-gate and multi-expert method.
  • the multi-gated multi-expert method enables each task to dynamically aggregate the output of the shared expert, and can better handle the direct relationship between multiple tasks.
  • the multi-gated multi-expert method disassembles the bottom shared layer into multiple experts (independent neural network module, which can be composed of a single fully connected layer and activation function), and then dynamically aggregates the output of the experts through the gating, and dynamically aggregates the The result is output to the corresponding task node.
  • the multi-gate control and multi-expert method does not limit the number of experts, but the gate control and the task have a one-to-one correspondence. Therefore, the number of gates is equal to the number of tasks.
  • the multi-gated multi-expert model includes 2 task nodes, 2 gates, and 3 experts. If the input is x, the input of the 3 experts is the d-dimensional vector x, and the output is Among them, e represents the function transformation, which can be regarded as a fully connected layer and a convolutional layer.
  • gate A is used to calculate the weight (scalar) of three experts for task A Gating can be a fully connected layer, the input is a vector x, and the output is the scores of three experts Among them, the weight is obtained after the score is transformed by a normalized exponential function, namely And according to the weight calculated by gate A, the input of task A can be obtained as Among them, the processing process of task B is similar to that of task A, and the function of gate B is similar to that of gate A.
  • the embodiments of the present application provide a method, device, electronic device, and computer-readable storage medium for constructing a multi-task learning model, which can automatically and accurately construct a multi-task learning model, and improve the efficiency of multi-task learning model construction. efficient.
  • the following describes an exemplary application of an electronic device for constructing a multi-task learning model provided by an embodiment of the present application.
  • the electronic equipment used to construct the multi-task learning model provided by the embodiments of the present application may be various types of terminal equipment or servers, where the server may be an independent physical server, or a server cluster or distribution composed of multiple physical servers.
  • the system can also be a cloud server that provides cloud computing services; the terminal can be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc., but it is not limited to this.
  • the terminal and the server can be directly or indirectly connected through wired or wireless communication, which is not limited in this application.
  • the server can be a server cluster deployed in the cloud to open artificial intelligence cloud services (AI as a Service, AIaaS) to developers.
  • AIaaS artificial intelligence cloud services
  • the AIaaS platform will split several types of common AI services and provide them in the cloud. Independent or packaged services.
  • This service model is similar to an AI theme mall. All developers can access one or more artificial intelligence services provided by the AIaaS platform through an application programming interface.
  • one of the artificial intelligence cloud services is a multi-task learning model construction service, that is, the cloud server encapsulates the program for the multi-task learning model construction.
  • the task learning model is used for recommendation applications. For example, for news recommendation applications, the click-through rate and completion degree of news are estimated through the multi-task learning model, so as to perform personalized news recommendation according to the click-through rate and completion degree of each news.
  • FIG. 2 is a schematic diagram of an application scenario of the multi-task learning model construction system 10 provided by an embodiment of the present application.
  • the terminal 200 is connected to the server 100 through a network 300.
  • the network 300 may be a wide area network or a local area network, or a combination of the two.
  • the terminal 200 (which runs a client, such as a news client, a video client, etc.) can be used to obtain sample data. For example, a developer inputs a recommended sample data set through the terminal. After the input is completed, the terminal automatically obtains the recommended sample data set.
  • a plug-in for constructing a multi-task learning model may be implanted in the client running in the terminal 200 to locally execute the method for constructing a multi-task learning model provided in the embodiments of this application to obtain information from the constructed search space.
  • the multi-task learning model is determined.
  • a recommendation client such as a video client, a news client, etc., is installed on the terminal 200.
  • the terminal 200 calls to construct the multi-task learning Model plug-in to build a search space composed of multiple sub-network layers and multiple search layers, and search for a multi-task learning model for multiple task predictions from the search space based on sample data, and then learn the model based on the multi-task Perform recommendation applications, for example, for video applications, use a multi-task learning model to estimate the click-through rate and completion degree of the video, so as to determine the recommended video according to the click-through rate and completion degree of each video, and customize the video through the video client Recommendation: For news applications, the exposure rate and click-through rate of news are estimated through the multi-task learning model, so that the recommended news is determined according to the exposure rate and click-through rate of each news, and personalized news recommendations are made through the news client.
  • the exposure rate and click-through rate of news are estimated through the multi-task learning model, so that the recommended news is determined according to the exposure rate and click-through rate of each news, and personalized news recommendations are made through the news client.
  • the terminal 200 can also send the recommended sample data set input by the developer on the terminal 200 to the server 100 in the cloud through the network 300, and call the multi-task learning model construction interface of the server 100 (which can be provided as a cloud service Form, a multi-task learning model construction service, that is, a program for building a task learning model is encapsulated).
  • the server 100 which can be provided as a cloud service Form, a multi-task learning model construction service, that is, a program for building a task learning model is encapsulated.
  • the method for constructing a multi-task learning model is used to start from the constructed search space
  • the multi-task learning model is determined in the terminal 200, for example, a recommended client (such as a shopping client) is installed on the terminal 200, the developer enters the recommended sample data set in the recommended client, and the terminal 200 calls the multi-task of the server 100 through the network 300
  • the learning model construction interface that is, the program that calls the encapsulated multi-task learning model construction, builds a search space composed of multiple sub-network layers and multiple search layers, and searches from the search space to perform multiple tasks according to the sample data Predictive multi-task learning model, and subsequent recommendation applications based on the multi-task learning model.
  • the server estimates the click-through rate and purchase rate of the product through the multi-task learning model, so as to determine the recommendation based on the product click-through rate and purchase rate And return the recommended products to the shopping client, and make personalized product recommendations through the shopping client.
  • the electronic device used to construct the multi-task learning model can be various terminals, such as mobile phones, computers, etc., or as shown in Figure 2. Out of the server 100.
  • FIG. 3 is a schematic structural diagram of an electronic device 500 for constructing a multi-task learning model provided by an embodiment of the present application.
  • the electronic device 500 includes: at least one processor 510, a memory 550, at least one network interface 520, and a user interface 530.
  • the various components in the electronic device 500 are coupled together through the bus system 540.
  • the bus system 540 is used to implement connection and communication between these components.
  • the bus system 540 also includes a power bus, a control bus, and a status signal bus. However, for clarity of description, various buses are marked as the bus system 540 in FIG. 3.
  • the processor 510 may be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP, Digital Signal Processor), or other programmable logic devices, discrete gates or transistor logic devices, or discrete hardware Components, etc., where the general-purpose processor may be a microprocessor or any conventional processor.
  • DSP Digital Signal Processor
  • the memory 550 includes volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory.
  • the non-volatile memory may be a read only memory (ROM, Read Only Memory), and the volatile memory may be a random access memory (RAM, Random Access Memory).
  • the memory 550 described in the embodiment of the present application is intended to include any suitable type of memory.
  • the memory 550 optionally includes one or more storage devices that are physically remote from the processor 510.
  • the memory 550 can store data to support various operations. Examples of these data include programs, modules, and data structures, or a subset or superset thereof, as illustrated below.
  • the operating system 551 includes system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;
  • the network communication module 552 is used to reach other computing devices via one or more (wired or wireless) network interfaces 520.
  • Exemplary network interfaces 520 include: Bluetooth, Wireless Compatibility Authentication (WiFi), and Universal Serial Bus ( USB, Universal Serial Bus), etc.;
  • the apparatus for constructing a multi-task learning model provided by the embodiments of the present application may be implemented in software, for example, it may be a plug-in for constructing a multi-task learning model in the terminal described above, which may be the above-mentioned Multi-task learning model construction service in the server.
  • the extraction device for constructing a multi-task learning model provided by the embodiments of the present application can be provided as various software embodiments, including various forms including application programs, software, software modules, scripts, or computer codes.
  • the method for constructing a multi-task learning model provided by the embodiments of the present application can be implemented as a computer program product of any form and deployed to various electronic devices as required.
  • Figure 3 shows the extraction device 555 for constructing a multi-task learning model stored in the memory 550, which can be software in the form of programs and plug-ins, and includes a series of modules, including a building module 5551, a sampling module 5552, and a generating module 5553; Among them, the construction module 5551, the sampling module 5552, and the generation module 5553 are used to implement the function of constructing the multi-task learning model provided by the embodiment of the present application.
  • the method for constructing a multi-task learning model provided in the embodiments of the present application can be implemented by various types of electronic devices for constructing a multi-task learning model, such as smart terminals and servers.
  • FIG. 4 is a schematic flowchart of a method for constructing a multi-task learning model provided by an embodiment of the present application, which is described in conjunction with the steps shown in FIG. 4.
  • the involved input node and task node correspond to the entry and exit of the multi-task learning model, respectively.
  • the data received by the input node is used as multiple (ie at least 2) task nodes for classification or prediction tasks Basically, the number of task nodes is related to the number of classified or predicted tasks that need to be implemented in a specific application scenario.
  • a search space composed of multiple sub-network layers and multiple search layers is constructed by interleaving the sub-network layer and the search layer.
  • the developer can input the sample data set in the terminal. After the input is completed, the terminal automatically sends the sample data set to the server, and the server receives the sample data set.
  • the sample data is recommended sample data.
  • the sample data is news sample data
  • product recommendation applications the sample data is product sample data
  • movie recommendation applications the sample data is movie samples data.
  • the server After the server receives the sample data set, it calls the program for constructing a multi-task learning model, and constructs a search space composed of multiple sub-network layers and multiple search layers between the input node and multiple task nodes.
  • the sub-network layer It is arranged alternately with the search layer.
  • Each sub-network layer includes multiple sub-network modules, and each search layer includes multiple search blocks.
  • the input node is connected to the first sub-network layer
  • the first sub-network layer is connected to the first sub-network layer.
  • the search layer is connected, the first search layer is connected to the second sub-network layer, and so on, until the last search layer is connected to the task node, that is, to realize the construction of multiple sub-network layers between the input node and multiple task nodes And a search space composed of multiple search layers. After the search space is determined, a multi-task learning model is obtained from the search space to perform multi-task prediction through the multi-task learning model.
  • the sub-network layer and the search layer are alternately arranged to construct multiple sub-network layers and multiple search layers. Search space.
  • the input of the input node is recommended data, such as product data, news data, etc.
  • the output of the task node is the prediction result for the recommended data, such as click-through rate, completion degree (for example, completion degree of video viewing, time of news browsing) Wait.
  • the output of the search block in the search layer is the input of the sub-network module; when the trailing node of the search layer is a task node, the output of the search block in the search layer is a task node.
  • the output of the search block is the input of the task node.
  • Figure 5 is an optional flowchart of the method for constructing a multi-task learning model provided by an embodiment of the present application.
  • Figure 5 shows that Figure 4 also includes step 104 to step 105: in step 104, the The output of multiple sub-network modules in the layer is sampled to obtain the output of multiple sampled sub-network modules; in step 105, according to the weight of each sub-network module in the multiple sub-network modules, the multiple sampled sub-network modules are The output of the network module is weighted and summed, and the result of the weighted summation is used as the output of the local structure in the search block to construct the transmission path of the search block, where the search block is a module in the search layer adjacent to the sub-network layer .
  • the output of the network module (v1, v2, v3) is sampled, and 7 kinds of sampling results can be obtained, namely (v1), (v2), (v3), (v1 and v2), (v1 and v3), (v2 and v3) ), (v1, v2, and v3), when the output of a sub-network module is obtained by sampling, such as (v1), (v2), (v3), the output of the sub-network module is used as the local structure in the search block To construct the transmission path in the search block, which is a module in the search layer adjacent to the sub-network layer; when the output of multiple sub-network modules is obtained by sampling, such as (v1 and v2), (v1 and v3), (v2 and v3), (v1, v2 and v3), according to the weight of each sub-network module in multiple sub-network modules, the output of multiple sub-network modules is weighted and summed, and the weighted sum is added
  • the search block further includes a gating node; after sampling the outputs of multiple sub-network modules in the sub-network layer, it further includes: sampling a signal source from the signal source set of the sub-network layer, the signal source It is the output of the input node or the output of the precursor sub-network module of the sub-network layer; the signal source is predicted by the gated node to obtain the predicted value of each sub-network module in the multiple sub-network modules; the predicted value of each sub-network module Perform normalization processing to obtain the weight of each sub-network module.
  • the predicted value of the sub-network module can be normalized to obtain the weight of each sub-network module, namely Among them, s represents the number of sub-network modules. Therefore, different signal sources are used to determine different weights to construct multiple transmission paths in the search block, so that the search space constructed later can contain enough possible network structures to solve specific multi-task learning problems. .
  • the search space includes N sub-network layers and N search layers, where N is a natural number greater than 1.
  • N is a natural number greater than 1.
  • the search space includes: Through the i-th search block in the first search layer, the output of multiple sub-network modules is sampled from the first sub-network layer, where i is a positive integer, and when the signal source is the output of the input node, according to the number
  • the weight of each sub-network module in each sub-network module is weighted and summed with the output of multiple sub-network modules, and the result of the weighted summation is used as the output of the local structure in the i-th search block to construct the i-th search block Until the completion of the construction of the transmission path of all local structures in the i-th search block; through the i-th search block in the j-th search layer, the output of multiple sub-network modules is sampled from the j-th sub-network layer , Where 1
  • the output of one sub-network module or multiple sub-network modules is sampled from the first sub-network layer, and the signal source of the first search layer is the input node
  • the output of the signal source is predicted by the gated node to obtain the predicted value of each sub-network module in multiple sub-network modules, and the predicted value of each sub-network module is normalized to obtain the weight of each sub-network module ,
  • the output of the multiple sub-network modules is weighted and summed, and the result of the weighted sum is used as the output of the local structure in the i-th search block to construct the i-th
  • the transmission path in the search block is completed until the construction of the transmission path of all the local structures in the i-th search block is completed.
  • the output of multiple sub-network modules is sampled from the j-th sub-network layer.
  • the signal source of the first search layer is the output of the input node or the j-th sub-network layer
  • the output of the predecessor sub-network module is predicted by gating the signal source to obtain the predicted value of each sub-network module in multiple sub-network modules.
  • the predicted value of each sub-network module is normalized to obtain each sub-network module.
  • the weight of the network module, and according to the weight of each sub-network module in the multiple sub-network modules, the output of the multiple sub-network modules is weighted and summed, and the result of the weighted summation is regarded as the i-th search block in the j-th search layer
  • constructing a search space composed of multiple sub-network layers and multiple search layers includes: connecting the input node to the transmission path of the first sub-network layer, and from the middle sub-network layer to the adjacent search layer.
  • the transmission path and the transmission path from the last search layer to the task node are used as the edges of the directed graph; the sub-network modules in the multiple sub-network layers and the search blocks in the multiple search layers are used as the nodes of the directed graph; Combine the nodes and edges of the graph to construct a search space for multi-task learning.
  • the search space can be constructed in a directed graph.
  • the transmission path from the input node to the first sub-network layer is taken as the edge of the directed graph, and the transmission path from the intermediate sub-network layer (the first sub-network layer to the last sub-network layer) to the adjacent search layer can also be taken as
  • the edges of the directed graph for example, the transmission path from the second sub-network layer to the adjacent second search layer, and the sub-network modules in the multiple sub-network layers and the search blocks in the multiple search layers are used as the directed graph
  • a search space for multi-task learning is constructed. Subsequently, the edges of the directed graph can be sampled to realize the sampling of the search space to obtain the candidate network structure.
  • step 102 the path from the input node to each task node through the search space is sampled to obtain a candidate path, which is used as a candidate network structure.
  • the server After the server constructs the search space, it can sample the path from the input node to each task node via the search space to determine the candidate network structure. Since the search space contains enough possible network structures, the path from the input node to each task node through the search space is sampled, and the obtained candidate network structure contains a variety of structures, which can solve specific Multi-task learning problem.
  • FIG. 6 is an optional flowchart of the method for constructing a multi-task learning model provided by an embodiment of the present application.
  • FIG. 6 shows that step 102 in FIG. 4 can be implemented through step 1021 to step 1022 in FIG. 6:
  • step 1021 according to the structural parameters of the search space, sample each search block in the search layer in the search space to obtain the local structure corresponding to each search block; in step 1022, from the input node, through each search block
  • the local structure of the block and the path to each task node are used as candidate paths.
  • each search block in the search space contains multiple local structures
  • sampling each search block in the search layer in the search space according to the structural parameters of the search space to obtain the local structure corresponding to each search block includes: mapping the structural parameters of the search space, Obtain the sampling probability of the local structure in each search block in the corresponding search space; construct the polynomial distribution of each search block according to the sampling probability of the local structure in each search block; perform sampling processing on the polynomial distribution of each search block , Get the local structure corresponding to each search block.
  • each search block in order to sample the local structure of each search block, you can first map the structure parameters of the search space to obtain the sampling probability of each local structure in each search block, and according to the local structure of each search block The sampling probability of the structure, the polynomial distribution of each search block is constructed, and finally the local structure in each search block is sampled according to the polynomial distribution of each search block to obtain the local structure corresponding to each search block, for example, when the search space Including B search blocks, sampling multiple local structures in each search block to obtain a corresponding local structure, then B local structures can be obtained, combining the B local structures, input nodes, sub-network modules, and tasks Node, you can get a complete candidate network structure.
  • step 103 the parameters of the candidate network structure are trained according to the sample data to generate a multi-task learning model for multi-task prediction.
  • the server After the server obtains the candidate network structure by sampling according to the search space, it performs training on the parameters of the candidate network structure, iterative sampling and training operations, and can generate a multi-task learning model for multi-task prediction.
  • the parameters of the candidate network structure can be trained based on the recommended sample data to generate a multi-task learning model for multiple recommendation predictions. For example, if the output of the task node is the click rate and completion degree of the news, the parameters of the candidate network structure are trained according to the news sample data to generate a multi-task learning model for multi-task prediction.
  • the multi-task learning model It is used to predict the click-through rate and completion degree of news, and then make news recommendations based on the click-through rate and completion degree of the news.
  • FIG. 7 is an optional flowchart of the method for constructing a multi-task learning model provided by an embodiment of the present application.
  • step 103 in FIG. 4 can be implemented by step 1031 to step 1033 in FIG. 7:
  • step 1031 train the network parameters of the candidate network structure to obtain the optimized network parameters of the candidate network structure;
  • step 1032 train the structure parameters of the search space according to the optimized network parameters of the candidate network structure ,
  • the structure parameters of the optimized search space are obtained;
  • step 1033 according to the structure parameters of the optimized search space, a candidate network structure for performing multiple task prediction is determined from the optimized candidate network structures, and As a multi-task learning model.
  • the network parameters of the candidate network structure can be trained first, and then the structural parameters can be trained, or the structural parameters can be trained first, and then the network parameters can be trained.
  • the network parameters refer to the parameters in the calculation of each module in the network structure (such as sub-network modules, search blocks, task nodes, etc.).
  • the structure parameters are used to characterize the possibility of sampling the local structure of the search block in the search space, such as
  • the i-th search block includes N local structures, and the structure parameter ⁇ i is an N-dimensional vector. The larger the value of the structure parameter ⁇ i , the greater the probability that the local structure corresponding to the value will be sampled.
  • training the network parameters of the candidate network structure to obtain the optimized network parameters of the candidate network structure includes: performing multi-task prediction processing on the sample data through the candidate network structure to obtain the multi-task prediction of the sample data Result: According to the multi-task prediction result and the multi-task label of the sample data, construct the loss function of the candidate network structure; update the network parameters of the candidate network structure until the loss function converges, and select the updated parameters of the network structure when the loss function converges, as The network parameters of the optimized candidate network structure.
  • the value of the loss function of the candidate network structure after determining the value of the loss function of the candidate network structure according to the multi-task prediction result and the multi-task label of the sample data, it can be judged whether the value of the loss function exceeds the preset threshold.
  • the value of the loss function exceeds the preset threshold
  • the error signal of the candidate network structure is determined based on the loss function, the error information is propagated back in the candidate network structure, and the model parameters of each layer are updated during the propagation process.
  • the training sample data is input to the input layer of the neural network model, passes through the hidden layer, and finally reaches the output layer and outputs the result.
  • This is the forward propagation process of the neural network model. If there is an error between the output result and the actual result, the error between the output result and the actual value is calculated, and the error is propagated back from the output layer to the hidden layer until it propagates to the input layer.
  • the error Adjust the values of the model parameters; continue to iterate the above process until convergence.
  • the candidate network structure belongs to the neural network model.
  • the structure parameters of the search space are trained to obtain the structure parameters of the optimized search space, including: through the sample data and the optimized candidate network structure
  • the network parameters are evaluated for the network structure, and the evaluation result of the optimized candidate network structure is obtained; according to the evaluation result, the objective function of the structure parameter of the search space is constructed; the structure parameter of the search space is updated until the objective function converges, and the objective function is converged
  • the updated structural parameters of the time search space are used as the structural parameters of the optimized search space.
  • the server after the server obtains the optimized candidate network structure, it predicts the sample data through the optimized candidate network structure to obtain the multi-task prediction result, and performs the optimized candidate network structure according to the multi-task prediction result.
  • Evaluation to obtain the evaluation results of the optimized candidate network structure such as accuracy, Area Under Curve (AUC), loss, etc., and based on the evaluation results, construct the objective function of the structural parameters of the search space, namely Where p( ⁇ ) represents the polynomial distribution determined by the structure parameter ⁇ , and R val represents the evaluation result of the optimized candidate network structure.
  • the structure parameters of the search space are updated until the objective function converges, and the structure of the search space is updated when the objective function converges
  • the parameters are used as the structural parameters of the optimized search space.
  • a candidate network structure used for multiple task prediction is determined from each optimized candidate network structure to serve as a multi-task learning model, including: The structure parameters of the subsequent search space are mapped to obtain the sampling probability of the local structure in each search block in the search space; the local structure corresponding to the maximum sampling probability of the local structure in each search block is used as The local structure of the candidate network structure predicted by multiple tasks; the local structure of each candidate network structure is combined to obtain a multi-task learning model.
  • the server can search for the optimal network structure from the search space according to the structural parameters of the optimized search space.
  • Map the structural parameters of the optimized search space such as a logistic regression function (softmax function), to obtain the sampling probability of the local structure in each search block in the search space, and set the maximum value of the local structure in each search block
  • the local structure corresponding to the sampling probability is used as the local structure of the candidate network structure for multiple task prediction, and finally the local structure of each candidate network structure is combined to obtain a multi-task learning model.
  • a terminal 200 is connected to a server 100 deployed in the cloud through a network 300, and a multi-task learning model is installed on the terminal 200 to build applications.
  • the recommended sample data set is input, and the terminal 200 sends the recommended sample data set to the server 100 through the network 300.
  • the server 100 After the server 100 receives the recommended sample data set, it determines the optimal network from the constructed search space Structure, as a multi-task learning model, follow-up recommendation applications based on the multi-task learning model, for example, for news recommendation applications, through the multi-task learning model to estimate the click rate and completion of news, so as to according to the click rate and completion of the news Carry out personalized news recommendation; for product recommendation applications, use a multi-task learning model to estimate the click-through rate (CTR) and conversion rate (CVR) of the product, so as to make personalized product recommendation based on the product's click-through rate and conversion rate;
  • CTR click-through rate
  • CVR conversion rate
  • the movie recommendation application uses a multi-task learning model to estimate the purchase rate of movies and the user's rating, so as to make personalized movie recommendations based on the movie's purchase rate and the user's rating.
  • the embodiments of the present application start from the point of view of neural network architecture search and use search algorithms to find the optimal network structure in the search space, so as to greatly alleviate the cost of manual adjustment of the network structure.
  • a search space is designed, which enumerates the correspondence between sub-network modules (experts), and between sub-network modules and tasks. Since the search space can be multi-layered, and the input source of the gate is also included in the search space, that is, the search space includes the above-mentioned multi-gate and multi-expert model.
  • the embodiment of the application uses polynomial distribution sampling and policy gradient algorithms to efficiently find the optimal network structure in the search space in a differentiable manner, as a multi-task learning model, so as to achieve better results than the multi-gate multi-expert method .
  • the following describes the method for constructing a multi-task learning model provided by an embodiment of the present application.
  • the method includes two parts, namely: 1) construction of search space; 2) search algorithm.
  • the goal of constructing the search space is to make the search space contain enough possible network structures to solve specific multi-task learning problems.
  • search space is composed of several search blocks.
  • Each search block represents a sub-search space, including several local network structures (for example, connections between sub-network modules).
  • local network structures for example, connections between sub-network modules.
  • a search block represents a sub-search space, which contains a variety of different local network structures (local structures).
  • local structures For a certain local structure, use gating to achieve dynamic aggregation of input features. Among them, the local structure is affected by two factors, namely: 1) different inputs (combinations); 2) different gated signal sources (signal sources).
  • the sub-search space represented by a search block can be formally expressed as in and Both represent sets, ⁇ represents Cartesian product, Represents the combination of all input features (where the input comes from the output of the previous sub-network module), set Expressed as Indicates that all possible gating signal sources, such as all the inputs of the previous sub-network layer and the original shared input can be used as signal sources, Represents the sub search space. That is, there can be a total of Different local structures.
  • the input of the k-th local structure is (s input features, each feature is d v dimension) and (Gated signal source, dimension d q ), the output of the k-th local structure is y k , which is the weighted sum of the input features.
  • the calculation formula is shown in formula (1):
  • g k represents the gating of the local network
  • mi represents the gating score (predicted value) of the i-th input feature
  • w k represents the gated Learnable parameters.
  • Search space of the embodiment of the application It can be expressed as the Cartesian product of the space represented by B search blocks That is, the physical structure can be regarded as an Over-Parameterized Network, which can include complex and diverse network structures.
  • Each search block contains A local structure, where i ⁇ [1,2,...,B], a complete network structure can be determined by selecting a local structure in each search block and combining all the local structures.
  • a complete network structure as Represents B local structures determined by B sampling actions, w u represents the network parameters of the network structure (network parameters refer to the parameters when each module in the network structure calculates, for example, w k in formula 1).
  • the sampling action u i (i ⁇ [1,2,...,B]) to sample from a structural parameter (i ⁇ [1,2,...,B]) determines the polynomial distribution, where the structure parameter ⁇ i is used to characterize the possibility that the local structure in the i-th search block is sampled.
  • the i-th search block includes If there are N local structures, the structure parameter ⁇ i is an N-dimensional vector. The larger the value of the structure parameter ⁇ i , the greater the probability that the local structure corresponding to the value will be sampled.
  • the calculation formula is as shown in formula (2), (3) Shown:
  • multinomial() represents a polynomial distribution
  • softmax() represents a logistic regression function
  • p i represents the probability that the local structure in the i-th search block is sampled. Therefore, by sampling B polynomial distributions, a complete network structure can be obtained.
  • the embodiment of the present application uses a reinforcement learning strategy gradient (REINFORCE) algorithm to optimize structural parameters.
  • REINFORCE reinforcement learning strategy gradient
  • the network structure that performs well on the specified evaluation index will have a higher sampling probability.
  • the formula for the optimization objective of the structural parameters is shown in formula (4):
  • p( ⁇ ) represents the polynomial distribution determined by the structure parameter ⁇
  • R val represents the sampled structure in a certain index (for example, accuracy, area under the ROC curve (Area Under Curve, AUC), loss, etc. ) On the score (evaluation result).
  • the gradient of the structural parameters is obtained by the following formula (5):
  • b represents the benchmark used to reduce the variance of the return, and the moving average can be used as the benchmark, and b can also be 0.
  • Algorithm 1 Search process and obtain the optimal network structure
  • Input training sample data, validation data, and super network including B search blocks
  • the optimized structure parameter ⁇ and network parameter w can be obtained, and the final network structure can be obtained based on the optimized structure parameter ⁇ and network parameter w.
  • the optimized structure parameter ⁇ and network parameter w can be obtained, and the final network structure can be obtained based on the optimized structure parameter ⁇ and network parameter w.
  • the embodiments of the present application can efficiently optimize the network structure of the specified multi-task data set, automatically balance the independence and sharing relationship of different task branches, so as to search for a better network structure as a multi-task learning model .
  • Multi-task learning is very important in the recommendation system. It can be used to optimize the network structure in multi-task learning (multiple distribution index estimation: targets such as estimated click-through rate and completion degree) in business recommendation scenarios, and make full use of different tasks ( The domain knowledge contained in the index) enhances the generalization ability of the multi-task learning model, so as to quickly and accurately obtain the specific index of the recommendation system.
  • the embodiments of the present application can learn the most suitable network structure for the training data of a specific service more efficiently, and accelerate the iterative upgrade of products.
  • the construction module 5551 is used to construct a search composed of multiple sub-network layers and multiple search layers by interleaving the sub-network layer and the search layer between the input node and multiple task nodes Space; sampling module 5552, used to sample the path from the input node to each of the task nodes via the search space to obtain candidate paths, which are used as candidate network structures; generating module 5553, which is used according to The sample data is used to train the parameters of the candidate network structure to generate a multi-task learning model for multi-task prediction.
  • the construction module 5551 is also used to sample the output of the multiple sub-network modules in the sub-network layer to obtain the output of the sub-network module after multiple samples; according to the multiple sub-network modules
  • the weight of each sub-network module in the network module is a weighted summation of the output of the plurality of sub-network modules after a plurality of samples, and the result of the weighted summation is used as the output of the local structure in the search block to construct the search The transmission path of the block; wherein the search block is a module in the search layer adjacent to the sub-network layer.
  • the search block further includes a gated node; the construction module 5551 is further configured to sample a signal source from the signal source set of the sub-network layer, and the signal source is the input node's Output or the output of the precursor sub-network module of the sub-network layer; predict the signal source through the gating node to obtain the predicted value of each sub-network module in the plurality of sub-network modules; The predicted value of the sub-network module is normalized to obtain the weight of each sub-network module.
  • the search space includes N sub-network layers and N search layers, where N is a natural number greater than 1, and the construction module 5551 is also used to search through the i-th search layer in the first search layer.
  • Block sample the output of multiple sub-network modules from the first sub-network layer, where i is a positive integer, and when the signal source is the output of the input node, according to each sub-network module The weight of the network module, the weighted summation of the outputs of the multiple sub-network modules, and the result of the weighted summation as the output of the local structure in the i-th search block to construct the transmission in the i-th search block Path until the completion of the construction of the transmission path of all local structures in the i-th search block in the first search layer; sampling from the j-th sub-network layer through the i-th search block in the j-th search layer Output of multiple sub-network modules, where 1 ⁇ j ⁇ N, j is a natural number, and
  • the output of the search block in the search layer is the input of the sub-network module; when the When the trailing node of the search layer is the task node, the output of the search block in the search layer is the input of the task node.
  • the construction module 5551 is further configured to transfer the transmission path from the input node to the first sub-network layer, the transmission path from the intermediate sub-network layer to the adjacent search layer, and the last one.
  • the transmission path from the search layer to the task node is used as the edge of the directed graph; the sub-network modules in the multiple sub-network layers and the search blocks in the multiple search layers are used as the edges of the directed graph Node; Combine the nodes and edges of the directed graph to obtain a search space for multi-task learning.
  • the sampling module 5552 is further configured to sample each search block of the search layer in the search space according to the structural parameters of the search space to obtain the local structure corresponding to each search block ; From the input node, through the local structure of each search block to reach each of the task nodes as a candidate path.
  • the sampling module 5552 is further configured to perform mapping processing on the structural parameters of the search space to obtain the sampling probability corresponding to the local structure in each search block in the search space;
  • the sampling probability of the local structure in the search block is constructed to construct the polynomial distribution of each search block; sampling processing is performed on the polynomial distribution of each search block to obtain the local structure corresponding to each search block.
  • the generating module 5553 is further configured to train the structural parameters of the search space according to the optimized network parameters of the candidate network structure to obtain the optimized structural parameters of the search space; According to the optimized structure parameters of the search space, a candidate network structure for performing multiple task prediction is determined from each optimized candidate network structure, and used as the multi-task learning model.
  • the generating module 5553 is further configured to perform multi-task prediction processing on the sample data through the candidate network structure to obtain a multi-task prediction result of the sample data; according to the multi-task prediction Result and the multi-task label of the sample data, construct the loss function of the candidate network structure; update the network parameters of the candidate network structure until the loss function converges, and when the loss function converges, the candidate network structure The updated parameters are used as the optimized network parameters of the candidate network structure.
  • the generating module 5553 is further configured to perform network structure evaluation processing through the sample data and the optimized network parameters of the candidate network structure to obtain an evaluation result of the optimized candidate network structure According to the evaluation result, construct the objective function of the structural parameters of the search space; update the structural parameters of the search space until the objective function converges, and update the structural parameters of the search space when the objective function is converged , As the structural parameter of the optimized search space.
  • the generating module 5553 is further configured to perform mapping processing on the structural parameters of the optimized search space to obtain the sampling probability corresponding to the local structure in each search block in the search space;
  • the local structure corresponding to the maximum sampling probability of the local structure in each search block is used as the local structure of the candidate network structure for multiple task prediction; the local structure of each candidate network structure is combined to obtain the The multi-task learning model is described.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供了一种构建多任务学习模型的方法、装置、电子设备及计算机可读存储介质;方法包括:在输入节点与多个任务节点之间,通过将子网络层与搜索层交错排布的方式,构建由多个子网络层以及多个搜索层构成的搜索空间;对从输入节点、经由搜索空间而到达每个任务节点的路径进行采样,得到候选路径,并作为候选网络结构;根据样本数据,对候选网络结构的参数进行训练,生成用于进行多个任务预测的多任务学习模型。

Description

构建多任务学习模型的方法、装置、电子设备及存储介质
相关申请的交叉引用
本申请实施例基于申请号为202010555648.0、申请日为2020年06月17日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请实施例作为参考。
技术领域
本申请涉及人工智能技术,尤其涉及一种构建多任务学习模型的方法、装置、电子设备及计算机可读存储介质。
背景技术
人工智能(Artificial Intelligence,AI)是计算机科学的一个综合技术,通过研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能技术是一门综合学科,涉及领域广泛,例如自然语言处理技术以及机器学习/深度学习等几大方向,随着技术的发展,人工智能技术将在更多的领域得到应用,并发挥越来越重要的价值。
相关技术中缺乏基于人工智能来确定多任务学习模型的有效方案,主要依赖于人工对各种模型进行验证,以选择出最合适的网络结构作为多任务学习模型。但是,这种方式效率太低,浪费大量的人力以及物力。
发明内容
本申请实施例提供一种构建多任务学习模型的方法、装置、电子设备及存储介质,能够自动并准确地构建多任务学习模型,提高多任务学习模型构建的效率。
本申请实施例的技术方案是这样实现的:
本申请实施例提供一种构建多任务学习模型的方法,包括:
在输入节点与多个任务节点之间,通过将子网络层与搜索层交错排布的方式,构建由多个所述子网络层以及多个所述搜索层构成的搜索空间;
对从所述输入节点、经由所述搜索空间而到达每个所述任务节点的路径进行采样,得到候选路径,并作为候选网络结构;
根据样本数据,对所述候选网络结构的参数进行训练,生成用于进行多个任务预测的多任务学习模型。
本申请实施例提供一种构建多任务学习模型的装置,包括:
构建模块,用于在输入节点与多个任务节点之间,通过将子网络层与 搜索层交错排布的方式,构建由多个所述子网络层以及多个所述搜索层构成的搜索空间;
采样模块,用于对从所述输入节点、经由所述搜索空间而到达每个所述任务节点的路径进行采样,得到候选路径,并作为候选网络结构;
生成模块,用于根据样本数据,对所述候选网络结构的参数进行训练,生成用于进行多个任务预测的多任务学习模型。
本申请实施例提供一种用于构建多任务学习模型的电子设备,所述电子设备包括:
存储器,用于存储可执行指令;
处理器,用于执行所述存储器中存储的可执行指令时,实现本申请实施例提供的构建多任务学习模型的方法。
本申请实施例提供一种计算机可读存储介质,存储有可执行指令,用于引起处理器执行时,实现本申请实施例提供的构建多任务学习模型的方法。
本申请实施例具有以下有益效果:
通过在输入节点与多个任务节点之间,将子网络层与搜索层交错排布的方式,构建多层结构的搜索空间,并根据样本数据,从搜索空间中搜索出用于进行多个任务预测的多任务学习模型,从而实现自动并准确地构建多任务学习模型,提高多任务学习模型构建的效率;进而,根据由多个子网络层以及多个搜索层构成的搜索空间,确定出多层结构的多任务学习模型,使得多任务学习模型能够进行层级的多任务学习,提高学习能力。
附图说明
图1是相关技术提供的多门控多专家模型的结构示意图;
图2是本申请实施例提供的多任务学习模型构建系统的应用场景示意图;
图3是本申请实施例提供的用于构建多任务学习模型的电子设备的结构示意图;
图4-7是本申请实施例提供的构建多任务学习模型的方法的流程示意图;
图8是本申请实施例提供的搜索块的示意图;
图9是本申请实施例提供的搜索空间的示意图;
图10是本申请实施例提供的搜索过程的流程示意图。
具体实施方式
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述,所描述的实施例不应视为对本申请的限制, 本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
在以下的描述中,所涉及的术语“第一\第二”仅仅是是区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。
对本申请实施例进行进一步详细说明之前,对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释。
1)深度学习(Deep Learning,DL):机器学习(Machine Learning,ML)领域中一个新的研究方向。学习样本数据的内在规律和表示层次,以获得对文字、图像和声音等数据的解释。最终让机器能够像人一样具有分析学习能力,能够识别文字、图像和声音等数据,模仿视听和思考等人类的活动。
2)多任务学习模型:用于进行多个任务的分类或预测,例如,对于新闻推荐,通过多任务学习模型预估新闻的点击率以及完成度,从而根据各新闻的点击率以及完成度进行个性化的新闻推荐。
多任务学习模型包括输入节点、子网络层、搜索层以及任务节点,其中,输入节点分别对应多任务学习模型的入口,输入节点接收的数据,用于作为多个(即至少2个)任务节点进行分类或预测任务的基础;子网络层包括多个子网络模块(即为多门控多专家模型中的专家,是一种独立的神经网络模块,可以由单个全连接层与激活函数组成);搜索层包括多个搜索块,每一个搜索块代表一个子搜索空间,包含若干个局部网络结构(例如,子网络模块之间的连接);任务节点对应多任务学习模型的出口,任务节点的数量与具体应用场景中需要实施的分类或预测的任务的数量相关。
3)网络参数:指网络结构中各模块(例如子网络模块、搜索块、任务节点等)进行计算时的参数。
4)结构参数:用于表征搜索空间中搜索块中局部结构被采样的可能性,例如,第i个搜索块包括N个局部结构,则结构参数α i为N维向量,其中结构参数α i中的值越大,则该值对应的局部结构被采样的可能性越大。
相关技术中,通过多门控多专家方法进行多任务学习。相比于底层共享,该多门控多专家方法使得每个任务都可以动态聚合共享专家的输出,能更好的处理多个任务直接的关系。多门控多专家方法将底部共享层拆解为多个专家(独立的神经网络模块,可以由单个全连接层与激活函数组成), 然后通过门控动态聚合专家的输出,并将动态聚合的结果输出至相应的任务节点。该多门控多专家方法并未限制专家的个数,但是门控与任务是一一对应的,因此,门控个数等于任务个数。如图1所示,多门控多专家模型包括2个任务节点、2个门控和3个专家,设输入为x,则3个专家的输入为d维向量x、输出为
Figure PCTCN2021095977-appb-000001
其中e表示函数变换,即可以视为全连接层、卷积层,对于任务A,门控A用于计算三个专家对于任务A的权重(标量)
Figure PCTCN2021095977-appb-000002
门控可以为一个全连接层,输入为向量x、输出为三个专家的得分
Figure PCTCN2021095977-appb-000003
其中,权重是由得分经过归一化指数函数变换后得到的,即
Figure PCTCN2021095977-appb-000004
并根据门控A计算得到的权重,即可获得任务A的输入为
Figure PCTCN2021095977-appb-000005
其中,任务B的处理过程与任务A的处理过程类似,门控B的作用与门控A类似。
虽然,多门控多专家方法可以进行多任务学习,但存在几个问题,分别为1)多门控多专家模型(Multi-gate Mixture-of-Experts,MMOE)中的所有专家被所有任务共享,但是这不一定是最优的方式;2)多门控多专家模型中专家的组合是线性的(加权和),表征能力受限;3)当专家层增多时,门控的输入选择难以确定。
为了解决上述问题,本申请实施例提供了一种构建多任务学习模型的方法、装置、电子设备及计算机可读存储介质,能够自动并准确地构建多任务学习模型,提高多任务学习模型构建的效率。
下面说明本申请实施例提供的用于构建多任务学习模型的电子设备的示例性应用。
本申请实施例提供的用于构建多任务学习模型的电子设备可以是各种类型的终端设备或服务器,其中,服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云计算服务的云服务器;终端可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表等,但并不局限于此。终端以及服务器可以通过有线或无线通信方式进行直接或间接地连接,本申请在此不做限制。
以服务器为例,例如可以是部署在云端的服务器集群,向开发人员开放人工智能云服务(AI as a Service,AIaaS),AIaaS平台会把几类常见的AI服务进行拆分,并在云端提供独立或者打包的服务,这种服务模式类似于一个AI主题商城,所有的开发人员都可以通过应用程序编程接口的方式来接入使用AIaaS平台提供的一种或者多种人工智能服务。例如,其中的一种人工智能云服务为多任务学习模型构建服务,即云端的服务器封装有多任务学习模型构建的程序。开发人员通过终端调用云服务中的多任务学习模型构建服务,以使部署在云端的服务器调用封装的多任务学习模型构建的程序,从构建的搜索空间中确定出多任务学习模型,后续根据多任务 学习模型进行推荐应用,例如,对于新闻推荐应用,通过多任务学习模型预估新闻的点击率以及完成度,从而根据各新闻的点击率以及完成度进行个性化的新闻推荐。
参见图2,图2是本申请实施例提供的多任务学习模型构建系统10的应用场景示意图,终端200通过网络300连接服务器100,网络300可以是广域网或者局域网,又或者是二者的组合。
终端200(运行有客户端,例如新闻客户端、视频客户端等)可以被用来获取样本数据,例如,开发人员通过终端输入推荐样本数据集,输入完成后,终端自动获取推荐样本数据集。
在一些实施例中,终端200中运行的客户端中可以植入有构建多任务学习模型插件,以在本地执行本申请实施例提供的构建多任务学习模型的方法,来从构建的搜索空间中确定出多任务学习模型,例如,在终端200上安装有推荐客户端,例如视频客户端、新闻客户端等,开发人员在推荐客户端中输入推荐样本数据集后,终端200调用构建多任务学习模型插件,以构建由多个子网络层以及多个搜索层构成的搜索空间,并根据样本数据,从搜索空间中搜索到用于进行多个任务预测的多任务学习模型,后续根据多任务学习模型进行推荐应用,例如,对于视频应用,通过多任务学习模型预估视频的点击率以及完成度,从而根据各视频的点击率以及完成度确定推荐的视频,并通过视频客户端进行个性化的视频推荐;对于新闻应用,通过多任务学习模型预估新闻的曝光率以及点击率,从而根据各新闻的曝光率以及点击率确定推荐的新闻,并通过新闻客户端进行个性化的新闻推荐。
在一些实施例中,终端200也可以通过网络300向云端的服务器100发送开发人员在终端200上输入的推荐样本数据集,并调用服务器100的多任务学习模型构建接口(可以提供为云服务的形式,多任务学习模型构建服务,即封装有任务学习模型构建的程序),服务器100接收到推荐样本数据集后,通过本申请实施例提供的构建多任务学习模型的方法,从构建的搜索空间中确定出多任务学习模型,例如,在终端200上安装推荐客户端(例如购物客户端),开发人员在推荐客户端中,输入推荐样本数据集,终端200通过网络300调用服务器100的多任务学习模型构建接口,即调用封装的多任务学习模型构建的程序,构建由多个子网络层以及多个搜索层构成的搜索空间,并根据样本数据,从搜索空间中搜索到用于进行多个任务预测的多任务学习模型,后续根据多任务学习模型进行推荐应用,例如,对于购物应用,服务器通过多任务学习模型预估商品的点击率以及购买率,从而根据商品的点击率以及购买率确定推荐的商品,并将推荐的商品返回至购物客户端,并通过购物客户端进行个性化的商品推荐。
下面说明本申请实施例提供的用于构建多任务学习模型的电子设备的结构,用于构建多任务学习模型的电子设备可以是各种终端,例如手机、 电脑等,也可以是如图2示出的服务器100。
参见图3,图3是本申请实施例提供的用于构建多任务学习模型的电子设备500的结构示意图,以电子设备500是服务器为例说明,图3所示的用于构建多任务学习模型的电子设备500包括:至少一个处理器510、存储器550、至少一个网络接口520和用户接口530。电子设备500中的各个组件通过总线系统540耦合在一起。可理解,总线系统540用于实现这些组件之间的连接通信。总线系统540除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图3中将各种总线都标为总线系统540。
处理器510可以是一种集成电路芯片,具有信号的处理能力,例如通用处理器、数字信号处理器(DSP,Digital Signal Processor),或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,其中,通用处理器可以是微处理器或者任何常规的处理器等。
存储器550包括易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(ROM,Read Only Memory),易失性存储器可以是随机存取存储器(RAM,Random Access Memory)。本申请实施例描述的存储器550旨在包括任意适合类型的存储器。存储器550可选地包括在物理位置上远离处理器510的一个或多个存储设备。
在一些实施例中,存储器550能够存储数据以支持各种操作,这些数据的示例包括程序、模块和数据结构或者其子集或超集,下面示例性说明。
操作系统551,包括用于处理各种基本系统服务和执行硬件相关任务的系统程序,例如框架层、核心库层、驱动层等,用于实现各种基础业务以及处理基于硬件的任务;
网络通信模块552,用于经由一个或多个(有线或无线)网络接口520到达其他计算设备,示例性的网络接口520包括:蓝牙、无线相容性认证(WiFi)、和通用串行总线(USB,Universal Serial Bus)等;
在一些实施例中,本申请实施例提供的构建多任务学习模型的装置可以采用软件方式实现,例如,可以是上文所述的终端中的构建多任务学习模型插件,可以是上文所述的服务器中多任务学习模型构建服务。
当然,不局限于此,本申请实施例提供的构建多任务学习模型的提取装置可以提供为各种软件实施例,包括应用程序、软件、软件模块、脚本或计算机代码在内的各种形式。
总而言之,本申请实施例提供的构建多任务学习模型的方法,可以实施为任意形式的计算机程序产品,并根据需要部署到各种电子设备中。
图3示出了存储在存储器550中的构建多任务学习模型的提取装置555,其可以是程序和插件等形式的软件,并包括一系列的模块,包括构建模块5551、采样模块5552以及生成模块5553;其中,构建模块5551、采样模 块5552以及生成模块5553用于实现本申请实施例提供的多任务学习模型构建的功能。
根据上文可以理解,本申请实施例提供的构建多任务学习模型的方法可以由各种类型的用于构建多任务学习模型的电子设备实施,例如智能终端和服务器等。
下面结合本申请实施例提供的服务器的示例性应用和实施,说明本申请实施例提供的构建多任务学习模型的方法。参见图4,图4是本申请实施例提供的构建多任务学习模型的方法的流程示意图,结合图4示出的步骤进行说明。
在下面的步骤中,涉及的输入节点和任务节点,分别对应多任务学习模型的入口和出口,输入节点接收的数据,用于作为多个(即至少2个)任务节点进行分类或预测任务的基础,任务节点的数量与具体应用场景中需要实施的分类或预测的任务的数量相关。
在步骤101中,在输入节点与多个任务节点之间,通过将子网络层与搜索层交错排布的方式,构建由多个子网络层以及多个搜索层构成的搜索空间。
作为获取样本数据的示例,开发人员可以在终端输入样本数据集,输入完成后,终端自动将样本数据集发送至服务器,服务器接收样本数据集。对于推荐的应用场景,该样本数据为推荐样本数据,例如,对于新闻推荐应用,样本数据为新闻样本数据;对于商品推荐应用,样本数据为商品样本数据;对于电影推荐应用,样本数据为电影样本数据。
在服务器接收到样本数据集后,调用构建多任务学习模型的程序,在输入节点与多个任务节点之间,构建由多个子网络层以及多个搜索层构成的搜索空间,其中,子网络层与搜索层是交错排布的,每个子网络层包括多个子网络模块,每个搜索层包括多个搜索块,例如,输入节点与第1个子网络层连接,第1个子网络层与第1个搜索层连接,第1个搜索层与第2个子网络层连接,以此类推,直至最后一个搜索层与任务节点连接,即实现在输入节点与多个任务节点之间,构建由多个子网络层以及多个搜索层构成的搜索空间,在确定搜索空间后,从搜索空间中获得多任务学习模型,以通过多任务学习模型进行多任务预测。
其中,对于推荐的应用场景,在输入节点与多个用于推荐预测的任务节点之间,通过将子网络层与搜索层交错排布的方式,构建由多个子网络层以及多个搜索层构成的搜索空间。其中,输入节点的输入为推荐数据,例如商品数据、新闻数据等,任务节点的输出为针对推荐数据的预测结果,例如点击率、完成度(例如,视频观看的完成程度、新闻浏览的时间)等。
其中,当搜索层的后驱节点为子网络层中的子网络模块时,搜索层中的搜索块的输出为子网络模块的输入;当搜索层的后驱节点为任务节点时,搜索层中的搜索块的输出为任务节点的输入。
参见图5,图5是本申请实施例提供的构建多任务学习模型的方法的一个可选的流程示意图,图5示出图4还包括步骤104-步骤105:在步骤104中,对子网络层中的多个子网络模块的输出进行抽样处理,得到多个抽样后的子网络模块的输出;在步骤105中,根据多个子网络模块中各子网络模块的权重,对多个抽样后的子网络模块的输出进行加权求和,将加权求和的结果作为搜索块中的局部结构的输出,以构建搜索块的传输路径,其中,搜索块为与子网络层相邻的搜索层中的模块。
作为示例,在构建搜索空间之前,构建每个搜索层中搜索块的结构。对子网络层中的多个子网络模块的输出进行抽样处理,以获得多个抽样后的子网络模块的输出,如图8所示,某子网络层中有3个子网络模块,对该3个子网络模块的输出(v1、v2、v3)进行抽样,可以得到7种抽样结果,即(v1)、(v2)、(v3)、(v1和v2)、(v1和v3)、(v2和v3)、(v1、v2和v3),当采样获得的是一个子网络模块的输出时,例如(v1)、(v2)、(v3),将该子网络模块的输出作为搜索块中的局部结构的输出,以构建该搜索块中的传输路径,该搜索块为与子网络层相邻的搜索层中的模块;当采样获得的是多个子网络模块的输出时,例如(v1和v2)、(v1和v3)、(v2和v3)、(v1、v2和v3),则根据多个子网络模块中各子网络模块的权重,对多个子网络模块的输出进行加权求和,将加权求和的结果作为搜索块中的局部结构的输出,以构建搜索块的传输路径,该搜索块为与子网络层相邻的搜索层中的模块。因此,通过构建搜索块中的多条传输路径,使得后续构建的搜索空间能够包含足够多可能的网络结构,从而可以解决特定的多任务学习问题。
在一些实施例中,搜索块还包括门控节点;对子网络层中的多个子网络模块的输出进行抽样处理之后,还包括:从子网络层的信号源集合中采样一个信号源,信号源为输入节点的输出或子网络层的前驱子网络模块的输出;通过门控节点对信号源进行预测处理,得到多个子网络模块中各子网络模块的预测值;对各子网络模块的预测值进行归一化处理,得到各子网络模块的权重。
承接上述示例,为了构建搜索块中的多条传输路径,对于某子网络层,服务器可以从该子网络层的信号源集合中采样一个信号源,并通过门控节点对该信号源进行预测,以得到该子网络层中多个子网络模块中每个子网络模块的预测值,即
Figure PCTCN2021095977-appb-000006
e=[e 1,e 2,...,e s],其中,e表示该子网络层中多个子网络模块的预测值,
Figure PCTCN2021095977-appb-000007
表示信号源,w k表示门控的可学习参数。在服务器得到每个子网络模块的预测值后,可以归一化子网络模块的预测值,以得到每个子网络模块的权重,即
Figure PCTCN2021095977-appb-000008
其中,s表示子网络模块的数量。因此,通过不同的信号源,以确定出不同的权重,从而构建搜索块中的多条传输路径,使得后续构建的搜索空间能够包含足够多可能的 网络结构,从而可以解决特定的多任务学习问题。
在一些实施例中,搜索空间包括N个子网络层以及N个搜索层,其中,N为大于1的自然数;构建由多层的子网络层以及多层的搜索层构成的搜索空间之前,包括:通过第1个搜索层中的第i个搜索块,从第1个子网络层中采样出多个子网络模块的输出,其中,i为正整数,并当信号源为输入节点的输出时,根据多个子网络模块中各子网络模块的权重,对多个子网络模块的输出进行加权求和,将加权求和的结果作为第i个搜索块中的局部结构的输出,以构建第i个搜索块中的传输路径,直至完成第i个搜索块中所有局部结构的传输路径的构建;通过第j个搜索层中的第i个搜索块,从第j个子网络层中采样出多个子网络模块的输出,其中,1<j≤N,j为自然数,并当信号源为输入节点或者第j个子网络层的前驱子网络模块的输出时,根据多个子网络模块中各子网络模块的权重,对多个子网络模块的输出进行加权求和,将加权求和的结果作为第j个搜索层中的第i个搜索块中的局部结构的输出,以构建第j个搜索层中的第i个搜索块中的传输路径,直至完成第j个搜索层中的第i个搜索块中所有局部结构的传输路径的构建。
作为示例,通过第1个搜索层中的第i个搜索块,从第1个子网络层中采样出一个子网络模块或者多个子网络模块的输出,该第1个搜索层的信号源为输入节点的输出,则通过门控节点对信号源进行预测处理,得到多个子网络模块中各子网络模块的预测值,对各子网络模块的预测值进行归一化处理,得到各子网络模块的权重,并根据多个子网络模块中各子网络模块的权重,对多个子网络模块的输出进行加权求和,将加权求和的结果作为第i个搜索块中的局部结构的输出,以构建第i个搜索块中的传输路径,直至完成第i个搜索块中所有局部结构的传输路径的构建。
通过第j个搜索层中的第i个搜索块,从第j个子网络层中采样出多个子网络模块的输出,该第1个搜索层的信号源为输入节点的输出或者第j个子网络层的前驱子网络模块的输出,则通过门控对信号源进行预测处理,得到多个子网络模块中各子网络模块的预测值,对各子网络模块的预测值进行归一化处理,得到各子网络模块的权重,并根据多个子网络模块中各子网络模块的权重,对多个子网络模块的输出进行加权求和,将加权求和的结果作为第j个搜索层中的第i个搜索块中的局部结构的输出,以构建第j个搜索层中的第i个搜索块中的传输路径,直至完成第j个搜索层中的第i个搜索块中所有局部结构的传输路径的构建,从而完成所有搜索块中局部结构的构建。
在一些实施例中,构建由多个子网络层以及多个搜索层构成的搜索空间,包括:将输入节点到第一个子网络层的传输路径、中间的子网络层到相邻的搜索层的传输路径以及最后一个搜索层到任务节点的传输路径,作为有向图的边;将多个子网络层中的子网络模块以及多个搜索层中的搜索块,作为有向图的节点;对有向图的节点以及边进行组合,以构建用于多 任务学习的搜索空间。
作为示例,可以以有向图的方式构建搜索空间。将输入节点到第一子网络层的传输路径作为有向图的边,还可以将中间的子网络层(第一子网络层至最后一个子网络层)到相邻的搜索层的传输路径作为有向图的边,例如,第二子网络层到相邻的第二搜索层的传输路径,并将多个子网络层中的子网络模块以及多个搜索层中的搜索块,作为有向图的节点,则根据有向图的节点以及边,构建用于多任务学习的搜索空间。后续可以对有向图的边进行采样,以实现对搜索空间的采样,以获得候选网络结构。
在步骤102中,对从输入节点、经由搜索空间而到达每个任务节点的路径进行采样,以得到候选路径,并作为候选网络结构。
在服务器构建搜索空间后,可以采样从输入节点、经由搜索空间而到达每个任务节点的路径,以确定候选网络结构。由于搜索空间包含足够多可能的网络结构,因此,从输入节点、经由搜索空间而到达每个任务节点的路径进行采样,所得到的候选网络结构包含各种各样的结构,从而可以解决特定的多任务学习问题。
参见图6,图6是本申请实施例提供的构建多任务学习模型的方法的一个可选的流程示意图,图6示出图4的步骤102可通过图6中的步骤1021-步骤1022实现:在步骤1021中,根据搜索空间的结构参数,对搜索空间中搜索层的每个搜索块进行采样,得到对应每个搜索块的局部结构;在步骤1022中,将从输入节点、经由每个搜索块的局部结构而到达每个任务节点的路径作为候选路径。
作为示例,由于搜索空间中的每个搜索块都包含多个局部结构,因此,可以先根据搜索空间的结构参数,采样搜索空间中的每个搜索块,采样得到每个搜索块的局部结构(传输路径),并将从输入节点、经由每个搜索块的局部结构而到达每个任务节点的路径作为候选路径,从而形成候选网络结构。
在一些实施例中,根据搜索空间的结构参数,对搜索空间中搜索层中的每个搜索块进行采样,得到对应每个搜索块的局部结构,包括:对搜索空间的结构参数进行映射处理,得到对应搜索空间中每个搜索块中的局部结构的采样概率;根据每个搜索块中的局部结构的采样概率,构建每个搜索块的多项式分布;对每个搜索块的多项式分布进行采样处理,得到对应每个搜索块的局部结构。
承接上述示例,为了采样得到每个搜索块的局部结构,可以先对搜索空间的结构参数进行映射,得到每个搜索块中的各局部结构的采样概率,并根据每个搜索块中的各局部结构的采样概率,构建每个搜索块的多项式分布,最后根据每个搜索块的多项式分布,对每个搜索块中的局部结构进行采样,得到对应每个搜索块的局部结构,例如当搜索空间包括B个搜索块,对每个搜索块中的多个局部结构进行采样,得到对应的一个局部结构, 则可以得到B个局部结构,组合该B个局部结构、输入节点、子网络模块以及任务节点,即可得到完整的某一候选网络结构。
在步骤103中,根据样本数据,对候选网络结构的参数进行训练,以生成用于进行多个任务预测的多任务学习模型。
在服务器根据搜索空间,采样得到候选网络结构后,对候选网络结构的参数进行训练,迭代采样以及训练操作,可以生成用于进行多个任务预测的多任务学习模型。对于推荐的应用场景,可以根据推荐样本数据,对候选网络结构的参数进行训练,以生成用于进行多个推荐预测的多任务学习模型。例如,任务节点的输出为新闻的点击率、完成度,则根据新闻样本数据,对候选网络结构的参数进行训练,以生成用于进行多个任务预测的多任务学习模型,该多任务学习模型用于预测新闻的点击率、完成度,后续根据新闻的点击率、完成度进行新闻推荐。
参见图7,图7是本申请实施例提供的构建多任务学习模型的方法的一个可选的流程示意图,图7示出图4的步骤103可通过图7中的步骤1031-步骤1033实现:在步骤1031中,对候选网络结构的网络参数进行训练,得到优化后的候选网络结构的网络参数;在步骤1032中,根据优化后的候选网络结构的网络参数,对搜索空间的结构参数进行训练,得到优化后的搜索空间的结构参数;在步骤1033中,根据优化后的搜索空间的结构参数,从各优化后的候选网络结构中确定出用于进行多个任务预测的候选网络结构,以作为多任务学习模型。
作为示例,在服务器采样得到候选网络结构后,可以先对候选网络结构的网络参数进行训练,再对结构参数进行训练,也可以先对结构参数进行训练,再对网络参数进行训练。例如,可以对候选网络结构的网络参数进行训练,得到优化后的候选网络结构的网络参数后,再根据优化后的候选网络结构(由于候选网络结构是通过网络参数构成的,因此候选网络结构的网络参数优化后,即优化了候选网络结构),对搜索空间的结构参数进行训练,得到优化后的搜索空间的结构参数,最后根据优化后的搜索空间的结构参数,从各优化后的候选网络结构中确定出用于进行多个任务预测的候选网络结构,以作为多任务学习模型。其中,网络参数是指网络结构中各模块(例如子网络模块、搜索块、任务节点等)进行计算时的参数,结构参数用于表征搜索空间中搜索块中局部结构被采样的可能性,例如第i个搜索块包括N个局部结构,则结构参数α i为N维向量,其中结构参数α i中的值越大,则该值对应的局部结构被采样的可能性越大。
在一些实施例中,对候选网络结构的网络参数进行训练,得到优化后的候选网络结构的网络参数,包括:通过候选网络结构对样本数据进行多任务预测处理,得到样本数据的多任务的预测结果;根据多任务的预测结果以及样本数据的多任务标签,构建候选网络结构的损失函数;更新候选网络结构的网络参数直至损失函数收敛,将损失函数收敛时候选网络结构 的更新的参数,作为优化后的候选网络结构的网络参数。
其中,根据多任务的预测结果以及样本数据的多任务标签,确定候选网络结构的损失函数的值后,可以判断损失函数的值是否超出预设阈值,当损失函数的值超出预设阈值时,基于损失函数确定候选网络结构的误差信号,将误差信息在候选网络结构中反向传播,并在传播的过程中更新各个层的模型参数。
这里,对反向传播进行说明,将训练样本数据输入到神经网络模型的输入层,经过隐藏层,最后达到输出层并输出结果,这是神经网络模型的前向传播过程,由于神经网络模型的输出结果与实际结果有误差,则计算输出结果与实际值之间的误差,并将该误差从输出层向隐藏层反向传播,直至传播到输入层,在反向传播的过程中,根据误差调整模型参数的值;不断迭代上述过程,直至收敛。其中,候选网络结构属于神经网络模型。
在一些实施例中,根据优化后的候选网络结构的网路参数,对搜索空间的结构参数进行训练,得到优化后的搜索空间的结构参数,包括:通过样本数据以及优化后的候选网络结构的网络参数进行网路结构的评估处理,得到优化后的候选网络结构的评估结果;根据评估结果,构建搜索空间的结构参数的目标函数;更新搜索空间的结构参数直至目标函数收敛,将目标函数收敛时搜索空间的更新的结构参数,作为优化后的搜索空间的结构参数。
作为示例,在服务器得到优化后的候选网络结构后,通过优化后的候选网络结构对样本数据进行预测,得到多任务的预测结果,并根据多任务的预测结果,对优化后的候选网络结构进行评估,得到优化后的候选网络结构的评估结果,例如准确率、ROC曲线下方的面积大小(Area Under Curve,AUC)、损失等,并根据评估结果,构建搜索空间的结构参数的目标函数,即
Figure PCTCN2021095977-appb-000009
其中p(α)表示由结构参数α确定的多项式分布,R val表示优化后的候选网络结构的评估结果,更新搜索空间的结构参数直至目标函数收敛,将目标函数收敛时搜索空间的更新的结构参数,作为优化后的搜索空间的结构参数。
在一些实施例中,根据优化后的搜索空间的结构参数,从各优化后的候选网络结构中确定出用于进行多个任务预测的候选网络结构,以作为多任务学习模型,包括:对优化后的搜索空间的结构参数进行映射处理,得到对应搜索空间中每个搜索块中的局部结构的采样概率;将每个搜索块中的局部结构的最大采样概率对应的局部结构,作为用于进行多个任务预测的候选网络结构的局部结构;将每个候选网络结构的局部结构进行组合,得到多任务学习模型。
作为示例,在服务器得到优化后的搜索空间的结构参数后,可以根据优化后的搜索空间的结构参数,从搜索空间中搜索出最优的网络结构。对优化后的搜索空间的结构参数进行映射,例如逻辑回归函数(softmax函数), 得到对应搜索空间中每个搜索块中的局部结构的采样概率,并将每个搜索块中的局部结构的最大采样概率对应的局部结构,作为用于进行多个任务预测的候选网络结构的局部结构,最后组合每个候选网络结构的局部结构,以得到多任务学习模型。
下面,将说明本申请实施例在一个实际的应用场景中的示例性应用。
本申请实施例可以应用于各种推荐的应用场景中,如图2所示,终端200通过网络300连接部署在云端的服务器100,在终端200上安装多任务学习模型构建应用,开发人员在多任务学习模型构建应用中,输入推荐样本数据集,终端200通过网络300向服务器100发送该推荐样本数据集,服务器100接收到推荐样本数据集后,从构建的搜索空间中确定出最优的网络结构,以作为多任务学习模型,后续根据多任务学习模型进行推荐应用,例如,对于新闻推荐应用,通过多任务学习模型预估新闻的点击率以及完成度,从而根据新闻的点击率以及完成度进行个性化的新闻推荐;对于商品推荐应用,通过多任务学习模型预估商品的点击率(CTR)以及转化率(CVR),从而根据商品的点击率以及转化率进行个性化的商品推荐;对于电影推荐应用,通过多任务学习模型预估电影的购买率以及用户的评分,从而根据电影的购买率以及用户的评分进行个性化的电影推荐。
虽然,相关技术中的多门控多专家方法可以进行多任务学习,但存在几个问题,分别为1)多门控多专家模型(Multi-gate Mixture-of-Experts,MMOE)中的所有专家被所有任务共享,但是这不一定是最优的方式;2)多门控多专家模型中专家的组合是线性的(加权和),表征能力受限;3)当专家层增多时,门控的输入选择难以确定。
为了解决上述问题,本申请实施例从神经网络架构搜索的角度出发,使用搜索算法在搜索空间中寻找最优的网络结构,以极大地缓解人工调整网络结构的成本。首先,设计了一个搜索空间,该空间枚举了子网络模块(专家)之间、子网络模块与任务之间的对应关系。由于该搜索空间可以是多层的,并且门控的输入来源也纳入到搜索空间中,即搜索空间中包括上述多门控多专家模型。本申请实施例使用多项式分布采样与策略梯度算法以可微分的方式高效地在搜索空间中寻找最优网络结构,以作为多任务学习模型,从而能够达到比多门控多专家方法更好的效果。
下面,说明本申请实施例提供的构建多任务学习模型的方法,该方法包括两个部分,分别为:1)搜索空间的构建;2)搜索算法。
1)搜索空间的构建
构建搜索空间的目标是使得搜索空间包含足够多可能的网络结构,从而可以解决特定的多任务学习问题。首先把参数共享的部分划分为若干个子网络。假设对于一个有T个任务的机器学习问题,子网络层(专家)有L层,每层有H个子网络模块。
从整体上说,搜索空间由若干个搜索块(Search Block)组成。每一个搜索块代表一个子搜索空间,包含若干个局部网络结构(例如,子网络模块之间的连接)。下面,介绍一个搜索块的具体结构:
如图8所示,一个搜索块表示一个子搜索空间,其中包含多种不同的局部网络结构(局部结构)。对于某个局部结构来说,使用门控实现输入特征的动态聚合。其中,局部结构受两个因素的影响,分别为:1)不同的输入(组合);2)不同的门控信号来源(信号源)。
其中,一个搜索块表示的子搜索空间可以形式化表示为
Figure PCTCN2021095977-appb-000010
其中
Figure PCTCN2021095977-appb-000011
Figure PCTCN2021095977-appb-000012
均表示集合,×表示笛卡尔积,
Figure PCTCN2021095977-appb-000013
表示所有输入特征的组合(其中,输入来自上一层子网络模块的输出),集合
Figure PCTCN2021095977-appb-000014
表示为
Figure PCTCN2021095977-appb-000015
表示所有可能的门控信号来源,例如所有前子网络层的输入以及原始共享输入均可作为信号来源,
Figure PCTCN2021095977-appb-000016
表示子搜索空间。即一个搜索块中一共可有
Figure PCTCN2021095977-appb-000017
个不同的局部结构。对于搜索块中任一一个局部结构(第k个局部结构,
Figure PCTCN2021095977-appb-000018
),第k个局部结构的输入为
Figure PCTCN2021095977-appb-000019
(s个输入特征,每个特征为d v维)和
Figure PCTCN2021095977-appb-000020
(门控信号源,维度为d q),第k个局部结构的输出为y k,即输入特征的加权和,计算公式如公式(1)所示:
Figure PCTCN2021095977-appb-000021
其中,
Figure PCTCN2021095977-appb-000022
e=[e 1,e 2,...,e s],g k表示该局部网络的门控,m i表示第i个输入特征的门控得分(预测值),w k表示门控的可学习参数。
如图9所示,搜索空间中的搜索块位于相邻两层子网络层之间,或者位于最后一层子网络层到任务层(包含多个任务节点)之间。因此,搜索块的总数为B=(L-1)*H+T,其中,T表示任务节点的数量。本申请实施例的搜索空间
Figure PCTCN2021095977-appb-000023
可以表示为B个搜索块所表示的空间的笛卡尔积
Figure PCTCN2021095977-appb-000024
即物理构成可以视作一个超网络(Over-Parameterized Network),这个超网络可以包括复杂多样的网络结构。
2)搜索算法
本申请实施例的目标是从上述超网络中寻找到效果最好的一种网络结构。每个搜索块中包含
Figure PCTCN2021095977-appb-000025
种局部结构,其中,i∈[1,2,...,B],通过在每个搜索块中选择出一个局部结构,组合所有局部结构即可确定出一个完整的网络结构。定义一个完整的网络结构为
Figure PCTCN2021095977-appb-000026
表示B个采样动作所确定出的B个局部结构,w u表示该网络结构的网络参数(网络参数是指网络结构中各模块进行计算时的参数,例如公式1中的w k)。
对于结构参数的优化,设定采样动作u i(i∈[1,2,...,B])采样自一个由结 构参数
Figure PCTCN2021095977-appb-000027
(i∈[1,2,...,B])决定的多项式分布,其中,结构参数α i用于表征第i个搜索块中局部结构被采样的可能性,例如第i个搜索块包括N个局部结构,则结构参数α i为N维向量,其中结构参数α i中的值越大,则该值对应的局部结构被采样的可能性越大,其计算公式如公式(2)、(3)所示:
u i~multinomial(p i)         (2)
p i=softmax(α i)            (3)
其中,multinomial()表示多项式分布,softmax()表示逻辑回归函数,p i表示第i个搜索块中局部结构被采样的概率。因此,通过对B个多项式分布进行采样,可以得到一个完整的网络结构。为了能够应对不可微分的评价指标,本申请实施例使用强化学习策略梯度(REINFORCE)算法来优化结构参数。在结构参数的优化过程中,在指定评价指标上表现好的网络结构会具有更高的采样概率,其结构参数的优化目标的公式如公式(4)所示:
Figure PCTCN2021095977-appb-000028
其中,p(α)表示由结构参数α确定的多项式分布,R val表示采样到的结构在某个特定指标(例如,准确率、ROC曲线下方的面积大小(Area Under Curve,AUC)、损失等)上的得分(评估结果)。根据REINFORCE算法,结构参数的梯度通过如下公式(5)获得:
Figure PCTCN2021095977-appb-000029
其中,b表示用来降低回报方差的基准,可以采用移动平均值作为基准,b也可以为0。
如图10所示,在每一次迭代的过程中,从超网络中采样出一个候选的网络结构,然后交替训练结构参数以及相应的网络参数,随着迭代的进行,表现优秀的网络结构的被采样的概率会增大。在搜索完成之后,在每一个搜索块中选择出最大概率的局部结构,从而组合所有最大概率的局部结构,以获得完成网络结构。其中,搜索过程以及获得最优网络结构的伪代码如下算法1所示:
算法1:搜索过程以及获得最优网络结构
输入:训练样本数据、验证数据以及包括B个搜索块的超网络
输出:优化的结构参数α和网络参数w
初始化结构参数α和网络参数w
while结构参数α和网络参数w不收敛do
for在超网络中的搜索块
Figure PCTCN2021095977-appb-000030
do
通过公式(3)计算局部结构被采样的概率
通过公式(2)采样局部结构u i
end for
获得一个网络结构
Figure PCTCN2021095977-appb-000031
其中,
Figure PCTCN2021095977-appb-000032
通过梯度下降
Figure PCTCN2021095977-appb-000033
更新网络参数w
通过梯度上升公式(5)更新结构参数α
end while
return基于优化的结构参数α和网络参数w,获得最终的网络结构
因此,通过输入训练样本数据、验证数据以及包括B个搜索块的超网络,可以得到优化的结构参数α和网络参数w,并基于优化的结构参数α和网络参数w,获得最终的网络结构,以作为多任务学习模型。
综上,本申请实施例可以高效地对指定多任务数据集进行网络结构的寻优,自动地平衡不同任务分支的独立与共享关系,从而搜索到更好的网络结构,以作为多任务学习模型。多任务学习在推荐系统中非常重要,可用于业务推荐场景下多任务学习(多个分发指标预估:如预估点击率与完成度等目标)中的网络结构寻优,充分利用不同任务(指标)中蕴含的领域知识提升多任务学习模型的泛化能力,从而快速且准确的获得推荐系统的特定指标。相比于以人工试错的方式去设计网络结构,本申请实施例可以更高效地针对具体业务的训练数据学习到最合适的网络结构,加速产品迭代升级。
至此已经说明本申请实施例提供的构建多任务学习模型的方法,下面继续说明本申请实施例提供的构建多任务学习模型的装置555中各个模块配合实现构建多任务学习模型的方案。
构建模块5551,用于在输入节点与多个任务节点之间,通过将子网络层与搜索层交错排布的方式,构建由多个所述子网络层以及多个所述搜索层构成的搜索空间;采样模块5552,用于对从所述输入节点、经由所述搜索空间而到达每个所述任务节点的路径进行采样,得到候选路径,并作为候选网络结构;生成模块5553,用于根据样本数据,对所述候选网络结构的参数进行训练,生成用于进行多个任务预测的多任务学习模型。
在一些实施例中,所述构建模块5551还用于对所述子网络层中的多个子网络模块的输出进行抽样处理,得到多个抽样后所述子网络模块的输出;根据所述多个子网络模块中各子网络模块的权重,对多个抽样后的所述多个子网络模块的输出进行加权求和,将加权求和的结果作为搜索块中的局部结构的输出,以构建所述搜索块的传输路径;其中,所述搜索块为与所述子网络层相邻的搜索层中的模块。
在一些实施例中,所述搜索块还包括门控节点;所述构建模块5551还用于从所述子网络层的信号源集合中采样一个信号源,所述信号源为所述输入节点的输出或所述子网络层的前驱子网络模块的输出;通过所述门控节点对所述信号源进行预测处理,得到所述多个子网络模块中各子网络模块的预测值;对所述各子网络模块的预测值进行归一化处理,得到所述各子网络模块的权重。
在一些实施例中,所述搜索空间包括N个子网络层以及N个搜索层, 其中,N为大于1的自然数;所述构建模块5551还用于通过第1个搜索层中的第i个搜索块,从第1个子网络层中采样出多个子网络模块的输出,其中,i为正整数,并当所述信号源为所述输入节点的输出时,根据所述多个子网络模块中各子网络模块的权重,对所述多个子网络模块的输出进行加权求和,将加权求和的结果作为第i个搜索块中的局部结构的输出,以构建所述第i个搜索块中的传输路径,直至完成所述第1个搜索层中的第i个搜索块中所有局部结构的传输路径的构建;通过第j个搜索层中的第i个搜索块,从第j个子网络层中采样出多个子网络模块的输出,其中,1<j≤N,j为自然数,并当所述信号源为所述输入节点或者第j个子网络层的前驱子网络模块的输出时,根据所述多个子网络模块中各子网络模块的权重,对所述多个子网络模块的输出进行加权求和,将加权求和的结果作为第j个搜索层中的第i个搜索块中的局部结构的输出,以构建所述第j个搜索层中的第i个搜索块中的传输路径,直至完成所述第j个搜索层中的第i个搜索块中所有局部结构的传输路径的构建。
在一些实施例中,当所述搜索层的后驱节点为所述子网络层中的子网络模块时,所述搜索层中的搜索块的输出为所述子网络模块的输入;当所述搜索层的后驱节点为所述任务节点时,所述搜索层中的搜索块的输出为所述任务节点的输入。
在一些实施例中,所述构建模块5551还用于将所述输入节点到第一子网络层的传输路径、中间的所述子网络层到相邻的所述搜索层的传输路径以及最后一个搜索层到所述任务节点的传输路径,作为有向图的边;将多个所述子网络层中的子网络模块以及多个所述搜索层中的搜索块,作为所述有向图的节点;对所述有向图的节点以及边进行组合,得到用于多任务学习的搜索空间。
在一些实施例中,所述采样模块5552还用于根据所述搜索空间的结构参数,对所述搜索空间中搜索层的每个搜索块进行采样,得到对应所述每个搜索块的局部结构;将从所述输入节点、经由所述每个搜索块的局部结构而到达每个所述任务节点的路径作为候选路径。
在一些实施例中,所述采样模块5552还用于对所述搜索空间的结构参数进行映射处理,得到对应所述搜索空间中每个搜索块中的局部结构的采样概率;根据所述每个搜索块中的局部结构的采样概率,构建所述每个搜索块的多项式分布;对所述每个搜索块的多项式分布进行采样处理,得到对应所述每个搜索块的局部结构。
在一些实施例中,所述生成模块5553还用于根据优化后的所述候选网络结构的网络参数,对所述搜索空间的结构参数进行训练,得到优化后的所述搜索空间的结构参数;根据优化后的所述搜索空间的结构参数,从各优化后的所述候选网络结构中确定出用于进行多个任务预测的候选网络结构,并作为所述多任务学习模型。
在一些实施例中,所述生成模块5553还用于通过所述候选网络结构对所述样本数据进行多任务预测处理,得到所述样本数据的多任务的预测结果;根据所述多任务的预测结果以及所述样本数据的多任务标签,构建所述候选网络结构的损失函数;更新所述候选网络结构的网络参数直至所述损失函数收敛,将所述损失函数收敛时所述候选网络结构的更新的参数,作为优化后的所述候选网络结构的网络参数。
在一些实施例中,所述生成模块5553还用于通过所述样本数据以及优化后的所述候选网络结构的网络参数进行网络结构的评估处理,得到优化后的所述候选网络结构的评估结果;根据所述评估结果,构建所述搜索空间的结构参数的目标函数;更新所述搜索空间的结构参数直至所述目标函数收敛,将所述目标函数收敛时所述搜索空间的更新的结构参数,作为优化后的搜索空间的结构参数。
在一些实施例中,所述生成模块5553还用于对优化后的所述搜索空间的结构参数进行映射处理,得到对应所述搜索空间中每个搜索块中的局部结构的采样概率;将所述每个搜索块中的局部结构的最大采样概率对应的局部结构,作为用于进行多个任务预测的候选网络结构的局部结构;将每个所述候选网络结构的局部结构进行组合,得到所述多任务学习模型。
以上所述,仅为本申请的实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和范围之内所作的任何修改、等同替换和改进等,均包含在本申请的保护范围之内。

Claims (15)

  1. 一种构建多任务学习模型的方法,所述方法包括:
    在输入节点与多个任务节点之间,通过将子网络层与搜索层交错排布的方式,构建由多个所述子网络层以及多个所述搜索层构成的搜索空间;
    对从所述输入节点、经由所述搜索空间而到达每个所述任务节点的路径进行采样,得到候选路径,并作为候选网络结构;
    根据样本数据,对所述候选网络结构的参数进行训练,生成用于进行多个任务预测的多任务学习模型。
  2. 根据权利要求1所述的方法,其中,所述构建由多层的子网络层以及多层的搜索层构成的搜索空间之前,所述方法还包括:
    对所述子网络层中的多个子网络模块的输出进行抽样处理,得到多个抽样后的所述子网络模块的输出;
    根据所述多个子网络模块中各子网络模块的权重,对多个抽样后的所述子网络模块的输出进行加权求和,将加权求和的结果作为搜索块的局部结构的输出,以构建所述搜索块的传输路径;
    其中,所述搜索块为与所述子网络层相邻的搜索层中的模块。
  3. 根据权利要求2所述的方法,其中,
    所述搜索块还包括门控节点;
    所述对所述子网络层中的多个子网络模块的输出进行抽样处理之后,所述方法还包括:
    从所述子网络层的信号源集合中采样一个信号源,所述信号源为所述输入节点的输出或所述子网络层的前驱子网络模块的输出;
    通过所述门控节点对所述信号源进行预测处理,得到所述多个子网络模块中各子网络模块的预测值;
    对所述各子网络模块的预测值进行归一化处理,得到所述各子网络模块的权重。
  4. 根据权利要求3所述的方法,其中,
    所述搜索空间包括N个子网络层以及N个搜索层,其中,N为大于1的自然数;
    所述构建由多层的子网络层以及多层的搜索层构成的搜索空间之前,所述方法还包括:
    通过第1个搜索层中的第i个搜索块,从第1个子网络层中采样出多个子网络模块的输出,其中,i为正整数,并
    当所述信号源为所述输入节点的输出时,根据所述多个子网络模块中各子网络模块的权重,对所述多个子网络模块的输出进行加权求和,将加权求和的结果作为第i个搜索块中的局部结构的输出,以构建所述第i个搜索块中的传输路径,直至完成所述第1个搜索层中的第i个搜索块中所有局 部结构的传输路径的构建;
    通过第j个搜索层中的第i个搜索块,从第j个子网络层中采样出多个子网络模块的输出,其中,1<j≤N,j为自然数,并
    当所述信号源为所述输入节点或者第j个子网络层的前驱子网络模块的输出时,根据所述多个子网络模块中各子网络模块的权重,对所述多个子网络模块的输出进行加权求和,将加权求和的结果作为第j个搜索层中的第i个搜索块中的局部结构的输出,以构建所述第j个搜索层中的第i个搜索块中的传输路径,直至完成所述第j个搜索层中的第i个搜索块中所有局部结构的传输路径的构建。
  5. 根据权利要求1所述的方法,其中,
    当所述搜索层的后驱节点为所述子网络层中的子网络模块时,所述搜索层中的搜索块的输出为所述子网络模块的输入;
    当所述搜索层的后驱节点为所述任务节点时,所述搜索层中的搜索块的输出为所述任务节点的输入。
  6. 根据权利要求1所述的方法,其中,所述构建由多个所述子网络层以及多个所述搜索层构成的搜索空间,包括:
    将所述输入节点到第一个子网络层的传输路径、中间的所述子网络层到相邻的所述搜索层的传输路径以及最后一个搜索层到所述任务节点的传输路径,作为有向图的边;
    将多个所述子网络层中的子网络模块以及多个所述搜索层中的搜索块,作为所述有向图的节点;
    对所述有向图的节点以及边进行组合,得到用于多任务学习的搜索空间。
  7. 根据权利要求1所述的方法,其中,所述对从所述输入节点、经由所述搜索空间而到达每个所述任务节点的路径进行采样,得到候选路径,包括:
    根据所述搜索空间的结构参数,对所述搜索空间中搜索层的每个搜索块进行采样,得到对应所述每个搜索块的局部结构;
    将从所述输入节点、经由所述每个搜索块的局部结构而到达每个所述任务节点的路径作为候选路径。
  8. 根据权利要求7所述的方法,其中,所述根据所述搜索空间的结构参数,对所述搜索空间中搜索层的每个搜索块进行采样,得到对应所述每个搜索块的局部结构,包括:
    对所述搜索空间的结构参数进行映射处理,得到对应所述搜索空间中每个搜索块中的局部结构的采样概率;
    根据所述每个搜索块中的局部结构的采样概率,构建所述每个搜索块的多项式分布;
    对所述每个搜索块的多项式分布进行采样处理,得到对应所述每个搜 索块的局部结构。
  9. 根据权利要求1所述的方法,其中,所述对所述候选网络结构的参数进行训练,生成用于进行多个任务预测的多任务学习模型,包括:
    对所述候选网络结构的网络参数进行训练,得到优化后的所述候选网络结构的网络参数;
    根据优化后的所述候选网络结构的网络参数,对所述搜索空间的结构参数进行训练,得到优化后的所述搜索空间的结构参数;
    根据优化后的所述搜索空间的结构参数,从各优化后的所述候选网络结构中确定出用于进行多个任务预测的候选网络结构,并作为所述多任务学习模型。
  10. 根据权利要求9所述的方法,其中,所述对所述候选网络结构的网络参数进行训练,得到优化后的所述候选网络结构的网络参数,包括:
    通过所述候选网络结构对所述样本数据进行多任务预测处理,得到所述样本数据的多任务的预测结果;
    根据所述多任务的预测结果以及所述样本数据的多任务标签,构建所述候选网络结构的损失函数;
    更新所述候选网络结构的网络参数直至所述损失函数收敛,将所述损失函数收敛时所述候选网络结构的更新的参数,作为优化后的所述候选网络结构的网络参数。
  11. 根据权利要求9所述的方法,其中,所述根据优化后的所述候选网络结构的网络参数,对所述搜索空间的结构参数进行训练,得到优化后的搜索空间的结构参数,包括:
    通过所述样本数据以及优化后的所述候选网络结构的网络参数进行网络结构的评估处理,得到优化后的所述候选网络结构的评估结果;
    根据所述评估结果,构建所述搜索空间的结构参数的目标函数;
    更新所述搜索空间的结构参数直至所述目标函数收敛,将所述目标函数收敛时所述搜索空间的更新的结构参数,作为优化后的搜索空间的结构参数。
  12. 根据权利要求9所述的方法,其中,所述根据优化后的所述搜索空间的结构参数,从各优化后的所述候选网络结构中确定出用于进行多个任务预测的候选网络结构,并作为所述多任务学习模型,包括:
    对优化后的所述搜索空间的结构参数进行映射处理,得到对应所述搜索空间中每个搜索块中的局部结构的采样概率;
    将所述每个搜索块中的局部结构的最大采样概率对应的局部结构,作为用于进行多个任务预测的候选网络结构的局部结构;
    将每个所述候选网络结构的局部结构进行组合,得到所述多任务学习模型。
  13. 一种构建多任务学习模型的装置,所述装置包括:
    构建模块,用于在输入节点与多个任务节点之间,通过将子网络层与搜索层交错排布的方式,构建由多个所述子网络层以及多个所述搜索层构成的搜索空间;
    采样模块,用于对从所述输入节点、经由所述搜索空间而到达每个所述任务节点的路径进行采样,得到候选路径,并作为候选网络结构;
    生成模块,用于根据样本数据,对所述候选网络结构的参数进行训练,生成用于进行多个任务预测的多任务学习模型。
  14. 一种电子设备,所述电子设备包括:
    存储器,用于存储可执行指令;
    处理器,用于执行所述存储器中存储的可执行指令时,实现权利要求1至12任一项所述的构建多任务学习模型的方法。
  15. 一种计算机可读存储介质,存储有可执行指令,用于引起处理器执行时,实现权利要求1至12任一项所述的构建多任务学习模型的方法。
PCT/CN2021/095977 2020-06-17 2021-05-26 构建多任务学习模型的方法、装置、电子设备及存储介质 WO2021254114A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/883,439 US20220383200A1 (en) 2020-06-17 2022-08-08 Method and apparatus for constructing multi-task learning model, electronic device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010555648.0A CN111723910A (zh) 2020-06-17 2020-06-17 构建多任务学习模型的方法、装置、电子设备及存储介质
CN202010555648.0 2020-06-17

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/883,439 Continuation US20220383200A1 (en) 2020-06-17 2022-08-08 Method and apparatus for constructing multi-task learning model, electronic device, and storage medium

Publications (1)

Publication Number Publication Date
WO2021254114A1 true WO2021254114A1 (zh) 2021-12-23

Family

ID=72567240

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/095977 WO2021254114A1 (zh) 2020-06-17 2021-05-26 构建多任务学习模型的方法、装置、电子设备及存储介质

Country Status (3)

Country Link
US (1) US20220383200A1 (zh)
CN (1) CN111723910A (zh)
WO (1) WO2021254114A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723910A (zh) * 2020-06-17 2020-09-29 腾讯科技(北京)有限公司 构建多任务学习模型的方法、装置、电子设备及存储介质
CN112232445B (zh) * 2020-12-11 2021-05-11 北京世纪好未来教育科技有限公司 多标签分类任务网络的训练方法和装置
CN112381215B (zh) * 2020-12-17 2023-08-11 之江实验室 一种面向自动机器学习的自适应搜索空间生成方法与装置
CN112733014A (zh) * 2020-12-30 2021-04-30 上海众源网络有限公司 推荐方法、装置、设备及存储介质
CN112860998B (zh) * 2021-02-08 2022-05-10 浙江大学 一种基于多任务学习机制的点击率预估方法
CN115705583A (zh) * 2021-08-09 2023-02-17 财付通支付科技有限公司 多目标预测方法、装置、设备及存储介质
CN116506072B (zh) * 2023-06-19 2023-09-12 华中师范大学 基于多任务联邦学习的mimo-noma系统的信号检测方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110168578A (zh) * 2017-01-30 2019-08-23 谷歌有限责任公司 具有任务特定路径的多任务神经网络
CN110443364A (zh) * 2019-06-21 2019-11-12 深圳大学 一种深度神经网络多任务超参数优化方法及装置
CN110956260A (zh) * 2018-09-27 2020-04-03 瑞士电信公司 神经架构搜索的系统和方法
CN111723910A (zh) * 2020-06-17 2020-09-29 腾讯科技(北京)有限公司 构建多任务学习模型的方法、装置、电子设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110168578A (zh) * 2017-01-30 2019-08-23 谷歌有限责任公司 具有任务特定路径的多任务神经网络
CN110956260A (zh) * 2018-09-27 2020-04-03 瑞士电信公司 神经架构搜索的系统和方法
CN110443364A (zh) * 2019-06-21 2019-11-12 深圳大学 一种深度神经网络多任务超参数优化方法及装置
CN111723910A (zh) * 2020-06-17 2020-09-29 腾讯科技(北京)有限公司 构建多任务学习模型的方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN111723910A (zh) 2020-09-29
US20220383200A1 (en) 2022-12-01

Similar Documents

Publication Publication Date Title
WO2021254114A1 (zh) 构建多任务学习模型的方法、装置、电子设备及存储介质
US9990558B2 (en) Generating image features based on robust feature-learning
US11556850B2 (en) Resource-aware automatic machine learning system
WO2021159776A1 (zh) 基于人工智能的推荐方法、装置、电子设备及存储介质
EP3711000B1 (en) Regularized neural network architecture search
WO2021047593A1 (zh) 推荐模型的训练方法、预测选择概率的方法及装置
CN113361680B (zh) 一种神经网络架构搜索方法、装置、设备及介质
Wang et al. Adaptive and large-scale service composition based on deep reinforcement learning
US20220027792A1 (en) Deep neural network model design enhanced by real-time proxy evaluation feedback
EP4290824A1 (en) Task allocation method and apparatus based on internet-of-things device, and network training method and apparatus
CN114997412A (zh) 一种推荐方法、训练方法以及装置
WO2022227217A1 (zh) 文本分类模型的训练方法、装置、设备及可读存储介质
CN116594748A (zh) 针对任务的模型定制处理方法、装置、设备和介质
CN115905687A (zh) 基于元学习图神经网络面向冷启动的推荐系统及方法
Song et al. Adaptive and collaborative edge inference in task stream with latency constraint
US20240054373A1 (en) Dynamic causal discovery in imitation learning
WO2023143121A1 (zh) 一种数据处理方法及其相关装置
CN116910357A (zh) 一种数据处理方法及相关装置
WO2023174064A1 (zh) 自动搜索方法、自动搜索的性能预测模型训练方法及装置
US20200302270A1 (en) Budgeted neural network architecture search system and method
WO2022252596A1 (zh) 构建ai集成模型的方法、ai集成模型的推理方法及装置
JP2024504179A (ja) 人工知能推論モデルを軽量化する方法およびシステム
CN114548382A (zh) 迁移训练方法、装置、设备、存储介质及程序产品
CN113191527A (zh) 一种基于预测模型进行人口预测的预测方法及装置
Heye Scaling deep learning without increasing batchsize

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21825625

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 260423)

122 Ep: pct application non-entry in european phase

Ref document number: 21825625

Country of ref document: EP

Kind code of ref document: A1