WO2021114625A1 - 用于多任务场景的网络结构构建方法和装置 - Google Patents

用于多任务场景的网络结构构建方法和装置 Download PDF

Info

Publication number
WO2021114625A1
WO2021114625A1 PCT/CN2020/099261 CN2020099261W WO2021114625A1 WO 2021114625 A1 WO2021114625 A1 WO 2021114625A1 CN 2020099261 W CN2020099261 W CN 2020099261W WO 2021114625 A1 WO2021114625 A1 WO 2021114625A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
sub
network
training
network model
Prior art date
Application number
PCT/CN2020/099261
Other languages
English (en)
French (fr)
Inventor
朱威
李恬静
何义龙
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021114625A1 publication Critical patent/WO2021114625A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method, device, computer equipment, and storage medium for constructing a network structure in a multi-task scenario.
  • Machine Learning is a branch of artificial intelligence.
  • the purpose of machine learning is to allow the machine to learn based on prior knowledge, so as to have the logical ability to classify and judge.
  • the machine learning model represented by neural networks is constantly developing and is increasingly applied to various industries.
  • the multi-task learning mechanism is widely used in the application of modern artificial intelligence products.
  • the original solution is to train a model for each sub-task. After deployment, each model must be trained once.
  • the training is time-consuming and predictive.
  • the speed is slow, and engineers manually try different neural network architectures, and then determine the target architecture based on the performance of the verification set. Due to the complexity of network architecture learning in multi-task scenarios, it is difficult to manually design a very good neural network structure.
  • the traditional automatic search method of model structure mainly aims at classification problems and cannot be directly applied to automatic search of model structure in multi-task scenarios. Constructing the model structure of multi-task scenes through manual methods has high complexity, low efficiency, and high system resource occupancy rate.
  • a method, apparatus, computer device, and storage medium for constructing a network structure for a multi-task scenario are provided.
  • a method for constructing a network structure in a multi-task scenario comprising:
  • the training set including multiple training sub-samples corresponding to different target semantic tasks, the training sub-samples including training sub-text data and training sub-label data;
  • the training sub-text data corresponding to each target semantic task is input step by step into the multi-task network model of the network structure to be determined, and the sub-prediction results corresponding to each target semantic task are obtained.
  • the sub-prediction results are adjusted according to the difference between the sub-prediction results and the corresponding training sub-label data. State the network parameters of the multi-task network model until the current target network parameters corresponding to the current network structure are obtained; and
  • Obtain the search space corresponding to the multi-task network model to form a differentiable network search space obtain a verification set, and adjust the structure of the multi-task network model corresponding to the current target network parameters by searching the differentiable network search space according to the verification set Parameter, when searching, divide the hidden state vector of the multi-task network model into a plurality of ordered sub-hidden state vectors, obtain the sub-hidden state vector corresponding to the current search in a preset order, and divide the sub-hidden state vector
  • the vector is input to the corresponding network layer for training, and the updated multi-task network model is obtained.
  • the target structure parameter is obtained, the network parameter matching the target structure parameter is obtained, and the trained multi-task network model is obtained according to the target structure parameter and the matched network parameter.
  • a network structure construction device used in a multi-task scenario comprising:
  • An obtaining module configured to obtain a training set, the training set includes a plurality of training sub-samples corresponding to different target semantic tasks, the training sub-samples include training sub-text data and training sub-label data;
  • the network parameter adjustment module is used to input the training sub-text data corresponding to each target semantic task step by step into the multi-task network model of the network structure to be determined, to obtain the sub-prediction result corresponding to each target semantic task, according to the sub-prediction result and the corresponding training Adjusting the network parameters of the multi-task network model by the difference of the sub-tag data until the current target network parameters corresponding to the current network structure are obtained;
  • the network structure building module is used to obtain the search space corresponding to the multi-task network model, form a differentiable network search space, obtain a verification set, and adjust the current target network parameter correspondence by searching the differentiable network search space according to the verification set
  • the structure parameter of the multi-task network model, the hidden state vector of the multi-task network model is divided into a plurality of ordered sub-hidden state vectors during the search, and the sub-hidden state corresponding to the current search is obtained in a preset order Vector, input the sub-hidden state vector into the corresponding network layer for training, obtain the updated multi-task network model, and return to the network parameter adjustment module until the output result of the multi-task network model on the verification set meets the convergence condition, and the target is obtained
  • the structure parameter obtains the network parameter matching the target structure parameter, and obtains the trained multi-task network model according to the target structure parameter and the matched network parameter.
  • a computer device including a memory and one or more processors, the memory stores computer readable instructions, and when the computer readable instructions are executed by the processor, the one or more processors execute The following steps:
  • the training set including multiple training sub-samples corresponding to different target semantic tasks, the training sub-samples including training sub-text data and training sub-label data;
  • the training sub-text data corresponding to each target semantic task is input step by step into the multi-task network model of the network structure to be determined, and the sub-prediction results corresponding to each target semantic task are obtained.
  • the sub-prediction results are adjusted according to the difference between the sub-prediction results and the corresponding training sub-label data. State the network parameters of the multi-task network model until the current target network parameters corresponding to the current network structure are obtained; and
  • Obtain the search space corresponding to the multi-task network model to form a differentiable network search space obtain a verification set, and adjust the structure of the multi-task network model corresponding to the current target network parameters by searching the differentiable network search space according to the verification set Parameter, when searching, divide the hidden state vector of the multi-task network model into a plurality of ordered sub-hidden state vectors, obtain the sub-hidden state vector corresponding to the current search in a preset order, and divide the sub-hidden state vector
  • the vector is input to the corresponding network layer for training, and the updated multi-task network model is obtained.
  • the target structure parameter is obtained, the network parameter matching the target structure parameter is obtained, and the trained multi-task network model is obtained according to the target structure parameter and the matched network parameter.
  • One or more computer-readable storage media storing computer-readable instructions.
  • the one or more processors perform the following steps:
  • the training set including multiple training sub-samples corresponding to different target semantic tasks, the training sub-samples including training sub-text data and training sub-label data;
  • the training sub-text data corresponding to each target semantic task is input step by step into the multi-task network model of the network structure to be determined, and the sub-prediction results corresponding to each target semantic task are obtained.
  • the sub-prediction results are adjusted according to the difference between the sub-prediction results and the corresponding training sub-label data. State the network parameters of the multi-task network model until the current target network parameters corresponding to the current network structure are obtained; and
  • Obtain the search space corresponding to the multi-task network model to form a differentiable network search space obtain a verification set, and adjust the structure of the multi-task network model corresponding to the current target network parameters by searching the differentiable network search space according to the verification set Parameter, when searching, divide the hidden state vector of the multi-task network model into a plurality of ordered sub-hidden state vectors, obtain the sub-hidden state vector corresponding to the current search in a preset order, and divide the sub-hidden state vector
  • the vector is input to the corresponding network layer for training, and the updated multi-task network model is obtained.
  • the target structure parameter is obtained, the network parameter matching the target structure parameter is obtained, and the trained multi-task network model is obtained according to the target structure parameter and the matched network parameter.
  • the above-mentioned network structure construction method, device, computer equipment and storage medium for multi-task scenarios can automatically discover the network structure that is most suitable for the existing multi-task scenario data set, and it can improve the multi-task system without manually trying many different models.
  • the accuracy of the system can effectively reduce the resource consumption of the differentiable search through partial links, and make the search convergence faster and more stable. While improving the accuracy of the system, it reduces the cost of manpower and computing resources required for system development, improves efficiency and reduces system resources Occupancy rate.
  • Fig. 1 is an application environment diagram of a method for constructing a network structure in a multi-task scenario according to one or more embodiments
  • FIG. 2 is a schematic flowchart of a method for constructing a network structure in a multi-task scenario according to one or more embodiments
  • Fig. 3 is a structural block diagram of a network structure construction device used in a multi-task scenario according to one or more embodiments;
  • Fig. 4 is a diagram of the internal structure of a computer device according to one or more embodiments.
  • Fig. 1 is a diagram of an application environment in which a method for constructing a network structure for a multi-task scenario runs in an embodiment.
  • the application environment includes a terminal 110 and a server 120.
  • the terminal and the server communicate through the network.
  • the communication network may be a wireless or wired communication network, such as an IP network, a cellular mobile communication network, etc., where the number of terminals and servers is not limited.
  • the terminal 110 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server can be implemented as an independent server or a server cluster composed of multiple servers.
  • the training set can be obtained at the terminal 110 or the server 120.
  • the training set includes multiple training sub-samples corresponding to different target semantic tasks.
  • the training sub-samples include training sub-text data and training sub-label data; the training sub-texts corresponding to each target semantic task
  • the data is input step by step into the multi-task network model of the network structure to be determined, and the sub-prediction results corresponding to each target semantic task are obtained, and the network parameters of the multi-task network model are adjusted according to the difference between the sub-prediction results and the corresponding training sub-label data, until Obtain the current target network parameters corresponding to the current network structure; obtain the search space corresponding to the multi-task network model to form a differentiable network search space, obtain the verification set, and adjust the current target network parameters by searching the differentiable network search space according to the verification set The structural parameters of the multi-task network model.
  • the hidden state vector of the multi-task network model is divided into multiple ordered sub-hidden state vectors, and the sub-hidden state vectors corresponding to the current search are obtained in a preset order.
  • the sub-hidden state vector is input to the corresponding network layer for training, and the updated multi-task network model is obtained.
  • the step of inputting the training sub-text data corresponding to each target semantic task into the multi-task network model of the network structure to be determined step by step.
  • the output result of the task network model on the verification set meets the convergence condition, the target structure parameters are obtained, the network parameters matching the target structure parameters are obtained, and the trained multi-task network model is obtained according to the target structure parameters and the matched network parameters.
  • a method for constructing a network structure for a multi-task scenario is provided. Taking the method applied to the terminal 110 or the server 120 in FIG. 1 as an example for description, the method includes the following steps :
  • Step 210 Obtain a training set.
  • the training set includes multiple training sub-samples corresponding to different target semantic tasks.
  • the training sub-samples include training sub-text data and training sub-label data.
  • the target semantic task is multiple different types of tasks corresponding to multi-task scenarios.
  • semantic analysis tasks include entity recognition, sentence classification, and intent recognition. Sentence pair similarity and other tasks.
  • the number of target semantic tasks corresponds to the target recognition result of the multi-task network model of the network structure to be determined, where the multi-task network model may be a semantic analysis network.
  • the target recognition result of the semantic analysis network includes entity recognition and purpose recognition of the input text
  • the target semantic task includes entity recognition task and purpose recognition task. For example, when receiving a user's question "How to eat metformin", it is necessary to identify the entity "metformin" inside, but also to identify the intention of the sentence, that is, the user wants to ask about the usage and dosage.
  • different target semantic tasks have corresponding training sub-samples to adapt to multi-task scenarios.
  • the first target semantic task corresponds to the first training sub-sample
  • the second target semantic task corresponds to the second training sub-sample
  • each training sub-sample Both include training sub-text data and training label data, where the training sub-label data is the task recognition result corresponding to the training text data for which the corresponding task result has been determined, and this task recognition result is used as the training sub-label data corresponding to the target semantic task.
  • Step 220 Input the training sub-text data corresponding to each target semantic task into the multi-task network model of the network structure to be determined step by step, and obtain the sub-prediction result corresponding to each target semantic task. According to the difference between the sub-prediction result and the corresponding training sub-label data Differentially adjust the network parameters of the multi-task network model until the current target network parameters corresponding to the current network structure are obtained.
  • the training sub-text data in the training sample is input step by step into the multi-task network model of the network structure to be determined in units of the target semantic task, where step-by-step means that the training sub-text data corresponding to the first target semantic task is input first, Obtain the first sub-prediction result corresponding to the first target semantic task, and then input the training sub-text data corresponding to the second target semantic task to obtain the second sub-prediction result corresponding to the second target semantic task until the corresponding target semantic task
  • the training sub-text data is input step by step to obtain the corresponding sub-prediction results.
  • Each sub-prediction result has corresponding training sub-label data, so that the sub-differences corresponding to each target semantic task are calculated, and the loss function is constructed according to each sub-difference, and then back-propagated in the direction of minimizing the loss function, and the multi-task network is adjusted
  • the network parameters of the model continue to be trained until the training end conditions are met.
  • the current target network parameters related to the structure of the multi-task network model that is, the current optimal weight w, are obtained.
  • Step 230 Obtain the search space corresponding to the multi-task network model, form a differentiable network search space, obtain a verification set, and adjust the structure parameters of the multi-task network model corresponding to the current target network parameters by searching the differentiable network search space according to the verification set, and search
  • search When dividing the hidden state vector of the multi-task network model into multiple ordered sub-hidden state vectors, obtain the sub-hidden state vector corresponding to the current search in a preset order, and input the sub-hidden state vector into the corresponding network
  • the layer is trained to obtain the updated multi-task network model, and return to the step of inputting the training sub-text data corresponding to each target semantic task into the multi-task network model of the network structure to be determined, until the output of the multi-task network model on the validation set
  • the result satisfies the convergence condition, obtains the target structure parameters, obtains the network parameters matching the target structure parameters, and obtains the trained multi-task network model according to the target structure parameters and the matched network parameters.
  • the search space is defined, and the search space contains various operators, including LSTM (Long Short-Term Memory, long short-term memory network), gated loop unit GRU, one-dimensional convolution, and multi-head attention (multi-head attention)
  • LSTM Long Short-Term Memory, long short-term memory network
  • GRU gated loop unit
  • one-dimensional convolution one-dimensional convolution
  • multi-head attention multi-head attention
  • the size of the core of the operator can be 1, 3, 5, etc., and the number of attention heads can be 1, 2, 4, 8, etc.
  • the multi-task network model is seen as a stack of multiple cells, and a cell is a directed graph composed of N ordered nodes, which are connected by directed edges to continuously relax the search space, and Each directed edge ( ⁇ , ⁇ ) represents a kind of operator, which is regarded as a mixture of all sub-operations, which can be realized by superposition of softmax weights.
  • the following is the softmax formula:
  • o(x) is the weight of each sub-operation that is initialized randomly. Because of the need of training, it cannot be limited whether it is a number between 0 and 1. This formula is to convert it to a number between 0-1, so that the weights of all sub-operations add up to 1.
  • the sub-operation mixing weight of directed edge (i, j) is ⁇ (i, j) .
  • the dimension is
  • the optimization method is cross-gradient descent.
  • the network parameters of w are updated once along the gradient of L train (w k-1 , ⁇ k-1 ) to w k-1 , and the network parameters are updated once along L train (w k , ⁇ k-1 ) to ⁇
  • the gradient of k-1 updates the structural parameter ⁇ of the multi-task network model once.
  • the hidden state of the multi-task network model is partially linked.
  • the hidden state vector includes 300 dimensions, and the 300 dimension is divided into 6 ordered sub-hidden state vectors, and each sub-hidden state vector includes 50 dimensions.
  • select a sub-hidden state vector that is, select 50 dimensions, perform a differential search in one step, and in the next step, select another sub-hidden state vector in sequence, and then select the sub-hidden state vector in turn and input the corresponding sub-hidden state vector
  • the network layer is trained to obtain the updated multi-task network model, and return to the step of inputting the training sub-text data corresponding to each target semantic task into the multi-task network model of the network structure to be determined step by step, until the multi-task network model is on the validation set
  • the output result meets the convergence condition, the target structure parameter is obtained, the network parameter matching the target structure parameter is obtained, and the trained multi-task network model is obtained according to the target structure parameter and the matched network parameter.
  • the above-mentioned network structure construction method for multi-task scenarios has automatic discovery of the network structure that is most suitable for the existing multi-task scenario data sets. It does not need to manually try many different models to improve the accuracy of the multi-task system, and it can effectively reduce the accuracy of the multi-task system through partial links. It can differentiate the resource consumption during the search, and make the search convergence faster and more stable. While improving the accuracy of the system, it reduces the cost of manpower and computing resources required for system development, improves efficiency and reduces system resource occupancy.
  • the searchable network search space is searched through at least one of the following sharing methods: the matrix parameter sharing of the multi-head attention in the differentiable network search space; when searching the pooling layer of the multi-task network model, based on The multiple operators of the capsule network share the parameters of the mapping network; to obtain the connection relationship between the nodes of the multi-task network model, the nodes with the same starting node are formed into a node set, and the corresponding operators of the nodes in the different node sets are performed Parameter sharing.
  • the three matrices of multi-head attention can share parameters, and the four operators based on the capsule network can share the parameters of a mapping network.
  • the operator 1->2 can be shared with 3->4; the rule that can be shared is that the nodes do not share the same starting point, that is, they can be shared, and the nodes with the same starting node form a node set, and different node sets The operators corresponding to the nodes in the node share parameters. Parameter sharing effectively reduces the resource consumption of differentiable search, and makes the search convergence faster and more stable.
  • the training sub-text data corresponding to each target semantic task is input step by step into the multi-task network model of the network structure to be determined, and the sub-prediction result corresponding to each target semantic task is obtained including:
  • the current training sub-text data corresponding to the task is segmented, and each segmentation is mapped to the corresponding vector to form a vector set; the semantic feature of the vector set is extracted through the encoder, and the sub-prediction result corresponding to the current target semantic task is obtained according to the semantic feature.
  • the target semantic task is one of the target semantic tasks.
  • a custom word segmentation algorithm can be used, and the word segmentation algorithm for different target semantic tasks can be different.
  • a custom mapping algorithm can be used.
  • the corresponding encoders can be different or the same, so that different semantic features can be extracted for different target semantic tasks, and the sub-prediction results corresponding to the current target semantic task can be obtained according to the semantic features.
  • the current training sub-text data corresponding to the current target semantic task is segmented and mapped to the corresponding vector to form a vector set, and then semantic features are extracted from the vector set through the encoder to obtain the sub-prediction corresponding to the current target semantic task
  • the diversification of word segmentation and the diversification of encoders improves the convenience of obtaining corresponding sub-prediction results for each target semantic task, and different word segmentation algorithms and encoders can be flexibly configured for different target semantic tasks.
  • the training sub-text data corresponding to each target semantic task is input step by step into the multi-task network model of the network structure to be determined, and the sub-prediction result corresponding to each target semantic task is obtained including: calculating the current target semantics The similarity between the current training sub-text data corresponding to the task and the candidate text in the database, and the similar sub-text data matching the current training sub-text data is obtained; the first vector set corresponding to the current training sub-text data is input to the first encoder Extract the semantic feature to obtain the first semantic feature, and input the second vector set corresponding to the similar sub-text data into the second encoder to extract the semantic feature to obtain the second semantic feature; according to the first semantic feature and the second semantic feature, the current target semantic task correspondence is obtained The result of the sub-prediction.
  • the candidate text in the database can be text with relatively standard expression.
  • the similar sub-text data corresponding to the training sub-text data is obtained through similarity search. Because the expression is relatively standard, subsequent extraction of semantic features is more effective.
  • the different semantic features obtained by the encoder are combined to obtain the sub-prediction result corresponding to the current target semantic task, which improves the accuracy of the sub-prediction result.
  • the first encoder can be called a premise encoder
  • the second encoder can be called a hypothesis encoder.
  • the first encoder and the second encoder can be shared. Through the sharing of encoders, resource utilization is improved and training efficiency is improved. Since one input text forms two input texts, the corresponding target semantic tasks can also include semantic tasks based on text pairs, such as question and answer sentence tasks, sentence similarity calculation tasks, and probability tasks of another sentence under the condition of one sentence Wait.
  • the sub-prediction result corresponding to the current target semantic task is obtained by combining the different semantic features obtained by the two encoders, which improves the accuracy of the sub-prediction result. It also improves the diversification of target semantic task forms.
  • the weights of the first encoder and the second encoder are shared.
  • weight sharing refers to convolution kernel parameter sharing, that is, the convolution kernel parameter of the first encoder is the same as the convolution kernel parameter of the second encoder.
  • the number of parameters is reduced through weight sharing, and the multitasking system mechanism, as well as the weight sharing of the premise encoder and hypothesis encoder, reduces the memory usage and cost during the deployment of the multitask system.
  • adjusting the network parameters of the multi-task network model according to the difference between the sub-prediction results and the corresponding training sub-label data includes: obtaining the sub-prediction results and training sub-label data corresponding to each target semantic task to obtain the corresponding training sub-label data. Sub-differences corresponding to semantic tasks; obtain the task weights corresponding to each target semantic task, weight each sub-difference according to the task weights to obtain statistical sub-differences; adjust the network parameters of the multi-task network model according to the statistical sub-differences.
  • the task weight corresponding to the target semantic task indicates the importance of the target semantic task, and the larger the task weight, the higher the importance corresponding to this task.
  • the main task is to identify the entity of the text
  • the secondary task is to identify the usage of the entity in the text
  • the task weight corresponding to the entity recognition task is greater than the task weight corresponding to the entity usage recognition.
  • the sub-differences are weighted by task weights, so that the weighting coefficients corresponding to important tasks are large, so that when the network parameters of the multi-task network model are adjusted according to the statistical sub-differences, the important tasks have a higher degree of influence when adjusting the parameters.
  • the degree of influence of different target semantic tasks on the adjustment of the network parameters of the multi-task network model can be flexibly controlled.
  • steps in the flowchart of FIG. 2 are displayed in sequence as indicated by the arrows, these steps are not necessarily performed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least part of the steps in FIG. 2 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. The execution of these sub-steps or stages The sequence is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • a network structure construction device for a multi-task scenario, including: an acquisition module 310, a network parameter adjustment module 320, and a network structure construction module 330, wherein:
  • the obtaining module 310 is configured to obtain a training set.
  • the training set includes multiple training sub-samples corresponding to different target semantic tasks.
  • the training sub-samples include training sub-text data and training sub-label data.
  • the network parameter adjustment module 320 is used to input the training sub-text data corresponding to each target semantic task step by step into the multi-task network model of the network structure to be determined, to obtain the sub-prediction result corresponding to each target semantic task, and according to the sub-prediction result and the corresponding
  • the difference of training sub-label data adjusts the network parameters of the multi-task network model until the current target network parameters corresponding to the current network structure are obtained.
  • the network structure building module 330 is used to obtain the search space corresponding to the multi-task network model, form the differentiable network search space, obtain the verification set, and adjust the multi-task network model corresponding to the current target network parameters by searching the differentiable network search space according to the verification set
  • the hidden state vector of the multi-task network model is divided into multiple ordered sub-hidden state vectors, and the sub-hidden state vector corresponding to the current search is obtained in a preset order, and the sub-hidden state Vector input corresponding network layer for training, get the updated multi-task network model, return to the network parameter adjustment module, until the output result of the multi-task network model on the verification set meets the convergence condition, obtain the target structure parameters, and obtain the matching with the target structure parameters According to the target structure parameters and the matched network parameters, the trained multi-task network model is obtained.
  • the network structure building module 330 is also used to search the differentiable network search space through at least one of the following sharing methods: the matrix parameter sharing of the multi-head attention in the differentiable network search space; the multi-task network model When searching in the pooling layer, multiple operators of the capsule network share the parameters of the mapping network; obtain the connection relationship between the nodes of the multi-task network model, and combine the nodes with the same starting node into a node set, and different node sets The operators corresponding to the nodes in the node share parameters.
  • the network parameter adjustment module 320 is also used to segment the current training sub-text data corresponding to the current target semantic task, and map each segment to a corresponding vector to form a vector set; the vector set is extracted by the encoder Semantic features, the sub-prediction results corresponding to the current target semantic task are obtained according to the semantic features, where the current target semantic task is one of the target semantic tasks.
  • the network parameter adjustment module 320 is also used to calculate the similarity between the current training sub-text data corresponding to the current target semantic task and the candidate text in the database to obtain similar sub-texts matching the current training sub-text data Data; input the first vector set corresponding to the current training sub-text data into the first encoder to extract the semantic feature to obtain the first semantic feature, and input the second vector set corresponding to the similar sub-text data into the second encoder to extract the semantic feature to obtain the second Semantic features: According to the first semantic feature and the second semantic feature, the sub-prediction result corresponding to the current target semantic task is obtained.
  • the weights of the first encoder and the second encoder are shared.
  • the network parameter adjustment module 320 is also used to obtain the sub-prediction results and training sub-label data corresponding to each target semantic task, to obtain the sub-differences corresponding to each target semantic task; to obtain the task corresponding to each target semantic task Weight, weight each sub-difference according to task weight to obtain statistical sub-difference; adjust the network parameters of the multi-task network model according to the statistical sub-difference.
  • the various modules in the device for constructing a network structure for a multi-task scenario can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 4.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile or volatile storage medium and internal memory.
  • the non-volatile or volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile or volatile storage medium.
  • the database of the computer equipment is used to store the training set.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • FIG. 4 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • the computer device may be a terminal.
  • a computer device includes a memory and one or more processors.
  • the memory stores computer-readable instructions.
  • the one or more processors perform the following steps: obtaining a training set, training The set includes multiple training sub-samples corresponding to different target semantic tasks.
  • the training sub-samples include training sub-text data and training sub-label data; the training sub-text data corresponding to each target semantic task is input step by step into the multi-task network of the network structure to be determined
  • the model obtains the sub-prediction results corresponding to each target semantic task, and adjusts the network parameters of the multi-task network model according to the difference between the sub-prediction results and the corresponding training sub-label data, until the current target network parameters corresponding to the current network structure are obtained;
  • Obtain the search space corresponding to the multi-task network model form a differentiable network search space, obtain a verification set, and adjust the structure parameters of the multi-task network model corresponding to the current target network parameters by searching the differentiable network search space according to the verification set,
  • the hidden state vector of the multi-task network model is divided into multiple ordered sub-hidden state vectors, the sub-hidden state vector corresponding to the current search is obtained in a preset order, and the sub-hidden state vector is input into the corresponding
  • the processor also implements at least one of the following steps when executing the computer-readable instructions: the matrix parameter sharing of the multi-head attention in the differentiable network search space; the search time of the pooling layer of the multi-task network model , Based on multiple operators of the capsule network, share the parameters of the mapping network; obtain the connection relationship between the nodes of the multi-task network model, and combine the nodes with the same starting node into a node set, and the corresponding operations of the nodes in different node sets To share parameters.
  • the processor further implements the following steps when executing computer-readable instructions: segmenting the current training sub-text data corresponding to the current target semantic task, mapping each segmentation to a corresponding vector to form a vector set;
  • the processor extracts semantic features from the vector set, and obtains the sub-prediction results corresponding to the current target semantic task according to the semantic features, where the current target semantic task is one of the target semantic tasks.
  • the processor further implements the following steps when executing the computer-readable instructions: calculating the similarity between the current training sub-text data corresponding to the current target semantic task and the candidate text in the database, and obtaining the similarity between the current training sub-text data and the current training sub-text data. Similar sub-text data matched by the text data; input the first vector set corresponding to the current training sub-text data into the first encoder to extract the semantic features to obtain the first semantic feature, and input the second vector set corresponding to the similar sub-text data into the second code The semantic feature is extracted by the processor to obtain the second semantic feature; the sub-prediction result corresponding to the current target semantic task is obtained according to the first semantic feature and the second semantic feature.
  • the weights of the first encoder and the second encoder are shared.
  • the processor further implements the following steps when executing the computer-readable instructions: obtain the sub-prediction results and training sub-label data corresponding to each target semantic task, obtain the sub-differences corresponding to each target semantic task; obtain each target semantic task The task weights corresponding to the semantic tasks are weighted according to the task weights to obtain statistical sub-differences; the network parameters of the multi-task network model are adjusted according to the statistical sub-differences.
  • one or more processors when the computer-readable instructions are executed by one or more processors, one or more processors are caused to perform the following steps: Obtain a training set, the training set includes Multiple training sub-samples corresponding to different target semantic tasks, the training sub-samples include training sub-text data and training sub-label data; the training sub-text data corresponding to each target semantic task is input step by step into the multi-task network model of the network structure to be determined, Obtain the sub-prediction results corresponding to each target semantic task, adjust the network parameters of the multi-task network model according to the difference between the sub-prediction results and the corresponding training sub-label data, until the current target network parameters corresponding to the current network structure are obtained;
  • the search space corresponding to the task network model forms a differentiable network search space, obtains a verification set, and adjusts the structure parameters of the multi-task network model corresponding to the current target network parameters by searching the differentiable network search space according to the verification set.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable instruction when executed by the processor, at least one of the following steps is also implemented: the matrix parameter sharing of the multi-head attention in the differentiable network search space; the search of the pooling layer of the multi-task network model When, based on the multiple operators of the capsule network, the parameters of the mapping network are shared; the connection relationship between the nodes of the multi-task network model is obtained, and the nodes with the same starting node are formed into a node set, and the nodes in different node sets correspond to Operators share parameters.
  • the following steps are also implemented: segmentation of the current training sub-text data corresponding to the current target semantic task, and map each segmentation to a corresponding vector to form a vector set;
  • the encoder extracts semantic features from the vector set, and obtains the sub-prediction results corresponding to the current target semantic task according to the semantic features, where the current target semantic task is one of the target semantic tasks.
  • the following steps are also implemented: calculating the similarity between the current training sub-text data corresponding to the current target semantic task and the candidate text in the database, and obtaining the similarity with the current training Similar sub-text data matched by the sub-text data; input the first vector set corresponding to the current training sub-text data into the first encoder to extract the semantic feature to obtain the first semantic feature, and input the second vector set corresponding to the similar sub-text data into the second
  • the encoder extracts the semantic feature to obtain the second semantic feature; obtains the sub-prediction result corresponding to the current target semantic task according to the first semantic feature and the second semantic feature.
  • the weights of the first encoder and the second encoder are shared.
  • the following steps are also implemented: obtaining the sub-prediction results and training sub-label data corresponding to each target semantic task, and obtaining the sub-differences corresponding to each target semantic task;
  • the task weight corresponding to the target semantic task is weighted according to the task weight to obtain statistical sub-differences; the network parameters of the multi-task network model are adjusted according to the statistical sub-differences.
  • This application can be applied to smart government affairs and smart security, so as to promote the construction of smart cities.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种用于多任务场景的网络结构构建方法、装置、设备和存储介质,涉及人工智能中的机器学习,包括:获取训练集(210),将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型,得到各个目标语义任务对应的子预测结果,调整多任务网络模型的网络参数,直到得到与当前网络结构对应的当前目标网络参数(220);获取多任务网络模型对应的搜索空间,获取验证集,通过搜索可微网络搜索空间调整当前目标网络参数对应的多任务网络模型的结构参数,搜索时将多任务网络模型的隐含状态向量分为多个有序的子隐含状态向量,直到多任务网络模型在验证集上的输出结果满足收敛条件,得到目标结构参数,得到已训练的多任务网络模型(230)。

Description

用于多任务场景的网络结构构建方法和装置
相关申请的交叉引用
本申请要求于2020年05月28日提交中国专利局,申请号为202010468557.3,申请名称为″用于多任务场景的网络结构构建方法和装置″的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,特别是涉及一种用于多任务场景的网络结构构建方法、装置、计算机设备和存储介质。
背景技术
机器学习(ML,Machine Learning)是人工智能的一个分支,机器学习的目的是让机器根据先验的知识进行学习,从而具有分类和判断的逻辑能力。以神经网络为代表的机器学习模型不断发展,被越来越多地应用到各个行业中。
多任务学习机制在现代的人工智能产品应用方面有很广泛的应用。发明人意识到,多任务指需要对输入针对不同的任务得到对应的识别结果,原始的解决方案是每个子任务训练一个模型,经过部署后,每个模型都要训练一次,训练耗时,预测速度慢,且由工程师们自己手动尝试不同的神经网络架构,然后根据验证集的表现确定目标架构。由于多任务场景的网络架构学习的复杂性,很难人工设计出非常好的神经网络结构。传统的模型结构自动搜索方法主要针对分类问题,无法直接应用于多任务场景的模型结构自动搜索。通过人工不断尝试的方法构建多任务场景的模型结构,复杂度高,效率低,系统资源占用率大。
发明内容
根据本申请公开的各种实施例,提供一种用于多任务场景的网络结构构建方法、装置、计算机设备和存储介质。
一种用于多任务场景的网络结构构建方法,所述方法包括:
获取训练集,所述训练集包括多个不同目标语义任务对应的训练子样本,训练子样本包括训练子文本数据和训练子标签数据;
将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型,得到各个目标语义任务对应的子预测结果,根据子预测结果与对应的训练子标签数据的差异调整所述多任务网络模型的网络参数,直到得到与当前网络结构对应的当前目标 网络参数;及
获取所述多任务网络模型对应的搜索空间,形成可微网络搜索空间,获取验证集,根据所述验证集通过搜索可微网络搜索空间调整所述当前目标网络参数对应的多任务网络模型的结构参数,搜索时将所述多任务网络模型的隐含状态向量分为多个有序的子隐含状态向量,按预设顺序获取当次搜索对应的子隐含状态向量,将子隐含状态向量输入对应的网络层进行训练,得到更新的多任务网络模型,返回将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型的步骤,直到多任务网络模型在所述验证集上的输出结果满足收敛条件,得到目标结构参数,获取与目标结构参数匹配的网络参数,根据所述目标结构参数和匹配的网络参数得到已训练的多任务网络模型。
一种用于多任务场景的网络结构构建装置,所述装置包括:
获取模块,用于获取训练集,所述训练集包括多个不同目标语义任务对应的训练子样本,训练子样本包括训练子文本数据和训练子标签数据;
网络参数调整模块,用于将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型,得到各个目标语义任务对应的子预测结果,根据子预测结果与对应的训练子标签数据的差异调整所述多任务网络模型的网络参数,直到得到与当前网络结构对应的当前目标网络参数;及
网络结构构建模块,用于获取所述多任务网络模型对应的搜索空间,形成可微网络搜索空间,获取验证集,根据所述验证集通过搜索可微网络搜索空间调整所述当前目标网络参数对应的多任务网络模型的结构参数,搜索时将所述多任务网络模型的隐含状态向量分为多个有序的子隐含状态向量,按预设顺序获取当次搜索对应的子隐含状态向量,将子隐含状态向量输入对应的网络层进行训练,得到更新的多任务网络模型,返回网络参数调整模块,直到多任务网络模型在所述验证集上的输出结果满足收敛条件,得到目标结构参数,获取与目标结构参数匹配的网络参数,根据所述目标结构参数和匹配的网络参数得到已训练的多任务网络模型。
一种计算机设备,包括存储器和一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述一个或多个处理器执行以下步骤:
获取训练集,所述训练集包括多个不同目标语义任务对应的训练子样本,训练子样本包括训练子文本数据和训练子标签数据;
将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型,得到各个目标语义任务对应的子预测结果,根据子预测结果与对应的训练子标签数据的差异调整所述多任务网络模型的网络参数,直到得到与当前网络结构对应的当前目标网络参数;及
获取所述多任务网络模型对应的搜索空间,形成可微网络搜索空间,获取验证集,根 据所述验证集通过搜索可微网络搜索空间调整所述当前目标网络参数对应的多任务网络模型的结构参数,搜索时将所述多任务网络模型的隐含状态向量分为多个有序的子隐含状态向量,按预设顺序获取当次搜索对应的子隐含状态向量,将子隐含状态向量输入对应的网络层进行训练,得到更新的多任务网络模型,返回将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型的步骤,直到多任务网络模型在所述验证集上的输出结果满足收敛条件,得到目标结构参数,获取与目标结构参数匹配的网络参数,根据所述目标结构参数和匹配的网络参数得到已训练的多任务网络模型。
一个或多个存储有计算机可读指令的计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
获取训练集,所述训练集包括多个不同目标语义任务对应的训练子样本,训练子样本包括训练子文本数据和训练子标签数据;
将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型,得到各个目标语义任务对应的子预测结果,根据子预测结果与对应的训练子标签数据的差异调整所述多任务网络模型的网络参数,直到得到与当前网络结构对应的当前目标网络参数;及
获取所述多任务网络模型对应的搜索空间,形成可微网络搜索空间,获取验证集,根据所述验证集通过搜索可微网络搜索空间调整所述当前目标网络参数对应的多任务网络模型的结构参数,搜索时将所述多任务网络模型的隐含状态向量分为多个有序的子隐含状态向量,按预设顺序获取当次搜索对应的子隐含状态向量,将子隐含状态向量输入对应的网络层进行训练,得到更新的多任务网络模型,返回将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型的步骤,直到多任务网络模型在所述验证集上的输出结果满足收敛条件,得到目标结构参数,获取与目标结构参数匹配的网络参数,根据所述目标结构参数和匹配的网络参数得到已训练的多任务网络模型。
上述用于多任务场景的网络结构构建方法、装置、计算机设备和存储介质,具有自动发现最适合已有的多任务场景数据集的网络架构,不需要人工尝试很多不同模型就能提高多任务系统的精度,通过部分链接有效降低可微分搜索时候的资源消耗,而且使得搜索收敛更快更稳定,在提升系统精度的同时降低了系统开发所需的人力和计算资源成本,提高效率和降低系统资源占用率。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为根据一个或多个实施例中用于多任务场景的网络结构构建方法的应用环境图;
图2为根据一个或多个实施例中用于多任务场景的网络结构构建方法的流程示意图;
图3为根据一个或多个实施例中用于多任务场景的网络结构构建装置的结构框图;
图4为根据一个或多个实施例中计算机设备的内部结构图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的用于多任务场景的网络结构构建方法,可以应用于如图1所示的应用环境中。图1为一个实施例中用于多任务场景的网络结构构建方法运行的应用环境图。如图1所示,该应用环境包括终端110、服务器120。终端、服务器之间通过网络进行通信,通信网络可以是无线或者有线通信网络,例如IP网络、蜂窝移动通信网络等,其中终端和服务器的个数不限。
其中,终端110可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。可以在终端110或服务器120获取训练集,训练集包括多个不同目标语义任务对应的训练子样本,训练子样本包括训练子文本数据和训练子标签数据;将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型,得到各个目标语义任务对应的子预测结果,根据子预测结果与对应的训练子标签数据的差异调整所述多任务网络模型的网络参数,直到得到与当前网络结构对应的当前目标网络参数;获取多任务网络模型对应的搜索空间,形成可微网络搜索空间,获取验证集,根据验证集通过搜索可微网络搜索空间调整当前目标网络参数对应的多任务网络模型的结构参数,搜索时将多任务网络模型的隐含状态向量分为多个有序的子隐含状态向量,按预设顺序获取当次搜索对应的子隐含状态向量,将子隐含状态向量输入对应的网络层进行训练,得到更新的多任务网络模型,返回将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型的步骤,直到多任务网络模型在验证集上的输出结果满足收敛条件,得到目标结构参数,获取与目标结构参数匹配的网络参数,根据目标结构参数和匹配的网络参数得到已训练的多任务网络模型。
在其中一个实施例中,如图2所示,提供了一种用于多任务场景的网络结构构建方法,以该方法应用于图1中的终端110或服务器120为例进行说明,包括以下步骤:
步骤210,获取训练集,训练集包括多个不同目标语义任务对应的训练子样本,训练子样本包括训练子文本数据和训练子标签数据。
其中,多个不同目标语义任务对应的训练子样本组成训练集,目标语义任务是多任务场景对应的多个不同类型的任务,如对于语义分析类型的任务包括实体识别、句子分类, 意图识别,句子对相似度等任务。其中目标语义任务的数量与待确定网络结构的多任务网络模型的目标识别结果对应,其中多任务网络模型可以是语义分析网络。如语义分析网络的目标识别结果包括输入文本的实体识别和用途识别,则目标语义任务包括实体识别任务和用途识别任务。如在接收到用户的问句时”二甲双胍怎么吃”,既要识别里面的实体”二甲双胍”,又要识别这句话的用意意图,即用户想问用法用量。
具体地,不同的目标语义任务有对应的训练子样本,以适应多任务场景,如第一目标语义任务对应第一训练子样本,第二目标语义任务对应第二训练子样本,各个训练子样本都包括训练子文本数据和训练标签数据,其中训练子标签数据是已确定对应任务结果的训练文本数据对应的任务识别结果,这个任务识别结果即作为与目标语义任务对应的训练子标签数据。
步骤220,将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型,得到各个目标语义任务对应的子预测结果,根据子预测结果与对应的训练子标签数据的差异调整多任务网络模型的网络参数,直到得到与当前网络结构对应的当前目标网络参数。
具体地,将训练样本中的训练子文本数据以目标语义任务为单位分步输入待确定网络结构的多任务网络模型,其中分步是指第一目标语义任务对应的训练子文本数据先输入,得到第一目标语义任务对应的第一子预测结果,然后接着输入第二目标语义任务对应的训练子文本数据,得到第二目标语义任务对应的第二子预测结果,直到各个目标语义任务对应的训练子文本数据依次分步输入得到对应的子预测结果。每个子预测结果都存在对应的训练子标签数据,从而计算得到各个目标语义任务对应的子差异,根据各个子差异构建损失函数,再按照最小化该损失函数的方向反向传播,调整多任务网络模型的网络参数并继续训练,直至满足训练结束条件。通过最小化训练损失,获得与多任务网络模型的结构相关的当前目标网络参数,即当前最优权重w。
步骤230,获取多任务网络模型对应的搜索空间,形成可微网络搜索空间,获取验证集,根据验证集通过搜索可微网络搜索空间调整当前目标网络参数对应的多任务网络模型的结构参数,搜索时将多任务网络模型的隐含状态向量分为多个有序的子隐含状态向量,按预设顺序获取当次搜索对应的子隐含状态向量,将子隐含状态向量输入对应的网络层进行训练,得到更新的多任务网络模型,返回将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型的步骤,直到多任务网络模型在验证集上的输出结果满足收敛条件,得到目标结构参数,获取与目标结构参数匹配的网络参数,根据目标结构参数和匹配的网络参数得到已训练的多任务网络模型。
其中,定义搜索空间,搜索空间中包含各种操作符,有LSTM(Long Short-Term Memory,长短期记忆网络),门控循环单元GRU,一维卷积、多头注意力(multi-head attention)等,其中操作符的核的大小可为1,3,5等,其中注意力头的数量可为1,2,4,8等。
具体地,多任务网络模型看成由多个单元cell堆叠而成,而一个cell是一个有向图,由N个有序节点组成,经过有向边连接构成,把搜索空间连续松弛化,而每个有向边(□,□)代表一种操作符,看成是所有子操作的混合,可以通过softmax权值叠加实现。以下为softmax公式:
Figure PCTCN2020099261-appb-000001
其中,o(x)是随机初始化的对每个子操作的权重,因为训练的需要,所以不能限定其是否是0到1之间的数。这个公式就是将其转化为0-1之间的数,这样所有子操作的权重就加起来等于1。有向边(i,j)的子操作混合权重为α (i,j)。维度为|O|即有向边(i,j)间子操作的总个数;o()表示当前子操作。更新结构参数和网络参数,学习最优的权重参数,优化目标是一个双层的Bi-level优化问题,即
Figure PCTCN2020099261-appb-000002
s.t.
Figure PCTCN2020099261-appb-000003
优化方法是交叉梯度下降,沿着L train(w k-1,α k-1)对w k-1的梯度更新一次w网络参数,沿着L train(w k,α k-1)对α k-1的梯度更新一次多任务网络模型的结构参数α。
搜索时对多任务网络模型的隐含状态进行部分链接,如隐含状态向量包括300维,将300维分为6个有序的子隐含状态向量,每个子隐含状态向量包括50维,每次调整参数时,选取一个子隐含状态向量,即选取50维,进行一步可微分搜索,下一步,按顺序选取另外一个子隐含状态向量,依次进行选取将子隐含状态向量输入对应的网络层进行训练,得到更新的多任务网络模型,返回将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型的步骤,直到多任务网络模型在验证集上的输出结果满足收敛条件,得到目标结构参数,获取与目标结构参数匹配的网络参数,根据目标结构参数和匹配的网络参数得到已训练的多任务网络模型。优化完毕后,激活最大的权重即目标结构参数所对应的操作符,去除其他操作符,得到的便是已训练的多任务网络模型。
上述用于多任务场景的网络结构构建方法,具有自动发现最适合已有的多任务场景数据集的网络架构,不需要人工尝试很多不同模型就能提高多任务系统的精度,通过部分链接有效降低可微分搜索时候的资源消耗,而且使得搜索收敛更快更稳定,在提升系统精度的同时降低了系统开发所需的人力和计算资源成本,提高效率和降低系统资源占用率。
在其中一个实施例中,搜索可微网络搜索空间通过以下共享方式中的至少一种:可微网络搜索空间中多头注意力的矩阵参数共享;多任务网络模型的池化层的搜索时,基于胶囊网络的多个操作符,共享映射网络的参数;获取多任务网络模型的节点间的连接关系,将具有同一个起始节点的节点组成节点集合,不同节点集合中的节点对应的操作符进行参数共享。
具体地,多头注意力的3个矩阵(W_Q,W_K,W_V)可以参数共享,基于胶囊网络的操 作符4个,可以共享一个映射网络的参数。比如1->2的操作符可以共享给3->4;能够共享的规则是节点之间不共有同一个起点,即可以共享,将具有同一个起始节点的节点组成节点集合,不同节点集合中的节点对应的操作符进行参数共享。参数共享有效降低可微分搜索时候的资源消耗,而且使得搜索收敛更快更稳定。
在其中一个实施例中,步骤220中将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型,得到各个目标语义任务对应的子预测结果包括:将当前目标语义任务对应的当前训练子文本数据进行分词,将各个分词映射到对应的向量,组成向量集;经过编码器对向量集提取语义特征,根据语义特征得到当前目标语义任务对应的子预测结果,其中当前目标语义任务是各个目标语义任务中的一个。
具体地,将当前目标语义任务对应的当前训练子文本数据进行分词,可采用自定义的分词算法,不同的目标语义任务的分词算法可以不同。将各个分词映射到对应的向量,可以采用自定义的映射算法。当当前目标语义任务为不同的目标语义任务时,对应的编码器可不同或相同,从而可针对不同的目标语义任务提取不同的语义特征,根据语义特征得到当前目标语义任务对应的子预测结果。
本实施例中,先将当前目标语义任务对应的当前训练子文本数据进行分词映射到对应的向量,组成向量集,再经过编码器对向量集提取语义特征,得到当前目标语义任务对应的子预测结果,分词的多样化和编码器的多样化提高了各个目标语义任务得到对应的子预测结果的便利性,可灵活针对不同的目标语义任务配置不同的分词算法和编码器。
在其中一个实施例中,步骤220中将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型,得到各个目标语义任务对应的子预测结果包括:计算当前目标语义任务对应的当前训练子文本数据与数据库中的候选文本对应的相似度,得到与当前训练子文本数据匹配的相似子文本数据;将当前训练子文本数据对应的第一向量集输入第一编码器提取语义特征得到第一语义特征,将相似子文本数据对应的第二向量集输入第二编码器提取语义特征得到第二语义特征;根据第一语义特征和第二语义特征得到当前目标语义任务对应的子预测结果。
具体地,数据库中的候选文本可以是表达相对标准的文本,通过相似度查找得到训练子文本数据对应的相似子文本数据,因为表达相对标准,便于后续提取得到语义特征更有效,通过将两个编码器得到的不同语义特征相结合得到当前目标语义任务对应的子预测结果,提高了子预测结果的准确度。其中第一编码器可以称为premise编码器,第二编码器可以称为hypothesis编码器。
当当前目标语义任务为不同的目标语义任务时,第一编码器和第二编码器可共享,通过编码器共享,提高了资源利用率,提高了训练的效率。由于一个输入文本形成了两个输入文本,对应的目标语义任务也可包括基于文本对的语义任务,如问答语句任务、句子相似度计算任务,以一个句子为条件下的另一个句子的概率任务等。
本实施例中,获取训练子文本数据匹配的相似子文本数据,通过将两个编码器得到的 不同语义特征相结合得到当前目标语义任务对应的子预测结果,提高了子预测结果的准确度,也提高了目标语义任务形式的多样化。
在其中一个实施例中,第一编码器和第二编码器的权重共享。
具体地,权重共享是指卷积核参数共享,也就是第一编码器的卷积核参数与第二编码器的卷积核参数相同。通过权重共享减少参数数量,通过多任务系统机制,以及premise编码器和hypothesis编码器的权重共享,降低多任务系统部署时的显存占用,降低成本。
在其中一个实施例中,根据子预测结果与对应的训练子标签数据的差异调整多任务网络模型的网络参数包括:获取各个目标语义任务对应的子预测结果与训练子标签数据,得到与各个目标语义任务对应的子差异;获取各个目标语义任务对应的任务权重,根据任务权重对各个子差异进行加权得到统计子差异;根据统计子差异调整多任务网络模型的网络参数。
具体地,其中目标语义任务对应的任务权重表示目标语义任务的重要程度,任务权重越大,说明此任务对应的重要度越高。如对于一个文本,其主要的任务在于识别文本的实体,次要的任务在于识别文本的中的实体的用法,则实体识别任务对应的任务权重大于实体的用法识别对应任务权重。通过任务权重对子差异进行加权,使得重要的任务对应的加权系数大,从而根据统计子差异调整多任务网络模型的网络参数时,重要的任务在调整参数时影响度更高。
本实施例中,通过对各个目标语义任务配置对应的任务权重,可以灵活控制不同的目标语义任务对多任务网络模型的网络参数调整的影响度。
应该理解的是,虽然图2的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在其中一个实施例中,如图3所示,提供了一种用于多任务场景的网络结构构建装置,包括:获取模块310、网络参数调整模块320、网络结构构建模块330,其中:
获取模块310,用于获取训练集,训练集包括多个不同目标语义任务对应的训练子样本,训练子样本包括训练子文本数据和训练子标签数据。
网络参数调整模块320,用于将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型,得到各个目标语义任务对应的子预测结果,根据子预测结果与对应的训练子标签数据的差异调整多任务网络模型的网络参数,直到得到与当前网络结构对应的当前目标网络参数。
网络结构构建模块330,用于获取多任务网络模型对应的搜索空间,形成可微网络搜索空间,获取验证集,根据验证集通过搜索可微网络搜索空间调整当前目标网络参数对应的多任务网络模型的结构参数,搜索时将多任务网络模型的隐含状态向量分为多个有序的子隐含状态向量,按预设顺序获取当次搜索对应的子隐含状态向量,将子隐含状态向量输入对应的网络层进行训练,得到更新的多任务网络模型,返回网络参数调整模块,直到多任务网络模型在验证集上的输出结果满足收敛条件,得到目标结构参数,获取与目标结构参数匹配的网络参数,根据目标结构参数和匹配的网络参数得到已训练的多任务网络模型。
在其中一个实施例中,网络结构构建模块330还用于搜索可微网络搜索空间通过以下共享方式中的至少一种:可微网络搜索空间中多头注意力的矩阵参数共享;多任务网络模型的池化层的搜索时,基于胶囊网络的多个操作符,共享映射网络的参数;获取多任务网络模型的节点间的连接关系,将具有同一个起始节点的节点组成节点集合,不同节点集合中的节点对应的操作符进行参数共享。
在其中一个实施例中,网络参数调整模块320还用于将当前目标语义任务对应的当前训练子文本数据进行分词,将各个分词映射到对应的向量,组成向量集;经过编码器对向量集提取语义特征,根据语义特征得到当前目标语义任务对应的子预测结果,其中当前目标语义任务是各个目标语义任务中的一个。
在其中一个实施例中,网络参数调整模块320还用于计算当前目标语义任务对应的当前训练子文本数据与数据库中的候选文本对应的相似度,得到与当前训练子文本数据匹配的相似子文本数据;将当前训练子文本数据对应的第一向量集输入第一编码器提取语义特征得到第一语义特征,将相似子文本数据对应的第二向量集输入第二编码器提取语义特征得到第二语义特征;根据第一语义特征和第二语义特征得到当前目标语义任务对应的子预测结果。
在其中一个实施例中,第一编码器和第二编码器的权重共享。
在其中一个实施例中,网络参数调整模块320还用于获取各个目标语义任务对应的子预测结果与训练子标签数据,得到与各个目标语义任务对应的子差异;获取各个目标语义任务对应的任务权重,根据任务权重对各个子差异进行加权得到统计子差异;根据统计子差异调整多任务网络模型的网络参数。
关于用于多任务场景的网络结构构建装置的具体限定可以参见上文中对于用于多任务场景的网络结构构建方法的限定,在此不再赘述。上述用于多任务场景的网络结构构建装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在其中一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图4所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接 口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性或易失性存储介质、内存储器。该非易失性或易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性或易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储训练集。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种用于多任务场景的网络结构构建方法。
本领域技术人员可以理解,图4中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。在一些实施例中,计算机设备可以是终端。
一种计算机设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被处理器执行时,使得一个或多个处理器执行以下步骤:获取训练集,训练集包括多个不同目标语义任务对应的训练子样本,训练子样本包括训练子文本数据和训练子标签数据;将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型,得到各个目标语义任务对应的子预测结果,根据子预测结果与对应的训练子标签数据的差异调整所述多任务网络模型的网络参数,直到得到与当前网络结构对应的当前目标网络参数;获取多任务网络模型对应的搜索空间,形成可微网络搜索空间,获取验证集,根据所述验证集通过搜索可微网络搜索空间调整所述当前目标网络参数对应的多任务网络模型的结构参数,搜索时将多任务网络模型的隐含状态向量分为多个有序的子隐含状态向量,按预设顺序获取当次搜索对应的子隐含状态向量,将子隐含状态向量输入对应的网络层进行训练,得到更新的多任务网络模型,返回将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型的步骤,直到多任务网络模型在验证集上的输出结果满足收敛条件,得到目标结构参数,获取与目标结构参数匹配的网络参数,根据目标结构参数和匹配的网络参数得到已训练的多任务网络模型。
在其中一个实施例中,处理器执行计算机可读指令时还实现以下步骤中的至少一种:可微网络搜索空间中多头注意力的矩阵参数共享;多任务网络模型的池化层的搜索时,基于胶囊网络的多个操作符,共享映射网络的参数;获取多任务网络模型的节点间的连接关系,将具有同一个起始节点的节点组成节点集合,不同节点集合中的节点对应的操作符进行参数共享。
在其中一个实施例中,处理器执行计算机可读指令时还实现以下步骤:将当前目标语义任务对应的当前训练子文本数据进行分词,将各个分词映射到对应的向量,组成向量集;经过编码器对向量集提取语义特征,根据语义特征得到当前目标语义任务对应的子预测结果,其中当前目标语义任务是各个目标语义任务中的一个。
在其中一个实施例中,处理器执行计算机可读指令时还实现以下步骤:计算当前目标语义任务对应的当前训练子文本数据与数据库中的候选文本对应的相似度,得到与所述当 前训练子文本数据匹配的相似子文本数据;将当前训练子文本数据对应的第一向量集输入第一编码器提取语义特征得到第一语义特征,将相似子文本数据对应的第二向量集输入第二编码器提取语义特征得到第二语义特征;根据第一语义特征和第二语义特征得到当前目标语义任务对应的子预测结果。
在其中一个实施例中,第一编码器和第二编码器的权重共享。
在其中一个实施例中,处理器执行计算机可读指令时还实现以下步骤:获取各个目标语义任务对应的子预测结果与训练子标签数据,得到与各个目标语义任务对应的子差异;获取各个目标语义任务对应的任务权重,根据任务权重对各个子差异进行加权得到统计子差异;根据统计子差异调整多任务网络模型的网络参数。
在一个或多个存储有计算机可读指令的计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:获取训练集,训练集包括多个不同目标语义任务对应的训练子样本,训练子样本包括训练子文本数据和训练子标签数据;将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型,得到各个目标语义任务对应的子预测结果,根据子预测结果与对应的训练子标签数据的差异调整所述多任务网络模型的网络参数,直到得到与当前网络结构对应的当前目标网络参数;获取多任务网络模型对应的搜索空间,形成可微网络搜索空间,获取验证集,根据所述验证集通过搜索可微网络搜索空间调整所述当前目标网络参数对应的多任务网络模型的结构参数,搜索时将多任务网络模型的隐含状态向量分为多个有序的子隐含状态向量,按预设顺序获取当次搜索对应的子隐含状态向量,将子隐含状态向量输入对应的网络层进行训练,得到更新的多任务网络模型,返回将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型的步骤,直到多任务网络模型在验证集上的输出结果满足收敛条件,得到目标结构参数,获取与目标结构参数匹配的网络参数,根据目标结构参数和匹配的网络参数得到已训练的多任务网络模型。
其中,该计算机可读存储介质可以是非易失性,也可以是易失性的。
在其中一个实施例中,计算机可读指令被处理器执行时还实现以下步骤中的至少一种:可微网络搜索空间中多头注意力的矩阵参数共享;多任务网络模型的池化层的搜索时,基于胶囊网络的多个操作符,共享映射网络的参数;获取多任务网络模型的节点间的连接关系,将具有同一个起始节点的节点组成节点集合,不同节点集合中的节点对应的操作符进行参数共享。
在其中一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:将当前目标语义任务对应的当前训练子文本数据进行分词,将各个分词映射到对应的向量,组成向量集;经过编码器对向量集提取语义特征,根据语义特征得到当前目标语义任务对应的子预测结果,其中当前目标语义任务是各个目标语义任务中的一个。
在其中一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:计算当前目 标语义任务对应的当前训练子文本数据与数据库中的候选文本对应的相似度,得到与所述当前训练子文本数据匹配的相似子文本数据;将当前训练子文本数据对应的第一向量集输入第一编码器提取语义特征得到第一语义特征,将相似子文本数据对应的第二向量集输入第二编码器提取语义特征得到第二语义特征;根据第一语义特征和第二语义特征得到当前目标语义任务对应的子预测结果。
在其中一个实施例中,第一编码器和第二编码器的权重共享。
在其中一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:获取各个目标语义任务对应的子预测结果与训练子标签数据,得到与各个目标语义任务对应的子差异;获取各个目标语义任务对应的任务权重,根据任务权重对各个子差异进行加权得到统计子差异;根据统计子差异调整多任务网络模型的网络参数。
本申请可应用于智慧政务、智慧安防中,从而推动智慧城市的建设。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种用于多任务场景的网络结构构建方法,包括:
    获取训练集,所述训练集包括多个不同目标语义任务对应的训练子样本,训练子样本包括训练子文本数据和训练子标签数据;
    将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型,得到各个目标语义任务对应的子预测结果,根据子预测结果与对应的训练子标签数据的差异调整所述多任务网络模型的网络参数,直到得到与当前网络结构对应的当前目标网络参数;及
    获取所述多任务网络模型对应的搜索空间,形成可微网络搜索空间,获取验证集,根据所述验证集通过搜索可微网络搜索空间调整所述当前目标网络参数对应的多任务网络模型的结构参数,搜索时将所述多任务网络模型的隐含状态向量分为多个有序的子隐含状态向量,按预设顺序获取当次搜索对应的子隐含状态向量,将子隐含状态向量输入对应的网络层进行训练,得到更新的多任务网络模型,返回将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型的步骤,直到多任务网络模型在所述验证集上的输出结果满足收敛条件,得到目标结构参数,获取与目标结构参数匹配的网络参数,根据所述目标结构参数和匹配的网络参数得到已训练的多任务网络模型。
  2. 根据权利要求1所述的方法,其中,所述搜索可微网络搜索空间通过以下共享方式中的至少一种:
    所述可微网络搜索空间中多头注意力的矩阵参数共享;
    所述多任务网络模型的池化层的搜索时,基于胶囊网络的多个操作符,共享映射网络的参数;及
    获取所述多任务网络模型的节点间的连接关系,将具有同一个起始节点的节点组成节点集合,不同节点集合中的节点对应的操作符进行参数共享。
  3. 根据权利要求1所述的方法,其中,所述将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型,得到各个目标语义任务对应的子预测结果包括:
    将当前目标语义任务对应的当前训练子文本数据进行分词,将各个分词映射到对应的向量,组成向量集;及
    经过编码器对所述向量集提取语义特征,根据语义特征得到所述当前目标语义任务对应的子预测结果,其中所述当前目标语义任务是所述各个目标语义任务中的一个。
  4. 根据权利要求1所述的方法,其中,所述将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型,得到各个目标语义任务对应的子预测结果包括:
    计算当前目标语义任务对应的当前训练子文本数据与数据库中的候选文本对应的相似度,得到与所述当前训练子文本数据匹配的相似子文本数据;
    将所述当前训练子文本数据对应的第一向量集输入第一编码器提取语义特征得到第一语义特征,将所述相似子文本数据对应的第二向量集输入第二编码器提取语义特征得到第二语义特征;及
    根据所述第一语义特征和第二语义特征得到所述当前目标语义任务对应的子预测结果。
  5. 根据权利要求4所述的方法,其中,所述第一编码器和第二编码器的权重共享。
  6. 根据权利要求1所述的方法,其中,所述根据子预测结果与对应的训练子标签数据的差异调整所述多任务网络模型的网络参数包括:
    获取各个目标语义任务对应的子预测结果与训练子标签数据,得到与各个目标语义任务对应的子差异;
    获取各个目标语义任务对应的任务权重,根据任务权重对各个子差异进行加权得到统计子差异;及
    根据所述统计子差异调整所述多任务网络模型的网络参数。
  7. 一种用于多任务场景的网络结构构建装置,包括:
    获取模块,用于获取训练集,所述训练集包括多个不同目标语义任务对应的训练子样本,训练子样本包括训练子文本数据和训练子标签数据;
    网络参数调整模块,用于将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型,得到各个目标语义任务对应的子预测结果,根据子预测结果与对应的训练子标签数据的差异调整所述多任务网络模型的网络参数,直到得到与当前网络结构对应的当前目标网络参数;及
    网络结构构建模块,用于获取所述多任务网络模型对应的搜索空间,形成可微网络搜索空间,获取验证集,根据所述验证集通过搜索可微网络搜索空间调整所述当前目标网络参数对应的多任务网络模型的结构参数,搜索时将所述多任务网络模型的隐含状态向量分为多个有序的子隐含状态向量,按预设顺序获取当次搜索对应的子隐含状态向量,将子隐含状态向量输入对应的网络层进行训练,得到更新的多任务网络模型,返回网络参数调整模块,直到多任务网络模型在所述验证集上的输出结果满足收敛条件,得到目标结构参数,获取与目标结构参数匹配的网络参数,根据所述目标结构参数和匹配的网络参数得到已训练的多任务网络模型。
  8. 根据权利要求7所述的装置,其中,所述网络结构构建模块还用于搜索可微网络搜索空间通过以下共享方式中的至少一种:
    所述可微网络搜索空间中多头注意力的矩阵参数共享;
    所述多任务网络模型的池化层的搜索时,基于胶囊网络的多个操作符,共享映射网络的参数;及
    获取所述多任务网络模型的节点间的连接关系,将具有同一个起始节点的节点组成节 点集合,不同节点集合中的节点对应的操作符进行参数共享。
  9. 一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    获取训练集,所述训练集包括多个不同目标语义任务对应的训练子样本,训练子样本包括训练子文本数据和训练子标签数据;
    将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型,得到各个目标语义任务对应的子预测结果,根据子预测结果与对应的训练子标签数据的差异调整所述多任务网络模型的网络参数,直到得到与当前网络结构对应的当前目标网络参数;及
    获取所述多任务网络模型对应的搜索空间,形成可微网络搜索空间,获取验证集,根据所述验证集通过搜索可微网络搜索空间调整所述当前目标网络参数对应的多任务网络模型的结构参数,搜索时将所述多任务网络模型的隐含状态向量分为多个有序的子隐含状态向量,按预设顺序获取当次搜索对应的子隐含状态向量,将子隐含状态向量输入对应的网络层进行训练,得到更新的多任务网络模型,返回将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型的步骤,直到多任务网络模型在所述验证集上的输出结果满足收敛条件,得到目标结构参数,获取与目标结构参数匹配的网络参数,根据所述目标结构参数和匹配的网络参数得到已训练的多任务网络模型。
  10. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还执行以下步骤中的至少一种:
    所述可微网络搜索空间中多头注意力的矩阵参数共享;
    所述多任务网络模型的池化层的搜索时,基于胶囊网络的多个操作符,共享映射网络的参数;及
    获取所述多任务网络模型的节点间的连接关系,将具有同一个起始节点的节点组成节点集合,不同节点集合中的节点对应的操作符进行参数共享。
  11. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还执行以下步骤:
    将当前目标语义任务对应的当前训练子文本数据进行分词,将各个分词映射到对应的向量,组成向量集;及
    经过编码器对所述向量集提取语义特征,根据语义特征得到所述当前目标语义任务对应的子预测结果,其中所述当前目标语义任务是所述各个目标语义任务中的一个。
  12. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还执行以下步骤:
    计算当前目标语义任务对应的当前训练子文本数据与数据库中的候选文本对应的相似度,得到与所述当前训练子文本数据匹配的相似子文本数据;
    将所述当前训练子文本数据对应的第一向量集输入第一编码器提取语义特征得到第一语义特征,将所述相似子文本数据对应的第二向量集输入第二编码器提取语义特征得到第二语义特征;及
    根据所述第一语义特征和第二语义特征得到所述当前目标语义任务对应的子预测结果。
  13. 根据权利要求12所述的计算机设备,其中,所述第一编码器和第二编码器的权重共享。
  14. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还执行以下步骤:
    获取各个目标语义任务对应的子预测结果与训练子标签数据,得到与各个目标语义任务对应的子差异;
    获取各个目标语义任务对应的任务权重,根据任务权重对各个子差异进行加权得到统计子差异;及
    根据所述统计子差异调整所述多任务网络模型的网络参数。
  15. 一个或多个存储有计算机可读指令的计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    获取训练集,所述训练集包括多个不同目标语义任务对应的训练子样本,训练子样本包括训练子文本数据和训练子标签数据;
    将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型,得到各个目标语义任务对应的子预测结果,根据子预测结果与对应的训练子标签数据的差异调整所述多任务网络模型的网络参数,直到得到与当前网络结构对应的当前目标网络参数;及
    获取所述多任务网络模型对应的搜索空间,形成可微网络搜索空间,获取验证集,根据所述验证集通过搜索可微网络搜索空间调整所述当前目标网络参数对应的多任务网络模型的结构参数,搜索时将所述多任务网络模型的隐含状态向量分为多个有序的子隐含状态向量,按预设顺序获取当次搜索对应的子隐含状态向量,将子隐含状态向量输入对应的网络层进行训练,得到更新的多任务网络模型,返回将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型的步骤,直到多任务网络模型在所述验证集上的输出结果满足收敛条件,得到目标结构参数,获取与目标结构参数匹配的网络参数,根据所述目标结构参数和匹配的网络参数得到已训练的多任务网络模型。
  16. 根据权利要求15所述的存储介质,其中,所述计算机可读指令被所述处理器执行时还执行以下步骤中的至少一种:
    所述可微网络搜索空间中多头注意力的矩阵参数共享;
    所述多任务网络模型的池化层的搜索时,基于胶囊网络的多个操作符,共享映射网络的参数;及
    获取所述多任务网络模型的节点间的连接关系,将具有同一个起始节点的节点组成节点集合,不同节点集合中的节点对应的操作符进行参数共享。
  17. 根据权利要求15所述的存储介质,其中,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    将当前目标语义任务对应的当前训练子文本数据进行分词,将各个分词映射到对应的向量,组成向量集;及
    经过编码器对所述向量集提取语义特征,根据语义特征得到所述当前目标语义任务对应的子预测结果,其中所述当前目标语义任务是所述各个目标语义任务中的一个。
  18. 根据权利要求15所述的存储介质,其中,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    计算当前目标语义任务对应的当前训练子文本数据与数据库中的候选文本对应的相似度,得到与所述当前训练子文本数据匹配的相似子文本数据;
    将所述当前训练子文本数据对应的第一向量集输入第一编码器提取语义特征得到第一语义特征,将所述相似子文本数据对应的第二向量集输入第二编码器提取语义特征得到第二语义特征;及
    根据所述第一语义特征和第二语义特征得到所述当前目标语义任务对应的子预测结果。
  19. 根据权利要求18所述的存储介质,其中,所述第一编码器和第二编码器的权重共享。
  20. 根据权利要求15所述的存储介质,其中,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    获取各个目标语义任务对应的子预测结果与训练子标签数据,得到与各个目标语义任务对应的子差异;
    获取各个目标语义任务对应的任务权重,根据任务权重对各个子差异进行加权得到统计子差异;及
    根据所述统计子差异调整所述多任务网络模型的网络参数。
PCT/CN2020/099261 2020-05-28 2020-06-30 用于多任务场景的网络结构构建方法和装置 WO2021114625A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010468557.3A CN111666763A (zh) 2020-05-28 2020-05-28 用于多任务场景的网络结构构建方法和装置
CN202010468557.3 2020-05-28

Publications (1)

Publication Number Publication Date
WO2021114625A1 true WO2021114625A1 (zh) 2021-06-17

Family

ID=72384884

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/099261 WO2021114625A1 (zh) 2020-05-28 2020-06-30 用于多任务场景的网络结构构建方法和装置

Country Status (2)

Country Link
CN (1) CN111666763A (zh)
WO (1) WO2021114625A1 (zh)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254792A (zh) * 2021-07-15 2021-08-13 腾讯科技(深圳)有限公司 训练推荐概率预测模型的方法、推荐概率预测方法及装置
CN113486672A (zh) * 2021-07-27 2021-10-08 腾讯音乐娱乐科技(深圳)有限公司 多音字消歧方法及电子设备和计算机可读存储介质
CN113609266A (zh) * 2021-07-09 2021-11-05 阿里巴巴新加坡控股有限公司 资源处理方法以及装置
CN113627153A (zh) * 2021-07-30 2021-11-09 平安普惠企业管理有限公司 处理数据的方法、装置、设备及存储介质
CN113688237A (zh) * 2021-08-10 2021-11-23 北京小米移动软件有限公司 文本分类方法、文本分类网络的训练方法及装置
CN113849314A (zh) * 2021-09-30 2021-12-28 支付宝(杭州)信息技术有限公司 一种数据处理模型部署方法及装置
CN113987324A (zh) * 2021-10-21 2022-01-28 北京达佳互联信息技术有限公司 一种数据处理方法、装置、设备及存储介质
CN114024587A (zh) * 2021-10-29 2022-02-08 北京邮电大学 基于全连接层共用的反馈网络编码器、架构及训练方法
CN114760639A (zh) * 2022-03-30 2022-07-15 深圳市联洲国际技术有限公司 资源单元分配方法、装置、设备及存储介质
CN115690544A (zh) * 2022-11-11 2023-02-03 北京百度网讯科技有限公司 多任务学习方法及装置、电子设备和介质
CN115859121A (zh) * 2023-01-29 2023-03-28 有米科技股份有限公司 文本处理模型训练方法及装置
CN116824305A (zh) * 2023-08-09 2023-09-29 中国气象服务协会 应用于云计算的生态环境监测数据处理方法及系统
CN117011821A (zh) * 2023-10-08 2023-11-07 东风悦享科技有限公司 一种基于多任务学习的自动驾驶视觉感知方法和系统
CN117035694A (zh) * 2023-10-08 2023-11-10 深圳市辰普森信息科技有限公司 Bim的系统管理方法、装置和计算机设备
CN117235119A (zh) * 2023-11-09 2023-12-15 北京谷器数据科技有限公司 一种低代码平台下多表联合查询的方法
CN117252560A (zh) * 2023-11-20 2023-12-19 深圳英之泰教育科技有限公司 政务信息化系统协助方法及其组件
CN117574179A (zh) * 2024-01-16 2024-02-20 北京趋动智能科技有限公司 多任务学习模型构建方法及装置
CN117909887A (zh) * 2024-03-19 2024-04-19 国网山东省电力公司嘉祥县供电公司 一种电网雷电多重预警方法及系统
CN117574179B (zh) * 2024-01-16 2024-05-28 北京趋动智能科技有限公司 多任务学习模型构建方法及装置

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112134304B (zh) * 2020-09-22 2022-04-01 南方电网数字电网研究院有限公司 一种基于深度学习的微电网全自动导航方法、系统与装置
CN113407806B (zh) * 2020-10-12 2024-04-19 腾讯科技(深圳)有限公司 网络结构搜索方法、装置、设备及计算机可读存储介质
CN112232445B (zh) * 2020-12-11 2021-05-11 北京世纪好未来教育科技有限公司 多标签分类任务网络的训练方法和装置
CN113590849A (zh) * 2021-01-27 2021-11-02 腾讯科技(深圳)有限公司 多媒体资源分类模型训练方法和多媒体资源推荐方法
CN112860534B (zh) * 2021-03-17 2022-10-25 上海壁仞智能科技有限公司 硬件架构性能评估和性能优化方法及装置
CN113377936B (zh) * 2021-05-25 2022-09-30 杭州搜车数据科技有限公司 智能问答方法、装置及设备
CN117668002B (zh) * 2024-02-01 2024-05-17 江西合一云数据科技股份有限公司 应用于公共信息平台的大数据决策方法、装置及设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016198749A1 (en) * 2015-06-12 2016-12-15 Turun Yliopisto Diagnostic biomarkers, clinical variables, and techniques for selecting and using them
CN109670191A (zh) * 2019-01-24 2019-04-23 语联网(武汉)信息技术有限公司 机器翻译的校准优化方法、装置与电子设备
CN109978142A (zh) * 2019-03-29 2019-07-05 腾讯科技(深圳)有限公司 神经网络模型的压缩方法和装置
CN110175671A (zh) * 2019-04-28 2019-08-27 华为技术有限公司 神经网络的构建方法、图像处理方法及装置
CN110210609A (zh) * 2019-06-12 2019-09-06 北京百度网讯科技有限公司 基于神经框架搜索的模型训练方法、装置以及终端
CN110851566A (zh) * 2019-11-04 2020-02-28 沈阳雅译网络技术有限公司 一种改进的可微分网络结构搜索的方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016198749A1 (en) * 2015-06-12 2016-12-15 Turun Yliopisto Diagnostic biomarkers, clinical variables, and techniques for selecting and using them
CN109670191A (zh) * 2019-01-24 2019-04-23 语联网(武汉)信息技术有限公司 机器翻译的校准优化方法、装置与电子设备
CN109978142A (zh) * 2019-03-29 2019-07-05 腾讯科技(深圳)有限公司 神经网络模型的压缩方法和装置
CN110175671A (zh) * 2019-04-28 2019-08-27 华为技术有限公司 神经网络的构建方法、图像处理方法及装置
CN110210609A (zh) * 2019-06-12 2019-09-06 北京百度网讯科技有限公司 基于神经框架搜索的模型训练方法、装置以及终端
CN110851566A (zh) * 2019-11-04 2020-02-28 沈阳雅译网络技术有限公司 一种改进的可微分网络结构搜索的方法

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113609266A (zh) * 2021-07-09 2021-11-05 阿里巴巴新加坡控股有限公司 资源处理方法以及装置
CN113254792A (zh) * 2021-07-15 2021-08-13 腾讯科技(深圳)有限公司 训练推荐概率预测模型的方法、推荐概率预测方法及装置
CN113254792B (zh) * 2021-07-15 2021-11-02 腾讯科技(深圳)有限公司 训练推荐概率预测模型的方法、推荐概率预测方法及装置
CN113486672A (zh) * 2021-07-27 2021-10-08 腾讯音乐娱乐科技(深圳)有限公司 多音字消歧方法及电子设备和计算机可读存储介质
CN113627153A (zh) * 2021-07-30 2021-11-09 平安普惠企业管理有限公司 处理数据的方法、装置、设备及存储介质
CN113627153B (zh) * 2021-07-30 2023-10-27 湖南提奥医疗科技有限公司 处理数据的方法、装置、设备及存储介质
CN113688237A (zh) * 2021-08-10 2021-11-23 北京小米移动软件有限公司 文本分类方法、文本分类网络的训练方法及装置
CN113688237B (zh) * 2021-08-10 2024-03-05 北京小米移动软件有限公司 文本分类方法、文本分类网络的训练方法及装置
CN113849314A (zh) * 2021-09-30 2021-12-28 支付宝(杭州)信息技术有限公司 一种数据处理模型部署方法及装置
CN113987324A (zh) * 2021-10-21 2022-01-28 北京达佳互联信息技术有限公司 一种数据处理方法、装置、设备及存储介质
CN114024587A (zh) * 2021-10-29 2022-02-08 北京邮电大学 基于全连接层共用的反馈网络编码器、架构及训练方法
CN114760639A (zh) * 2022-03-30 2022-07-15 深圳市联洲国际技术有限公司 资源单元分配方法、装置、设备及存储介质
CN115690544A (zh) * 2022-11-11 2023-02-03 北京百度网讯科技有限公司 多任务学习方法及装置、电子设备和介质
CN115690544B (zh) * 2022-11-11 2024-03-01 北京百度网讯科技有限公司 多任务学习方法及装置、电子设备和介质
CN115859121A (zh) * 2023-01-29 2023-03-28 有米科技股份有限公司 文本处理模型训练方法及装置
CN116824305A (zh) * 2023-08-09 2023-09-29 中国气象服务协会 应用于云计算的生态环境监测数据处理方法及系统
CN117035694B (zh) * 2023-10-08 2024-01-26 深圳市辰普森信息科技有限公司 Bim的系统管理方法、装置和计算机设备
CN117035694A (zh) * 2023-10-08 2023-11-10 深圳市辰普森信息科技有限公司 Bim的系统管理方法、装置和计算机设备
CN117011821A (zh) * 2023-10-08 2023-11-07 东风悦享科技有限公司 一种基于多任务学习的自动驾驶视觉感知方法和系统
CN117235119A (zh) * 2023-11-09 2023-12-15 北京谷器数据科技有限公司 一种低代码平台下多表联合查询的方法
CN117235119B (zh) * 2023-11-09 2024-01-30 北京谷器数据科技有限公司 一种低代码平台下多表联合查询的方法
CN117252560A (zh) * 2023-11-20 2023-12-19 深圳英之泰教育科技有限公司 政务信息化系统协助方法及其组件
CN117252560B (zh) * 2023-11-20 2024-03-19 深圳英之泰教育科技有限公司 政务信息化系统协助方法及其组件
CN117574179A (zh) * 2024-01-16 2024-02-20 北京趋动智能科技有限公司 多任务学习模型构建方法及装置
CN117574179B (zh) * 2024-01-16 2024-05-28 北京趋动智能科技有限公司 多任务学习模型构建方法及装置
CN117909887A (zh) * 2024-03-19 2024-04-19 国网山东省电力公司嘉祥县供电公司 一种电网雷电多重预警方法及系统
CN117909887B (zh) * 2024-03-19 2024-05-31 国网山东省电力公司嘉祥县供电公司 一种电网雷电多重预警方法及系统

Also Published As

Publication number Publication date
CN111666763A (zh) 2020-09-15

Similar Documents

Publication Publication Date Title
WO2021114625A1 (zh) 用于多任务场景的网络结构构建方法和装置
EP3711000B1 (en) Regularized neural network architecture search
CN111797893B (zh) 一种神经网络的训练方法、图像分类系统及相关设备
US9807473B2 (en) Jointly modeling embedding and translation to bridge video and language
WO2019100724A1 (zh) 训练多标签分类模型的方法和装置
WO2019100723A1 (zh) 训练多标签分类模型的方法和装置
WO2021022521A1 (zh) 数据处理的方法、训练神经网络模型的方法及设备
WO2021159714A1 (zh) 一种数据处理方法及相关设备
US9754188B2 (en) Tagging personal photos with deep networks
WO2021218517A1 (zh) 获取神经网络模型的方法、图像处理方法及装置
WO2022042123A1 (zh) 图像识别模型生成方法、装置、计算机设备和存储介质
WO2019232772A1 (en) Systems and methods for content identification
WO2021184902A1 (zh) 图像分类方法、装置、及其训练方法、装置、设备、介质
CN110598869B (zh) 基于序列模型的分类方法、装置、电子设备
CN110598210B (zh) 实体识别模型训练、实体识别方法、装置、设备及介质
CN112307048B (zh) 语义匹配模型训练方法、匹配方法、装置、设备及存储介质
WO2023051369A1 (zh) 一种神经网络的获取方法、数据处理方法以及相关设备
JP2022117452A (ja) ネットワークモチーフ解析を使用したグラフベース予測の説明
CN114492601A (zh) 资源分类模型的训练方法、装置、电子设备及存储介质
CN113128622B (zh) 基于语义-标签多粒度注意力的多标签分类方法及系统
US20200167655A1 (en) Method and apparatus for re-configuring neural network
WO2022063076A1 (zh) 对抗样本的识别方法及装置
CN113283575A (zh) 用于重构人工神经网络的处理器及其操作方法、电气设备
CN114445692B (zh) 图像识别模型构建方法、装置、计算机设备及存储介质
CN114638823B (zh) 基于注意力机制序列模型的全切片图像分类方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20898787

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20898787

Country of ref document: EP

Kind code of ref document: A1