CN110046709A

CN110046709A - A kind of multi-task learning model based on two-way LSTM

Info

Publication number: CN110046709A
Application number: CN201910326878.7A
Authority: CN
Inventors: 韩景光; 赵小诣
Original assignee: Chengdu New Hope Finance Information Co Ltd
Current assignee: Chengdu New Hope Finance Information Co Ltd
Priority date: 2019-04-22
Filing date: 2019-04-22
Publication date: 2019-07-23

Abstract

Public affairs of the present invention are related to the deep learning field of artificial intelligence field, have opened a kind of multi-task learning model based on two-way LSTM, to be completed at the same time in natural language processing part-of-speech tagging, language block identification, the name tasks such as Entity recognition.Major programme includes S1, defines single length memory network (LSTM) neural network；S2, the application method for defining two-way LSTM be using a LSTM network to input data string from left to right be sequentially input to LSTM network (L2R), while exporting result and being LSTM network (R2L) is sequentially inputted from what the right side was turned left to input data string using a LSTM network, while exporting result and beingMerge the output of L2R and R2L network as a result, i.e.S4, merge the output that output layer result obtains each subtask of word level.The present invention is applicable not only to natural language learning field, all can be used in other field.

Description

A kind of multi-task learning model based on two-way LSTM

Technical field

A kind of multi-task learning model based on two-way LSTM, belongs to the deep learning scope of artificial intelligence field.

Background technique

Deep learning is a part of the wider machine learning method indicated based on learning data.Deep learning frame Structure, such as deep neural network, depth confidence network and recurrent neural network etc. have been applied to computer vision, speech recognition, from Right Language Processing, audio identification, social networks filtering, machine translation, bioinformatics, drug design, medical image analysis etc. Field.The model result as caused by deep learning frame can compare favourably with human expert, or even be better than people in some cases Class expert.

Deep learning can be divided into two kinds as machine learning: supervised learning and unsupervised learning.In recent years, depth Habit technology obtains development at full speed with the raising that computer calculates power.It is all achieved out in fields such as information identification, recommended engines The application effect of color.Meanwhile abundant experimental results prove that deep learning model has good robustness and generalization.

Currently, traditional disaggregated model is all modeled with single model for single task role, in production environment, this is built Mould mode, which need to expend a large amount of manpowers and calculate power, to be carried out similar task to repeat modeling.For complicated problem, conventional model Resolving ideas is that challenge is decomposed into simple and mutually independent subproblem individually to model solution, then remerges knot Fruit, to obtain the result of initial challenge.And many problems cannot be decomposed into independent son one by one in production environment Problem, even if can decompose, there is also stronger correlations between each subproblem.

Summary of the invention

The problem of for above-mentioned production environment, the present invention provides a kind of multi-task learnings based on two-way LSTM Model framework solves the problems, such as that once training can only generate a model reply single task role to traditional single task model, enhance The performance of model between inter-related task, compares the prior art, and this patent provides a kind of new method realization multitasking.Together When, model of the present invention can solve the problems, such as to include being not limited to multiple word level tasks, Sentence-level task in natural language processing, Being used in mixed way for word level and Sentence-level task can also be coped with.

In order to achieve the above objectives, the present invention adopts the following technical scheme:

One, multiple word levels or multiple Sentence-level tasks are directed to

S1, to define single LSTM neural network configuration first as follows:

f_t=σ_g(W_fx_t+U_fh_t-1+b_f)

i_t=σ_g(W_ix_t+U_ih_t-1+b_i)

o_t=σ_g(W_ox_t+U_oh_t-1+b_o)

c_t=f_t·c_t-1+i_t·σ_c(W_cx_t+U_ch_t-1+b_c)

h_t=o_t·σ_h(c_t)

Wherein x_tFor the input of t moment；W is the corresponding input weight of different outputs, i.e. W_fF is exported for t moment_tIt is corresponding Input weight, W_iI is exported for t moment_tCorresponding input weight, W_oO is exported for t moment_tCorresponding input weight, W_cFor t moment Export c_tCorresponding input weight；U is the corresponding output weight of different outputs, i.e. U_fF is exported for t moment_tCorresponding output power Weight, U_iI is exported for t moment_tCorresponding output weight, U_oO is exported for t moment_tCorresponding output weight, U_cC is exported for t moment_t Corresponding output weight；B is the corresponding biasing of different outputs；σ is ReLu activation primitive.

S2, the application method for defining two-way LSTM (Bi-LSTM) are to input data string using a LSTM network from a left side That turns right is sequentially input to LSTM network (L2R), while exporting result and beingUsing a LSTM network to input data string LSTM network (R2L) is sequentially inputted from what the right side was turned left, while exporting result and beingIts concrete operations are as follows:

S2.1, setting neural network input layer, and configure initial weight；

S2.2, setting neural network hidden layer, hidden layer is set as 1 layer or 2 layers of shot and long term memory network, and ties S2.1 In initial weight be trained, the hidden layer weight after being trained；

S2.3, setting output layer, in conjunction with the weight output model result of hidden layer in S4.2；

S2.4, by input data according to from left to right sequentially input LSTM model, and the step of carrying out S2.1 to S2.3, Obtain output result

S2.5, input data is sequentially inputted into LSTM model, and the step of carrying out S2.1 to S2.3 according to what is turned left from the right side, Obtain output result

S3, the output result composition Bi-LSTM model for merging L2R and R2L network, i.e.,Use Bi- The meaning of LSTM network is that Bi-LSTM network can be from the positive and negative both direction study of input sentence to history and future Information, to be conducive to improve the predictive ability to follow-up work.

S4, the input by the output of Bi-LSTM as junior subtask, the stage can link the full connection of any number Neural network model, and the output of each subtask is passed in other subtasks.The meaning of the operation is, Bi-LSTM mind The feature selecting and feature weight assignment automated through network with the output data that other subtasks couple model entirely, passes through Training iteration achievees the purpose that automatic parameter adjusts.The operation has the feature selecting mistake manually participated in different from traditional sense Journey, model can be found that data inner link by iteration, find artificial indetectable fine feature, significantly improve model Energy.Simultaneously as input of the output of other subtasks as current subtask, and exist centainly between each subtask Task dependencies, the operation can provide stronger feature reference for each subtask, increase the concertedness between task.Separately Outside, due to the increase of feature quantity, the data noise of subtask model is also increased to a certain extent, can make arriving for training Model has stronger generalization ability.

S4, merge the output that output layer result obtains each subtask of word level, it is available by taking three subtasks as an example Following formula:

Wherein m is the number of different subtasks, and b is biasing, and softmax is softmax function.

Two, for existing simultaneously word level and Sentence-level task

S1, as multiple word levels or multiple Sentence-level tasks, define single LSTM neural network configuration first It is as follows:

f_t=σ_g(W_fx_t+U_fh_t-1+b_f)

i_t=σ_g(W_ix_t+U_ih_t-1+b_i)

o_t=σ_g(W_ox_t+U_oh_t-1+b_o)

c_t=f_t·c_t-1+i_t·σ_c(W_cx_t+U_ch_t-1+b_c)

h_t=o_t·σ_h(c_t)

Wherein x_tFor the input of t moment；W is the corresponding input weight of different outputs, i.e. W_fF is exported for t moment_tIt is corresponding Input weight, W_iI is exported for t moment_tCorresponding input weight, W_oO is exported for t moment_tCorresponding input weight, W_cIt is carved for t cuns Export c_tCorresponding input weight；U is the corresponding output weight of different outputs, i.e. U_fF is exported for t moment_tCorresponding output power Weight, U_iI is exported for t moment_tCorresponding output weight, U_oO is exported for t moment_tCorresponding output weight, U_cC is exported for t moment_t Corresponding output weight；B is the corresponding biasing of different outputs；σ is ReLu activation primitive.

S2.1, setting neural network input layer simultaneously configure initial weight；

S2.2, setting neural network hidden layer, hidden layer is set as 1 layer or 2 layers of shot and long term memory network, and combines Initial weight in S2.1 is trained, the hidden layer weight after being trained；

S2.3, setting output layer, in conjunction with the weight output model result of hidden layer in S2.2；

S3, the output result composition Bi-LSTM model for merging L2R and R2L network, i.e.,

S5, the input for constructing each subtask of Sentence-level are as follows:

x_t=H_t+y_m+y_m+1+y_m+2

S6, the input with the input constructed in S5 as next stage task, repeating S1 to S4 can be obtained next stage task Output.

The present invention because using above technical scheme therefore have it is following the utility model has the advantages that

One, this patent requires to have used two LSTM models simultaneously in preposition model, one of LSTM model identification The forward sequence (inputting from left to right) of input data, another LSTM model identify the reverse order of input data (from dextrad Left input), which can allow neural network that can not only adequately learn the contextual information to historical information, while also can Learn the contextual information to Future Information.

Two, the model combined strategy that similar Boosting has been used in preposition model, by two input datas not Tongfang To LSTM neural network be combined, the input results as mid-module.

Three, this patent equally has originality in the internal structure of LSTM network, and this patent is in order to simplify LSTM network Structure is all made of ReLu activation primitive as activation primitive on activation primitive.Meanwhile the operation can reach in calculating speed To the promotion of calculating speed.

Four, model described in this patent has stronger expansion, specific manifestation are as follows:

4.1, when model structure is two-way LSTM model+multi task model, model can be completed at the same time multi task model Corresponding multiple word level tasks.

4.2, when model structure is two-way LSTM model+multi task model+two-way LSTM model+multi task model, mould Type can be completed at the same time word level task and Sentence-level task.

4.3, has the shared characteristic of weight between multi task model, i.e., the output weight of first sub- task model is with one Certainty ratio shares to second sub- task model, and so on.Meanwhile the model output of first subtask is used as second son Task model enters modular character.The operation provides the data characteristics of more various dimensions for subsequent child task model, which can be with The boosting being analogous in integrated study improves the predictive ability of model entirety, simultaneously because subtask model is specific Task has a different, which can be considered as the input of higher level subtask model to a certain extent under noise data is Grade subtask model provides data noise, the generalization ability of enhancer task model.

4.5, this patent two, for existing simultaneously word level and the meaning of S6 in Sentence-level task is:

4.5.1, as the defeated of the Bi-LSTM of next multitask module after the output of higher level's multitask module being integrated Enter.The stage, link of the Bi-LSTM as two multitask modules re-start feature sampling to input data, and will sampling As a result it is passed in each subtask model of next multitask module.The meaning of the operation is that Bi-LSTM neural network is to higher level The output data that the Bi-LSTM model row output of multitask module couples model with other subtasks entirely re-starts automation Feature selecting and feature weight assignment achieve the purpose that automatic parameter adjusts by training iteration.The operation is anticipated different from tradition Justice has the feature selection process manually participated in, and model can be found that data inner link by iteration, finds and manually is not easy to send out Existing fine feature, significantly improves model performance.

4.5.2, since junior's multitask module equally uses Bi-LSTM model as the connection of two multitask modules, Subsequent multitask module can be made with reference also to the history of higher level's multitask module and following information, can dynamically adjust junior The weight configuration of each model in multitask module.

4.5.3, the operation is also different from conventional multilayer single task Bi-LSTM model structure, due to higher level's multitask module Each subtask input of the output as junior's multitask module, and because of there are certain tasks between each subtask Correlation, the operation can provide stronger feature reference for junior's multitask module, increase the concertedness between subtask.Separately Outside, due to the increase of feature quantity, also increase the data noise of junior's multi task model to a certain extent, arriving for training can be made Model have stronger generalization ability.

4.5.4, compared with this patent model, conventional double single task Bi-LSTM model lacks intermediate each subtask mould Block, thus will lead to the second level Bi-LSTM model of conventional double single task Bi-LSTM to the output of higher level's Bi-LSTM model into Row excessively sampling (because there are strong correlations with prediction result for the output of higher level Bi-LSTM model), so as to cause traditional double Layer single task Bi-LSTM model is poor to the Generalization Capability of task.And this patent model is shown by constructing each subtask module Ground introduces the input feature vector to the unobvious relevant data of prediction result as second level Bi-LSTM model, can significantly avoid The problem of appeared in conventional double single task Bi-LSTM, improve the generalization ability of model.

4.5, comparing patent document, " CN109375776A- acts intention assessment based on the EEG signals of multitask RNN model Method " uses model only as more classification tasks in the multitask stage, and this patent requires model to can be used not only for more classification times It is engaged in (word level task), while can be used on overstepping one's bounds generic task, such as the emotion recognition of text, text output prediction, spelling is entangled The tasks such as just.

Detailed description of the invention

Fig. 1 is LSTM Construction of A Model and internal structure chart used in the present invention；

Fig. 2 is word level/Sentence-level model structure of multitask in the present invention；

Fig. 3 is the word level and Sentence-level model structure of multitask in the present invention.

Specific embodiment

Model is specifically described below in conjunction with attached drawing:

The present invention provides a kind of multi-task learning models based on two-way LSTM, it is characterised in that following three points:

S1.1, a unified multi-task learning model is proposed, wherein including minimal number of RNN (circulation nerve net Network) (two layers/mono- layer) of the layer natural language processing task for word level and Sentence-level.

S1.2, the model, which can be used for multiple word level tasks (such as POS, Chunk, NER), can be used for multiple Sentence-levels Task (such as sentiment analysis).It is also possible to learn the task of both types together.

S1.3, the number of plies for keeping less RNN/LSTM neural network change output layer only to accelerate training speed.

The neural network number of plies that model uses in S1.1 is less, to cope with different types of natural language processing task. It can accelerate the speed of model training using the less neural network number of plies.

Model is different from a model in traditional single task model and is only capable of completing a task, patent requirements 1 in S1.2 The model can carry out unified study to the task of word level in natural language processing or Sentence-level simultaneously.Learn due to unified Task have correlation, some factors can be shared between multiple tasks and (including are not limited into modular character, the model of neural network Parameter etc.), to improve the effect and Generalization Capability of each task in model.

Embodiment

One, multiple word levels or multiple Sentence-level tasks are directed to

S1, as shown in Figure 1, to define single LSTM neural network configuration first as follows:

f_t=σ_g(W_fx_t+U_fh_t-1+b_f)

i_t=σ_g(W_ix_t+U_ih_t-1+b_i)

o_t=σ_g(W_ox_t+U_oh_t-1+b_o)

c_t=f_t·c_t-1+i_t·σ_c(W_cx_t+U_ch_t-1+b_c)

h_t=o_t·σ_h(c_t)

S2, the application method for defining two-way LSTM (Bi-LSTM) are to input data string using a LSTM network from a left side That turns right is sequentially input to LSTM network (L2R), while exporting result and beingUsing a LSTM network to input data string LSTM network (R2L) is sequentially inputted from what the right side was turned left, while exporting result and being

S3, merge the output of L2R and R2L network as a result, i.e.The part as shown in "+" in Fig. 2；

Two, for existing simultaneously word level and Sentence-level task

S1, as multiple word levels or multiple Sentence-level tasks, define single LSTM neural network structure first (as shown in Figure 1) is as follows:

f_t=σ_g(W_fx_t+U_fh_t-1+b_f)

i_t=σ_g(W_ix_t+U_ih_t-1+b_i)

o_t=σ_g(W_ox_t+U_oh_t-1+b_o)

c_t=f_t·c_t-1+i_t·σ_c(W_cx_t+U_ch_t-1+b_c)

h_t=o_t·σ_h(c_t)

S3, merge the output of L2R and R2L network as a result, i.e.The part as shown in the "+" of the lower part Fig. 3；

S4, the output for merging each subtask that output layer result obtains word level are available by taking three subtasks as an example Following formula:

S5, the part as shown in the "+" of the top Fig. 3, the input for constructing each subtask of Sentence-level are as follows:

x′_t=H_t+y_m+y_m+1+y_m+2

Claims

1. a kind of multi-task learning model based on two-way LSTM, which comprises the following steps:

S1, single LSTM neural network is defined；

S2, the application method for defining two-way LSTM be using a LSTM network to input data string sequentially inputting from left to right To LSTM network L2R, while exporting result and beingThe sequence turned left to input data string from the right side using a LSTM network is defeated Enter LSTM network R2L, while exporting result and being

S3, merge the output of LSTM network L2R and LSTM network R2L network as a result, i.e.

S4, merge the output that output layer result obtains each subtask of word level, it is available as follows by taking three subtasks as an example Formula:

Wherein m is the number of different subtasks, and b is biasing, and softmax is softmax function；

S5, the input for constructing each subtask of Sentence-level are as follows:

x′_t=H_t+y_m+y_m+1+y_m+2

S6, the input with the input constructed in S5 as next stage task, repeating S1 to S4 can be obtained the defeated of next stage task Out；

For multiple word levels or multiple Sentence-level tasks, step S1-S4 is executed, it is in charge of a grade with sentence for word level is existed simultaneously Business executes step S1-S6.

2. a kind of multi-task learning model based on two-way LSTM according to patent requirements 1, which is characterized in that single LSTM Neural network configuration is as follows:

f_t=σ_g(W_fx_t+U_fh_t-1+b_f)

i_t=σ_g(W_ix_t+U_ih_t-1+b_i)

o_t=σ_g(W_ox_t+U_oh_t-₁+b_o)

c_t=f_t·c_t-1+i_t·σ_c(W_cx_t+U_ch_t-1+b_c)

h_t=o_t·σ_h(c_t)

Wherein x_tFor the input of t moment；W is the corresponding input weight of different outputs, i.e. W_fF is exported for t moment_tCorresponding input Weight, W_iI is exported for t moment_tCorresponding input weight, W_oO is exported for t moment_tCorresponding input weight, W_cFor t moment output c_tCorresponding input weight；U is the corresponding output weight of different outputs, i.e. U_fF is exported for t moment_tCorresponding output weight, U_i I is exported for t moment_tCorresponding output weight, U_oO is exported for t moment_tCorresponding output weight, U_cC is exported for t moment_tIt is corresponding Export weight；B is the corresponding biasing of different outputs；σ is ReLu activation primitive.

3. a kind of multi-task learning model based on two-way LSTM according to patent requirements 1, which is characterized in that step S2 packet Include following steps:

S2.1, setting neural network input layer, and configure initial weight；

S2.2, setting neural network hidden layer, hidden layer is set as 1 layer or 2 layers of shot and long term memory network, and combines in S2.1 Initial weight be trained, the hidden layer weight after being trained；

S2.4, by input data according to from left to right sequentially input LSTM model, and the step of carrying out S2.1 to S2.3, obtain Export result

S2.5, input data is sequentially inputted into LSTM model, and the step of carrying out S2.1 to S2.3 according to what is turned left from the right side, obtained Export result