CN112395876A - Knowledge distillation and multitask learning-based chapter relationship identification method and device - Google Patents

Knowledge distillation and multitask learning-based chapter relationship identification method and device Download PDF

Info

Publication number
CN112395876A
CN112395876A CN202110078740.7A CN202110078740A CN112395876A CN 112395876 A CN112395876 A CN 112395876A CN 202110078740 A CN202110078740 A CN 202110078740A CN 112395876 A CN112395876 A CN 112395876A
Authority
CN
China
Prior art keywords
model
cost function
classification
discourse relation
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110078740.7A
Other languages
Chinese (zh)
Other versions
CN112395876B (en
Inventor
邬昌兴
谢子若
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Jiaotong University
Original Assignee
East China Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Jiaotong University filed Critical East China Jiaotong University
Priority to CN202110078740.7A priority Critical patent/CN112395876B/en
Publication of CN112395876A publication Critical patent/CN112395876A/en
Application granted granted Critical
Publication of CN112395876B publication Critical patent/CN112395876B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a discourse relation recognition method and device based on knowledge distillation and multi-task learning, and on one hand, knowledge is shared between a connecting word classification auxiliary task and an implicit discourse relation recognition main task based on a parameter sharing mode; on the other hand, the knowledge in the teacher model enhanced by the connecting words is migrated from the characteristic layer and the classification layer to the corresponding implicit discourse relation identification model based on the knowledge distillation technology; the recognition performance of the student model is improved by fully utilizing the information of the connecting words inserted during the corpus labeling.

Description

Knowledge distillation and multitask learning-based chapter relationship identification method and device
Technical Field
The invention relates to the technical field of computer intelligent analysis and processing, in particular to a discourse relation identification method and device based on knowledge distillation and multitask learning.
Background
Discourse generally refers to a whole language unit composed of a series of structural connection, semantically coherent language units (sentences or clauses) and according to a certain semantic relationship or hierarchical structure. Semantic relationships between sentences or clauses are often referred to as discourse relationships, e.g., causal relationships, turning relationships, etc. The chapter relationship identification refers to automatically judging the semantic relationship between two discourse elements (sentences or clauses), and is one of the core subtasks of chapter structure analysis and the performance bottleneck of the chapter structure analysis. Therefore, the improvement of the identification performance of discourse relations not only can promote the development of discourse structure analysis, but also is beneficial to a plurality of downstream natural language processing tasks. Such as machine translation, sentiment analysis, question and answer systems, and text summaries, etc.
Where discourse conjunctions (e.g., because, but, etc.) are one of the most important features in discourse relation recognition. When two arguments are connected by discourse connecting words, the explicit discourse relation recognition can achieve more than 90% of classification accuracy by using the connecting words as features. On the contrary, when discourse connection words are omitted between two arguments, implicit discourse relation identification needs to deduce the relation between the two arguments according to the semantics of the two arguments, and the corresponding accuracy is only about 60% at present. For example, as shown in fig. 1, the connection word "so" is omitted between two arguments of the implicit discourse relation example, and it is very difficult to deduce the semantic "causal relation" between the two arguments based on the texts "water accumulation" and "basketball not playing". In fact, even corpus annotators often utilize connective information to assist in the annotation of implicit discourse relationships. For example, when labeling The currently largest Bingzhou chapter Tree Bank (PDTB), The labeling personnel is also required to insert an appropriate connection word between two arguments of The implicit chapter relationship instance first, and then integrate information of both arguments and The inserted connection word to judge The chapter relationship of The instance. That is, discourse corpus annotators often use (inserted) conjunctive word information to assist in the annotation of implicit discourse relationships.
From the above analysis, it can be seen that: on one hand, a huge performance gap (90% to 60%) exists between explicit discourse relation identification based on the connection words and implicit discourse relation identification based on the argument semantics; on the other hand, the labeling process of the corpus also illustrates that the connective word information is helpful for implicit discourse relation identification. Therefore, some researchers have attempted to utilize conjunctive word information in an implicit discourse relation recognition model to improve the performance of recognition. At present, researchers use a method based on counterstudy to help implicit discourse relation identification by using connection word information inserted during corpus tagging.
However, the existing method based on counterstudy does not fully utilize the information of the connecting words, only stays on the feature extraction layer to transfer knowledge, and the recognition performance is not ideal.
Disclosure of Invention
In view of the above situation, there is a need to solve the problems that the conventional counterlearning-based method only remains in the feature extraction layer migration knowledge, and the recognition performance is not ideal.
The embodiment of the invention provides a discourse relation identification method based on knowledge distillation and multitask learning, wherein the method comprises the following steps:
taking an implicit discourse relation instance labeled with the connection words and implicit discourse relation categories as a training instance;
constructing a teacher model reinforced by connecting words based on a bidirectional attention mechanism classification model, and carrying out iterative minimization processing on a cost function corresponding to the teacher model reinforced by the connecting words by taking the connecting words as additional input until convergence to obtain a trained teacher model;
constructing a multi-task learning student model based on the two-way attention mechanism classification model, introducing connection word classification as an auxiliary task to determine a cost function based on multi-task learning, calculating the characteristics and the prediction result of a training example by using the trained teacher model to determine a cost function based on knowledge distillation, and then determining a total cost function of the student model;
and iteratively minimizing the total cost function of the student model until convergence so as to output the trained student model, and further identifying the implicit discourse relation of the test case.
The invention provides a discourse relation identification method based on knowledge distillation and multi-task learning, which takes an implicit discourse relation example marked with connection words and categories as a training example and aims to fully utilize information of the connection words inserted during corpus marking; firstly, constructing a teacher model reinforced by connecting words based on a bidirectional attention mechanism classification model, and iteratively minimizing a cost function until convergence by using the connecting words as additional input to obtain a trained teacher model; and then training the constructed multi-task learning student model, constructing a total cost function based on a multi-task learning and knowledge distillation method, and carrying out minimum iteration processing on the total cost function until convergence, thereby outputting the well-trained multi-task learning student model. On one hand, the discourse relation identification method based on knowledge distillation and multi-task learning, provided by the invention, shares knowledge between a connection word classification auxiliary task and an implicit discourse relation identification main task based on a parameter sharing mode (shared characteristic extraction layer); on the other hand, the knowledge in the teacher model enhanced by the connecting words is migrated from the feature extraction layer and the classification layer to the corresponding implicit discourse relation recognition model (multitask learning student model) based on the knowledge distillation technology; the recognition performance of the student model is improved by fully utilizing the information of the connecting words inserted during the corpus labeling. The method provided by the invention obtains better identification performance on the first-level and second-level implicit discourse relations of the common PDTB data set than the similar method.
The discourse relation identification method based on knowledge distillation and multitask learning is characterized in that in the training example, the implicit discourse relation example marked with the connection words and the implicit discourse relation category is represented as
Figure 172834DEST_PATH_IMAGE001
Wherein the content of the first and second substances,
Figure 39158DEST_PATH_IMAGE002
two arguments representing the implicit discourse relation training instance,
Figure 768080DEST_PATH_IMAGE003
a conjunction that indicates a label is used,
Figure 378053DEST_PATH_IMAGE004
and representing the implicit discourse relation category of the label.
The knowledge-based distillation and multitaskingThe discourse relation identification method comprises the steps that in the teacher model strengthened by the connecting words, the input is
Figure 610451DEST_PATH_IMAGE005
The corresponding cost function is expressed as:
Figure 647677DEST_PATH_IMAGE006
wherein the content of the first and second substances,
Figure 863895DEST_PATH_IMAGE007
are the parameters of the teacher model and are,
Figure 746400DEST_PATH_IMAGE008
implicit discourse relation classification for annotations
Figure 630043DEST_PATH_IMAGE004
The corresponding one-hot code is coded,
Figure 572591DEST_PATH_IMAGE009
indicating the expected value of the prediction result with respect to the label category,
Figure 276105DEST_PATH_IMAGE010
representing the prediction results obtained after the classification layer of the teacher model strengthened by the connecting words,
Figure 227880DEST_PATH_IMAGE011
is a training example set.
The chapter relationship identification method based on knowledge distillation and multitask learning is characterized in that in the multitask learning student model, the total cost function of the student model is expressed as:
Figure 434871DEST_PATH_IMAGE012
wherein the content of the first and second substances,
Figure 813899DEST_PATH_IMAGE013
for the total cost function of the student model,
Figure 739130DEST_PATH_IMAGE014
are the parameters of the student model and are,
Figure 996061DEST_PATH_IMAGE015
the weight coefficients are respectively a cost function based on multitask learning and a cost function based on knowledge distillation;
the cost function based on the multi-task learning comprises two parts:
Figure 588717DEST_PATH_IMAGE016
for cross-entropy cost functions identified corresponding to implicit discourse relations,
Figure 607488DEST_PATH_IMAGE017
is a cross entropy cost function corresponding to the connected word classification; the cost function of knowledge-based distillation comprises two parts:
Figure 285594DEST_PATH_IMAGE018
as a cost function corresponding to the distillation of the knowledge of the feature extraction layer,
Figure 313593DEST_PATH_IMAGE019
as a cost function corresponding to the distillation of the knowledge of the classification layer.
The chapter relationship identification method based on knowledge distillation and multitask learning is characterized in that in the multitask learning student model, the input is
Figure 26334DEST_PATH_IMAGE020
The cross-entropy cost function corresponding to implicit discourse relation identification is expressed as:
Figure 747166DEST_PATH_IMAGE021
wherein the content of the first and second substances,
Figure 381409DEST_PATH_IMAGE022
are the parameters of the student model and are,
Figure 213099DEST_PATH_IMAGE008
implicit discourse relation classification for annotations
Figure 514767DEST_PATH_IMAGE004
The corresponding one-hot code is coded,
Figure 406500DEST_PATH_IMAGE009
indicating the expected value of the prediction result with respect to the label category,
Figure 528040DEST_PATH_IMAGE023
representing the prediction result obtained after the student model is classified into the layer 1 and corresponding to the implicit discourse relation identification,
Figure 429000DEST_PATH_IMAGE011
is a training example set.
The chapter relationship identification method based on knowledge distillation and multitask learning is characterized in that a cross entropy cost function corresponding to the connection word classification in the multitask learning student model is represented as follows:
Figure 850754DEST_PATH_IMAGE024
wherein the content of the first and second substances,
Figure 647808DEST_PATH_IMAGE022
are the parameters of the student model and are,
Figure 522224DEST_PATH_IMAGE025
for marked conjunctions
Figure 430137DEST_PATH_IMAGE003
The corresponding one-hot code is coded,
Figure 971976DEST_PATH_IMAGE026
indicating the expected value of the prediction result with respect to the annotation link,
Figure 205512DEST_PATH_IMAGE027
represents the prediction result corresponding to the connection word classification obtained after the student model classification layer 2,
Figure 301644DEST_PATH_IMAGE011
is a training example set.
The chapter relationship identification method based on knowledge distillation and multitask learning is characterized in that a cost function corresponding to feature extraction layer knowledge distillation in the multitask learning student model is represented as follows:
Figure 777362DEST_PATH_IMAGE028
wherein the content of the first and second substances,
Figure 642550DEST_PATH_IMAGE029
which represents the mean-square error of the signal,
Figure 46986DEST_PATH_IMAGE030
representing the characteristics obtained after the teacher model characteristic extraction layer strengthened by the connecting words,
Figure 630415DEST_PATH_IMAGE031
representing features obtained after passing through a feature extraction layer of the multi-task learning student model,
Figure 676868DEST_PATH_IMAGE011
is a training example set.
The chapter relationship identification method based on knowledge distillation and multitask learning is characterized in that a cost function corresponding to classification layer knowledge distillation in the multitask learning student model is represented as follows:
Figure 662142DEST_PATH_IMAGE032
wherein the content of the first and second substances,
Figure 971900DEST_PATH_IMAGE033
indicating the KL-distance between the two probability distributions,
Figure 308204DEST_PATH_IMAGE010
representing the prediction result obtained after the classification layer of the teacher model strengthened by the connecting words,
Figure 361610DEST_PATH_IMAGE034
and the prediction result obtained after the multi-task learning student model classification layer 1 is represented.
The knowledge distillation and multitask learning-based discourse relation identification method comprises the steps that the bidirectional attention mechanism classification model comprises an encoding layer, an interaction layer, an aggregation layer and a classification layer, wherein the encoding layer is used for learning the expression of words in arguments in context, and the encoding layer is expressed as follows:
Figure 466969DEST_PATH_IMAGE035
wherein the content of the first and second substances,
Figure 478788DEST_PATH_IMAGE036
are respectively the first in argument 1
Figure 36808DEST_PATH_IMAGE037
A word vector of words and its representation in context,
Figure 893906DEST_PATH_IMAGE038
are respectively the first in argument 2
Figure 588192DEST_PATH_IMAGE039
Word vectors of words and their representation in context,
Figure 770912DEST_PATH_IMAGE040
and
Figure 81807DEST_PATH_IMAGE041
the number of words in two arguments respectively,
Figure 477017DEST_PATH_IMAGE042
both are bidirectional long-time memory networks.
The invention also provides a chapter relationship recognition device based on knowledge distillation and multitask learning, wherein the device comprises:
the training input module is used for taking an implicit discourse relation example marked with the connection words and the implicit discourse relation category as a training example;
the first construction module is used for constructing a teacher model reinforced by connecting words based on a bidirectional attention mechanism classification model, taking the connecting words as additional input, and performing iterative minimization processing on a cost function corresponding to the teacher model reinforced by the connecting words until convergence to obtain a trained teacher model;
the second construction module is used for constructing a multi-task learning student model based on the two-way attention mechanism classification model, introducing connection word classification as an auxiliary task to determine a cost function based on multi-task learning, calculating the characteristics and the prediction result of a training example by using the trained teacher model to determine a cost function based on knowledge distillation, and then determining a total cost function of the student model;
and the training output module is used for iteratively minimizing the total cost function of the student model until convergence so as to output the trained student model and further identify the implicit discourse relation of the test case.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
FIG. 1 is a diagram of an example of implicit discourse relation labeled with conjunctions and categories of implicit discourse relation;
FIG. 2 is a flow chart of the chapter relationship identification method based on knowledge distillation and multitask learning according to the present invention;
FIG. 3 is a schematic diagram of the concept of the chapter relationship identification method based on knowledge distillation and multitask learning according to the present invention;
FIG. 4 is a schematic diagram of a classification model based on a two-way attention mechanism;
fig. 5 is a schematic structural diagram of the chapter relationship recognition device based on knowledge distillation and multitask learning according to the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
These and other aspects of embodiments of the invention will be apparent with reference to the following description and attached drawings. In the description and drawings, particular embodiments of the invention have been disclosed in detail as being indicative of some of the ways in which the principles of the embodiments of the invention may be practiced, but it is understood that the scope of the embodiments of the invention is not limited correspondingly. On the contrary, the embodiments of the invention include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.
The existing method based on counterstudy is not sufficient in utilization of the information of the connecting words, only stays in the feature extraction layer to transfer knowledge, and the recognition performance is not ideal.
In order to solve the technical problem, the present invention provides a chapter relationship identification method based on knowledge distillation and multitask learning, referring to fig. 1 to 3, the method includes the following steps:
s101, taking the implicit discourse relation example marked with the connection words and the implicit discourse relation category as a training example.
Specifically, any implicit discourse relation training instance labeled with conjunctions and relation categories in the corpus can be represented as
Figure 556968DEST_PATH_IMAGE001
. Wherein the content of the first and second substances,
Figure 379431DEST_PATH_IMAGE002
two arguments representing the implicit discourse relation training instance,
Figure 912043DEST_PATH_IMAGE003
the connecting words inserted during the labeling, namely the real connecting word marks,
Figure 110943DEST_PATH_IMAGE004
and representing the annotated implicit discourse relation category, namely the real category label.
And S102, constructing a teacher model reinforced by connecting words based on the two-way attention mechanism classification model, taking the connecting words as additional input, and performing iterative minimization processing on a cost function corresponding to the teacher model reinforced by the connecting words until convergence to obtain a trained teacher model.
It should be noted that the teacher model is an implicit discourse relation identification model reinforced by connecting words and uses argument
Figure 45401DEST_PATH_IMAGE043
And conjunctions inserted at the time of annotation
Figure 71388DEST_PATH_IMAGE003
Is an input. The characteristics of the teacher model obtained after passing through the characteristic extraction layer are expressed as
Figure 91297DEST_PATH_IMAGE044
And the prediction result of the teacher model obtained after the classification layer is expressed as
Figure 828309DEST_PATH_IMAGE045
When a teacher model is trained, a teacher model cost function (cross-entropy classification cost function) is minimized on training corpora. Wherein the teacher model cost function is represented as:
Figure 617273DEST_PATH_IMAGE046
wherein the content of the first and second substances,
Figure 312697DEST_PATH_IMAGE007
are the parameters of the teacher model and are,
Figure 554322DEST_PATH_IMAGE004
for the annotated implicit discourse relation category,
Figure 829446DEST_PATH_IMAGE008
implicit discourse relation classification for annotations
Figure 4075DEST_PATH_IMAGE004
Corresponding One-hot Encoding (One-hot Encoding),
Figure 339242DEST_PATH_IMAGE003
a conjunction that indicates a label is used,
Figure 333742DEST_PATH_IMAGE009
indicating the expected value of the prediction result with respect to the label category,
Figure 412557DEST_PATH_IMAGE010
representing the prediction results obtained after the classification layer of the teacher model strengthened by the connecting words,
Figure 176114DEST_PATH_IMAGE011
is a training example set.
It should be added that the teacher model reinforced by the connection words simulates the process of human labeling the implicit discourse relation. In inserting conjunctions
Figure 478919DEST_PATH_IMAGE003
With the assistance of (2), the recognition performance is far higher than that of only argument
Figure 429557DEST_PATH_IMAGE043
The input multi-task learning student model (for example, the accuracy rate on the first-level implicit discourse relation classification task of the PDTB corpus can reach more than 85%), which fully shows that the teacher model with strengthened connecting words can well fuse the information of the connecting words inserted during corpus labeling.
S103, constructing a multi-task learning student model based on the two-way attention mechanism classification model, introducing connection word classification as an auxiliary task to determine a cost function based on multi-task learning, calculating the characteristics and the prediction result of a training example by using the trained teacher model to determine a cost function based on knowledge distillation, and then determining a total cost function of the student model.
The multi-task learning student model is a chapter relation identification model based on multi-task learning. With conjunctive word classification as an auxiliary task, i.e. giving implicit discourse relation examples
Figure 577642DEST_PATH_IMAGE002
Predicting a conjunction word suitable for connecting two arguments; and taking implicit discourse relation identification as a main task. Models of two related tasks (implicit discourse relation identification task and connection word classification task) share a characteristic extraction layer, and the respective classification layers are used. Specifically, referring to fig. 3, the classification layer 1 is used for the implicit discourse relation identification task, and the classification layer 2 is used for the conjunctive word classification task. Through the shared feature extraction layer, the models of two related tasks can exchange information, so that the effect of mutual promotion is achieved. Multi-task learning student model only using argument
Figure 930126DEST_PATH_IMAGE047
As input, the student model features obtained after passing through the shared feature extraction layer are expressed as
Figure 138253DEST_PATH_IMAGE048
The prediction result of the multi-task learning student model obtained after the classification layer 1 corresponding to the implicit discourse relation identification is represented as
Figure 841767DEST_PATH_IMAGE023
The prediction result of the multi-task learning student model obtained after the classification layer 2 corresponding to the connection word classification is expressed as
Figure 262384DEST_PATH_IMAGE027
When training a multi-task learning student model, in order to enable the model to fit a training example as much as possible
Figure 266112DEST_PATH_IMAGE049
It is desirable to minimize the cost function based on multitask learning, i.e., to simultaneously minimize the cross-entropy classification cost function corresponding to implicit discourse relation identification and the cross-entropy classification cost function corresponding to conjunctive word classification.
Specifically, the cross-entropy classification cost function corresponding to implicit discourse relation identification is represented as:
Figure 379562DEST_PATH_IMAGE021
wherein the content of the first and second substances,
Figure 68907DEST_PATH_IMAGE022
are the parameters of the student model and are,
Figure 558794DEST_PATH_IMAGE004
for the annotated implicit discourse relation category,
Figure 620291DEST_PATH_IMAGE008
implicit discourse relation categories representing annotations
Figure 435800DEST_PATH_IMAGE004
The corresponding one-hot code is coded,
Figure 848327DEST_PATH_IMAGE009
indicating the expected value of the prediction result with respect to the label category,
Figure 141905DEST_PATH_IMAGE023
shows that the prediction result about the implicit discourse relation is obtained after the student model is classified into the layer 1,
Figure 589067DEST_PATH_IMAGE011
is a training example set.
The cross-entropy classification cost function corresponding to the conjunctive word classification is expressed as:
Figure 778740DEST_PATH_IMAGE024
wherein the content of the first and second substances,
Figure 209721DEST_PATH_IMAGE022
to learn the parameters of the student model for multiple tasks,
Figure 775832DEST_PATH_IMAGE003
a conjunction that indicates a label is used,
Figure 343079DEST_PATH_IMAGE025
conjunctions representing annotations
Figure 703653DEST_PATH_IMAGE003
The corresponding one-hot code is coded,
Figure 356352DEST_PATH_IMAGE026
indicating the expected value of the prediction result with respect to the annotation link,
Figure 991732DEST_PATH_IMAGE027
the prediction result about the connection words is obtained after the classification layer 2 of the student model is shown,
Figure 882328DEST_PATH_IMAGE011
is a training example set.
In order to learn the classification knowledge integrated with the connecting word information from the teacher model, the invention adopts a knowledge distillation method, and the basic idea is to make the student model simulate the behavior of the teacher model as much as possible.
On the one hand, it is desirable to learn features learned by a multi-task learning student model and a connection-enhanced teacher model
Figure 210541DEST_PATH_IMAGE050
And
Figure 819377DEST_PATH_IMAGE051
the two models can be as close as possible, so that the knowledge transfer of the two models in the feature extraction layer is realized. As can be seen from the fact that the recognition performance of the teacher model on the PDTB data set is much higher than that of the student models, the characteristics of the teacher model
Figure 524028DEST_PATH_IMAGE051
Containing specific student model features
Figure 534709DEST_PATH_IMAGE050
More information useful for implicit discourse relation identification.
Specifically, a cost function corresponding to the distillation of the knowledge of the feature extraction layer in the student model is defined as:
Figure 768244DEST_PATH_IMAGE028
wherein the content of the first and second substances,
Figure 631420DEST_PATH_IMAGE029
which represents the mean-square error of the signal,
Figure 608604DEST_PATH_IMAGE022
are the parameters of the student model and are,
Figure 473792DEST_PATH_IMAGE044
representing the characteristics obtained after the teacher model characteristic extraction layer strengthened by the connecting words,
Figure 878228DEST_PATH_IMAGE050
representing the features obtained after the feature extraction layer of the multi-task learning student model,
Figure 461656DEST_PATH_IMAGE011
is a training example set.
On the other hand, final prediction results of teacher model with reinforcement of multi-task learning student model and connection words
Figure 976951DEST_PATH_IMAGE023
And
Figure 962225DEST_PATH_IMAGE052
the two models can be as close as possible, so that the knowledge migration of the two models at the classification layer is realized. True class labels represented by one-hot coding
Figure 803142DEST_PATH_IMAGE004
Can be regarded as a Hard Label (Hard Label), and the predicted result of the teacher model
Figure 873866DEST_PATH_IMAGE053
Can be regarded as a Soft Label (Soft Label), and the Soft Label is generally considered to contain more category information. For example, similarity information between categories. Specifically, a cost function corresponding to the distillation of knowledge of the classification layer in the multi-task learning student model is defined as:
Figure 192852DEST_PATH_IMAGE032
wherein the content of the first and second substances,
Figure 298211DEST_PATH_IMAGE033
indicating the KL (Kullback-Leibler) distance between the two probability distributions,
Figure 778871DEST_PATH_IMAGE005
for the implicit discourse relation training example with the connection word information,
Figure 602470DEST_PATH_IMAGE010
representing the prediction result obtained after the classification layer of the teacher model strengthened by the connecting words,
Figure 725147DEST_PATH_IMAGE023
and representing a prediction result obtained after the multi-task learning student model is classified into the layer 1.
Finally, the multitask learning student model total cost function is defined as a linear summation of the multitask learning based cost function and the knowledge distillation based cost function.
Specifically, the total cost function of the multi-task learning student model is expressed as:
Figure 419434DEST_PATH_IMAGE012
wherein the content of the first and second substances,
Figure 336574DEST_PATH_IMAGE022
are the parameters of the student model and are,
Figure 381891DEST_PATH_IMAGE015
the weight coefficients are respectively a cost function based on multitask learning and a cost function based on knowledge distillation; the cost function based on the multi-task learning comprises two parts:
Figure 42679DEST_PATH_IMAGE016
for cross-entropy cost functions identified corresponding to implicit discourse relations,
Figure 591472DEST_PATH_IMAGE017
is a cross entropy cost function corresponding to the connected word classification; the cost function of knowledge-based distillation consists of two parts:
Figure 210672DEST_PATH_IMAGE018
as a cost function corresponding to the distillation of the knowledge of the feature extraction layer,
Figure 743285DEST_PATH_IMAGE019
as a cost function corresponding to the distillation of the knowledge of the classification layer.
And S104, iteratively minimizing the total cost function of the student model until convergence, so as to output the trained student model, and further identifying the implicit discourse relation of the test case.
Algorithm 1 describes the training process of the discourse relation identification method based on knowledge distillation and multitask learning.
Specifically, the whole training process is divided into two stages: the first stage is based on a cost function
Figure 530518DEST_PATH_IMAGE054
Training a teacher model reinforced by connecting words (steps 1-5), and in the second stage, based on a cost function
Figure 933818DEST_PATH_IMAGE055
Training a multitask student model (step 6-12). For simplicity, the step of judging whether the model converges or not based on the verification data set is omitted in the algorithm 1, and the finally trained multi-task learning student model is the required implicit discourse relation identification model.
Algorithm 1 training algorithm
Inputting: training example set
Figure 458340DEST_PATH_IMAGE011
Maximum number of training rounds
Figure 478248DEST_PATH_IMAGE056
And (3) outputting: trained multi-task learning student model
1. Constructing teacher model and initializing parameters randomly
Figure 480839DEST_PATH_IMAGE057
2. Repeating the following steps:
3. from a set of training examples
Figure 4225DEST_PATH_IMAGE011
Take out a batch of examples
Figure 699648DEST_PATH_IMAGE058
4. Minimizing join-term-enforced teacher model cost function
Figure 675695DEST_PATH_IMAGE054
Updating the parameters
Figure 481977DEST_PATH_IMAGE007
5. Until: model convergence or maximum number of training rounds
Figure 125447DEST_PATH_IMAGE059
6. Constructing a multi-task learning student model and randomly initializing parameters
Figure 991772DEST_PATH_IMAGE022
7. Repeating the following steps:
8. from a set of training examples
Figure 986273DEST_PATH_IMAGE011
Take out a batch of examples
Figure 65088DEST_PATH_IMAGE058
9. Calculating corresponding characteristics based on trained teacher model reinforced by connecting words
Figure 828644DEST_PATH_IMAGE060
10. Calculating corresponding prediction results based on trained connection word reinforced teacher model
Figure 334712DEST_PATH_IMAGE061
11. Minimizing a multi-task learning student model cost function
Figure 816509DEST_PATH_IMAGE055
Updating the parameters
Figure 699014DEST_PATH_IMAGE022
12. Until: model convergence or maximum number of training rounds
Figure 582657DEST_PATH_IMAGE062
Meanwhile, in the present invention, the above-mentioned two-way attention mechanism classification model is often used to model semantic relationships between two sentences, such as text implication recognition, automatic question-answering, sentence semantic matching, and the like.
Referring to fig. 4, in particular, the two-way attention mechanism classification model includes a coding layer, an interaction layer, an aggregation layer and a classification layer. The feature extraction layer is composed of a coding layer, an interaction layer and an aggregation layer. In addition, the coding layer is used for learning the expression of words in the argument in the context, and the coding layer is expressed as follows:
Figure 790784DEST_PATH_IMAGE035
wherein the content of the first and second substances,
Figure 228719DEST_PATH_IMAGE036
are respectively the first in argument 1
Figure 914915DEST_PATH_IMAGE037
A word vector of words and its representation in context,
Figure 653064DEST_PATH_IMAGE038
are respectively the first in argument 2
Figure 500934DEST_PATH_IMAGE039
Word vectors of words and their representation in context,
Figure 800066DEST_PATH_IMAGE040
and
Figure 24374DEST_PATH_IMAGE041
are respectively two argumentsThe number of the Chinese words is equal to the number of the Chinese words,
Figure 617029DEST_PATH_IMAGE042
both are bidirectional long-time memory networks.
The interaction layer is represented as:
Figure 901380DEST_PATH_IMAGE063
wherein the content of the first and second substances,
Figure 94333DEST_PATH_IMAGE064
is a fully connected multi-layer feedforward neural network,
Figure 184649DEST_PATH_IMAGE065
is the number 1 of argument
Figure 366231DEST_PATH_IMAGE037
The first word and argument 2
Figure 585598DEST_PATH_IMAGE039
Relevance weight of the individual words;
Figure 751000DEST_PATH_IMAGE066
is the number 1 of the argument
Figure 317111DEST_PATH_IMAGE037
The representation of a word in argument 2 to which the word is related,
Figure 884358DEST_PATH_IMAGE067
is the number 2 of the argument
Figure 776091DEST_PATH_IMAGE039
The representation of a word in argument 1 to which the word is related,
Figure 428789DEST_PATH_IMAGE068
is another fully-connected multi-layer feed-forward neural network,
Figure 533011DEST_PATH_IMAGE069
a stitching operation of the representation vector is performed,
Figure 954765DEST_PATH_IMAGE070
and
Figure 282978DEST_PATH_IMAGE071
can be regarded as learned local semantic relation representation.
The aggregation layer calculates the global semantic relation based on the local semantic relation expression
Figure 891814DEST_PATH_IMAGE072
. The expression is specifically as follows:
Figure 330886DEST_PATH_IMAGE073
wherein the content of the first and second substances,
Figure 341567DEST_PATH_IMAGE074
the characteristics extracted by the characteristic extraction layer are expressed as the characteristics in the student model and the teacher model respectively
Figure 575102DEST_PATH_IMAGE075
And
Figure 202393DEST_PATH_IMAGE076
in addition, the classification layer is used to calculate the final classification result. The details are as follows:
Figure 648418DEST_PATH_IMAGE077
wherein the content of the first and second substances,
Figure 44764DEST_PATH_IMAGE078
by a fully-connected multi-layer feedforward neural network and
Figure 449201DEST_PATH_IMAGE079
layer composition;
Figure 298208DEST_PATH_IMAGE080
is the final classification result.
For the teacher model with strengthened connecting words, the teacher model can be directly constructed based on the two-way attention mechanism classification model, and only needs to strengthen input of the connecting words, namely the input of the model is used as the input of the model
Figure 547924DEST_PATH_IMAGE081
In particular, connecting words
Figure 300241DEST_PATH_IMAGE003
Is spliced at
Figure 875579DEST_PATH_IMAGE047
The beginning of argument 2 in, as new argument 2. The learned features are expressed as
Figure 946303DEST_PATH_IMAGE044
The predicted result is expressed as
Figure 265289DEST_PATH_IMAGE045
For a multi-task learning student model, the construction of the two-way attention mechanism classification model, the implicit discourse relation recognition task and the connection word classification task need to be simply expanded to share a feature extraction layer, but the classification layers are respectively used. Specifically, for the input example
Figure 370648DEST_PATH_IMAGE047
Features obtained through the shared feature extraction layer are
Figure 851308DEST_PATH_IMAGE048
Then, based on the classification layer 1, the prediction result corresponding to the implicit discourse relation identification is calculated as:
Figure 674908DEST_PATH_IMAGE082
wherein the content of the first and second substances,
Figure 797585DEST_PATH_IMAGE083
by a fully-connected multi-layer feedforward neural network and
Figure 491871DEST_PATH_IMAGE079
layer composition; the prediction result corresponding to the connected word classification is calculated based on the classification layer 2 as:
Figure 409011DEST_PATH_IMAGE084
wherein the content of the first and second substances,
Figure 454328DEST_PATH_IMAGE085
by a fully-connected multi-layer feedforward neural network and
Figure 380696DEST_PATH_IMAGE079
and (3) layer composition.
The invention provides a discourse relation identification method based on knowledge distillation and multi-task learning, which takes an implicit discourse relation example marked with connection words and categories as a training example and aims to fully utilize information of the connection words inserted during corpus marking; firstly, constructing a teacher model reinforced by connecting words based on a bidirectional attention mechanism classification model, and iteratively minimizing a cost function until convergence by using the connecting words as additional input to obtain a trained teacher model; and then training the constructed multi-task student model, constructing a total cost function based on a multi-task learning and knowledge distillation method, and carrying out minimum iteration processing on the total cost function until convergence, thereby outputting the well-trained multi-task student model.
According to the discourse relation identification method based on knowledge distillation and multitask learning, on one hand, knowledge is shared between a connection word classification auxiliary task and an implicit discourse relation identification main task based on a parameter sharing mode (shared characteristic extraction layer), on the other hand, knowledge in a teacher model enhanced by connection words is migrated to a corresponding implicit discourse relation identification model (multitask learning student model) from the characteristic extraction layer and the classification layer based on a knowledge distillation technology, so that the identification performance of the student model is improved by fully utilizing connection word information inserted during corpus labeling. The method provided by the invention obtains better identification performance on the first-level and second-level implicit discourse relations of the common PDTB data set than the similar method.
Referring to fig. 5, for the discourse relation identification device based on knowledge distillation and multitask learning according to the second embodiment of the present invention, the device includes a training input module 111, a first construction module 112, a second construction module 113, and a training output module 114, which are connected in sequence;
wherein the training input module 111 is specifically configured to:
taking an implicit discourse relation instance labeled with the connection words and implicit discourse relation categories as a training instance;
the first construction module 112 is specifically configured to:
constructing a teacher model reinforced by connecting words based on a bidirectional attention mechanism classification model, and carrying out iterative minimization processing on a cost function corresponding to the teacher model reinforced by the connecting words by taking the connecting words as additional input until convergence to obtain a trained teacher model;
the second construction module 113 is specifically configured to:
constructing a multi-task learning student model based on the two-way attention mechanism classification model, introducing connection word classification as an auxiliary task to determine a cost function based on multi-task learning, calculating the characteristics and the prediction result of a training example by using the trained teacher model to determine a cost function based on knowledge distillation, and then determining a total cost function of the student model;
the training output module 114 is specifically configured to:
and iteratively minimizing the total cost function of the student model until convergence so as to output the trained student model, and further identifying the implicit discourse relation of the test case.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A discourse relation identification method based on knowledge distillation and multitask learning is characterized by comprising the following steps:
taking an implicit discourse relation instance labeled with the connection words and implicit discourse relation categories as a training instance;
constructing a teacher model reinforced by connecting words based on a bidirectional attention mechanism classification model, and carrying out iterative minimization processing on a cost function corresponding to the teacher model reinforced by the connecting words by taking the connecting words as additional input until convergence to obtain a trained teacher model;
constructing a multi-task learning student model based on the two-way attention mechanism classification model, introducing connection word classification as an auxiliary task to determine a cost function based on multi-task learning, calculating the characteristics and the prediction result of a training example by using the trained teacher model to determine a cost function based on knowledge distillation, and then determining a total cost function of the student model;
and iteratively minimizing the total cost function of the student model until convergence so as to output the trained student model, and further identifying the implicit discourse relation of the test case.
2. The knowledge distillation and multitask learning-based discourse relation identification method as claimed in claim 1, wherein in the training instance, the implicit discourse relation instance labeled with the connection word and the implicit discourse relation category is represented as
Figure 773101DEST_PATH_IMAGE001
Wherein the content of the first and second substances,
Figure 392301DEST_PATH_IMAGE002
two arguments representing the implicit discourse relation training instance,
Figure 659335DEST_PATH_IMAGE003
a conjunction that indicates a label is used,
Figure 858235DEST_PATH_IMAGE004
and representing the implicit discourse relation category of the label.
3. The knowledge distillation and multitask learning based discourse relation identification method as claimed in claim 2, wherein in said connectives enhanced teacher model, the input is
Figure 527114DEST_PATH_IMAGE005
The corresponding cost function is expressed as:
Figure 561890DEST_PATH_IMAGE006
wherein the content of the first and second substances,
Figure 316219DEST_PATH_IMAGE007
are the parameters of the teacher model and are,
Figure 318810DEST_PATH_IMAGE008
implicit discourse relation classification for annotations
Figure 373354DEST_PATH_IMAGE004
The corresponding one-hot code is coded,
Figure 537619DEST_PATH_IMAGE009
indicating the expected value of the prediction result with respect to the label category,
Figure 44824DEST_PATH_IMAGE010
representing the prediction results obtained after the classification layer of the teacher model strengthened by the connecting words,
Figure 585526DEST_PATH_IMAGE011
is a training example set.
4. The knowledge distillation and multitask learning based discourse relation identification method according to claim 2, wherein in the multitask learning student model, the total cost function of the student model is expressed as:
Figure 228997DEST_PATH_IMAGE012
wherein the content of the first and second substances,
Figure 95322DEST_PATH_IMAGE013
for the total cost function of the student model,
Figure 558665DEST_PATH_IMAGE014
are the parameters of the student model and are,
Figure 434217DEST_PATH_IMAGE015
respectively costs based on multi-task learningWeighting coefficients of the function and the cost function based on knowledge distillation;
the cost function based on the multi-task learning comprises two parts:
Figure 932194DEST_PATH_IMAGE016
for cross-entropy cost functions identified corresponding to implicit discourse relations,
Figure 703841DEST_PATH_IMAGE017
is a cross entropy cost function corresponding to the connected word classification; the cost function of knowledge-based distillation comprises two parts:
Figure 185638DEST_PATH_IMAGE018
as a cost function corresponding to the distillation of the knowledge of the feature extraction layer,
Figure 68143DEST_PATH_IMAGE019
as a cost function corresponding to the distillation of the knowledge of the classification layer.
5. The knowledge distillation and multitask learning based chapter relationship identification method according to claim 4, characterized in that in said multitask learning student model, the input is
Figure 951786DEST_PATH_IMAGE020
The cross-entropy cost function corresponding to implicit discourse relation identification is expressed as:
Figure 628755DEST_PATH_IMAGE021
wherein the content of the first and second substances,
Figure 332269DEST_PATH_IMAGE008
implicit discourse relation classification for annotations
Figure 284044DEST_PATH_IMAGE004
The corresponding one-hot code is coded,
Figure 491034DEST_PATH_IMAGE009
indicating the expected value of the prediction result with respect to the label category,
Figure 870063DEST_PATH_IMAGE022
representing the prediction result corresponding to the implicit discourse relation identification after the classification layer 1 of the multi-task learning student model,
Figure 795294DEST_PATH_IMAGE011
is a training example set.
6. The knowledge distillation and multitask learning based discourse relation identification method according to claim 4, wherein the cross-entropy cost function corresponding to the connected word classification in the multitask learning student model is expressed as:
Figure 52225DEST_PATH_IMAGE023
wherein the content of the first and second substances,
Figure 644880DEST_PATH_IMAGE024
for marked conjunctions
Figure 929231DEST_PATH_IMAGE003
The corresponding one-hot code is coded,
Figure 341758DEST_PATH_IMAGE025
indicating the expected value of the prediction result with respect to the annotation link,
Figure 369757DEST_PATH_IMAGE026
represents the prediction result corresponding to the connection word classification obtained after the student model classification layer 2,
Figure 816919DEST_PATH_IMAGE011
is a training example set.
7. The knowledge distillation and multitask learning based discourse relation identification method according to claim 4, wherein the cost function corresponding to the feature extraction layer knowledge distillation in the multitask learning student model is expressed as:
Figure 803329DEST_PATH_IMAGE027
wherein the content of the first and second substances,
Figure 703152DEST_PATH_IMAGE028
which represents the mean-square error of the signal,
Figure 534842DEST_PATH_IMAGE029
representing the characteristics obtained after the teacher model characteristic extraction layer strengthened by the connecting words,
Figure 836510DEST_PATH_IMAGE030
representing features obtained after passing through a feature extraction layer of the multi-task learning student model,
Figure 462664DEST_PATH_IMAGE011
is a training example set.
8. The knowledge distillation and multitask learning based chapter relationship identification method according to claim 4, wherein the cost function corresponding to the classification layer knowledge distillation in the multitask learning student model is expressed as:
Figure 584204DEST_PATH_IMAGE031
wherein the content of the first and second substances,
Figure 485163DEST_PATH_IMAGE032
indicating the KL-distance between the two probability distributions,
Figure 906918DEST_PATH_IMAGE010
representing the prediction result obtained after the classification layer of the teacher model strengthened by the connecting words,
Figure 703972DEST_PATH_IMAGE033
and the prediction result obtained after the multi-task learning student model classification layer 1 is represented.
9. The knowledge distillation and multitask learning based discourse relation identification method according to claim 1, wherein the bidirectional attention mechanism classification model comprises an encoding layer, an interaction layer, an aggregation layer and a classification layer, wherein the encoding layer is used for learning the representation of words in arguments in context, and the encoding layer is represented as:
Figure 578387DEST_PATH_IMAGE034
wherein the content of the first and second substances,
Figure 486300DEST_PATH_IMAGE035
are respectively the first in argument 1
Figure 28140DEST_PATH_IMAGE036
A word vector of words and its representation in context,
Figure 261675DEST_PATH_IMAGE037
are respectively the first in argument 2
Figure 357807DEST_PATH_IMAGE038
Word vectors of words and their representation in context,
Figure 833526DEST_PATH_IMAGE039
and
Figure 964293DEST_PATH_IMAGE040
the number of words in two arguments respectively,
Figure 368729DEST_PATH_IMAGE041
both are bidirectional long-time memory networks.
10. An apparatus for identifying discourse relation based on knowledge distillation and multitask learning, the apparatus comprising:
the training input module is used for taking an implicit discourse relation example marked with the connection words and the implicit discourse relation category as a training example;
the first construction module is used for constructing a teacher model reinforced by connecting words based on a bidirectional attention mechanism classification model, taking the connecting words as additional input, and performing iterative minimization processing on a cost function corresponding to the teacher model reinforced by the connecting words until convergence to obtain a trained teacher model;
the second construction module is used for constructing a multi-task learning student model based on the two-way attention mechanism classification model, introducing connection word classification as an auxiliary task to determine a cost function based on multi-task learning, calculating the characteristics and the prediction result of a training example by using the trained teacher model to determine a cost function based on knowledge distillation, and then determining a total cost function of the student model;
and the training output module is used for iteratively minimizing the total cost function of the student model until convergence so as to output the trained student model and further identify the implicit discourse relation of the test case.
CN202110078740.7A 2021-01-21 2021-01-21 Knowledge distillation and multitask learning-based chapter relationship identification method and device Active CN112395876B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110078740.7A CN112395876B (en) 2021-01-21 2021-01-21 Knowledge distillation and multitask learning-based chapter relationship identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110078740.7A CN112395876B (en) 2021-01-21 2021-01-21 Knowledge distillation and multitask learning-based chapter relationship identification method and device

Publications (2)

Publication Number Publication Date
CN112395876A true CN112395876A (en) 2021-02-23
CN112395876B CN112395876B (en) 2021-04-13

Family

ID=74625591

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110078740.7A Active CN112395876B (en) 2021-01-21 2021-01-21 Knowledge distillation and multitask learning-based chapter relationship identification method and device

Country Status (1)

Country Link
CN (1) CN112395876B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177415A (en) * 2021-04-30 2021-07-27 科大讯飞股份有限公司 Semantic understanding method and device, electronic equipment and storage medium
CN113377915A (en) * 2021-06-22 2021-09-10 厦门大学 Dialogue chapter analysis method
CN115271272A (en) * 2022-09-29 2022-11-01 华东交通大学 Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation
CN116028630A (en) * 2023-03-29 2023-04-28 华东交通大学 Implicit chapter relation recognition method and system based on contrast learning and Adapter network
CN116432752A (en) * 2023-04-27 2023-07-14 华中科技大学 Construction method and application of implicit chapter relation recognition model
CN113177415B (en) * 2021-04-30 2024-06-07 科大讯飞股份有限公司 Semantic understanding method and device, electronic equipment and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8265925B2 (en) * 2001-11-15 2012-09-11 Texturgy As Method and apparatus for textual exploration discovery
CN107273358A (en) * 2017-06-18 2017-10-20 北京理工大学 A kind of end-to-end English structure of an article automatic analysis method based on pipe modes
CN107330032A (en) * 2017-06-26 2017-11-07 北京理工大学 A kind of implicit chapter relationship analysis method based on recurrent neural network
US10303771B1 (en) * 2018-02-14 2019-05-28 Capital One Services, Llc Utilizing machine learning models to identify insights in a document
CN110633473A (en) * 2019-09-25 2019-12-31 华东交通大学 Implicit discourse relation identification method and system based on conditional random field
US10606952B2 (en) * 2016-06-24 2020-03-31 Elemental Cognition Llc Architecture and processes for computer learning and understanding
CN111428525A (en) * 2020-06-15 2020-07-17 华东交通大学 Implicit discourse relation identification method and system and readable storage medium
CN111538841A (en) * 2020-07-09 2020-08-14 华东交通大学 Comment emotion analysis method, device and system based on knowledge mutual distillation
EP3699753A1 (en) * 2019-02-25 2020-08-26 Atos Syntel, Inc. Systems and methods for virtual programming by artificial intelligence
CN111651974A (en) * 2020-06-23 2020-09-11 北京理工大学 Implicit discourse relation analysis method and system
CN111695341A (en) * 2020-06-16 2020-09-22 北京理工大学 Implicit discourse relation analysis method and system based on discourse structure diagram convolution
EP3593262A4 (en) * 2017-03-10 2020-12-09 Eduworks Corporation Automated tool for question generation

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8265925B2 (en) * 2001-11-15 2012-09-11 Texturgy As Method and apparatus for textual exploration discovery
US10606952B2 (en) * 2016-06-24 2020-03-31 Elemental Cognition Llc Architecture and processes for computer learning and understanding
EP3593262A4 (en) * 2017-03-10 2020-12-09 Eduworks Corporation Automated tool for question generation
CN107273358A (en) * 2017-06-18 2017-10-20 北京理工大学 A kind of end-to-end English structure of an article automatic analysis method based on pipe modes
CN107330032A (en) * 2017-06-26 2017-11-07 北京理工大学 A kind of implicit chapter relationship analysis method based on recurrent neural network
US10303771B1 (en) * 2018-02-14 2019-05-28 Capital One Services, Llc Utilizing machine learning models to identify insights in a document
EP3699753A1 (en) * 2019-02-25 2020-08-26 Atos Syntel, Inc. Systems and methods for virtual programming by artificial intelligence
CN110633473A (en) * 2019-09-25 2019-12-31 华东交通大学 Implicit discourse relation identification method and system based on conditional random field
CN111428525A (en) * 2020-06-15 2020-07-17 华东交通大学 Implicit discourse relation identification method and system and readable storage medium
CN111695341A (en) * 2020-06-16 2020-09-22 北京理工大学 Implicit discourse relation analysis method and system based on discourse structure diagram convolution
CN111651974A (en) * 2020-06-23 2020-09-11 北京理工大学 Implicit discourse relation analysis method and system
CN111538841A (en) * 2020-07-09 2020-08-14 华东交通大学 Comment emotion analysis method, device and system based on knowledge mutual distillation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
朱珊珊等: "面向不平衡数据的隐式篇章关系分类方法研究", 《中文信息学报》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177415A (en) * 2021-04-30 2021-07-27 科大讯飞股份有限公司 Semantic understanding method and device, electronic equipment and storage medium
CN113177415B (en) * 2021-04-30 2024-06-07 科大讯飞股份有限公司 Semantic understanding method and device, electronic equipment and storage medium
CN113377915A (en) * 2021-06-22 2021-09-10 厦门大学 Dialogue chapter analysis method
CN113377915B (en) * 2021-06-22 2022-07-19 厦门大学 Dialogue chapter analysis method
CN115271272A (en) * 2022-09-29 2022-11-01 华东交通大学 Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation
CN115271272B (en) * 2022-09-29 2022-12-27 华东交通大学 Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation
CN116028630A (en) * 2023-03-29 2023-04-28 华东交通大学 Implicit chapter relation recognition method and system based on contrast learning and Adapter network
CN116028630B (en) * 2023-03-29 2023-06-02 华东交通大学 Implicit chapter relation recognition method and system based on contrast learning and Adapter network
CN116432752A (en) * 2023-04-27 2023-07-14 华中科技大学 Construction method and application of implicit chapter relation recognition model
CN116432752B (en) * 2023-04-27 2024-02-02 华中科技大学 Construction method and application of implicit chapter relation recognition model

Also Published As

Publication number Publication date
CN112395876B (en) 2021-04-13

Similar Documents

Publication Publication Date Title
CN112395876B (en) Knowledge distillation and multitask learning-based chapter relationship identification method and device
Wu et al. Neural metaphor detecting with CNN-LSTM model
CN111611810B (en) Multi-tone word pronunciation disambiguation device and method
Joty et al. Combining intra-and multi-sentential rhetorical parsing for document-level discourse analysis
CN111428525B (en) Implicit discourse relation identification method and system and readable storage medium
Urieli Robust French syntax analysis: reconciling statistical methods and linguistic knowledge in the Talismane toolkit
WO2023024412A1 (en) Visual question answering method and apparatus based on deep learning model, and medium and device
Zhang et al. Aspect-based sentiment analysis for user reviews
Gao et al. Named entity recognition method of Chinese EMR based on BERT-BiLSTM-CRF
CN113255320A (en) Entity relation extraction method and device based on syntax tree and graph attention machine mechanism
Zhang et al. n-BiLSTM: BiLSTM with n-gram Features for Text Classification
CN112163429A (en) Sentence relevancy obtaining method, system and medium combining cycle network and BERT
CN111783461A (en) Named entity identification method based on syntactic dependency relationship
CN111651973A (en) Text matching method based on syntax perception
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN114881042A (en) Chinese emotion analysis method based on graph convolution network fusion syntax dependence and part of speech
Chen et al. Research on automatic essay scoring of composition based on CNN and OR
CN114970536A (en) Combined lexical analysis method for word segmentation, part of speech tagging and named entity recognition
CN115048936A (en) Method for extracting aspect-level emotion triple fused with part-of-speech information
Hughes Automatic inference of causal reasoning chains from student essays
Neill et al. Meta-embedding as auxiliary task regularization
Zhao Research and design of automatic scoring algorithm for English composition based on machine learning
CN116562291A (en) Chinese nested named entity recognition method based on boundary detection
CN115906854A (en) Multi-level confrontation-based cross-language named entity recognition model training method
Hsiao et al. [Retracted] Construction of an Artificial Intelligence Writing Model for English Based on Fusion Neural Network Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant