CN112232024A

CN112232024A - Dependency syntax analysis model training method and device based on multi-labeled data

Info

Publication number: CN112232024A
Application number: CN202011089840.1A
Authority: CN
Inventors: 李正华; 周明月; 赵煜; 张民
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2020-10-13
Filing date: 2020-10-13
Publication date: 2021-01-15
Also published as: WO2022077891A1

Abstract

The application discloses a dependency syntax analysis model training method based on multi-label data, which comprises the following steps: acquiring a word sequence and a plurality of labeling results; inputting the word sequence into a dependency syntax analysis model to obtain an arc score and a label score; calculating loss values of the arc score and the label score relative to various labeling results according to the target loss function; through iterative training, the model parameters of the dependency syntax analysis model are adjusted with the aim of minimizing the loss value, so that model training is realized. Therefore, the method can calculate the loss value of the output result of the model relative to all the labeled results according to the target loss function, and accordingly, the iterative training of the model is completed, the purpose of fully utilizing the effective information in all the labeled data is achieved, and the dependency syntactic analysis capability of the model is improved. In addition, the application also provides a dependency parsing model training device, equipment and a readable storage medium based on multi-labeled data, and the technical effect of the device corresponds to the method.

Description

Dependency syntax analysis model training method and device based on multi-labeled data

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a readable storage medium for dependency parsing model training based on multi-labeled data.

Background

The objective of the dependency syntax analysis is to give an input sentence, capture the modification and collocation relationship between words in the sentence, characterize the syntax and semantic structure of the sentence, and construct a dependency syntax tree.

In recent years, with the rapid development of deep learning in the field of natural language processing, the accuracy of dependency parsing is significantly improved. However, when processing text that is different from the training data, the accuracy of the dependency parsing may drop dramatically. A straightforward solution to this problem is to label domain-specific syntactic data. However, most of the dependency syntax tree libraries are constructed by a few linguistics experts for a long time, so that the method has the defects of time and labor waste and high cost, and cannot meet the current requirements.

Inspired by crowdsourcing work, the method is a feasible method for quickly constructing the multi-label dependency syntax tree library by using label data of a large number of non-expert label personnel. However, compared to expert annotation, this method has relatively low annotation quality and high inconsistency. The current solutions include two, one is to select one kind of labeled data from multiple kinds of labeled data by adopting a majority voting method, and the other is to simply discard inconsistent labeled data or to manually review.

For the majority voting mode, the result obtained by voting can also be a completely wrong answer, so that the possibly correct information is completely discarded, the training effect is influenced, and the less the number of people is marked, the less reliable the voting result is. Although a weighted voting method can also be used, the problem of biased listening confidence when the number of people marked is small still cannot be solved.

For the way of simply discarding inconsistent sentences, although the reliability of the data set is improved, if the inconsistency rate of the original data set is higher, the way will result in greatly reduced data set size and waste. Although the manual auditing method can greatly improve the quality of the data set, the method is time-consuming, labor-consuming and high in cost.

In summary, although a data set which can be directly used for the dependency parsing model can be obtained in the majority voting mode and the simple inconsistent data discarding mode, both the two modes generate data waste, discard information of a part of the data set, and do not fully utilize effective information in multi-labeled data, thereby resulting in poor model performance.

Therefore, how to fully utilize the multi-labeled data to complete the training of the dependency syntactic analysis model and improve the performance of the model is a problem to be solved by the technical personnel in the field.

Disclosure of Invention

The invention aims to provide a multi-labeled data-based dependency syntactic analysis model training method, a multi-labeled data-based dependency syntactic analysis model training device and a readable storage medium, and aims to solve the problem that when a dependency syntactic analysis model is trained by using multi-labeled data, part of labeled data is still discarded essentially, only one type of labeled data is used for model training, effective information in the multi-labeled data cannot be fully utilized, and the model performance is poor. The specific scheme is as follows:

in a first aspect, the present application provides a method for dependency parsing model training based on multi-labeled data, including:

acquiring a word sequence and a plurality of labeling results of the word sequence, wherein for each modified word in the word sequence, the labeling results comprise an arc and a dependency relationship label, and each labeling result comes from different users;

inputting the word sequence into a dependency syntactic analysis model to obtain an arc score and a label score;

calculating loss values of the arc score and the label score relative to the plurality of labeled results according to a target loss function;

and adjusting the model parameters of the dependency syntax analysis model through iterative training with the aim of minimizing the loss value so as to realize the training of the dependency syntax analysis model.

Preferably, the calculating the loss values of the arc score and the label score relative to the plurality of labeled results according to an objective loss function includes:

setting weight values for various marking results in the multiple marking results according to the marking capabilities of different users;

and calculating the loss values of the arc scores and the label scores relative to the various labeling results according to the target loss function and the weight values of various labeling results.

Preferably, the setting a weight value for each of the plurality of labeling results includes:

and respectively setting an arc weight value and/or a label weight value aiming at each labeling result in the plurality of labeling results.

calculating the loss value of the arc score relative to the arc in the various labeling results according to an arc loss function to obtain a first loss value;

calculating the loss value of the label score relative to the dependency relationship label in the multiple labeling results according to a label loss function to obtain a second loss value;

determining a loss value of the arc score and the label score relative to a plurality of annotated results based on the first loss value and the second loss value.

Preferably, the calculating, according to a tag loss function, a loss value of the tag score with respect to the dependency tag in the multiple labeling results to obtain a second loss value includes:

calculating a loss value of the tag score relative to a dependency relationship tag in a target labeling result according to a tag loss function to obtain a second loss value, wherein the target labeling result is a labeling result that an arc in the multiple labeling results is equal to a target arc, the target arc is an arc determined according to a target strategy, and the target strategy comprises: arc score prediction, majority voting, weighted voting, random selection.

Preferably, the dependency parsing model includes: the device comprises an input layer, a coding layer, a first MLP layer, a first result layer, a second MLP layer and a second result layer;

wherein the first MLP layer is used for determining a representation vector of a current word as a core word and a representation vector of a current word as a modifier according to the output of the coding layer, and the first scoring layer is used for determining an arc score according to the output of the first MLP layer;

the second MLP layer is configured to determine, according to the output of the encoding layer, a representation vector including dependency label information when the current word is used as a core word and a representation vector including dependency label information when the current word is used as a modifier, and the second score layer is configured to determine a label score according to the output of the second MLP layer.

Preferably, the coding layer of the dependency parsing model comprises a plurality of layers of BilSTM.

In a second aspect, the present application provides a dependency parsing model training apparatus based on multi-labeled data, including:

a training sample acquisition module: the system comprises a word sequence and a plurality of labeling results of the word sequence, wherein the labeling results comprise arcs and dependency relationship labels for each modified word in the word sequence, and each labeling result comes from different users;

an input-output module: the dependency syntactic analysis model is used for inputting the word sequence to obtain an arc score and a label score;

a loss calculation module: for calculating a loss value of the arc score and the label score relative to the plurality of annotated results according to a target loss function;

an iteration module: and adjusting the model parameters of the dependency syntax analysis model through iterative training with the aim of minimizing the loss value so as to realize the training of the dependency syntax analysis model.

In a third aspect, the present application provides a dependency parsing model training device based on multi-labeled data, including:

a memory: for storing a computer program;

a processor: for executing the computer program to implement the multi-labeled data-based dependency parsing model training method as described above.

In a fourth aspect, the present application provides a readable storage medium having stored thereon a computer program for implementing a multi-labeled data-based dependency parsing model training method as described above when executed by a processor.

The application provides a dependency syntactic analysis model training method based on multi-label data, which comprises the following steps: acquiring a word sequence and a plurality of labeling results of the word sequence; inputting the word sequence into a dependency syntax analysis model to obtain an arc score and a label score; calculating loss values of the arc score and the label score relative to various labeling results according to the target loss function; through iterative training, the model parameters of the dependency syntax analysis model are adjusted with the aim of minimizing the loss value, so that the training of the dependency syntax analysis model is realized. Therefore, the method can calculate the loss value of the output result of the model relative to all the labeled results according to the target loss function, and accordingly, the iterative training of the model is completed, the purpose of fully utilizing the effective information in all the labeled data is achieved, and the dependency syntactic analysis capability of the model is improved.

In addition, the application also provides a device, equipment and a readable storage medium for training the dependency parsing model based on the multi-label data, and the technical effect of the device corresponds to the method, which is not repeated herein.

Drawings

For a clearer explanation of the embodiments or technical solutions of the prior art of the present application, the drawings needed for the description of the embodiments or prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart illustrating a first embodiment of a multi-labeled data-based dependency parsing model training method provided in the present application;

FIG. 2 is a flowchart illustrating a step S103 in an embodiment of a multi-labeled data-based dependency parsing model training method provided in the present application;

FIG. 3 is a schematic diagram of a model architecture of a second embodiment of a multi-labeled data-based dependency parsing model training method provided by the present application;

FIG. 4 is a diagram illustrating a single annotation result in a second embodiment of a multi-annotation data-based dependency parsing model training method according to the present application;

FIG. 5 is a data storage format of a single annotation result in a second embodiment of a multi-annotation-data-based dependency parsing model training method according to the present application;

FIG. 6 is a diagram illustrating a multi-labeled result of a second embodiment of a multi-labeled data-based dependency parsing model training method provided by the present application;

FIG. 7 is a data storage format of a multi-labeled result in a second embodiment of a multi-labeled data-based dependency parsing model training method provided by the present application;

FIG. 8 is a functional block diagram of an embodiment of a multi-labeled data-based dependency parsing model training apparatus provided in the present application.

Detailed Description

The core of the application is to provide a multi-labeled data-based dependency syntactic analysis model training method, device, equipment and readable storage medium, which can fully utilize effective information in all labeled data and improve the dependency syntactic analysis capability of the model.

In order that those skilled in the art will better understand the disclosure, the following detailed description will be given with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, a first embodiment of a dependency parsing model training method based on multi-labeled data provided by the present application is described below, where the first embodiment includes:

s101, obtaining a word sequence and a plurality of labeling results of the word sequence, wherein the labeling results comprise arcs and dependency relationship labels for each modified word in the word sequence;

the word sequence is a sequence obtained by segmenting a sentence. In the multiple annotation results (more than two annotation results) obtained in this embodiment, each annotation result comes from a different user. Assuming that a sentence is labeled by K users, K labeling results are generated, and each labeling result is a dependency syntax tree of the sentence.

The dependency syntax tree is used to describe the dependency relationship between words, and one dependency relationship contains three elements: modifiers, core words, and dependency types, meaning that a modifier modifies a core word with a certain dependency type.

In this embodiment, for each modifier in the word sequence, the tagging result includes the following two items of information: arcs (core words) and dependency tags.

S102, inputting the word sequence into a dependency syntax analysis model to obtain an arc score and a label score;

in this embodiment, the dependency parsing model is used to predict the core word and the dependency label of each word according to the word sequence, and specifically, the model outputs the arc score and the dependency score, and the actually predicted arc and dependency label can be determined according to the arc score and the dependency score.

The present embodiment does not limit what neural network is selected as the dependency parsing model, as long as it can predict the dependency relationship from the word sequence. A feasible scheme is provided, and a Biaffine Parser model is selected as the dependency parsing model of the embodiment.

S103, calculating loss values of the arc scores and the label scores relative to the multiple labeling results according to a target loss function;

in general, when only one kind of labeled result is used as a standard, the loss value between the actual prediction result and the labeled result is calculated directly. Since the method and the device adopt various labeling results, when the loss value is calculated, the loss value of the actual prediction result relative to all the labeling results needs to be calculated. Specifically, the loss values between the actual predicted result and each labeled result can be calculated respectively and then accumulated.

And S104, adjusting model parameters of the dependency syntax analysis model through iterative training with the aim of minimizing the loss value so as to realize the training of the dependency syntax analysis model.

The dependency syntax analysis model training method based on multi-labeled data provided by the embodiment can calculate the loss value of the output result of the model relative to all labeled results according to the target loss function, and accordingly, the iterative training of the model is completed, the purpose of fully utilizing effective information in all labeled data is achieved, and the dependency syntax analysis capability of the model is improved.

As a preferred implementation manner, on the basis of the first embodiment, weighting values may be assigned to the labeling results of different users to distinguish the labeling capabilities of different users. For example, for the labeling result given by the expert, a relatively high weight value can be given; for the labeling result given by the ordinary user, a lower weight value can be given.

Specifically, in order to distinguish the labeling capabilities of different users, weighted values are respectively set for various labeling results in the multiple labeling results, and then the process of S103 is modified as follows: and calculating the loss values of the arc scores and the label scores relative to the various labeling results according to the target loss function and the weight values of various labeling results.

On the basis, the annotation result is considered to contain two items of information: the arc and the dependency relationship labels can be distinguished from two dimensions respectively when the user labeling capacity is distinguished, and an arc weight value and a label weight value are set respectively. The user's marking ability may even be differentiated from only one of the dimensions, and not from the other dimension.

At this time, the weight setting process specifically includes: and respectively setting an arc weight value and/or a label weight value aiming at each labeling result in the plurality of labeling results. When the arc weight value and the label weight value are set respectively, the numerical value of the arc weight value is different from the weight value of the label weight.

To sum up, taking table 1 as an example, when setting the weight for the word i, the embodiment provides the following four weight setting methods to adapt to different scene requirements:

TABLE 1

Specifically, when calculating the loss value of the actual prediction result (the labeling result output by the model) with respect to all the labeling results, the calculation may be performed from two dimensions of the arc and the dependency relationship label, respectively. In this case, as shown in fig. 2, S103 includes:

s201, calculating loss values of the arc scores relative to the arcs in the multiple labeling results according to an arc loss function to obtain a first loss value;

s202, calculating the loss value of the label score relative to the dependency relationship label in the multiple labeling results according to a label loss function to obtain a second loss value;

s203, determining the loss values of the arc score and the label score relative to various labeling results according to the first loss value and the second loss value.

On the basis, as a preferred embodiment, when calculating the tag loss, the difference calculation may be performed not with the relationship type tags in all the labeling results, but with only the relationship type tags of some of the labeling results. The partial annotation result is an annotation result selected from all annotation results according to a certain policy, and the policy may specifically be majority voting, weighted voting, arc score prediction, random selection, and the like. In this case, as shown in fig. 3, the step S202 is specifically:

Wherein, the arc score prediction means: selecting the arc with the maximum score as a target arc according to the arc score output by the dependency syntax analysis model;

majority voting refers to: selecting the arc with the most occurrence times in the multi-marking result as a target arc by adopting a majority voting method;

the weighted voting refers to: selecting a target arc by adopting a weighted majority voting method and combining the weight of each kind of marked result and the occurrence frequency of each kind of marked result in the plurality of kinds of marked results;

the random selection means that: and randomly selecting an arc from the various labeling results as a target arc.

The following begins to describe in detail an embodiment two of the dependency parsing model training method based on multi-labeled data provided by the present application, and the embodiment two describes the training process in detail by taking practical applications as examples based on the foregoing description.

In this embodiment, a Biaffine Parser model is adopted, as shown in fig. 3. The dependency parsing model includes: the device comprises an input layer, a coding layer, a first MLP layer, a first result layer, a second MLP layer and a second result layer;

wherein, the coding layer comprises a plurality of layers of BiLSTM;

the first MLP layer is used for determining a representation vector of a current word as a core word and a representation vector of a current word as a modifier according to the output of the coding layer, and the first score layer is used for determining an arc score according to the output of the first MLP layer;

For sentence S ═ w₀w₁w₂w₃...w_n，w₀Is to insert an auxiliary root node at the beginning of the sentence. The input layer combines each word w_iMapping to a vector x_i，x_iIs the concatenation of the word embedding vector and the character embedding (Char-LSTM) vector, i.e.:

the coding layer is a plurality of layers of BiLSTM, and the output of the two-direction connection of the previous layer of BiLSTM is the input of the next layer.

Then the MLP indicates that the layer will encode the output h of the layer_iAs input, four independent MLPs are used to obtain four low-dimensional representation vectors containing corresponding information

And

as follows:

wherein

Is w_iAs a representative vector when it is a core word,

is w_iAs a vector of representations when a modifier is used,

denotes w_iAs a core word, a representation vector containing the predicted dependency label information,

is w_iThe modifier is a vector containing the representation of the predicted dependency tag information.

The biaffine scoring layer then computes the scores for all dependencies through biaffine, the dependency score being divided into two parts, an arc score and a dependency label score, where the arc score is as follows:

wherein score_arc(i, j) represents the score of the dependent arc with j acting as the core word and i acting as the modifier. Matrix W^bIs the biaffine parameter.

The dependency label score is as follows:

wherein

And

is the biaffine parameter and b is the offset.

The overall loss of the model consists of two parts: arc loss and label loss, wherein the arc loss refers to a part of the overall loss function and represents the difference between the distribution of the predicted arc and the distribution of the real arc; tag loss also refers to a portion of the overall loss function, representing the difference between the distribution of predicted tags and the true tags.

The original Biaffine attribution server uses cross entropy as a loss function, and each word calculates local loss separately. The original arc loss function is shown below:

in this embodiment, in order to adapt the model to the multi-labeled data, all answers of the multi-labeled data are fully utilized by modifying the original loss function of the model. Assuming that a sentence is labeled by K-labeled persons, multi-labeled data is generated. For the ith word, the K core words correspondingly labeled by the K labeled personnel are represented as a list H ═ H₁，h₂，...，h_K]Then the arc loss for this word is:

assume that the label set is L ═ L₁，l₂，...，l_TAnd for a modifier i to modify the dependency arc of the core word j by the dependency relationship type l, the original label loss is:

suppose that the K dependency labels correspondingly labeled by the K labeled personnel are represented as Y ═ Y₁，y₂，...，y_K]. Calculating each pair of answers (h) in combination with the label loss function_k，y_k) Then summing to obtain the final ensembleThe loss function is:

the overall loss is minimized in the model training iteration, and the difference is reduced, so that the optimized result is achieved.

And (4) obtaining a final syntactic analysis model through iterative training, and decoding and analyzing any input sentence to obtain a syntactic tree result. After syntactic information of the data is obtained, the syntactic information can be used for extracting long-distance information to adapt to the requirements of other natural language tasks.

On the basis, weight values can be set for various marking results. For example, the consistency of one annotator with other annotators is used to measure his annotating ability, and the higher the consistency, the higher the weight.

If K annotators are available { a₁，a₂，...，a_K}，s(a_k) Is the annotator a_kNumber of words labeled, w (a)_k) Is the annotator a_kThe number of words in the tagged word that are consistent with the answers given by the other tagged persons, then w (a)_k)/s(a_k) That is the annotator a_kThe rate of coincidence. The weight is the normalized coincidence ratio, i.e. the annotator a_kThe weight calculation formula of (a) is:

thus, the arc loss function for the ith word is modified to:

here again, the dependency type loss weighting is not applied, and the final loss function is then:

the above describes the loss function calculation method in this embodiment, and other calculation methods may be adopted in practical applications, which should not be construed as limiting the present application.

The dependency syntax tree is illustrated in FIG. 4, where s₀Representing a pseudo node pointing to a word that is the root node of a sentence. One dependent arc is composed of three elements

Wherein w_iCalled core word, w_jCalled modifier, r is a relationship type, representing w_jEmbellishment of w with syntactic role r_i. Here, the dependency arcs are taken as examples, and the relationship types are omitted.

The existing model uses a gold standard database, in which each sentence has only one standard answer, as shown in fig. 4. Fig. 4 is a graphical representation of dependency syntax data, and the corresponding data store has a CoNLL format as shown in fig. 5, where the second column is a vector representation of words and the seventh column is the corresponding core word sequence standard answers.

According to the method and the system, a plurality of annotating personnel annotate the same sentence according to the annotation guide, so that a plurality of annotation results are obtained. Each sentence has multiple syntactic tree labeled answers, FIG. 6 is an example of two-person labeling, with one person's labeling above the sentence and another person's labeling below the sentence. Correspondingly, the application is modified on the basis of the CoNLL format, so that the data format is also suitable for the multi-label form, as shown in FIG. 7. The first 10 columns are consistent with the CoNLL format, the 11 th column to the 12 th column are respectively the identification of the first annotation person and the core word sequence annotation answer, and the 14 th column to the 15 th column are respectively the identification of the second annotation person and the core word sequence annotation answer.

According to the scheme provided by the application, the data input format and the loss function of the Biaffine Parser basic model are modified, and then the multi-label data can be directly used for training. After iterative training, a syntactic analysis model can be obtained, and a syntactic tree result can be given for any input sentence.

In the following, a multi-labeled data-based dependency parsing model training apparatus according to an embodiment of the present application is described, and a multi-labeled data-based dependency parsing model training apparatus described below and a multi-labeled data-based dependency parsing model training method described above may be referred to in correspondence.

As shown in fig. 8, the dependency parsing model training apparatus based on multi-labeled data according to the present embodiment includes:

training sample acquisition module 801: the system comprises a word sequence and a plurality of labeling results of the word sequence, wherein the labeling results comprise arcs and dependency relationship labels for each modified word in the word sequence, and each labeling result comes from different users;

the input-output module 802: the dependency syntactic analysis model is used for inputting the word sequence to obtain an arc score and a label score;

loss calculation module 803: for calculating a loss value of the arc score and the label score relative to the plurality of annotated results according to a target loss function;

an iteration module 804: and adjusting the model parameters of the dependency syntax analysis model through iterative training with the aim of minimizing the loss value so as to realize the training of the dependency syntax analysis model.

The multi-labeled data-based dependency parsing model training device of the present embodiment is used for implementing the aforementioned multi-labeled data-based dependency parsing model training method, and therefore specific embodiments of the device can be seen in the foregoing embodiments of the multi-labeled data-based dependency parsing model training method, for example, the training sample obtaining module 801, the input/output module 802, the loss calculating module 803, and the iteration module 804 are respectively used for implementing the steps S101, S102, S103, and S104 of the aforementioned multi-labeled data-based dependency parsing model training method. Therefore, specific embodiments thereof may be referred to in the description of the corresponding respective partial embodiments, and will not be described herein.

In addition, since the dependency parsing model training device based on multi-labeled data of the present embodiment is used to implement the aforementioned dependency parsing model training method based on multi-labeled data, the function corresponds to that of the aforementioned method, and is not described herein again.

In addition, the present application also provides a dependency parsing model training device based on multi-labeled data, including:

a memory: for storing a computer program;

Finally, the present application provides a readable storage medium having stored thereon a computer program for implementing a multi-labeled data-based dependency parsing model training method as described above when executed by a processor.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above detailed descriptions of the solutions provided in the present application, and the specific examples applied herein are set forth to explain the principles and implementations of the present application, and the above descriptions of the examples are only used to help understand the method and its core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A multi-annotation data-based dependency parsing model training method is characterized by comprising the following steps:

2. The method of claim 1, wherein said calculating loss values for said arc score and said label score with respect to said plurality of labeled results according to an objective loss function comprises:

3. The method of claim 2, wherein setting weight values for each of the plurality of annotation results comprises:

4. The method of claim 1, wherein said calculating loss values for said arc score and said label score with respect to said plurality of labeled results according to an objective loss function comprises:

5. The method of claim 4, wherein the calculating the loss value of the tag score with respect to the dependency tag in the plurality of labeling results according to the tag loss function to obtain a second loss value comprises:

6. The method of claim 1, wherein the dependency parsing model comprises: the device comprises an input layer, a coding layer, a first MLP layer, a first result layer, a second MLP layer and a second result layer;

7. The method as recited in claim 6, wherein the coding layer of the dependency syntax analysis model comprises a plurality of layers of BilSTM.

8. A multi-labeled data-based dependency parsing model training apparatus, comprising:

9. A multi-labeled data-based dependency parsing model training device, comprising:

a memory: for storing a computer program;

a processor: for executing the computer program to implement the multi-labeled data based dependency parsing model training method according to any one of claims 1-7.

10. A readable storage medium having stored thereon a computer program for implementing a multi-labeled data based dependency parsing model training method according to any one of claims 1-7 when executed by a processor.