CN112084769A

CN112084769A - Dependency syntax model optimization method, device, equipment and readable storage medium

Info

Publication number: CN112084769A
Application number: CN202010963511.9A
Authority: CN
Inventors: 周楠楠; 于夕畔; 汤耀华; 杨海军; 徐倩
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2020-09-14
Filing date: 2020-09-14
Publication date: 2020-12-15

Abstract

The invention discloses a dependency syntax model optimization method, a device, equipment and a readable storage medium, wherein the dependency syntax model to be optimized comprises a pre-training model at the bottom layer and a dependency relationship prediction network at the upper layer, the pre-training model is obtained by adopting a field-independent text training set for training, and the dependency syntax model optimization method comprises the following steps: performing bottom-layer vector extraction processing on training sentences in a text training set in a target field by using the pre-training model to obtain word vectors corresponding to all words in the training sentences; and performing upper-layer prediction processing on the word vectors by adopting the dependency relationship prediction network, and optimizing a processing result to optimize the dependency syntactic model. The invention can greatly reduce the marking workload, reduce the marking cost and improve the model optimization efficiency.

Description

Dependency syntax model optimization method, device, equipment and readable storage medium

Technical Field

The invention relates to the technical field of natural language processing, in particular to a dependency syntax model optimization method, a dependency syntax model optimization device, dependency syntax model optimization equipment and a readable storage medium.

Background

Dependency syntax is one of the key techniques in natural language processing, and is a method for revealing the syntax structure by analyzing the dependency relationship between components in a language unit. Dependency syntax may provide assistance for other natural language processing tasks, such as reference resolution, semantic analysis, machine translation, information extraction, and the like. In the existing graph-based dependency syntax analysis method, a dependency relationship between any two elements is assumed to exist with a certain probability, a function for evaluating a subtree score is trained by using deep learning, and an optimal spanning tree is searched in a dynamic planning process. Because all the possibilities of the dependency tree can be considered, the graph-based dependency syntax analysis method has high accuracy, but because the model structure is generally deep, a large amount of labeled training data is needed for training in order to obtain a good prediction result, however, the data labeling difficulty of the dependency syntax analysis is high, and the cost is high.

Disclosure of Invention

The invention mainly aims to provide a dependency syntax model optimization method, a device, equipment and a readable storage medium, and aims to solve the technical problems that the existing graph-based dependency syntax analysis method needs a large amount of labeled data, and is high in labeling difficulty and high in cost.

In order to achieve the above object, the present invention provides a dependency syntax model optimization method, in which a dependency syntax model to be optimized includes a pre-training model at a bottom layer and a dependency relationship prediction network at an upper layer, the pre-training model is obtained by training using a field-independent text training set, and the dependency syntax model optimization method includes the following steps:

performing bottom-layer vector extraction processing on training sentences in a text training set in a target field by using the pre-training model to obtain word vectors corresponding to all words in the training sentences;

and performing upper-layer prediction processing on the word vectors by adopting the dependency relationship prediction network, and optimizing a processing result to optimize the dependency syntactic model.

Optionally, the step of performing bottom-layer vector extraction processing on the training sentences in the text training set in the target field by using the pre-training model to obtain word vectors corresponding to words in the training sentences includes:

performing bottom-layer vector extraction processing on training sentences in a text training set of a target field by using the pre-training model to obtain word vectors corresponding to all words in the training sentences;

and for each word in the training sentence, carrying out weighted summation on word vectors of each word forming the word to obtain a word vector corresponding to the word, wherein model parameters in the dependency syntax model comprise weights corresponding to each word in the training sentence.

Optionally, the step of performing upper-layer prediction processing on the word vector by using the dependency prediction network, and optimizing a processing result to optimize the dependency syntax model includes:

splicing the word vector of each word with a preset part-of-speech vector to obtain a sentence vector comprising the splicing result of each word;

performing upper-layer prediction processing on the sentence vector by adopting the dependency relationship prediction network to obtain a dependency relationship prediction result corresponding to the training sentence;

and calculating errors according to the dependency relationship prediction result and the dependency relationship real label corresponding to the training statement, and updating model parameters in the dependency syntax model according to the errors so as to optimize the dependency syntax model.

Optionally, the dependency prediction network includes a relationship prediction module and a relationship type prediction module,

the step of performing upper-layer prediction processing on the sentence vector by using the dependency relationship prediction network to obtain the dependency relationship prediction result corresponding to the training sentence comprises:

inputting the sentence vector into the relation prediction module for prediction to obtain a relation prediction result of the training sentence, wherein the relation prediction result represents whether dependency exists between words in the training sentence or not;

and predicting to obtain a relation type prediction result by adopting the relation type prediction module based on the sentence vector and the relation prediction result, and taking the relation prediction result and the relation type prediction result as a dependency relation prediction result, wherein the relation type prediction result represents the type of dependency relation in the training sentence.

Optionally, the relation prediction module comprises a first multi-layer perceptron, a second multi-layer perceptron and a double affine transformation network,

the step of inputting the sentence vector into the relation prediction module for prediction to obtain the relation prediction result of the training sentence comprises:

inputting the sentence vectors into the first multilayer perceptron and the second multilayer perceptron respectively to obtain a first sentence characteristic vector and a second sentence characteristic vector correspondingly, wherein the first sentence characteristic vector comprises characteristic vectors with all words as dependency relationship heads, and the second sentence characteristic vector comprises characteristic vectors with all words as dependency relationship tails;

inputting the first sentence characteristic vector and the second sentence characteristic vector into the double affine transformation network for transformation processing to obtain a dependency relationship score matrix of each word;

and predicting to obtain a relation prediction result of the training sentence according to the dependency relation score matrix.

Optionally, the step of obtaining a relationship prediction result of the training sentence according to the dependency relationship score matrix includes:

determining alternative dependency relationship combinations of all words in the training sentences, wherein the alternative dependency relationship combinations meet preset dependency relationship tree conditions;

calculating the score of each alternative dependency relationship combination according to the dependency relationship score matrix;

and selecting the alternative dependency relationship combination with the highest score as the relationship prediction result of the training sentence.

preprocessing the training sentences in the text training set of the target field to obtain preprocessed training sentences;

and performing bottom-layer vector extraction processing on the preprocessed training sentences by adopting the pre-training model to obtain word vectors corresponding to all words in the training sentences.

Optionally, the step of performing a preprocessing operation on the training sentences in the text training set of the target field to obtain preprocessed training sentences includes:

performing character-level segmentation on training sentences in a text training set in a target field to obtain each character in the training sentences;

converting each word in the training sentence into a corresponding preset word code, and converting the length of the training sentence into a preset length;

and adding a preset sentence beginning label and a preset sentence end label to the converted training sentence to obtain the preprocessed training sentence.

In order to achieve the above object, the present invention further provides a dependency syntax model optimization apparatus, where a dependency syntax model to be optimized includes a pre-training model at a bottom layer and a dependency relationship prediction network at an upper layer, the pre-training model is obtained by training using a domain-independent text training set, and the dependency syntax model optimization apparatus includes:

the vector extraction module is used for performing bottom vector extraction processing on training sentences in a text training set in a target field by adopting the pre-training model to obtain word vectors corresponding to all words in the training sentences;

and the prediction module is used for performing upper-layer prediction processing on the word vectors by adopting the dependency relationship prediction network and optimizing a processing result so as to optimize the dependency syntactic model.

To achieve the above object, the present invention also provides a dependency syntax model optimizing device, including: a memory, a processor, and a dependent syntax model optimizer stored on the memory and executable on the processor, the dependent syntax model optimizer, when executed by the processor, implementing the steps of the dependent syntax model optimization method as described above.

In addition, to achieve the above object, the present invention also provides a computer readable storage medium having stored thereon a dependency syntax model optimization program, which when executed by a processor, implements the steps of the dependency syntax model optimization method as described above.

In the invention, a dependency syntax model comprising a bottom-layer pre-training model and an upper-layer dependency relationship prediction network is arranged, and the pre-training model is adopted to extract bottom-layer vectors of training sentences in a text training set in a target field to obtain word vectors corresponding to all words in the training sentences; and performing upper-layer prediction processing on the word vectors by adopting a dependency relationship prediction network, and optimizing a processing result to optimize a dependency syntax model. Because the pre-training model is obtained by training the domain-independent training corpus (i.e. the training corpus of the general domain), and the model parameters are not initialized randomly but contain a large amount of semantic information of natural language, when the pre-training model is used for the dependency syntax analysis of the specific domain, the accurate vector representation of the training text can be obtained, i.e. compared with the method of extracting the vector representation of the text by using models such as a random initialized bidirectional cyclic neural network and the like, the pre-training model can provide directional guidance for the upper dependency relationship prediction network at the beginning of training, thereby improving the accuracy of model prediction, accelerating the convergence speed of the model, obtaining better prediction results by using less training data of the specific domain, greatly reducing the labeling workload, reducing the labeling cost and improving the optimization efficiency of the model, thereby enabling the dependent syntax model to be applied to the associated downstream natural language processing task at low cost.

Drawings

FIG. 1 is a schematic diagram of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a first embodiment of a dependency syntax model optimization method according to the present invention;

FIG. 3 is a flow diagram illustrating a dependency syntax model optimization process according to an embodiment of the present invention;

FIG. 4 is a diagram of a dependency syntax model structure according to an embodiment of the present invention;

FIG. 5 is a functional block diagram of the dependency syntax model optimization apparatus according to the preferred embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.

It should be noted that, the dependency syntax model optimization device in the embodiment of the present invention may be a smart phone, a personal computer, a server, and the like, and is not limited herein.

As shown in fig. 1, the dependency syntax model optimization device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the device architecture illustrated in FIG. 1 does not constitute a limitation on the dependency syntax model optimization device, and may include more or fewer components than those illustrated, or some components in combination, or a different arrangement of components.

As shown in FIG. 1, memory 1005, which is one type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a dependency syntax model optimizer. The dependency syntax model to be optimized comprises a pre-training model at the bottom layer and a dependency relationship prediction network at the upper layer, wherein the pre-training model is obtained by adopting a field-independent text training set for training. The operating system is a program that manages and controls the device hardware and software resources, supporting the operation of the dependency syntax model optimizer and other software or programs.

In the device shown in fig. 1, the user interface 1003 is mainly used for data communication with a client; the network interface 1004 is mainly used for establishing communication connection with a server; and processor 1001 may be configured to invoke the dependency syntax model optimizer stored in memory 1005 and perform the following operations:

Further, the step of performing bottom-layer vector extraction processing on the training sentences in the text training set in the target field by using the pre-training model to obtain word vectors corresponding to each word in the training sentences includes:

Further, the step of performing upper-layer prediction processing on the word vector by using the dependency relationship prediction network and optimizing a processing result to optimize the dependency syntax model includes:

Further, the dependency prediction network includes a relationship prediction module and a relationship type prediction module,

Further, the relation prediction module comprises a first multi-layer perceptron, a second multi-layer perceptron and a double affine transformation network,

Further, the step of obtaining the relationship prediction result of the training sentence according to the dependency relationship score matrix prediction includes:

Further, the step of preprocessing the training sentences in the text training set in the target field to obtain preprocessed training sentences includes:

Based on the above structure, various embodiments of the dependency syntax model optimization method are provided.

Referring to FIG. 2, FIG. 2 is a flowchart illustrating a first embodiment of a dependency syntax model optimization method according to the present invention.

While a logical order is shown in the flow chart, in some cases, the steps shown or described may be performed in an order different than presented herein. The execution subject of each embodiment of the dependency syntax model optimization method of the present invention may be a device such as a smart phone, a personal computer, and a server, and for convenience of description, the execution subject is omitted in the following embodiments for explanation. In this embodiment, the dependency syntax model optimization method includes:

step S10, performing bottom vector extraction processing on training sentences in a text training set of a target field by adopting the pre-training model to obtain word vectors corresponding to all words in the training sentences;

in this embodiment, in order to solve the problems that the existing graph-based dependency parsing method needs a large amount of labeled training data to train a model, and the data labeling difficulty is large and the cost is high, on the basis of the graph-based dependency parsing method, a dependency parsing model is proposed, which includes a pre-training model at a bottom layer and a dependency relationship prediction network at an upper layer. That is, in the conventional graph-based dependency syntax analysis method, the underlying model of the dependency syntax model is generally implemented by using a network structure such as a bidirectional cyclic neural network, and the model parameters are initialized randomly.

The pre-training model is a pre-training model in the field of preset natural language processing, and is not limited in this embodiment, for example, the pre-training model may be a pre-training model such as Bert, Roberta, AlBERT, XLNet, or the like. The pre-training model in the natural language processing field is obtained by training a field-independent text training set (namely, a text training set in a general field), and is used for performing semantic understanding on text sentences to give semantic representation. In this embodiment, a pre-training module is used to process a sentence to obtain a word vector of each word in the sentence. The dependency prediction network may be implemented by using a dependency prediction network structure commonly used in a graph-based dependency parsing method, for example, using a multi-level perceptron (MLP) + bijective transform network (Biaffine), which is not limited in this embodiment.

A training set of text for training the dependency syntax model may be collected in advance. If the dependency syntax model is applied to text dependency syntax analysis in a specific domain (target domain), for example, in a professional conversational domain of banking services, some sentences in the specific domain may be collected as training sentences. Specifically, a labeled training data set of the general field can be obtained from a public database, then training sentences are collected for a specific field and labeled to obtain a training data set of the specific field, the two training data sets are disordered to obtain a training set, and the training set is adopted to perform optimization training on the dependency syntax model.

Model parameters of a pre-trained model in the dependency syntax model are pre-trained, model parameters in the dependency relationship prediction network can be initialized randomly or according to experience, and the optimization of the dependency syntax model is to perform multiple rounds of iterative training on the dependency syntax model by adopting a training set and continuously update the model parameters of the dependency syntax model, namely update the model parameters in the pre-trained model and the dependency relationship prediction network.

In one round of optimization training, firstly, a pre-training model is adopted to perform bottom vector extraction processing on training sentences in a text training set to obtain word vectors corresponding to all words in the training sentences. It should be noted that the text training set includes a large number of training sentences, the processing logic of each training sentence is the same, and for convenience of description, each embodiment of the present invention is described by using one training sentence. A training sentence is composed of a plurality of words, one word may be composed of a plurality of characters, vector representation corresponding to each character in the training sentence can be obtained by adopting pre-training model processing, and then the vector representation of the word, namely the word vector, is obtained by carrying out operations such as averaging or splicing on the vectors of the plurality of characters forming one word.

And step S20, performing upper-layer prediction processing on the word vectors by adopting the dependency relationship prediction network, and optimizing the processing result to optimize the dependency syntax model.

And performing upper-layer prediction processing on the word vectors by adopting a dependency relationship prediction network to obtain a processing result. In particular, the processing result may be a dependency prediction result of the training statement. The dependency syntax model is optimized by optimizing the processing results. Specifically, a common machine learning method may be employed to optimize the processing results.

And after one round of optimization training, updating the model parameters in the dependency syntax model once, performing next round of optimization training by using the training set on the basis of the dependency syntax model with the updated model parameters, and finishing the training until the condition of stopping the training is detected, thereby obtaining the final optimized and updated dependency syntax model. The condition for stopping training may be that the dependency syntactic model converges, or that the training round reaches a predetermined maximum round, or that the training duration reaches a predetermined maximum duration, or other stopping conditions set according to specific needs.

Further, the step S20 includes:

step S201, splicing the word vector of each word with a preset part-of-speech vector to obtain a sentence vector comprising the splicing result of each word;

step S202, the dependency relationship prediction network is adopted to carry out upper layer prediction processing on the sentence vectors to obtain the dependency relationship prediction result corresponding to the training sentences;

step S203, calculating an error according to the dependency relationship prediction result and the dependency relationship real label corresponding to the training sentence, and updating the model parameter in the dependency syntax model according to the error to optimize the dependency syntax model.

Furthermore, the part of speech of each word can be added as a prediction basis of the dependency relationship prediction network. Specifically, the part of speech of each word in the training sentence is labeled in advance, and the various parts of speech are encoded by using coding methods such as one-hot coding and the like to obtain part of speech vectors corresponding to the various parts of speech, so that each word in the training sentence corresponds to one part of speech vector.

After the word vector of each word in the training sentence is obtained, the word vector of each word is spliced with the preset part-of-speech vector to obtain a splicing result corresponding to each word, and the splicing result of each word is used as a sentence vector, namely the sentence vector comprises the splicing result of the word vector and the part-of-speech vector of each word. And performing upper-layer prediction processing on the sentence vectors by adopting a dependency relationship prediction network to obtain a dependency relationship prediction result corresponding to the training sentence. And the dependency relationship prediction network predicts to obtain a dependency relationship prediction result representing the dependency relationship between the words in the training sentence according to the part of speech of each word and semantic information in the word vector of each word. One dependency exists between two words, one of which is a dependency head and the other of which is a dependency tail, which is grammatically dependent on the dependency head. There are many types of dependencies, such as a predicate and an actor-guest. The dependency prediction network may be configured to predict whether a dependency exists between words in a sentence, and further may be configured to predict a type of dependency. Then, the dependency prediction result may include a result characterizing whether a dependency exists between words in the training sentence, and may further include a result characterizing what type of dependency the dependency exists in the training sentence.

The dependence relationship prediction result can be optimized by adopting a supervised learning method. Specifically, a dependency relationship real label corresponding to the training sentence is labeled in advance, and the dependency relationship real label may include a label indicating whether a dependency relationship exists between words in the training sentence, and further, if the type of the dependency relationship is to be predicted, the dependency relationship real label may further include a label indicating what type the existing dependency relationship belongs to. That is, the dependency real label represents the real dependency in the training sentence, and the dependency prediction result is the prediction made by the model, and the dependency syntax model is optimized such that the error between the dependency prediction result made by the model and the dependency real label is as small as possible. Then, the error may be calculated according to the dependency relationship prediction result of the training statement and the dependency relationship real label, where the error may adopt a common loss function calculation manner, and each model parameter in the dependency syntax model is updated according to the error according to a back propagation algorithm, and a specific model parameter updating process may refer to an existing back propagation algorithm, which is not described herein. By updating each model parameter, the error between the dependency relationship prediction result and the dependency relationship real label is smaller and smaller, and thus the dependency syntax model is optimized.

After the finally optimized and updated dependency syntax model is obtained, the dependency syntax model can be used for analyzing the text sentence needing dependency syntax analysis, the dependency relationship among all the words in the text sentence is predicted, and further the subsequent natural language processing tasks such as machine translation and the like can be carried out according to the dependency relationship.

In the embodiment, a dependency syntax model comprising a bottom-layer pre-training model and an upper-layer dependency relationship prediction network is set, and the pre-training model is adopted to perform bottom-layer vector extraction processing on training sentences in a text training set in a target field to obtain word vectors corresponding to words in the training sentences; and performing upper-layer prediction processing on the word vectors by adopting a dependency relationship prediction network, and optimizing a processing result to optimize a dependency syntax model. Because the pre-training model is obtained by training the domain-independent training corpus (i.e. the training corpus of the general domain), and the model parameters are not initialized randomly but contain a large amount of semantic information of natural language, when the pre-training model is used for the dependency syntax analysis of the specific domain, the accurate vector representation of the training text can be obtained, i.e. compared with the method of extracting the vector representation of the text by using models such as a random initialized bidirectional cyclic neural network and the like, the pre-training model can provide directional guidance for the upper dependency relationship prediction network at the beginning of training, thereby improving the accuracy of model prediction, accelerating the convergence speed of the model, obtaining better prediction result by using less training data of the specific domain, greatly reducing the labeling workload, reducing the labeling cost and improving the optimization efficiency of the model, thereby enabling the dependent syntax model to be applied to the associated downstream natural language processing task at low cost.

Further, based on the first embodiment, a second embodiment of the dependency syntactic model optimizing method according to the present invention is provided, in this embodiment, the step S10 includes:

step S101, performing bottom vector extraction processing on training sentences in a text training set of a target field by adopting the pre-training model to obtain word vectors corresponding to all words in the training sentences;

further, in this embodiment, the training sentences in the text training set in the target field are input into the pre-training model to perform bottom-layer vector extraction processing, so as to obtain word vectors corresponding to each word in the training sentences. Specifically, the processing flow inside the model differs according to the pre-training model used, and detailed development is not performed here.

Step S102, for each word in the training sentence, carrying out weighted summation on word vectors of each word forming the word to obtain a word vector corresponding to the word, wherein model parameters in the dependency syntax model comprise weights corresponding to each word in the training sentence.

The words included in the training sentences are labeled in advance, and the word vectors of all the characters forming one word are weighted and summed to obtain the word vector corresponding to the word, so that the word vector of each word is obtained. Before the optimization starts, the weight corresponding to each word may be initialized randomly, and in the optimization updating process, the weight of each word is used as a model parameter of the dependency syntax model, that is, when the optimization updating is performed, the weight corresponding to each word needs to be optimized and updated in addition to the model parameters of the pre-training model and the dependency relationship prediction network in the dependency syntax model.

In this embodiment, considering that the contribution of each word forming a word in a text to the dependency prediction is different, if the word vector is obtained by directly averaging or splicing the word vectors of each word, the contribution is ignored, in this embodiment, the contribution of each word is also used as a basis for the dependency prediction, that is, the word vector of each word is used to obtain the word vector by a weighted summation method, and the weight corresponding to each word is also used as a model parameter, which is optimized along with model optimization, so that the model can accurately extract the contribution of each word forming a word in the text to the dependency prediction, and then the dependency prediction can be performed according to the contribution, thereby enriching the prediction basis, improving the prediction accuracy, and further conforming to the rules of natural language.

Further, in an embodiment, the step S10:

step S103, preprocessing the training sentences in the text training set of the target field to obtain preprocessed training sentences;

and preprocessing the training sentences in the text training data set of the target field to obtain preprocessed training sentences. The pre-training models are different, and the pre-processing operation is different, so that the pre-processed training sentences conform to the data format of the input data of the pre-training models. For example, the input data of the pre-training model needs to be fixed length, and the lengths of the training sentences may be different, so that the training sentences need to be converted into fixed lengths.

And step S104, performing bottom-layer vector extraction processing on the preprocessed training sentences by using the pre-training model to obtain word vectors corresponding to all words in the training sentences.

Inputting the preprocessed training sentences into a pre-training model to perform bottom vector extraction processing, namely, performing semantic understanding on the preprocessed training sentences through the pre-training model to obtain word vectors corresponding to all words in the training sentences. Specifically, the word vectors corresponding to the words may be obtained by processing the preprocessed training sentences according to the processing steps of S101 and S102.

In the embodiment, each training sentence in the text training set in the target field is preprocessed, and the preprocessed training sentences are input into the pre-training model for processing, so that the model can obtain a more accurate prediction result based on the training sentences with uniform formats, the training optimization duration is further shortened, and the optimization efficiency is improved.

In one embodiment, the dependency syntax model may be optimized according to an optimization procedure as shown in FIG. 3.

Further, the step S103 includes:

step S1031, performing character-level segmentation on the training sentences in the text training set in the target field to obtain each character in the training sentences;

in an embodiment, when the pre-training model is a Bert model, the training sentences in the text training set in the target field may be subjected to character-level segmentation to obtain each character in the training sentences. For example, when the training sentence is a chinese text, the character-level segmentation is to segment the training sentence into individual chinese characters, one chinese character being one word; when the training sentence is an english text, the character-level segmentation may be to segment the training sentence into words, or may be elements smaller than the words, where one element is one word.

Step S1032, after each word in the training sentence is converted into a corresponding preset word code, the length of the training sentence is converted into a preset length;

and converting each word in the training sentence into a corresponding preset word code. The words in the dictionary are encoded in advance, and each word corresponds to a word code, namely an ID number; for each word in the training sentence, the word code corresponding to the word is found, and the word is converted into the corresponding word code, i.e. the subsequent operation is performed on the word code.

And converting the training sentences coded by the converted words into preset lengths, namely converting the lengths of all the training sentences into fixed preset lengths. Wherein the preset length is preset; if the length of the training sentence is larger than the preset length, the redundant part of the training sentence is cut off, and if the length of the training sentence is smaller than the preset length, the insufficient part of the training sentence is supplemented by 0.

Step S1033, adding a preset sentence beginning label and a preset sentence end label to the converted training sentence to obtain the preprocessed training sentence.

And adding a preset sentence beginning label and a preset sentence end label to the converted training sentence to obtain the preprocessed training sentence. The preset sentence head label and the preset sentence end label are preset according to needs, for example, the sentence head label is [ CLS ], and the sentence end label is [ SEP ]. After the preprocessed training sentences are obtained, the preprocessed training sentences can be input into a Bert model for processing, and word vectors of all words are obtained.

Further, based on the first and/or second embodiments, a third embodiment of the dependency syntax model optimization method according to the present invention is provided, and in this embodiment, the step S202:

step S2021, inputting the sentence vector into the relation prediction module for prediction to obtain a relation prediction result of the training sentence, wherein the relation prediction result represents whether dependency relations exist among words in the training sentence or not;

in this embodiment, the dependency relationship prediction network includes a relationship prediction module and a relationship type prediction module, where the relationship prediction module is configured to predict whether a dependency relationship exists between words in a training sentence, and the relationship type prediction module is configured to predict a type of the dependency relationship in the training sentence. Specifically, the relationship prediction module and the relationship type prediction module may both be implemented by MPL + Biaffine, that is, internal structures of the two modules may be set to be the same, an output of the relationship prediction module is set to be a tag indicating whether a relationship exists, an output of the relationship type prediction module is set to be a tag indicating a relationship type, and the two modules are optimized by different errors.

And inputting the sentence vectors into a relation prediction module for prediction to obtain a relation prediction result of the training sentence. And the relation prediction result represents whether the dependency relationship exists between the words in the training sentence or not. Specifically, there are two types of dependencies between two words, one is that they depend on each other, and one is that they depend on each other; the relationship prediction result may then be a scoring matrix including two probabilities between any two words in the training sentence, respectively indicating the likelihood of the two dependencies between the two words.

Step S2022, predicting by using the relation type prediction module based on the sentence vector and the relation prediction result to obtain a relation type prediction result, and taking the relation prediction result and the relation type prediction result as a dependency relation prediction result, wherein the relation type prediction result represents the type of dependency relation in the training sentence.

And after the relation prediction result is obtained, predicting to obtain a relation type prediction result by adopting a relation type prediction module based on the sentence vector and the relation prediction result. Specifically, the relationship type prediction result may also be a score matrix, which includes 2 × N probabilities between any two words in the training sentence, where the N probabilities respectively represent the possibility that there are N types of dependencies between the two words, where N is the total number of types of dependencies, and N is the type index. The relation type prediction module performs prediction based on the relation prediction result on the basis of the sentence vector, that is, the relation prediction result affects the relation type prediction result, for example, if the probability of the dependency relationship of a to B in the relation prediction result is small, then correspondingly, N probabilities of a to B in the relation type prediction result are small, which means that the dependency relationship of a to B does not belong to any relation type, that is, the dependency relationship of a to B does not exist.

It should be noted that, in the optimization training process, the relationship prediction result and the relationship type prediction result are in a probability form, and after the training is finished, when the dependency syntax analysis is performed by using the dependency syntax model, a direct certainty result can be further obtained according to the relationship prediction result and the relationship type prediction result in the probability form, that is, which type of dependency relationship exists between which words is directly given.

Further, in an embodiment, the step S2021 includes:

step a, inputting the sentence vectors into the first multilayer perceptron and the second multilayer perceptron respectively to obtain a first sentence characteristic vector and a second sentence characteristic vector correspondingly, wherein the first sentence characteristic vector comprises characteristic vectors of all words as dependency relationship heads, and the second sentence characteristic vector comprises characteristic vectors of all words as dependency relationship tails;

in the present embodiment, as shown in fig. 4, the relationship prediction module includes a first multilayer perceptron (MLP1), a second multilayer perceptron (MLP2), and a dual affine transformation network. MLP1 is used to extract each word in the sentence vector as the feature vector of the dependency head, and MLP2 is used to extract each word in the sentence vector as the feature vector of the dependency tail.

Then, after the sentence vectors corresponding to the training sentences are obtained, the sentence vectors can be MLP1 and MLP2, respectively. MLP1 performs feature extraction on the sentence vector and outputs to obtain a first sentence feature vector, wherein the first sentence feature vector comprises a feature vector of each word as a dependency head. MLP2 performs feature extraction on the sentence vector and outputs a second sentence feature vector, which includes the feature vector of each word as the tail of the dependency relationship.

B, inputting the first sentence characteristic vector and the second sentence characteristic vector into the double affine transformation network for transformation processing to obtain a dependency relationship score matrix of each word;

and inputting the first sentence characteristic vector and the second sentence characteristic vector into a bijective transformation network for transformation processing to obtain a dependency relationship score matrix of each word. The transformation process may refer to a process of an existing double-shot transformation network, and is not described in detail herein. The dependency Score matrix is a matrix of M × M, and then the element Score (i < -j) in the dependency matrix represents the probability that the dependency exists between the ith word as the head of the dependency and the jth word as the tail of the dependency, i.e., the probability that the jth word is attached to the ith word. Wherein M represents the number of words in the training sentence, i is more than 0 and less than or equal to M, and j is more than 0 and less than or equal to M.

And c, predicting to obtain a relation prediction result of the training sentence according to the dependency relation score matrix.

After the dependency relationship score matrix is obtained, the relationship prediction result of the training sentence can be obtained according to the pre-stored relationship score matrix. Specifically, in the optimization training stage, the dependency relationship score matrix can be directly used as a relationship prediction result; the dependency relationship score matrix may also be converted into a deterministic result, and the deterministic result is used as a relationship prediction result, that is, the relationship prediction result directly indicates which words have dependency relationships therebetween; in particular, a maximum spanning tree algorithm may be employed to generate deterministic relationship predictions based on the dependency score matrix.

Further, the step c includes:

step c1, determining alternative dependency relationship combinations of each word in the training sentence, wherein each alternative dependency relationship combination meets the condition of a preset dependency relationship tree;

in one embodiment, alternative dependency combinations between words in the training sentence that meet the conditions of the preset dependency tree may be determined. Wherein the preset dependency tree condition is set in advance as needed. For example, the conditions may include: each word can only be attached to one word; the dependency between words cannot form a closed loop, etc. Finding alternative dependency combinations that meet these conditions, e.g., with three words i, j, and k, one alternative dependency combination is: i is attached to j, j is attached to k.

Step c2, calculating the score of each alternative dependency relationship combination according to the dependency relationship score matrix;

and respectively calculating the scores of the alternative dependency relationship combinations according to the dependency relationship score matrix. Specifically, the candidate dependency combination includes a plurality of dependencies, a score (i.e., a probability) corresponding to each dependency is searched for in the dependency score matrix, and the scores are added or multiplied to obtain the score of the candidate dependency combination. For example, there are alternative dependency combinations: i is attached to j, j is attached to k, then Score (j < -i) and Score (k < -j) are searched from the dependency relationship Score matrix, and the two scores are added or multiplied to obtain the Score of the alternative dependency relationship combination.

And c3, selecting the alternative dependency relationship combination with the highest score as the relationship prediction result of the training sentence.

And selecting the candidate dependency relationship combination with the highest score from the candidate dependency relationship combinations as a relationship prediction result of the training sentence. For example, alternative dependencies combine two groups (more than two groups in the actual scenario): i is attached to j, j is attached to k; k is attached to j, j is attached to i. And calculating to obtain the highest score of the first group, and taking the first group as the relation prediction result of the training sentence.

Further, in one embodiment, the relationship type prediction module may also include two MLPs and one Biaffine, optimized separately from the model parameters of the MLPs and the Biaffine in the relationship prediction module. The sentence vectors are processed through MLP and Biaffine in the relation type prediction module, and the relation prediction result can be set as any hidden layer which is input into the relation type prediction module as a feature so as to guide the prediction of the relation type. The output of the relation type prediction module obtains a relation type Score matrix which is also M, the element Score (i < -j) is a vector of N elements, and the element N in the vector represents the probability that the jth word is attached to the ith word and the dependency type is N type.

In addition, an embodiment of the present invention further provides a dependency syntax model optimization apparatus, where a dependency syntax model to be optimized includes a pre-training model at a bottom layer and a dependency relationship prediction network at an upper layer, the pre-training model is obtained by training using a domain-independent text training set, and referring to fig. 5, the dependency syntax model optimization apparatus includes:

the vector extraction module 10 is configured to perform bottom-layer vector extraction processing on training sentences in a text training set in a target field by using the pre-training model to obtain word vectors corresponding to words in the training sentences;

and the prediction module 20 is configured to perform upper layer prediction processing on the word vector by using the dependency relationship prediction network, and optimize a processing result to optimize the dependency syntax model.

Further, the vector extraction module 10 includes:

the processing unit is used for performing bottom vector extraction processing on training sentences in a text training set in a target field by adopting the pre-training model to obtain word vectors corresponding to all words in the training sentences;

and the calculation unit is used for carrying out weighted summation on word vectors of all words forming the words in the training sentence to obtain the word vectors corresponding to the words, wherein the model parameters in the dependency syntax model comprise weights corresponding to all words in the training sentence.

Further, the prediction module 20 includes:

the splicing unit is used for splicing the word vector of each word with a preset part-of-speech vector to obtain a sentence vector comprising the splicing result of each word;

the prediction unit is used for performing upper-layer prediction processing on the sentence vector by adopting the dependency relationship prediction network to obtain a dependency relationship prediction result corresponding to the training sentence;

and the optimization unit is used for calculating errors according to the dependency relationship prediction result and the dependency relationship real label corresponding to the training statement, and updating model parameters in the dependency syntax model according to the errors so as to optimize the dependency syntax model.

the prediction unit includes:

the first prediction subunit is configured to input the sentence vector into the relationship prediction module for prediction, so as to obtain a relationship prediction result of the training sentence, where the relationship prediction result represents whether a dependency relationship exists between words in the training sentence;

and the second prediction subunit is configured to predict, by using the relationship type prediction module, a relationship type prediction result based on the sentence vector and the relationship prediction result, and use the relationship prediction result and the relationship type prediction result as a dependency relationship prediction result, where the relationship type prediction result represents a type of a dependency relationship in the training sentence.

the first prediction subunit is further to: inputting the sentence vectors into the first multilayer perceptron and the second multilayer perceptron respectively to obtain a first sentence characteristic vector and a second sentence characteristic vector correspondingly, wherein the first sentence characteristic vector comprises characteristic vectors with all words as dependency relationship heads, and the second sentence characteristic vector comprises characteristic vectors with all words as dependency relationship tails;

Further, the first prediction subunit is further configured to:

Further, the vector extraction module 10 includes:

the preprocessing unit is used for preprocessing the training sentences in the text training set in the target field to obtain preprocessed training sentences;

and the extraction unit is used for performing bottom-layer vector extraction processing on the preprocessed training sentences by adopting the pre-training model to obtain word vectors corresponding to all words in the training sentences.

Further, the preprocessing unit includes:

the segmentation subunit is used for performing character-level segmentation on the training sentences in the text training set in the target field to obtain each character in the training sentences;

the conversion subunit is used for converting each word in the training sentence into a corresponding preset word code and then converting the length of the training sentence into a preset length;

and the adding subunit is used for adding a preset sentence beginning label and a preset sentence end label to the converted training sentence to obtain the preprocessed training sentence.

The specific implementation of the dependency syntax model optimization device of the present invention has basically the same extension as the embodiments of the dependency syntax model optimization method, and is not described herein again.

Furthermore, an embodiment of the present invention also provides a computer-readable storage medium, on which a dependency syntax model optimization program is stored, which, when executed by a processor, implements the steps of the dependency syntax model optimization method as described below.

The embodiments of the dependency syntax model optimization apparatus and the computer-readable storage medium of the present invention can refer to the embodiments of the dependency syntax model optimization method of the present invention, and are not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A dependency syntax model optimization method is characterized in that a dependency syntax model to be optimized comprises a pre-training model at a bottom layer and a dependency relationship prediction network at an upper layer, the pre-training model is obtained by training through a field-independent text training set, and the dependency syntax model optimization method comprises the following steps:

2. The dependency syntax model optimization method according to claim 1, wherein the step of performing bottom-level vector extraction processing on the training sentences in the text training set of the target domain by using the pre-training model to obtain word vectors corresponding to the words in the training sentences comprises:

3. The dependency syntax model optimization method according to claim 1, wherein the step of performing upper-level prediction processing on the word vector using the dependency prediction network and optimizing the processing result to optimize the dependency syntax model comprises:

4. The dependency syntax model optimization method according to claim 3, wherein the dependency prediction network includes a relationship prediction module and a relationship type prediction module,

5. The dependency syntax model optimization method of claim 4, wherein the relational prediction module includes a first multi-layer perceptron, a second multi-layer perceptron, and a dual affine transformation network,

6. The method for dependency syntax model optimization according to claim 5, wherein the step of predicting the relationship prediction result for the training sentence based on the dependency score matrix comprises:

7. The dependency syntax model optimization method according to any one of claims 1 to 6, wherein the step of performing bottom-layer vector extraction processing on the training sentences in the text training set of the target domain by using the pre-training model to obtain word vectors corresponding to words in the training sentences comprises:

8. The dependency syntax model optimization method of claim 7, wherein the step of preprocessing the training sentences in the text training set of the target domain to obtain preprocessed training sentences comprises:

9. A dependency syntax model optimization device is characterized in that a dependency syntax model to be optimized comprises a pre-training model at a bottom layer and a dependency relationship prediction network at an upper layer, the pre-training model is obtained by training through a field-independent text training set, and the dependency syntax model optimization device comprises:

10. A dependency syntax model optimization device, characterized in that the dependency syntax model optimization device comprises: a memory, a processor, and a dependent syntax model optimizer stored on the memory and executable on the processor, the dependent syntax model optimizer, when executed by the processor, performing the steps of the dependent syntax model optimization method of any one of claims 1-8.

11. A computer-readable storage medium having stored thereon a dependency syntax model optimizer that, when executed by a processor, performs the steps of the dependency syntax model optimization method according to any one of claims 1-8.