CN107656921B

CN107656921B - Short text dependency analysis method based on deep learning

Info

Publication number: CN107656921B
Application number: CN201710934201.2A
Authority: CN
Inventors: 肖仰华; 谢晨昊; 梁家卿; 崔万云
Original assignee: Shanghai Shuyan Technology Development Co ltd
Current assignee: Shanghai Shuyan Technology Development Co ltd
Priority date: 2017-10-10
Filing date: 2017-10-10
Publication date: 2021-01-08
Anticipated expiration: 2037-10-10
Also published as: CN107656921A

Abstract

The invention discloses a short text dependency analysis method based on deep learning, which comprises the following steps: step 1) acquiring an HTML file where a user query statement is located from a search engine log as a training data set; step 2) generating a dependency analysis tree of the query statement according to the training data set; and 3) training a part of speech annotator and a syntactic analyzer based on the neural network model by using the dependency tree. The invention utilizes the dependency analyzer at the sentence level in use to automatically generate a massive short text dependency analysis data set, and carries out noise reduction and optimization on the generated data set by using a plurality of methods. Based on the data set, a dependence analysis model of the short text is trained, and experiments show that the labeling effect of the model on the short text is greatly improved compared with that of a sentence-level dependence analyzer.

Description

Short text dependency analysis method based on deep learning

Technical Field

The invention belongs to a short text dependency analysis method based on deep learning.

Background

Phrase structure and dependency structure are the two most widely studied types of grammar structures in current syntactic analysis. The ontario grammar was originally proposed by the french linguist l.tesniere in his work "foundation of structural syntax" (1959). The dependency grammar reveals the syntactic structure by analyzing the dependency relationship among the components in the language unit, and the core verb in the sentence is claimed to be the central component which governs other components, but is not governed by any other components, and all the governed components depend on a governing person with a certain dependency relationship.

For example, for the text "Its apple gathering stand is my favorite stand", the dependency analysis tree obtained after dependency analysis is shown in FIG. 2:

from the dependency analysis tree, the overall syntactic structure of the sentence, the modification relation between words can be clearly obtained, and the semantics of the sentence can be understood to a certain extent.

Dependency analysis of short text is important for understanding the grammatical composition, word part of speech and semantic meaning of short text. Consider the following search query and its corresponding syntactic structure, as in FIG. 3:

the result of the short text "cover iphone 6 plus" indicates the protective shell (cover) of the body of this phrase, and the user's desire is to find the protective shell of the iphone, rather than the iphone. Based on the knowledge, the search engine can reasonably display the related advertisements of the iphone protective shell. For "distance earth", the subject is distance (distance), indicating that the user's intention is to ask for the distance between the earth (earth) and the moon (moon). For the faucet adapter simple, the intent is to find the faucet adapter. In short, if the dependency relationship of the short text can be correctly identified, the relationship between the core main body and the modification in the short text can be extracted, so that the semantics of the short text can be better understood.

The main challenges of dependency analysis on short text are:

1. in short text, there is typically no complete grammar element to assist in the analysis. In fact, short texts are often highly ambiguous. For example, the short text "kids tods" may represent "tods fordics" and "kids with tods", in which case the dependency edges of tods and kids are diametrically opposed, as in FIG. 4.

2. There is no linguistic rule for dependency analysis on short texts. In the manual annotation process of the dependency analysis, the annotation caused by the lack of the standard may be unclear. Moreover, the cost of manual labeling is huge, and a dependency analysis labeling set can be completed in years.

In dependency analysis, semantic information of a short text is mainly contained in a dependency analysis margin. That is, for any two words x, y ∈ q in the short text, it is determined whether there is a dependency between x and y, and if so, which dependency.

To make this determination, the semantics of short text that can be utilized are mainly classified into two main categories: context-free information and context-related information.

● context free information: with context-free information, we model P (e | x, y) directly, where e represents the dependency edge for x, y (x → y or x ← y). This modeling approach is context-free, since we do not consider the relative positional relationship of x and y in the input.

One way to obtain P (e | x, y) is through a labeled corpus such as Google's syntax gram dataset. For two words x and y, we estimate P (e | x, y) by counting the number of times x modifies y and the number of times y modifies x in the corpus.

● context related information: there are two main disadvantages to using only context-free information: 1) it is risky to consider directly the relationship between two words without considering context. 2) Context-free information often fails to characterize the type of dependency that two words directly depend on, and thus fails to fully represent the entire input semantics.

To take context information into account, i.e. to estimate P (e | x, y, q) for any two words x, y, our goal is to translate into constructing a dependency parser (dependency parser) designed for short text. To construct such a dependency analyzer, a massive training data set is required. We devised a method of automatically generating this data set to avoid the cost of manual labeling. The whole method is based on the following assumptions: the intention of the short text q coincides with the intention of a click sentence of this short text. Let us remember that sentence s is a click sentence of short text q if and only if: 1) the sentence s is clicked a high number of times by the user in the search result of the short text q. 2) Each word in the short text q appears in the sentence s. For example, assuming that the sentence s is a click sentence of "… my favorite Thai food in Houston …", the whole intentions of the two are similar, and at the same time, the dependency relationship between word pairs in the short text is similar to the direct relationship between corresponding word pairs in the sentence. However, considering that a certain word pair in the short text may not be directly connected in the sentence, a method is still needed to map the dependency relationship in the sentence onto the short text reasonably.

In recent years, deep learning has proven to be highly applicable to Natural Language Processing (NLP) problems. As early as the 21 st century, a neural network-based language model was proposed, opening the way to deep learning to apply to natural language processing tasks. Next, studies have shown that deep learning based on convolutional neural network (convolutional neural network) is excellent in many natural language processing tasks such as part-of-speech tagging, chunking, and named entity recognition. Still later, with the popularization of a recurrent neural network (recurrent neural network), deep learning has better performance in NLP problems and has wider application in more fields such as machine translation (machine translation).

Disclosure of Invention

The invention aims to solve the technical problem of providing a short text dependency analysis method based on deep learning, which is used for solving the problems in the prior art.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a short text dependency analysis method based on deep learning comprises the following steps:

step 1) acquiring an HTML file where a user query statement is located from a search engine log as a training data set;

step 2) generating a dependency analysis tree of the query statement according to the training data set;

and 3) training a part of speech annotator and a syntactic analyzer based on the neural network model by using the dependency tree.

Preferably, the step 1) specifically includes:

for each query q in the search log and the URL list with a high user click rate under the search result, acquiring an HTML document corresponding to the query q;

the sentence s containing each word in the query is extracted, so that a plurality of triads can be obtained: (q, s, count), wherein count represents the number of times the word occurs in the sentence;

the resulting triplet set is used as the training data set for generating the dependency analysis tree.

Preferably, a short text may have a plurality of corresponding sentences clicked by the user, wherein further, generating a dependency analysis tree in the sentence s for the short text q specifically includes:

let T_sAll subtrees of the dependency tree representing s;

finding the minimum subtree T ∈ T_sSatisfying that each word x belongs to q and only one matching x' belongs to t;

for any two words x and y in q, a dependency tree t for q is generated from t in the following manner_q,s：

If there is an edge x '→ y' in t, then at t_q,sCreating a same edge x → y;

if there is a path from x 'to y' in t, then at t_q,sAn x → y edge is created and temporarily marked as dep.

After the dependency tree is generated for each sentence, a unique dependency tree needs to be selected for the short text. I define a scoring function f to evaluate the dependency tree t generated from the corresponding sentence s of q_qThe mass of (A):

wherein (x → y) represents an edge on the tree, count (x → y) is the number of times this edge appears on the whole data set, dist (x, y) is the distance between the words x and y on the dependency analysis tree of the original sentence, and α is a parameter for adjusting the importance of the two scoring methods;

the label is finally refined.

Preferably, the type of partial dependency edge is set to placeholder "dep", which we must infer as a true label, otherwise inconsistencies may result in the training dataset;

to solve this problem, we use a way of majority voting (majpriority vote);

the method comprises the following steps: for arbitrary

Statistics of

The number of occurrences for each particular label in the training dataset. If the frequency of a particular tag is greater than a threshold, such as 10 times more occurrences than other tags, we change the placeholder dep to that tag.

Preferably, the step 3) of training the part of speech annotator and the syntactic analyzer based on the neural network model specifically includes:

establishing a fixed window by taking each word in the sentence as a center, and extracting characteristics including the word, capital and lowercase, prefix and suffix;

for word features, a pre-trained word2vec embedding method is used; for case and prefix and suffix, randomly initializing embedding;

next, the sentence is parsed using the ArcStandard based dependency analysis system, using the following characteristics:

in the table, s_i(i ═ 1,2, …) denotes the ith element at the top of the stack, b_i(i ═ 1,2, …) denotes the ith element of the buffer, lc_k(s_i) And rc_k(s_i) Denotes s_iThe left-end kth child node and the right-end kth child node. w represents the word itself, t represents a part-of-speech tag, and l represents a dependency label.

The invention utilizes the dependency analyzer at the sentence level in use to automatically generate a massive short text dependency analysis data set, and carries out noise reduction and optimization on the generated data set by using a plurality of methods. Based on the data set, a dependence analysis model of the short text is trained, and experiments show that the annotation effect of the model on the short text is greatly improved compared with that of a sentence-level dependence analyzer.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings

Drawings

The present invention will be described in detail below with reference to the accompanying drawings so that the above advantages of the present invention will be more apparent. Wherein the content of the first and second substances,

FIG. 1 is a schematic diagram of the overall structure of a dependency analyzer of the deep learning-based short text dependency analysis method of the present invention;

FIG. 2 is a diagram of sentence analysis related to the background art of the present invention;

FIG. 3 is a diagram of sentence analysis related to the background art of the present invention;

FIG. 4 is a diagram of sentence analysis related to the background art of the present invention;

FIG. 5 is a schematic diagram of sentence analysis involved in an embodiment of the present invention;

FIG. 6 is a schematic diagram of sentence analysis involved in an embodiment of the present invention;

FIG. 7 is a schematic diagram of sentence analysis involved in an embodiment of the present invention;

FIG. 8 is a diagram of sentence analysis in accordance with an embodiment of the present invention;

FIG. 9 is a schematic diagram of sentence analysis involved in an embodiment of the present invention;

FIG. 10 is a schematic diagram of sentence analysis involved in an embodiment of the present invention;

FIG. 11 is a schematic diagram of sentence analysis involved in an embodiment of the present invention;

FIG. 12 is a diagram of sentence analysis involved in an embodiment of the present invention;

FIG. 13 is a diagram of sentence analysis in accordance with an embodiment of the present invention;

FIG. 14 is a schematic diagram of sentence analysis involved in an embodiment of the present invention;

FIG. 15 is a schematic diagram of sentence analysis involved in an embodiment of the present invention;

FIG. 16 is a diagram of sentence analysis involved in an embodiment of the present invention;

FIG. 17 is a diagram of sentence analysis involved in an embodiment of the present invention;

FIG. 18 is a diagram of sentence analysis according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described in detail with reference to the accompanying drawings and examples, so that how to apply the technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented. It should be noted that, as long as no conflict is formed, the embodiments and the features of the embodiments of the present invention may be combined with each other, and the technical solutions formed are within the scope of the present invention.

Additionally, the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions, and while a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order different than here.

Specifically, the invention constructs an end-to-end system, automatically generates massive short text dependency analysis data sets from a search engine log by using an existing sentence-level dependency analyzer, and performs noise reduction and optimization on the generated data sets by using various methods. Based on the data set, a dependence analysis model of the short text is trained, and experiments show that the labeling effect of the model on the short text is greatly improved compared with that of a sentence-level dependence analyzer.

Specifically, the method for analyzing the short text dependence based on deep learning comprises the following steps:

Preferably, the step 1) specifically includes:

let T_sAll subtrees of the dependency tree representing s;

If there is an edge x '→ y' in t, then at t_q,sCreating a same edge x → y;

the label is finally refined.

to solve this problem, we use a way of majority voting (majpriority vote);

the method comprises the following steps: for arbitrary

Statistics of

in the table, s_i(i ═ 1,2, …) denotes the ith element at the top of the stack, b_i(i ═ 1,2, …) denotes the ith element of the buffer, lc_k(s_i) And rc_k(s_i) Denotes s_iThe left-end kth child node and the right-end kth child node. w represents the word itself, t represents the part of speechNote that l represents a dependency label.

In one embodiment:

5.1. data source

The data source is from a search log of a search engine. For each query q in the search log and the higher HTML list clicked by the user under the search result, for each URL, acquiring the corresponding HTML document, and taking out the sentence containing each word in the query as a clicked sentence of the query. This results in a triplet:

(q, s, count). Then, we analyze the dependency analyzer on the s-use sentence to obtain its dependency analysis tree, which we consider to be basically correct.

5.2. Inferring dependency parse trees

A short text q may have a plurality of corresponding sentences clicked by the user. This step is to map one of the sentences s to its dependency parse tree onto the short text q.

We use the following heuristic to map the dependencies on the sentence s onto the short text q.

1. Let T_sAll subtrees of the dependency tree representing s.

2. Finding the minimum subtree T ∈ T_sSatisfying that each word x e q has and only one match x' et

3. Inheriting q's dependency tree t from t in the following manner_q,s: for two words x and y in q:

a. if there is an edge x '→ y' in t, then at t_q,sCreating a common x → y edge.

b. If it is notThere is a path from x 'to y' in t (a path contains a series of co-directional edges), then at t_q,sAn x → y edge is created and its edge type is temporarily marked as dep and updated to a more specific dependency in a subsequent optimization step.

The following classifies the dependency relationships common in short texts and the mapping method from sentences.

Direct connection: in this case, we directly copy the edges and their types in the sentence. Consider a sentence corresponding to the short text "party supports" as shown in FIG. 5:

in this situation, the two groups of words (party, supports) and (apply) are directly connected. Therefore, the dependency relationship of the short text can be obtained by directly inheriting the relationship in the sentence, as shown in fig. 6:

connecting by functional words (functional words): in short text queries, it is common to omit prepositions. For example, a sentence corresponding to the short text "moonlanding", as shown in fig. 7:

we can map the following dependency tree, as in FIG. 8:

for the sentence corresponding to the short text "side effects b 12", as shown in fig. 9:

the following dependency tree can be obtained, as in FIG. 10:

in both cases, a temporary "dep" type dependency edge will appear, which we will process in the following step.

Connecting through a modifier word: many search queries are composed of noun phrases, and their corresponding sentences may have many modifiers omitted. Depending on the way noun phrases are partitioned (nounphrase breaking), noun phrases may be directly connected or indirectly connected.

For "offset word" and its corresponding sentence, omitting the modifier "drilling" does not pose any problem: "offset" and "work" are still directly connected, so that the dependency relationship can be directly inherited, as shown in FIG. 11.

But this is not the case for the short text "loud price" and its corresponding sentence, as in fig. 12.

In this case, considering a path, raw ← oil ← price, one can inherit to get an edge, as in fig. 13. Connected by a header word: in some cases, the head term of a noun phrase may be omitted. Consider "countrysingers" and its corresponding sentences, as in FIG. 14:

obviously their semantics are consistent, but the first name "music" is omitted in the short text. There is still a path from "singers" to "counter" in the sentence, and the dependency tree can be obtained in turn, as shown in fig. 15.

Linking by verbs (verbs): a common example is the omission of tie verbs. Consider the sentence corresponding to the example "plants poisonous to targets", as in FIG. 16: in this case, omitting "are" does not affect the direct concatenation of the words in the short text. But consider the short text "paiinbetweenbranches" and its corresponding sentence, as in fig. 17:

in this case, the dependency relationship can be inherited, as in FIG. 18:

5.3. merging dependency parse trees

In the last step, we have obtained a set of dependency parse trees derived from a plurality of corresponding sentence mappings for a short text q. These dependency trees may not be consistent. The main reasons are: 1. the dependency parser for sentences is not perfect. 2. The short text itself may have ambiguity. Part of the short text may not have semantically consistent sentences. The main purpose of this step is to merge these multiple dependency trees, which may not be identical, to obtain the dependency tree unique to this short text.

To select a unique dependency tree for a short text q, we define a scoring function f to evaluate the dependency tree t generated from the corresponding sentence s of q_qThe mass of (A):

where (x → y) represents an edge on the tree, count (x → y) is the number of occurrences of the edge on the entire data set, dist (x, y) is the distance between the words x and y on the dependency analysis tree of the original sentence, and α is a parameter for adjusting the degree of importance of the two scoring methods.

The first item in the scoring function characterizes the compactness of the short text dependency analysis tree, and the dependency relationship tree with good compactness can more simply characterize the semantics of the short text. For example, for the short text "deep learning", there are two corresponding sentences:

in the first sentence, the connection between "deep" and "learning" is very loose, resulting in a large semantic deviation of its semantics from short text. In the second sentence, the two words are directly connected, and the whole sentence has good semantic similarity with the short text.

The second term of the scoring function characterizes the global consistency of the short text dependency analysis tree. For a pair of words x, y), if the number of occurrences of the edge x → y is much higher than the number of occurrences of the edge y → x over the entire data set, the latter is likely to be erroneous. One particular case that needs to be considered in this process is the order of words, and if the order of appearance of two words in a short text is different, their corresponding grammatical relations may not be consistent. For example, "child of" and "ofchild" are both composed of two words, "child" and "of", but the correct dependencies are not the same.

5.4. Result optimization

In the previous step, the type of partial dependency edge is set to the placeholder "dep". Before training the dependency analyzer using the resulting data set, we must infer "dep" as a true label, otherwise there will be both specific and unspecific labels in the training data set, resulting in inconsistencies. For example, for the short text "include price", the edge type of include derived from the sentence containing "include price" is dep, and the edge type of include derived from the sentence containing "include price" will be amod.

To infer "dep," we first use the majority voting method. Firstly, the method

On our training data set, the above process can resolve approximately 90% of the dependencies. In case of an unresolvable situation, it can be deleted directly since such an edge already does not provide dependency information. But considering that in these short texts, other word pairs may also contain meaningful information, we take a bootstrap (bootstrap) approach to deal with: deleting the short text dependency relationship data containing the uncertain edge types, and training a short text analyzer; predicting the data of about 10 percent, and if the direction of the prediction result is consistent with that of the data, backfilling the specific type output by the analyzer to the dep edge into a dependency analysis tree; and finally, adding the backfilled dependency relationship tree into the training set, and retraining the dependency analyzer again to obtain a final model.

5.5. Short text dependency analysis model

Short text dependency analysis uses a neural network based dependency analyzer structure similar to that used in (Danqi 2014). The main features used therein are as follows:

It should be noted that for simplicity of description, the above method embodiments are described as a series of acts or combination of acts, but those skilled in the art should understand that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that the acts and modules involved are not necessarily required for this application.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects.

Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A short text dependency analysis method based on deep learning is characterized by comprising the following steps:

step 3) training a part of speech annotator and a syntactic analyzer based on the neural network model by using a dependency tree;

the step 1) specifically comprises the following steps:

for each query q in the search log and a URL list with a high user click rate under the search result, acquiring an HTML document corresponding to the query q;

the sentence s containing each word in the query is extracted, so that several triples are obtained: (q, s, count), wherein count represents the number of times the word occurs in the sentence;

the obtained ternary group set is used as a training data set for generating a dependency analysis tree;

the method includes that a short text has a plurality of corresponding sentences clicked by users, wherein a dependency analysis tree is generated in a sentence s for the short text q, and the method specifically comprises the following steps:

let T_sAll subtrees of the dependency tree representing s;

for any two words x and y in q, a dependency tree t for q is generated from t in the following manner_q，s：

If there is an edge x '→ y' in t, then at t_q，sCreating a same edge x → y;

if there is a path from x 'to y' in t, then at t_q，sAn x → y edge is created, and temporarily marked as dep,

after generating the dependency tree for each sentence, a unique dependency tree needs to be selected for the short text, and a scoring function f is defined to evaluate the dependency tree t generated from the corresponding sentence s of q_qThe mass of (A):

wherein (x → y) represents an edge on the tree, count (x → y) is the number of times the edge appears on the whole data set, dist (x, y) is the distance between the words x and y on the dependency analysis tree of the original sentence, and α is a parameter for adjusting the importance of the two scoring methods;

and finally refining the label.

2. The deep learning-based short text dependency analysis method according to claim 1, wherein the type of the partial dependency relationship edge is set as a placeholder "dep", and the "dep" is inferred to be a real label, otherwise an inconsistency phenomenon is caused in the training data set;

a mode of using majority vote (majpriority vote) correspondingly;

the method comprises the following steps: for arbitrary

Statistics of

The number of occurrences for each particular label in the training dataset; if the frequency of a particular tag is greater than the threshold, the placeholder dep is changed to that tag when it occurs more than 10 times more frequently than the other tags.

3. The deep learning-based short text dependency analysis method according to claim 1, wherein the step 3) of training the part of speech annotator and the syntactic analyzer based on the neural network model specifically comprises:

next, the sentence is parsed using the ArcStandard based dependency analysis system using the following characteristics as shown in the table:

in the table, s_i(i 1, 2..) denotes the ith element at the top of the stack, b_i(i 1, 2..) denotes the ith element of the buffer, lc_k(s_i) And rc_k(s_i) Denotes s_iK-th child node at the left end and k-th child node at the right end of (a), w represents the word itself, t represents the word itselfPart of speech is labeled, and l represents a dependency label.