CN110046338B

CN110046338B - Context selection method and device, electronic equipment and storage medium

Info

Publication number: CN110046338B
Application number: CN201810035965.2A
Authority: CN
Inventors: 刘乐茂; 史树明
Original assignee: Shenzhen Tencent Computer Systems Co Ltd
Current assignee: Shenzhen Tencent Computer Systems Co Ltd
Priority date: 2018-01-15
Filing date: 2018-01-15
Publication date: 2022-11-11
Anticipated expiration: 2038-01-15
Also published as: CN110046338A

Abstract

The embodiment of the invention provides a context selection method, a context selection device, electronic equipment and a storage medium, wherein the method comprises the following steps: obtaining a source end vector representation sequence corresponding to a source sentence; according to a target element to be predicted at the current moment, assuming a target source word of the target element aligned in a source sentence; according to the target source words, separating phrase structures and half-phrase structures corresponding to the current moment from the source sentences, wherein the phrase structures are at least deterministic; and determining the context corresponding to the current moment at least according to the target source word, the phrase structure, the half-phrase structure and the source end vector representation sequence. The embodiment of the invention can improve the comprehensiveness of the captured context, improve the accuracy of context selection and provide possibility for improving the accuracy of results such as syntactic analysis and the like.

Description

Context selection method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a context selection method and device, electronic equipment and a storage medium.

Background

The context selection is a stage of the processes of syntactic analysis, machine translation and the like, and is mainly used for selecting a context from a vector representation of a source end when a decoder predicts a target element each time so as to realize the prediction of the target element.

Taking a syntactic analysis model of an encoder and a decoder as an example, when syntactic analysis is performed, after a source sentence (a natural language sentence to be subjected to syntactic analysis can be called a source sentence) is input into the syntactic analysis model, the encoder can generate a source-end vector representation sequence corresponding to the source sentence (the source-end vector representation sequence contains vector representation of each source word in the source sentence), when the decoder predicts one element each time (the element is a component of the syntactic analysis result, and a sequence formed by each element can form the syntactic analysis result), an attention layer in the syntactic analysis model can select a context from the vector representation of the source end to assist the prediction of the element, so that the syntactic analysis result is generated after the prediction of each element is completed.

The selection of the context is mainly achieved by an attention layer, and the attention layer at present mainly relies on a probability-based attention mechanism, which achieves the selection of the context by generating a discrete probability distribution to represent the alignment probability of the current predicted target element with the source word in the source sentence. However, the inventor of the present invention finds that the attention mechanism based on probability cannot capture the context comprehensively, for example, some heuristic contexts in the context analysis scenario cannot be captured, which results in a decrease in the accuracy of the context selection result and affects the accuracy of the result of the syntactic analysis.

Disclosure of Invention

In view of this, embodiments of the present invention provide a context selection method, an apparatus, an electronic device, and a storage medium, so as to improve the accuracy of context selection.

In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:

a context selection method, comprising:

obtaining a source end vector representation sequence corresponding to a source sentence;

according to a target element to be predicted at the current moment, assuming a target source word of the target element aligned in a source sentence;

according to the target source words, separating phrase structures and half phrase structures corresponding to the current moment from the source sentences; wherein the phrase structure is at least deterministic;

and determining the context corresponding to the current moment at least according to the target source word, the phrase structure, the half-phrase structure and the source end vector representation sequence.

An embodiment of the present invention further provides a context selecting apparatus, including:

the source end vector sequence acquisition module is used for acquiring a source end vector representation sequence corresponding to a source sentence;

the target source word determining module is used for assuming a target source word of which the target element is aligned in the source sentence according to a target element to be predicted at the current moment;

the separation module is used for separating a phrase structure and a half phrase structure corresponding to the current moment from the source sentence according to the target source words; wherein the phrase structure is at least deterministic;

and the context output module is used for determining the context corresponding to the current moment at least according to the target source word, the phrase structure, the half-phrase structure and the source end vector representation sequence.

An embodiment of the present invention further provides an electronic device, including: at least one memory and at least one processor; the memory stores a program, and the processor calls the program to realize the steps of the context selection method.

An embodiment of the present invention further provides a storage medium, where the storage medium stores a program suitable for being executed by a processor, so as to implement the steps of the context selection method described above.

Based on the above technical solution, the context selection method provided in the embodiment of the present invention includes: obtaining a source end vector representation sequence corresponding to a source sentence; according to a target element to be predicted at the current moment, assuming a target source word of the target element aligned in a source sentence; according to the target source words, separating phrase structures and half-phrase structures corresponding to the current moment from the source sentences, wherein the phrase structures are at least deterministic; and determining the context corresponding to the current moment at least according to the target source word, the phrase structure, the half-phrase structure and the source end vector representation sequence. After determining the corresponding phrase structure and the corresponding half-phrase structure at the current moment, the embodiment of the invention has the advantages that the corresponding phrase structure at the current moment is knowable and deterministic, and the initial word of the corresponding half-phrase structure at the current moment is knowable; therefore, the context selected at the current moment is determined according to the target source word, the deterministic corresponding phrase structure at the current moment, the deterministic starting word in the half-phrase structure and the source end vector representation sequence, the determinacy of the context selected at the current moment can be improved, the comprehensiveness of the captured context is improved, and the possibility is provided for improving the accuracy of the syntactic analysis result.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is an exemplary diagram of a prior art context selection based on a probabilistic attention mechanism;

FIG. 2 is a flow chart of a context selection method provided by an embodiment of the present invention;

FIG. 3 is another flow chart of a context selection method according to an embodiment of the present invention;

FIG. 4 is an exemplary diagram of selecting a context based on a deterministic attention mechanism in accordance with an embodiment of the present invention;

FIG. 5 is a diagram illustrating an exemplary structure of a parsing model according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating another example of a syntactic analysis model according to an embodiment of the present invention;

FIG. 7 is a flow chart of a parsing method provided by an embodiment of the invention;

FIG. 8 is an exemplary diagram of a sequence of syntax trees;

FIG. 9 is a flowchart of a method for training a syntactic analysis model according to an embodiment of the present invention;

FIG. 10 is a diagram illustrating an example of a scenario for syntactic analysis provided in an embodiment of the present invention;

FIG. 11 is a block diagram of a context selecting apparatus according to an embodiment of the present invention;

FIG. 12 is a block diagram of another structure of a context selecting apparatus according to an embodiment of the present invention;

FIG. 13 is a block diagram of another embodiment of a context selecting apparatus;

fig. 14 is a block diagram of a hardware configuration of the electronic apparatus.

Detailed Description

To facilitate understanding of the problems in the prior art, taking fig. 1 as an example, a source sentence is "John has adog.", and after an encoder generates a corresponding source end vector representation sequence for the source sentence, the selection of a context by an attention layer based on a probabilistic attention mechanism in the prior art can be shown as a dotted line in fig. 1, where the dotted line represents a discrete probability distribution; the discrete probability distribution has values corresponding to the number of source words in the source sentence (e.g., the discrete probability distribution in fig. 1 has 5 values), each timeEach value corresponds to a source word in the source sentence; wherein one probability value represents the target element to be predicted currently by the decoder (as shown in fig. 1, y is currently performed) ₅ Prediction of (2), the alignment probability of the source word in the source sentence corresponding to the probability value;

the inventor of the present invention finds that in the scenario of syntactic analysis, etc., there are some heuristic contexts, such as y in fig. 1 ₃ Should align to John, to assist y ₃ Selection of a context at the time of prediction; however, the attention layer based on the probabilistic attention mechanism often lacks capture of such heuristic context, and due to the lack of these heuristic context with very information amount, the precision of the selected context is low, which affects the precision of the syntactic analysis result; this is also a problem with selecting contexts based on probabilistic attention mechanisms, which are common in the prior art.

Based on this, the embodiment of the present invention considers that when the decoder predicts to generate each element, the attention layer based on the deterministic attention mechanism is used for selecting the context, so as to promote the comprehensiveness of the captured context and the precision of the context selection.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

Fig. 2 is a flowchart of a context selection method according to an embodiment of the present invention, where the method is applicable to an electronic device, and the electronic device may be implemented by using a server (for example, implementing a syntactic analysis process on a server side), or by using a terminal (for example, implementing a syntactic analysis process on a terminal side); as an example, the context selection method shown in fig. 2 may be implemented by an attention layer in a syntax analysis model, which may be disposed on a server side to implement a syntax analysis process by the server, or may be disposed on a terminal side to implement a syntax analysis process by the terminal;

referring to fig. 2, a context selection method provided by an embodiment of the present invention may include:

and S100, according to the target element to be predicted at the current moment, assuming that the target element is aligned to a target source word in the source sentence.

Optionally, the source sentence is input into the encoder, and the encoder performs vector generation on each source word in the source sentence one by one to obtain a source end vector representation sequence corresponding to the source sentence; thus, after the attention layer obtains the corresponding source-end vector representation sequence of the source sentence, the attention layer based on the deterministic attention mechanism provided by the embodiment of the present invention can select the context at each time by the method shown in fig. 2, and a decoder at one time can generally predict one element.

Optionally, at the current time, the element to be predicted at the current time may be referred to as a target element; taking a syntactic analysis scenario as an example, the element may be a composition of a syntactic analysis result, and it can be understood that a target element to be predicted at the current time is unknown and is an element in the syntactic analysis result to be predicted at the current time; if the syntax tree sequence represents the syntax analysis result, the target element to be predicted at the current moment can be regarded as an element in the syntax tree sequence to be predicted at the current moment, and the syntax tree sequence can be formed by elements predicted at all moments; a syntax tree can be considered as a tree-structured representation of the result of the syntax analysis.

When a target element at the current moment needs to be predicted, the embodiment of the invention can assume a target source word of the target element aligned in the source sentence; the target source word can be considered as a source word corresponding to the target element in the source sentence when the target element represents a certain source word in the source sentence;

optionally, because the predicted target element at the current time has multiple possible value types, the predicted target element at the current time does not necessarily represent a certain source word in the source sentence; as an example, the range of possible value types for an element may include: a terminator (generally denoted by "XX"), a left bracket (generally denoted by "(" or "), a right bracket (generally"); generally, when the value type of the target element is a terminal, the predicted target element can represent a certain source word in the source sentence, so that the predicted target element at the current time does not necessarily represent a certain source word in the source sentence;

based on this, the embodiment of the present invention may assume that the target element predicted at the current time is in a condition of representing a source word in a source sentence, and assume that the target element is an aligned source word in the source sentence; for example, assuming that the value type of the target element is a terminal, the corresponding target source word of the target element in the source sentence is determined.

Step S110, separating a phrase structure and a half phrase structure corresponding to the current moment from a source sentence according to the target source words; wherein the phrase structure is at least deterministic.

Optionally, after the target source word is determined, the phrase structure and the half-phrase structure corresponding to the current time may be separated from the source sentence according to the target source word in the embodiment of the present invention; in an embodiment of the present invention, the phrase structure is at least deterministic, while the half-phrase structure is at least known to the starting word;

optionally, the phrase structure and the half-phrase structure may both include a start word and an end word, the phrase structure may be formed by source words included in a source sentence by the start word and the end word of the phrase structure (the words in the source sentence may be referred to as source words), and the half-phrase structure may be formed by source words included in the source sentence by the start word and the end word of the half-phrase structure;

in the embodiment of the invention, the phrase structure is deterministic, namely the initial word and the final word of the phrase structure are known; as an alternative example, the last word of the phrase structure may be a word before the target source word in the source sentence, and the start word may be a word before the last word of the phrase structure, which may be determined by assuming that the value type of the target element is right brackets;

in the embodiment of the invention, the half-phrase structure at least knows the initial word; as an alternative example, the start word of the half-phrase structure may be the target source word, and the end word may be any word that is unknown after the target source word in the source sentence; of course, the embodiment of the invention can also support the situation that the last word of the half-phrase structure can be known.

And step S120, determining a context corresponding to the current moment at least according to the target source word, the phrase structure, the half-phrase structure and the source end vector representation sequence.

Optionally, when predicting an element at each time, the embodiment of the present invention may execute the method shown in fig. 2 to perform corresponding context selection; thus, at each moment, a corresponding selected context is obtained, the auxiliary decoder performs element prediction at each moment, and a syntax tree sequence is formed by the elements obtained by prediction at each moment to obtain a syntax analysis result.

In the embodiment of the invention, after determining the corresponding phrase structure and half-phrase structure at the current moment, the corresponding phrase structure at the current moment is knowable and deterministic, and the initial word of the corresponding half-phrase structure at the current moment is knowable; therefore, the context selected at the current moment is determined according to the target source word, the deterministic corresponding phrase structure at the current moment, the deterministic starting word in the half-phrase structure and the source end vector representation sequence, the determinacy of the context selected at the current moment can be improved, the comprehensiveness of the captured context is improved, and the possibility of improving the accuracy of the syntactic analysis result is provided.

As an example, one can set the source sentence x to be x ₁ To x _n The target source word aligned in the source sentence by the target element to be predicted at the current time is assumed as x _t The phrase structure corresponding to the current time is ρ (x) _b ，x _t-1 ) Wherein x is _b Is the starting word of the phrase structure, and the last word of the phrase structure is the target source word x _t The case of the previous word; the corresponding half phrase structure at the current moment is rho (x) _t Is it? ) Where? (question mark) can represent target source word x in source sentence _t Any word not known thereafter, x _b And x _t Belong to x ₁ To x _n The source word in (1);

the selected context c is selected after determining the corresponding phrase structure and half-phrase structure of the target source word at the current time _t Can be defined as in equation 1;

c _t ＝φ(ρ(x _b ，x _t-1 ),ρ(x _t ，？),x _t ,E ^x ) (formula 1)

Wherein E is ^x Representing a source-end vector representation sequence output by the encoder; phi denotes the dot product of the concatenation operation result of the vector representation of the start word of the phrase structure, the vector representation of the end word, and the vector representation of the start word of the half-phrase structure, and the attention level parameter.

The above calculation of equation 1 can be considered as a vector-dependent calculation for p (x) _t Is it? ) May be unknown, p (x) may be ignored in the calculation process _t Is there? ) The last word of (1).

Optionally, as an alternative implementation, the definition of Φ may be as shown in equation 2;

wherein, theta ^c Representing the parameters of the attention layer based on a deterministic attention mechanism provided by embodiments of the present invention,

starting word x representing the corresponding phrase structure at the current time _b Is used to represent the vector of (a),

a vector representation of the last word representing the corresponding phrase structure at the current time,

a vector representation representing the target source word; [; (ii) a]Representing a concatenation operation of vectors.

As can be seen from equation 2, φ is defined in three words x _b ,x _t-1 And x _t Whereas if the encoder uses RNN (recurrent neural network) to encode the source sentence, then x _t Expresses to some extent the word x adjacent to it _t-1 Thus, the present invention also implements the following simplified definition of phi, as shown in equation 3;

further, similarly, x _t Can also express x _b The definition of phi can be further simplified, as shown in formula 4;

accordingly, context c _t Can be expressed as equation 5:

correspondingly, in the syntactic analysis process, when predicting the elements at each moment, the context at each moment can be selected according to formula 5 after determining the corresponding target source word, phrase structure and half-phrase structure at each moment, so as to assist the decoder in predicting the elements at each moment and achieve the obtainment of the syntactic analysis result.

It should be noted that, the definitions of Φ shown in the above formula 2, formula 3, and formula 4 may be selected to be included in formula 1 to perform context prediction, and the embodiment of the present invention is not limited; of course, the method shown in equation 5 is simpler.

The above description is only an alternative way of "determining the context corresponding to the current time at least according to the target source word, the phrase structure, the half-phrase structure and the source end vector representation sequence" in step S120, and the embodiment of the present invention may also not be limited to the above formula way for selecting the context.

Optionally, the range of possible value types of the element includes: the final symbols, left bracket and right bracket are taken as examples; the embodiment of the invention can estimate the possible value types of the target elements to be predicted at the current moment; determining a target source word, a phrase structure corresponding to the current moment and a half-phrase structure according to the estimated condition of each value type;

optionally, fig. 3 shows another flowchart of a context selection method provided in an embodiment of the present invention, and referring to fig. 3, the method may include:

step S200, when the value type of the target element to be predicted at the current moment is assumed to be a terminal character, determining a target source word of the target element aligned in the source sentence.

Optionally, the possible value types of the target element to be predicted at the current time are divided into three types, namely a terminator, a left bracket and a right bracket; since the target element is not predicted yet and is unknown, embodiments of the present invention may assume the value type of the target element as a terminator and determine a corresponding target source word aligned in the source sentence.

As an example, when the value type of the target element is assumed to be a terminal, the embodiment of the present invention may determine an ordinal corresponding to the target element in the predicted element whose value type is the terminal, so as to determine, from the source sentence, a source word of the corresponding ordinal as the target source word by using the determined ordinal;

in the example shown in fig. 4, the source sentence is "John has a dog ₅ ，y ₅ Unknown), the predicted elements at the previous 4 moments are known to be y ₁ ＝(S，y ₂ ＝(NP，y ₃ ＝XX，y ₄ ＝XX；

Then y can be assumed ₅ Is XX (representation of terminator) and determines y ₅ When the value type of (1) is XX, y ₅ The corresponding ordinal number in the predicted element with value type XX; thus, the source words of the corresponding ordinal numbers are determined as target source words from the source sentences by the ordinal numbers;

as can be seen from FIG. 4, at predicted element y ₁ To y ₄ In (2), the predicted element with the value type XX is y ₃ And y ₄ And thus can be assumed at y ₅ When the value type of (A) is XX, y is determined ₅ The corresponding ordinal number in the predicted element with value type XX is 3; thereby determining that the 3 rd word "a" in the source sentence is the target element y to be predicted currently ₅ Target source words aligned in the source sentence.

Similarly, y needs to be predicted at the current moment ₄ For example, assume y ₄ When the value type of (3) is XX, the corresponding ordinal number in the predicted element with value type of XX is 2, then y ₄ The source word aligned in the source sentence is "has"; and the elements needing to be predicted at other moments, the assumed source words aligned in the source sentences are processed in the same way.

Step S210, when the value type of the target element to be predicted at the current time is assumed to be right brackets, determining a starting word of a phrase structure corresponding to the current time from the source sentence, taking a previous word of the target source word as an end word of the phrase structure, and determining the phrase structure corresponding to the current time according to the starting word and the end word of the phrase structure.

After the target source word is determined, the embodiment of the present invention may determine the phrase structure and the half-phrase structure corresponding to the current time.

Because the phrase structure is deterministic, as an alternative implementation when determining the phrase structure, embodiments of the present invention may determine the starting word of the phrase structure; optionally, in this embodiment of the present invention, when it is assumed that the value type of the target element is a right bracket, a phrase element starting from a left bracket closest to the target element is determined from the predicted elements, a next predicted value type of the phrase element starting from the left bracket is determined as an element of a terminal, the next predicted value type is determined as an element of the terminal, a corresponding ordinal number in the predicted element having the value type of the terminal is determined, and a source word corresponding to the ordinal number is determined as a start word of the phrase structure from the source sentence.

Referring to the example shown in FIG. 4, what is done at the current time is the 5 th timeDecoding (i.e. element y at the 5 th moment needs to be predicted) ₅ ，y ₅ Unknown), the predicted elements at the previous 4 moments are known to be y ₁ ＝(S，y ₂ ＝(NP，y ₃ ＝XX，y ₄ ＝XX；

Suppose y ₅ Is ")" (right brackets) means that it corresponds to a phrase structure that can be predicted from the predicted element y ₁ To y ₄ In (1), determining and targeting element y ₅ The phrase element beginning with the nearest left bracket; as can be seen in FIG. 4, the left parenthesis begins with the phrase element y ₂ ("(NP"), and y can thus be determined ₂ The later predicted most recent value of type XX element (called y) ₂ The latter predicted value type is the element of the terminator), which is y as can be seen in fig. 4 ₃ (ii) a Thereby determining y ₃ The corresponding ordinal number in the predicted element with the value type XX is 1, and accordingly, the starting word of the phrase structure corresponding to the current time is the source word "John" with the ordinal number of 1 in the source sentence.

Optionally, element y at the 4 th time needs to be predicted ₄ The corresponding phrase structure is determined in the same manner as the initial word.

After determining the starting word of the phrase structure corresponding to the current moment, taking the previous word of the target source word in the source sentence as the tail word of the phrase structure, so as to form the source sentence corresponding to the current moment by using the source words covered by the starting word and the tail word of the phrase structure in the source sentence;

referring to the example shown in FIG. 4, y needs to be predicted at the current time ₅ When the target source word is determined to be "a", and after the starting word of the corresponding phrase structure at the current time is "John", the previous word "has" of the target source word "a" is used as the last tail word of the phrase structure, so as to form the phrase structure of (John, has).

Step S220, the target source word is used as a starting word of a corresponding half-phrase structure at the current moment, and an end word of the half-phrase structure is set as any unknown source word behind the target source word to form the half-phrase structure.

After the target source word is determined, the value type of a target element to be predicted at the current moment can be assumed to be left brackets ("("), which means that a half-phrase structure is to be generated, the determined target source word can be used as a starting word of the half-phrase structure, and the last word of the half-phrase structure is set as any unknown source word behind the target source word in a source sentence to form the half-phrase structure;

referring to the example shown in FIG. 4, y needs to be predicted at the current time ₅ Then, the target source word is determined to be "a", and the end word of the half-phrase structure is set to be unknown (denoted by ".

And step S230, determining a context corresponding to the current moment at least according to the target source word, the phrase structure, the half-phrase structure and the source end vector representation sequence.

Alternatively, as an alternative implementation, step S230 may be implemented with reference to the corresponding formula described above.

Taking the syntax analysis scenario as an example, when predicting the elements of the syntax analysis result at each time, the context at each time can be selected by the method shown in fig. 3, and the auxiliary decoder performs prediction of the elements at each time to obtain the syntax analysis result.

Taking a syntactic analysis scenario as an example, the context selection method described above may be implemented by an attention layer of a syntactic analysis model, and the attention layer may be implemented based on a deterministic attention mechanism; specifically, in the process of performing syntactic analysis by the syntactic analysis model, the context selection method provided by the embodiment of the present invention can be used to perform context selection at each time. In the embodiment of the present invention, the syntactic analysis model may be implemented based on a neural network, and an alternative structure of the syntactic analysis model based on the neural network may be as shown in fig. 5, which includes: an encoder and a decoder; wherein an attention layer based on a deterministic attention mechanism is provided in the decoder.

Optionally, in the process of performing syntactic analysis, the source sentence may be input into the syntactic analysis module, and a source-end vector representation sequence corresponding to the source sentence is generated by the encoder; the source sentence may include at least one source word, and one source word may correspond to one representation vector in the source vector representation sequence;

after the source end vector representation sequence is obtained, the attention layer can select the context at the current moment by using the context selection method provided by the embodiment of the invention, so that the decoder can predict the target element at the current moment according to the context at the current moment;

and continuously selecting the context at each moment by circularly using the attention layer in the manner, and predicting the elements at each moment by using a decoder, so that the sequence formed by the predicted elements at each moment is used as a syntax tree sequence to obtain a syntax analysis result, and the syntax analysis of the source sentence is realized.

Alternatively, the framework of the neural network-based syntactic analyzer may be implemented using a plurality of serialized neural networks, such as a plurality of RNNs (recurrent neural networks); as an example, as shown in fig. 6, the encoder may be implemented by a serialized neural network (e.g., an RNN), e.g., the encoder may be implemented based on a bi-directional RNN; the decoder may be implemented by another serialized neural network, such as may be implemented based on RNN from left to right; the attention layer may be implemented by a network layer in a serialized neural network for selection of contexts for respective time instants based on the output of the encoder.

Fig. 7, with reference to fig. 5 and fig. 6, shows a flowchart of a parsing method provided in an embodiment of the present invention, where the parsing method is applicable to an electronic device, and the electronic device may be implemented by a server or a terminal; the syntactic analysis process can be realized by a syntactic analysis model arranged in the electronic equipment;

referring to fig. 7, a syntax analysis method provided in an embodiment of the present invention may include:

and step S300, reading the source sentence by the encoder, and outputting a corresponding source end vector representation sequence.

Optionally, each source word included in the source sentence may form an input sequence, and after the input to the encoder, the encoder may convert discrete source words in the source sentence into a continuous spatial representation by using a property of RNN compression representation, and input the continuous spatial representation obtained by conversion into a bidirectional RNN (Recurrent Neural Networks), so as to obtain a corresponding source-end vector representation sequence.

Step S310, at the current time, the attention layer selects the context at the current time.

Alternatively, the processing of step S310 may be implemented based on the context selection method provided in the above-described embodiment of the present invention;

specifically, according to a target element to be predicted at the current moment, a target source word of the target element aligned in the source sentence is assumed; according to the target source words, separating phrase structures and half-phrase structures corresponding to the current moment from the source sentences, wherein the phrase structures are at least deterministic; and determining the context corresponding to the current moment at least according to the target source word, the phrase structure, the half phrase structure and the source end vector representation sequence.

Step S320, the decoder outputs the predicted target element at the current time according to the context at the current time.

Optionally, the state of the decoder at the current time can be set as s _t The target element to be predicted at the current moment is y _t Then at the current moment, the decoder can select the context c according to the current moment _t Decoder state s at a previous time _t-1 Element y predicted at the previous moment _t-1 Determining the decoder state s at the current time _t (this process can be considered a standard RNN operation);

furthermore, the decoder can be based on the decoding end state s at the current time _t Context at the current time c _t And the element y predicted at the previous time _t-1 Determining the predicted target element y at the current moment _t 。

Thus, the attention layer and the decoder are continuously and circularly processed at each moment (namely, the steps S310 and S320 are repeatedly executed at each moment), so that the elements generated at each moment are obtained, the sequence consisting of the elements generated at each moment is formed, the syntax tree sequence is formed, and the syntax analysis result is obtained.

Alternatively, the syntax analysis result may be a syntax tree sequence, as shown in fig. 8, fig. 8 shows a top-down (top-down) serialization process of a syntax tree, the upper half is the syntax tree, the lower half is the serialization result, and the middle dotted line represents the leaf node expressed by XX.

Based on the deterministic attention mechanism provided by the embodiment of the invention, the context selection process related to the syntactic analysis model training process can be adapted to adjustment; optionally, fig. 9 shows an optional training method flow of the syntactic analysis model, where the training method flow is applicable to an electronic device, and the electronic device may be implemented by using a server or a terminal;

referring to fig. 9, a training process of a syntactic analysis model provided in an embodiment of the present invention may include:

and S400, acquiring a source sentence sample.

The source sentence samples can be regarded as sentence samples used for training a syntactic analysis model, and can be obtained by a given standard book base;

when a syntactic analysis model is trained, in the embodiment of the present invention, each source sentence sample may be input into the syntactic analysis model one by one, with a goal of maximizing a likelihood function score, and parameters of the syntactic analysis model (including parameters of an attention layer based on a deterministic attention mechanism provided in the embodiment of the present invention) are updated iteratively, so that after iteration is completed, training of the syntactic analysis model is completed, and a specific manner may be as shown in the following steps.

Step S410, inputting the source sentence sample into a syntactic analysis model, wherein the syntactic analysis model comprises: an encoder and a decoder; the decoder is provided with an attention layer based on a deterministic attention mechanism.

Step S420, determining, by the encoder, a source-end vector representation sequence corresponding to the source sentence sample.

Step S430, at the current moment, the attention layer assumes a target source word of the target element aligned in the source sentence according to the target element to be predicted at the current moment; according to the target source words, separating phrase structures and half-phrase structures corresponding to the current moment from the source sentences, wherein the phrase structures are at least deterministic; and determining the context corresponding to the current moment at least according to the target source word, the phrase structure, the half-phrase structure and the source end vector representation sequence.

In the process of training a syntactic analysis model, the mode that the attention layer selects the context can be realized by the context selection method provided by the embodiment of the invention;

the target element to be predicted at the current moment can be set as x _t The corresponding phrase structure is ρ (x) _b ，x _t-1 ) The structure of the half phrase is rho (x) _t ，？)，E ^x Representing a source end vector representation sequence; the context c of the current time _t The selection of (c) can be based on the following formula:

c _t ＝φ(ρ(x _b ，x _t-1 ),ρ(x _t ，？),x _t ,E ^x )；

further, phi can be defined as

a vector representation representing the target source word;

thus the context c of the previous moment _t Can be based on a formula

And (5) realizing.

Step S440, the decoder predicts a target element corresponding to the current time according to the context corresponding to the current time, so as to form a syntax tree sequence corresponding to the source sentence sample from the predicted elements corresponding to each time, and obtain a syntax analysis result of the source sentence sample.

Alternatively, the processing of step S440 may refer to the processing described above in step S320.

And S450, determining corresponding likelihood function scores according to the source sentence samples and the syntactic tree sequences corresponding to the source sentence samples.

Step S460, iteratively updating parameters of the syntactic analysis model at least by taking the maximum likelihood function score as a training target until an iteration termination condition is reached, and finishing the training of the syntactic analysis model; wherein, the parameters of the syntactic analysis model at least include: parameters of the attention layer based on a deterministic attention mechanism.

Optionally, the training of the syntactic analysis model may be performed by maximizing a likelihood function score as a target function;

wherein x is ⁱ For the ith source sentence (i.e., the ith input sequence), y ⁱ A syntax tree sequence corresponding to the ith source sentence; theta represents parameters of the syntactic analysis model and needs to be updated iteratively, and the theta comprises parameters theta of the attention layer based on the deterministic attention mechanism ^c ；

Optionally, the source sentence sample input into the syntactic analysis model for training is set as x, x = x<x ₁ ,x ₂ ,...x _|x| >The length is | x |, and the syntax tree sequence corresponding to the source sentence sample is set as y, y =<y ₁ ,y ₂ ,...y _|y| >And its length is y, then P (yx; theta) can be defined as follows;

wherein x represents a currently input source sentence sample, y represents a syntax tree sequence corresponding to the currently input source sentence sample, | y | represents the length of the syntax tree sequence, y _t Elements representing a sequence of syntax trees resulting at the current time; h' _t ＝f′(h′ _t-1 ，y _t-1 ，c _t ) Representing the implicit neurons in the decoder decoding process, which can be defined by a recurrent neural network.

It should be noted that the objective function may be implemented by at least a likelihood function score, but in an actual situation, the objective function may also be added with other numerical values, which is not limited to the likelihood function score, and may be specifically determined according to the training requirement of the syntactic analysis model, but no matter how the training requirement of the syntactic analysis model changes, when the model training process and the syntactic analysis process perform context selection, both the model training process and the syntactic analysis process may be implemented based on the context selection method provided in the embodiment of the present invention.

Based on the trained syntactic analysis model, the syntactic analysis process can be as shown in fig. 7, which is not described again here; alternatively, as shown in fig. 10, a scenario example of syntactic analysis performed based on the trained syntactic analysis model may specifically be that a syntactic analysis model is set in a server, and the server receives a syntactic analysis request from a terminal to perform syntactic analysis; alternatively, as shown in fig. 10, the application scenario process of the syntactic analysis may include:

s1, a user inputs a source sentence to be subjected to syntactic analysis at a terminal, and the terminal sends a syntactic analysis request containing the source sentence to a server.

S2, after receiving a syntactic analysis request sent by a terminal, the server calls a syntactic analysis model; the syntactic analysis model includes an encoder and a decoder that contains an attention layer based on a deterministic attention mechanism.

And S3, the server inputs the source sentence into a syntactic analysis model, and determines a syntactic tree sequence corresponding to the source sentence through the syntactic analysis model to obtain a syntactic analysis result.

In the process of carrying out syntactic analysis on a source sentence, the syntactic analysis model can carry out context selection by an attention layer based on a deterministic attention mechanism according to the context selection method provided by the embodiment of the invention;

specifically, the method comprises the following steps: the attention layer can assume a target source word of the target element aligned in the source sentence according to the target element to be predicted at the current moment; according to the target source words, separating phrase structures and half-phrase structures corresponding to the current moment from the source sentences, wherein the phrase structures are at least deterministic; and determining the context corresponding to the current moment at least according to the target source word, the phrase structure, the half-phrase structure and the source end vector representation sequence.

And S4, the server outputs a syntax tree sequence corresponding to the source sentence through the syntax analysis model and feeds the syntax tree sequence back to the terminal.

The main core of the context selection method provided by the embodiment of the invention is in the definition mode of the attention mechanism, and the embodiment of the invention adopts a deterministic mode to select the context information in decoding, thereby improving the comprehensiveness of the captured context and the accuracy of context selection;

optionally, the context selection method provided in the embodiment of the present invention may be applied to a syntactic analysis scenario, the syntactic analysis model may be implemented based on a serialized neural network model, and when the syntactic analysis model is trained, the training efficiency may be improved depending on parallel, for example, in the case of using 1 GPU, the training of the syntactic analysis model may be completed in 1 day; meanwhile, from the aspect of the accuracy of syntactic analysis, the accuracy of syntactic analysis results can be remarkably improved by applying the syntactic analysis model of the context selection method provided by the embodiment of the invention to two public data sets PTB (Penn Treebank) and CTB (Chinese Penn Treebank).

In the following, the context selection apparatus provided in the embodiment of the present invention is described, and the context selection apparatus described below may be considered as a program module that is required to be set by an electronic device to implement the context selection method provided in the embodiment of the present invention. The contents of the context selection apparatus described below and the contents of the context selection method described above may be referred to in correspondence with each other.

Fig. 11 is a block diagram of a context selecting apparatus according to an embodiment of the present invention, where the context selecting apparatus is applicable to an electronic device, and the electronic device may be implemented by a server or a terminal;

referring to fig. 11, a context selecting apparatus provided in an embodiment of the present invention may include:

a source-end vector sequence acquisition module 100, configured to acquire a source-end vector representation sequence corresponding to a source sentence;

a target source word determining module 200, configured to assume, according to a target element to be predicted at a current time, a target source word in which the target element is aligned in a source sentence;

a separation module 300, configured to separate a phrase structure and a half-phrase structure corresponding to a current time from a source sentence according to the target source word; wherein the phrase structure is at least deterministic;

a context output module 400, configured to determine a context corresponding to the current time according to at least the target source word, the phrase structure, the half-phrase structure, and the source-end vector representation sequence.

Optionally, the target source word determining module 200 is configured to assume, according to a target element to be predicted at a current time, a target source word of the target element aligned in the source sentence, and specifically includes:

determining a target source word of the target element aligned in the source sentence on the assumption that the value type of the target element is a terminal character; wherein the possible value types of the target element include: terminator, left bracket and right bracket.

Optionally, the target source word determining module 200 is configured to determine, assuming that the value type of the target element is a terminal, a target source word aligned by the target element in the source sentence, and specifically includes:

when the value type of the target element is assumed to be a terminal, determining the ordinal number corresponding to the target element in the predicted element with the value type as the terminal;

and determining the source word of the corresponding ordinal number from the source sentence as the target source word by using the determined ordinal number.

Optionally, the separating module 300 is configured to separate a phrase structure corresponding to the current time from the source sentence according to the target source word, and specifically includes:

and assuming that the value type of the target element is a right bracket, determining a starting word of a phrase structure corresponding to the current moment from the source sentence, taking a previous word of the target source word as an end word of the phrase structure, and determining the phrase structure corresponding to the current moment according to the starting word and the end word of the phrase structure.

Optionally, the separating module 300 is configured to, assuming that the value type of the target element is a right bracket, determine a starting word of a corresponding phrase structure at the current time from the source sentence, and specifically include:

determining a phrase element starting from a left bracket closest to the target element from among predicted elements, assuming that the value type of the target element is a right bracket;

determining that the next predicted value type of the phrase element starting from the left bracket is an element of a terminal character, determining that the next predicted value type is an element of the terminal character, determining a corresponding ordinal number in the predicted element of which the value type is the terminal character, and determining a source word corresponding to the ordinal number from a source sentence as a start word of the phrase structure.

Optionally, the separating module 300 is configured to separate a corresponding half-phrase structure at the current time from the source sentence according to the target source word, and specifically includes:

and assuming that the value type of the target element is a left bracket, taking the target source word as a starting word of the half-phrase structure, and setting an end word of the half-phrase structure as any unknown source word behind the target source word to form the half-phrase structure.

Optionally, the context output module 400 is configured to determine a context corresponding to a current time according to at least the target source word, the phrase structure, the half-phrase structure, and the source end vector representation sequence, and specifically includes:

according to formula c _t ＝φ(ρ(x _b ，x _t-1 ),ρ(x _t ，？),x _t ,E ^x ) Determining a context corresponding to the current moment;

wherein, c _t Representing the corresponding context at the current time, x _t For the target source word, x _b As a starting word of the phrase structure, E ^x Representing a sequence, rho (x), for a corresponding source-side vector of the source sentence _b ，x _t-1 ) For the corresponding phrase structure at the current time, ρ (x) _t Is it? ) The corresponding half phrase structure at the current moment.

Optionally, the definition of Φ includes:

wherein, theta ^c Parameters representing attention layers based on a deterministic attention mechanism,

vector representation representing the target source word

Or the like, or, alternatively,

or the like, or, alternatively,

optionally, in a syntax analysis scenario, the context selection method provided by the embodiment of the present invention may be executed by an attention layer based on a deterministic attention mechanism in a syntax analysis model; wherein the syntactic analysis model includes: an encoder and a decoder, the decoder being provided with the attention layer;

optionally, fig. 12 is a block diagram of another structure of the context selecting apparatus according to the embodiment of the present invention, which is shown in fig. 11 and 12, and further includes:

an encoding module 500, configured to input a source sentence into the syntactic analysis model, and output a source-end vector representation sequence corresponding to the source sentence;

the decoding module 600 is configured to, after determining a context corresponding to a current time, output a target element predicted at the current time according to the context corresponding to the current time, so that a sequence corresponding to a syntax tree is formed by elements predicted at each time, and a syntax analysis result is obtained.

Optionally, the decoding module 600 is configured to output a target element predicted at the current time according to a context corresponding to the current time, and specifically includes:

determining the decoder state at the current moment according to the context corresponding to the current moment, the decoder state at the previous moment and the predicted element at the previous moment;

and determining a predicted target element at the current moment according to the state of the decoding end at the current moment, the context corresponding to the current moment and the predicted element at the previous moment.

Optionally, fig. 13 shows another structural block diagram of the context selecting apparatus according to an embodiment of the present invention, which is shown in fig. 12 and 13, and further includes:

a training module 700, configured to obtain source sentence samples; inputting the source sentence sample into a syntactic analysis model; determining, by the encoder, a source-side vector representation sequence corresponding to the source sentence samples; after determining the context corresponding to the current moment, predicting, by the decoder, the target element corresponding to the current moment according to the context corresponding to the current moment, so as to form the syntax tree sequence corresponding to the source sentence sample by the corresponding predicted element at each moment; determining a corresponding likelihood function score according to the source sentence sample and a syntax tree sequence corresponding to the source sentence sample; iteratively updating parameters of the syntactic analysis model by at least taking the maximum likelihood function score as a training target until an iteration termination condition is reached so as to train the syntactic analysis model; wherein, the parameters of the syntactic analysis model at least include: parameters of the attention layer.

The context selection device provided by the embodiment of the invention can be applied to electronic equipment, such as a server and the like; alternatively, the hardware structure block diagram of the electronic device may be as shown in fig. 14, and includes: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;

in the embodiment of the present invention, the number of the processor 1, the communication interface 2, the memory 3, and the communication bus 4 is at least one, and the processor 1, the communication interface 2, and the memory 3 complete mutual communication through the communication bus 4;

the processor 1 may be a central processing unit CPU, or an Application Specific Integrated Circuit ASIC (Application Specific Integrated Circuit), or one or more Integrated circuits configured to implement embodiments of the present invention;

the memory 3 may comprise a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory;

the memory stores a program, and the processor calls the program to realize the steps of the context selection method provided by the embodiment of the invention.

Alternatively, the functions of the programs may be described with reference to the corresponding parts above.

The embodiment of the invention also provides a storage medium, which stores a program suitable for being executed by a processor, so as to implement the steps of the context selection method provided by the embodiment of the invention.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for context selection, comprising:

2. The method of claim 1, wherein the target element to be predicted according to the current time, and the target source word assuming that the target element is aligned in the source sentence comprises:

3. The context selection method of claim 2, wherein the determining a target source word for which the target element is aligned in the source sentence, assuming that the value type of the target element is a terminator, comprises:

4. The method of any one of claims 2-3, wherein the separating the phrase structure corresponding to the current time from the source sentence according to the target source word comprises:

5. The method of claim 4, wherein the determining the starting word of the corresponding phrase structure at the current time from the source sentence, assuming that the value type of the target element is right brackets, comprises:

6. The context selection method of any one of claims 2-3, wherein the separating the corresponding half-phrase structure at the current time from the source sentence according to the target source word comprises:

and assuming that the value type of the target element is a left bracket, taking the target source word as a starting word of the half phrase structure, and setting an end word of the half phrase structure as any unknown source word behind the target source word to form the half phrase structure.

7. The method of claim 1, wherein determining the context corresponding to the current time according to at least the target source word, the phrase structure, the half-phrase structure, and the source-end vector representation sequence comprises:

according to the formula c _t ＝φ(ρ(x _b ，x _t-1 ),ρ(x _t ，？),x _t ,E ^x ) Determining a context corresponding to the current moment;

8. The context selection method of claim 7, wherein the definition of φ comprises:

wherein, theta ^c Parameters representing an attention layer based on a deterministic attention mechanism,

starting word x representing the corresponding phrase structure at the current time _b Is represented by a vector of (a) or (b),

vector representation representing the target source word

Or the like, or, alternatively,

or the like, or, alternatively,

9. the context selection method of claim 1, wherein the context selection method is performed by an attention layer based on a deterministic attention mechanism in a syntactic analysis model, the syntactic analysis model comprising: an encoder and a decoder, the decoder being provided with the attention layer;

the method further comprises the following steps:

inputting a source sentence into the syntactic analysis model, and outputting a source end vector representation sequence corresponding to the source sentence by the encoder;

and after determining the context corresponding to the current moment, outputting the predicted target element at the current moment by the decoder according to the context corresponding to the current moment, and forming a sequence corresponding to a syntax tree by the elements predicted at all moments to obtain a syntax analysis result.

10. The method of claim 9, wherein outputting, by the decoder, the predicted target element at the current time according to the corresponding context at the current time comprises:

the decoder determines the decoder state at the current moment according to the context corresponding to the current moment, the decoder state at the previous moment and the predicted element at the previous moment;

the decoder determines the predicted target element at the current moment according to the state of the decoding end at the current moment, the context corresponding to the current moment and the predicted element at the previous moment.

11. The context selection method of claim 9 or 10, wherein the method further comprises:

obtaining a source sentence sample;

inputting the source sentence sample into a syntactic analysis model;

determining, by the encoder, a source-end vector representation sequence corresponding to the source sentence samples;

after determining the context corresponding to the current moment, predicting, by the decoder, the target element corresponding to the current moment according to the context corresponding to the current moment, so as to form the syntax tree sequence corresponding to the source sentence sample by the element corresponding to the prediction at each moment;

determining a corresponding likelihood function score according to the source sentence sample and a syntax tree sequence corresponding to the source sentence sample;

iteratively updating parameters of the syntactic analysis model by at least taking the maximum likelihood function score as a training target until an iteration termination condition is reached so as to train the syntactic analysis model; wherein, the parameters of the syntactic analysis model at least include: parameters of the attention layer.

12. A context selection apparatus, comprising:

13. The context selection apparatus according to claim 12, wherein the target source word determining module is configured to assume, according to a target element to be predicted at a current time, a target source word whose target element is aligned in a source sentence, and specifically includes:

determining a target source word of the target element aligned in the source sentence on the assumption that the value type of the target element is a terminal character; wherein the possible value types of the target element include: terminator, left and right brackets;

the separation module is configured to separate a phrase structure corresponding to the current time from the source sentence according to the target source word, and specifically includes:

assuming that the value type of the target element is a right bracket, determining a starting word of a corresponding phrase structure at the current moment from the source sentence, taking a previous word of the target source word as an end word of the phrase structure, and determining the corresponding phrase structure at the current moment according to the starting word and the end word of the phrase structure;

the separation module is configured to separate a corresponding half-phrase structure at the current time from the source sentence according to the target source word, and specifically includes:

14. An electronic device, comprising: at least one memory and at least one processor; the memory stores a program that the processor invokes to perform the steps of the context selection method of any of claims 1-11.

15. A storage medium storing a program adapted to be executed by a processor to perform the steps of the context selection method of any one of claims 1 to 11.