CN109062902B

CN109062902B - Text semantic expression method and device

Info

Publication number: CN109062902B
Application number: CN201810942947.2A
Authority: CN
Inventors: 华磊; 刘权; 陈志刚
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2018-08-17
Filing date: 2018-08-17
Publication date: 2022-12-06
Anticipated expiration: 2038-08-17
Also published as: CN109062902A

Abstract

The application discloses a text semantic expression method and a text semantic expression device, wherein the method comprises the following steps: after the target text to be expressed is obtained, word segmentation processing is carried out on the target text to obtain each target word, dependency syntax analysis is carried out on the target text to determine the dependency relationship among the target words, and then semantic expression can be carried out on the target text according to the dependency relationship among the target words. Therefore, after the target text to be expressed is obtained, the semantic expression is performed on the target text in a common one-hot manner no longer, but according to the dependency relationship between the target words in the target text, that is, the semantic relationship between the words in the text is considered when the target text is subjected to semantic expression, so that the accuracy of the semantic expression result is improved.

Description

Text semantic expression method and device

Technical Field

The application relates to the technical field of natural language processing, in particular to a text semantic expression method and device.

Background

Text may refer to sentences or sections, and semantic expression of text refers to encoding the text in natural language into a specific vector, so that the vector contains semantic information of the text. A good semantic expression result is helpful for improving the effect and performance of various tasks such as text similarity retrieval, emotion classification, field classification and the like.

Specifically, a vocabulary including a large number of words may be created in advance, for example, a word belonging to a text a in the vocabulary may be represented by 1, and a word not belonging to the text a in the vocabulary may be represented by 0, so as to form a text vector composed of 0 and 1 to express semantic information of the text a, and a dimension of the text vector is the same as the number of words in the vocabulary.

However, in the existing one-hot method for semantic expression of a text, semantic relation among words in the text is not considered, so that the semantic expression result is inaccurate.

Disclosure of Invention

The embodiment of the application mainly aims to provide a text semantic expression method and device, which can improve the accuracy of semantic expression results.

The embodiment of the application provides a text semantic expression method, which comprises the following steps:

acquiring a target text to be expressed;

performing word segmentation processing on the target text to obtain each target word;

performing dependency syntax analysis on the target text, and determining the dependency relationship among target words;

and according to the dependency relationship among all target words, performing semantic expression on the target text.

Optionally, the determining the dependency relationship between the target words includes:

determining a dominant word having a dependency relationship with the target word, resulting in a word pair consisting of the target word and the dominant word, wherein the dominant word is a root node identifier or another target word different from the target word, the root node identifier is an identifier of a root node of a dependency syntax tree, and the dependency syntax tree describes a dependency relationship between the respective target words;

and determining the dependency relationship between two words in the word pairs for the word pairs corresponding to the target words respectively.

Optionally, the performing semantic expression on the target text according to the dependency relationship between the target words includes:

for each word pair, determining a word vector corresponding to each word in the word pair and a relationship vector corresponding to the dependency relationship between two words in the word pair;

coding two corresponding word vectors and a relation vector by using each word to obtain a text coding vector of the target text, wherein the text coding vector expresses syntax information and word sequence information of the target text;

and adopting the text coding vector to express semantic information of the target text.

semantically expressing the target text according to the dependency relationship between the target words and each dependency path, wherein each dependency path is each subpath in a dependency syntax tree, the dependency syntax tree describes the dependency relationship between the target words, and the end point of each subpath is a leaf node of the dependency syntax tree.

Optionally, the performing semantic expression on the target text according to the dependency relationship between each target word and each dependency path includes:

determining an application scene of a semantic expression result of the target text;

respectively determining the importance of each dependency path in the application scene;

and performing semantic expression on the target text according to the dependency relationship among the target words and the importance of each dependency path.

Optionally, the determining the importance of each dependency path in the application scenario respectively includes:

encoding two corresponding word vectors and a relation vector by using each word to obtain a text encoding vector of the target text, wherein the text encoding vector expresses syntax information and word sequence information of the target text;

coding each dependency path to obtain a path coding vector corresponding to each dependency path, wherein the path coding vector expresses path information formed by each target word in the dependency path;

and determining the path weight of the dependent path by using the text encoding vector and the path encoding vector, wherein the path weight characterizes the importance of the dependent path in the application scene.

Optionally, the performing semantic expression on the target text according to the dependency relationship between the target words and the importance of each dependency path includes:

determining path coding vectors corresponding to all the dependent paths according to the path coding vector corresponding to each dependent path and the path weight;

and adopting the text coding vector and path coding vectors corresponding to all the dependency paths to express semantic information of the target text.

The embodiment of the present application further provides a text semantic expression apparatus, including:

the target text acquisition unit is used for acquiring a target text to be expressed;

the target word obtaining unit is used for carrying out word segmentation processing on the target text to obtain each target word;

the dependency relationship determining unit is used for carrying out dependency syntax analysis on the target text and determining the dependency relationship among target words;

and the text semantic expression unit is used for performing semantic expression on the target text according to the dependency relationship among the target words.

Optionally, the dependency relationship determining unit includes:

a word pair obtaining subunit, configured to determine a dominant word having a dependency relationship with the target word, and obtain a word pair composed of the target word and the dominant word, where the dominant word is a root node identifier or another target word different from the target word, the root node identifier is an identifier of a root node of a dependency syntax tree, and the dependency syntax tree describes a dependency relationship between the target words;

and the dependency relationship determining subunit is used for determining the dependency relationship between two words in the word pairs for the word pairs corresponding to the target words respectively.

Optionally, the text semantic expression unit includes:

the first relation vector determining subunit is used for determining a word vector corresponding to each word in each word pair and a relation vector corresponding to the dependency relationship between two words in the word pair for each word pair;

the first encoding vector obtaining subunit is configured to encode two corresponding word vectors and a relationship vector by using each word to obtain a text encoding vector of the target text, where the text encoding vector expresses syntax information and word sequence information of the target text;

and the first semantic information expression subunit is used for expressing the semantic information of the target text by adopting the text coding vector.

Optionally, the text semantic expression unit is specifically configured to perform semantic expression on the target text according to a dependency relationship between target words and each dependency path, where each dependency path is each subpath in a dependency syntax tree, the dependency syntax tree describes the dependency relationship between target words, and an end point of the subpath is a leaf node of the dependency syntax tree.

Optionally, the text semantic expression unit includes:

the application scene determining subunit is used for determining an application scene of a semantic expression result of the target text;

the importance determining subunit is used for respectively determining the importance of each dependency path in the application scene;

and the text semantic expression subunit is used for performing semantic expression on the target text according to the dependency relationship among the target words and the importance of each dependency path.

Optionally, the importance determining subunit includes:

the second relation vector determining subunit is used for determining a word vector corresponding to each word in each word pair and a relation vector corresponding to the dependency relationship between two words in the word pair for each word pair;

a second encoding vector obtaining subunit, configured to encode two corresponding word vectors and a relationship vector by using each word to obtain a text encoding vector of the target text, where the text encoding vector expresses syntax information and word sequence information of the target text;

the path coding vector obtaining subunit is used for coding each dependency path to obtain a path coding vector corresponding to each dependency path, and the path coding vector expresses path information formed by each target word in the dependency path;

and the path weight determining subunit is used for determining the path weight of the dependent path by using the text encoding vector and the path encoding vector, wherein the path weight represents the importance degree of the dependent path under the application scene.

Optionally, the text semantic expression subunit includes:

a path code vector determination subunit, configured to determine path code vectors corresponding to all the dependent paths according to the path code vector corresponding to each dependent path and the path weight;

and the second semantic information expression subunit is used for expressing the semantic information of the target text by adopting the text coding vector and the path coding vectors corresponding to all the dependency paths.

The embodiment of the present application further provides a text semantic expression apparatus, including: a processor, a memory, a system bus;

the processor and the memory are connected through the system bus;

the memory is used for storing one or more programs, and the one or more programs comprise instructions which, when executed by the processor, cause the processor to execute any implementation of the text semantic expression method.

An embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a terminal device, the terminal device is enabled to execute any implementation manner of the text semantic expression method.

The embodiment of the present application further provides a computer program product, which, when running on a terminal device, enables the terminal device to execute any implementation manner of the text semantic expression method.

According to the text semantic expression method and device provided by the embodiment of the application, after the target text to be expressed is obtained, word segmentation processing is performed on the target text to obtain each target word, then dependency syntax analysis is performed on each target text to determine the dependency relationship among the target words, and then the target text can be subjected to semantic expression according to the dependency relationship among the target words. Therefore, after the target text to be expressed is obtained, the semantic expression is performed on the target text in the common one-hot manner no longer, but according to the dependency relationship among the target words in the target text, that is, the semantic relationship among the words in the text is considered when the semantic expression is performed on the target text, so that the accuracy of the semantic expression result is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a text semantic expression method according to an embodiment of the present disclosure;

FIG. 2 is a flowchart illustrating a process for determining dependencies between target words according to an embodiment of the present application;

FIG. 3 is a diagram illustrating a result of dependency parsing on a target text according to an embodiment of the present application;

FIG. 4 is a diagram illustrating a structure of a dependency syntax tree and a dependency path according to an embodiment of the present application;

FIG. 5 is a flowchart illustrating semantic expression of a target text according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a text encoding vector for generating a target text according to an embodiment of the present application;

FIG. 7 is a second flowchart illustrating semantic expression of a target text according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a path coding vector for generating a target text according to an embodiment of the present application;

fig. 9 is a schematic composition diagram of a text semantic expression apparatus according to an embodiment of the present application.

Detailed Description

In some text semantic expression methods, a one-hot method is generally adopted to semantically express a text. However, the dimension of the vocabulary in the one-hot expression mode is generally too high (more than 10 ten thousand of common Chinese words), which results in too high computational complexity, and meanwhile, the expression mode ignores the semantic relation between the text words, for example, although both words "apple" and "pear" represent fruit, the two words are completely unrelated in the one-hot expression mode and both words are represented as 0 or 1, that is, the semantic association relation between words is not considered, which results in inaccurate text semantic expression result.

In order to solve the above-mentioned defects, an embodiment of the present application provides a text semantic expression method, where after a text to be expressed is obtained, a word segmentation process is performed on the text to obtain each word in the text, then dependency syntax analysis is performed on the text to determine a dependency relationship between words in the text, and then the text is semantically expressed according to the dependency relationship between words. Therefore, in the embodiment of the application, the text is not semantically expressed in a traditional one-hot manner any more, but is semantically expressed according to the dependency relationship among the words in the text, that is, the influence of the semantic relationship among the words in the text on the semantic expression result of the text is considered, so that the accuracy of the semantic expression result of the text is improved.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

First embodiment

Referring to fig. 1, a schematic flow chart of a text semantic expression method provided in this embodiment is shown, where the method includes the following steps:

s101: and acquiring a target text to be expressed.

In this embodiment, any text that implements semantic expression of text by using this embodiment is defined as a target text. In addition, the embodiment does not limit the language type of the target text, for example, the target text may be a chinese text, an english text, or the like; the length of the target text is not limited in this embodiment, for example, the target text may be a sentence text or a chapter text; the source of the target text is not limited in this embodiment, for example, the target text may be a result from speech recognition, or may be log data collected from each service system of the platform; the present embodiment also does not limit the type of the target text, for example, the target text may be a certain sentence in a daily dialog of people, or may also be a partial text in a lecture manuscript, a magazine article, a literature, and the like.

It can be understood that the sentence text refers to a sentence and is a set of words, the chapter text refers to a set of a series of sentences, and after the sentence text or the chapter text is obtained as the target text to be expressed, the sentence text or the chapter text can be expressed semantically according to the following steps.

S102: and performing word segmentation processing on the target text to obtain each target word.

In this embodiment, after the target text to be expressed is obtained in step S101, in order to implement more accurate semantic expression on the target text, word segmentation processing may be performed on the target text to obtain each word included in the target text, where each word obtained by word segmentation is defined as a target word.

When the target text is a sentence text, the target text may be subjected to word segmentation processing by using a word segmentation method existing in the prior art or appearing in the future to obtain each word in the target text, and the word is used as each target word.

Or, if the target text is a chapter text, the target text needs to be subjected to clause processing to obtain each clause text of the target text, and then each clause text is subjected to word segmentation processing by using a word segmentation method to obtain each word in the target text as each target word.

S103: and performing dependency syntax analysis on the target text to determine the dependency relationship among the target words.

In this embodiment, after obtaining each target word corresponding to the target text through step S102, further, a dependency syntax analysis method may be used to perform dependency syntax analysis on the target text to determine a dependency relationship between the target words, where the dependency relationship between the target words refers to a semantic association relationship between the target words, for example, in six target words "he, call, tom, go, take, and overcoat" of the target text "he calls tom to take the overcoat", the semantic association between "he" and "call" is a cardinal-to-predicate relationship.

It should be noted that, for a specific implementation process of performing dependency syntax analysis on the target text to determine the dependency relationship between the target words, reference may be made to the following description of the second embodiment.

S104: and performing semantic expression on the target text according to the dependency relationship among the target words.

In this embodiment, after determining the dependency relationship between the target words in the target text in step S103, the target text may be semantically expressed further according to the dependency relationship between the target words, such as a predicate relationship, a moving object relationship, and the like.

Specifically, in the process of semantically expressing the target text, a dependency syntax tree corresponding to the target text may be first constructed according to a result of dependency syntax analysis performed on the target text, then a text coding vector corresponding to the target text may be determined according to semantic information of each target word in the dependency syntax tree and a dependency relationship between each target word and other target words, and then, the text coding vector corresponding to the target text may be used to semantically express the target text. It should be noted that, for a specific implementation process of semantically expressing a target text according to the dependency relationship between target words, reference may be made to the related description of the second embodiment below.

Furthermore, when a dependency syntax tree corresponding to the target text is constructed, a plurality of dependency paths corresponding to the target text can be obtained according to the tree structure of the dependency syntax tree, wherein each dependency path plays an important role in semantic expression of the target text, after a text encoding vector corresponding to the target text is determined, a path encoding vector corresponding to each dependency path in the target text can be further determined on the basis of the text encoding vector, and the path encoding vector can reflect the parent-child relationship of each target word on the corresponding dependency path. And further, the target text can be subjected to more accurate semantic expression according to the dependency relationship among the target words and each dependency path in the target text, that is, the target text can be subjected to more accurate semantic expression according to the text coding vector corresponding to the target text and the path coding vector corresponding to each dependency path in the target text.

Therefore, in order to improve the accuracy of semantic expression of the target text, after determining the dependency relationship between each target word, the semantic expression of the target text may be further implemented by combining each dependency path in the target text, and for a specific implementation process of semantic expression of the target text by combining the dependency relationship between each target word and each dependency path in the target text, reference may be made to the related description of the third embodiment.

In summary, according to the text semantic expression method provided in this embodiment, after the target text to be expressed is obtained, word segmentation processing is performed on the target text to obtain each target word, dependency syntax analysis is performed on the target text to determine a dependency relationship between each target word, and then, semantic expression can be performed on the target text according to the dependency relationship between each target word. Therefore, after the target text to be expressed is obtained, the semantic expression is performed on the target text in a common one-hot manner no longer, but according to the dependency relationship among the target words in the target text, that is, the semantic relationship among the words in the text is considered when the semantic expression is performed on the target text, so that the accuracy of the semantic expression result is improved.

Second embodiment

The present embodiment first describes a specific implementation of the step S103 "determining the dependency relationship between the target words" in the first embodiment.

Referring to fig. 2, it shows a schematic flow chart of determining the dependency relationship between target words provided by this embodiment, and the flow chart includes the following steps:

s201: determining a dominant word having a dependency relationship with the target word, resulting in a word pair consisting of the target word and the dominant word, wherein the dominant word is a root node identification or another target word different from the target word, the root node identification is an identification of a root node of a dependency syntax tree, and the dependency syntax tree describes the dependency relationship between the respective target words.

In this embodiment, after the word segmentation method is used to segment the target text to obtain each target word corresponding to the target text, a dependency syntax analysis method may be further used to perform dependency syntax analysis on the target text, for example, a major word having a dependency relationship with each target word in the target text may be determined according to the analysis result by using a Long Technology Platform (LTP) to perform dependency syntax analysis on the target text to obtain an analysis result, and each target word and its corresponding major word may further be combined into a word pair.

Wherein, for each target word in the target text, the dominant word having a dependency relationship with the target word is either an identification of a root node in the dependency syntax tree or another target word in the dependency syntax tree different from the target word.

For example, the following steps are carried out: based on the above example, after performing word segmentation processing on the target text "called tom goes to take the outer garment", six target words are obtained as "he, call, tom, go, take, outer garment", and after performing dependency parsing on the target text "called tom goes to take the outer garment" by using LTP, the obtained parsing results are shown in fig. 3, the bottommost box in fig. 3 shows the result of dependency parsing on the target text "called tom goes to take the outer garment", wherein each target word has an "input" arrow pointing thereto, a word connected to the other end of the arrow is a dominant word having a dependency relationship with each target word, each target word and its corresponding dominant word may form a word pair, and the target words are dominated by their corresponding dominant words in a semantic relationship.

As shown in fig. 3, the dominant word of the target word "he" is "call", and the dependency relationship between the two is "subject-verb (SBV)", which may form a pair of word pairs; the dominant word of the target word "called" is a ROOT node identifier "ROOT" of the dependency syntax tree, the dependency relationship of the two is a "core relationship (head, abbreviated as HED)", and the two can form a pair of word pairs to represent the core of the whole target text sentence; the dominant word of the target word "tom" is "called", the dependency relationship of the two is "Doublet (DBL)", and the two can form a pair of word pairs; the dominant word of the target word "go" is "na", the dependency relationship between the two is "shape middle (ADV)", and the two can form a pair of word pairs; the dominant word of the target word "take" is "call", the dependency relationship of the two is "verb-object (VOB)", and the two can form a pair of word pairs; the dominant word of the target word "outerwear" is "take", and the dependency relationship between the two is also "VOB"), and the two can form a pair of word pairs.

Meanwhile, based on the result of performing dependency syntax analysis on the target text "called tom remove outerwear" shown in fig. 3, the dominant word in each pair of word pairs may be used as a parent node of the corresponding target word, and the dependency syntax analysis result is further expanded into a tree form to construct a dependency syntax tree corresponding to the target text, as shown in the left diagram of fig. 4, where the dependency syntax tree can describe the dependency relationship between target words in the target text, and the corresponding parent node is sequentially searched upward from the target word in each leaf node of the dependency syntax tree, so that one dependency path corresponding to the target word may be obtained, taking the leaf node "go" as an example, and the corresponding parent node is sequentially searched upward as "take" - > "ROOT", so that one dependency path corresponding to the target word "remove" may be obtained as "ROOT-take-go", as shown in the third path of the right diagram of fig. 4, and as shown in the right diagram of fig. 4, four dependency paths corresponding to the target words in the other three nodes may be generated, that is shown in the left diagram, that is the dependency path corresponding to the dependency tree may be generated according to the left diagram.

S202: and determining the dependency relationship between two words in the word pairs for the word pairs respectively corresponding to the target words.

In this embodiment, after obtaining the word pair composed of each target word and the dominant word corresponding thereto in the target text in step S201, for the word pair corresponding to each target word, the dependency relationship between the target word and the dominant word in the word pair may be determined, for example, based on the above example, for the target word "he" and the dominant word "Caller" in the target text "He called tom goes to take a coat", the dependency relationship between the "He" and the "Caller" in the word pair may be determined to be "SBV", and similarly, the dependency relationship between the "Tom" and the "Caller" in the word pair may be determined to be "DBL".

After determining the dependency relationship between the target words through steps S201-S202, the present embodiment will next describe a specific implementation of step S104 "semantically express a target text according to the dependency relationship between the target words" in the first embodiment through steps S501-S503.

In this embodiment, a text encoding vector corresponding to a target text is determined according to semantic information of each target word in the dependency syntax tree and a dependency relationship between each target word and other target words, and then semantic expression of the target text can be realized by using the text encoding vector.

Referring to fig. 5, it shows a schematic flowchart of semantic expression of target text according to this embodiment, where the flowchart includes the following steps:

s501: for each word pair, a word vector corresponding to each word in the word pair and a relationship vector corresponding to the dependency relationship between two words in the word pair are determined.

In this embodiment, after each word pair included in the target text is obtained in step S201, a text coding vector corresponding to the target text may be further generated according to each word pair, so as to implement semantic expression on the target text, and specifically, a pre-constructed semantic expression model may be used to generate the text coding vector.

The pre-constructed semantic expression model may be a coding model based on a Neural Network, such as a coding model based on a Convolutional Neural Network (CNN) or a Recurrent Neural Network (RNN).

Specifically, in the process of semantically expressing the target text, for each word pair, a word vector corresponding to each word in the word pair and a relationship vector corresponding to the dependency relationship between two words in the word pair may be first determined, for example, a word vector of a target word in the word pair may be represented by x, a word vector of a dominant word in the word pair may be represented by y, and the dependency relationship between the target word and the dominant word may be represented by a relationship vector r. The Word vectorization may be performed on the target Word and the dominant Word by using a Word vectorization method or a correlation model for generating a Word vector to obtain a Word vector x and a Word vector y, for example, the Word vectorization may be performed on the target Word and the dominant Word in a Word pair by using an open source software such as a Word2vec method or a Glove (Global Vectors for Word replication) to obtain a Word vector x and a Word vector y, and the relationship vector r between the target Word and the dominant Word may be directly obtained by using a random initialization method.

S502: and coding the two corresponding word vectors and the relation vector by using each word to obtain a text coding vector of the target text, wherein the text coding vector expresses syntax information and word sequence information of the target text.

In this embodiment, after determining the word vector x of the target word in each word pair, the word vector y of the dominant word, and the relationship vector r representing the dependency relationship therebetween through step S501, the three may be spliced into a vector group to represent the corresponding word pair, for example, the three may be spliced into a ternary vector group p = [ x, r, y ] to represent a word pair of the target text.

For example, as shown in FIG. 6, forThe dependency relationship of the target Word "he" and the domination Word "Cali" is SBV, and the Word vector corresponding to the target Word "he" and the domination Word "Cali" is obtained as x after the Word vectorization is carried out on the target Word "he" and the domination Word "Cali" respectively by using a Word2vec method ₁ The word vector corresponding to "Cali" is y ₁ Then, a relation vector r representing the dependency relationship between the two is obtained by a random initialization mode ₁ And then the three vectors can be spliced to obtain a ternary vector group p ₁ ＝[x ₁ ,r ₁ ,y ₁ ]To denote a word pair consisting of the target word "he" and the dominant word "Call", a set of three elements p, for the same reason, may be used ₂ 、p ₃ 、p ₄ 、p ₅ 、p ₆ The corresponding word pairs "call" and "ROOT", "tom" and "call", "go" and "take", "Na" and "call", "coat" and "Na", respectively, are indicated, as shown in fig. 6.

It is understood that for the target text I, it can be expressed as I = [ p = ₁ ,p ₂ …p _i …p _N ]Where N represents the number of target words in target text I and p _i Then the ternary vector group corresponding to the ith word pair in the target text is represented, for example, as shown in fig. 6, for the target text "tamm goes to get a coat", it can be represented as I = [ p = [ p ] ₁ ,p ₂ ,p ₃ ,p ₄ ,p ₅ ,p ₆ ]Wherein the target text contains six target words "He, cal, tom, go, na, and coat", and p ₁ 、p ₂ 、p ₃ 、p ₄ 、p ₅ 、p ₆ And respectively represent the three vector sets corresponding to the six word pairs to which the six target words belong.

Furthermore, each word in the target text can be used for coding the corresponding ternary vector group to obtain a text coding vector corresponding to the target text, and the text coding vector expresses syntax information and word sequence information of the target text. The syntactic information of the target text refers to syntactic relations among target words composing the target text, and the word sequence information refers to ordering information of the target words in the target text.

Specifically, each word pair in the target text may be input into a pre-constructed semantic expression model for encoding, for example, input into a pre-constructed coding model of a deep neural network (e.g., CNN or RNN) for encoding, so as to obtain a final layer implicit output coding vector h of the deep neural network, and further, the final layer implicit output coding vector h may be used as a text coding vector corresponding to the target text to perform semantic expression on the target text. For example, as shown in FIG. 6, for the target text "He called Tom goes to take a coat", the six word pairs contained in the target text "He called Tom goes to take a coat" may be associated with the corresponding set of tri-vectors p ₁ 、p ₂ 、p ₃ 、p ₄ 、p ₅ 、p ₆ Inputting the coded data into a pre-constructed deep neural network coding model for coding, and further obtaining a corresponding text coding vector h.

S503: and adopting a text coding vector to express semantic information of the target text.

In this embodiment, after the text encoding vector h of the target text is obtained in step S502, the text encoding vector h may be further used to express semantic information of the target text, as shown in fig. 6, the text encoding vector h may be used to express semantic information of the target text "tame call tom go to get outer clothing". In the process of obtaining the text coding vector h, the semantic information of each target word in the target text, the syntax information of the target text (the syntactic relation between each target word) and the word sequence information (the sequencing information of each target word in the target text) are already coded into the text coding vector, so that the semantic expression can be performed on the target text by using the obtained text coding vector.

It can be understood that, if the target text is a chapter text, the sequence information of each sentence in the chapter text can be encoded into the text encoding vector h corresponding to the target text. In addition, in the encoding process, the ternary vector group corresponding to each word pair in the target text is simultaneously input into the semantic expression model for encoding, so that the parallel operation is realized, the problem that the parallel operation cannot be realized because the operation is directly carried out on the syntax tree is solved, the encoding time can be effectively saved, and the encoding efficiency is improved.

In summary, in this embodiment, a text coding vector corresponding to a target text is determined according to semantic information of each target word in a dependency syntax tree corresponding to the target text and dependency relationships between each target word and other target words, and then the text coding vector is used to perform semantic expression on the target text, so that the semantic expression on the target text is realized on the basis of fully considering the semantic relationships between each target word in the target text, and the accuracy of a semantic expression result of the target text is further improved.

Third embodiment

This embodiment will describe another specific implementation of the step S104 "semantically expressing a target text according to the dependency relationship between target words" in the first embodiment.

In this embodiment, not only the dependency relationship between the target words of the target text may be adopted to semantically express the target text, but also each dependency path of the target text may be further utilized to semantically express the target text by combining the two, where each dependency path is each sub-path in the dependency syntax tree corresponding to the target text, and as shown in the right diagram of fig. 4, the end point of each sub-path is each leaf node of the dependency syntax tree.

Referring to fig. 7, a schematic flow chart of semantically expressing a target text provided in this embodiment is shown, where the flow chart includes the following steps:

s701: and determining the application scene of the semantic expression result of the target text.

In this embodiment, in order to implement semantic expression on a target text, first, an application scenario of a semantic expression result of the target text needs to be determined, where the application scenario of the semantic expression result of the target text may be various application scenarios in natural language processing fields such as emotion classification of sentences, sentence similarity retrieval, category classification, and the like.

S702: and respectively determining the importance of each dependent path in the application scene.

In this embodiment, after the application scenario of the semantic expression result of the target text is determined in step S701, each dependency path of the dependency syntax tree corresponding to the target text may be analyzed according to the application scenario, so as to determine the importance of each dependency path in the application scenario. Regarding the importance of each dependency path, the weight occupied by each dependency path in the application scenario may be used for representing the importance of each dependency path in the semantic expression result of the target text, for example, the higher the importance, the larger the corresponding weight value, or vice versa, and further, the importance of each dependency path in the semantic expression result of the target text may be represented by using the normalization result of the weight value of each dependency path.

In an implementation manner of this embodiment, step S702 may specifically include steps a-D:

step A: for each word pair, a word vector corresponding to each word in the word pair and a relationship vector corresponding to the dependency relationship between two words in the word pair are determined.

And B, step B: and coding the two corresponding word vectors and the relation vector by using each word to obtain a text coding vector of the target text, wherein the text coding vector expresses syntax information and word sequence information of the target text.

It should be noted that, steps a-B are the same as steps S501-S502 in an implementation manner of semantically expressing a target text in the second embodiment, and for related points, reference is made to the description of steps S501-S502 above, and details are not repeated here.

And C: and coding each dependency path to obtain a path coding vector corresponding to each dependency path, wherein the path coding vector expresses path information formed by each target word in the corresponding dependency path.

In this embodiment, encoding may be performed according to each dependency path in the dependency syntax tree corresponding to the target text to obtain a path encoding vector of each dependency path, where the path encoding vector of each dependency path expresses path information formed by each target word in each dependency path, that is, expresses parent-child relationship information of each target word on each dependency path on the corresponding dependency path.

Specifically, first, word vectors corresponding to each target word in each dependency path may be obtained through step a, and then, a dependency path corresponding to all target words in each dependency path may be characterized by the word vectors corresponding to all target words in each dependency path, as shown in the right-side diagram of fig. 4, for a target text "called tom go to take a garment", four dependency paths may be generated correspondingly, which are respectively the 1 st "ROOT-called-his", the 2 nd "ROOT-called-tom", the 3 rd "ROOT-called-taken-go", and the 4 th "ROOT-called-taken-garment", and further, for example, for the 1 st dependency path "ROOT-called-his", the dependency path may be characterized by the word vectors corresponding to "called" and "dependency", and similarly, a word vector set representing other three paths may be obtained.

Further, the word vector set corresponding to each dependency path may be input into a pre-constructed coding model for coding, for example, input into a pre-constructed coding model of a deep neural network (e.g., CNN or RNN) for coding, so as to obtain a coding vector implicitly output by a final layer of the deep neural network, and further, the coding vector may be used as a path coding vector corresponding to each dependency path for semantically expressing the target text, as shown in fig. 8, for the target text "he called tom goes to take a coat", the word vector sets corresponding to the four dependency paths included in the target text may be input into a pre-constructed coding model of the deep neural network for coding, so as to obtain path coding vectors S corresponding to the four dependency paths ₁ 、S ₂ 、S ₃ And S ₄ 。

Step D: and determining the path weight of each dependency path by using the text encoding vector of the target text and the path encoding vector of each dependency path, wherein the path weight represents the importance of the corresponding dependency path in the corresponding application scene.

It should be noted that although each dependency path in the target text has an important role in semantic expression, when the semantic expression result of the target text is in a different application scenario, only a part of the dependency paths may have a main role in semantic expression, for example, when the semantic expression result of the target text is in an application scenario of "sentence emotion classification", the dependency path "i happy" in the target text is more important in semantic expression than the dependency path "you".

Based on this, in the embodiment, the text encoding vector of the target text and the path encoding vector corresponding to each dependency path can be utilized to calculate the path weight of each dependency path, wherein the path weight characterizes the importance of the dependency path in the application scene.

The specific formula for calculating the path weight of each dependent path is as follows:

wherein, i represents the ith dependency path in the dependency syntax tree corresponding to the target text; v. of _i A path weight representing the ith dependency path; h represents a text encoding vector of the target text; s. the _i A path encoding vector representing the ith dependent path.

Further, to v _i After normalization, the calculation formula of the normalized path weight is obtained as follows:

wherein, i represents the i-th dependency path in the dependency syntax tree corresponding to the target text; v. of _i A path weight representing the ith dependency path; v. of _k A path weight representing the kth dependent path; m represents the number of dependency paths in the dependency syntax tree; a is _i Presentation pairv _i After normalization, the normalized path weight is obtained, it should be noted that, a _i Representing the weight of the current dependent path relative to the codes of all dependent paths, a _i The larger the value, the more important the current dependent path is in the current application scenario relative to the other dependent paths.

S703: and performing semantic expression on the target text according to the dependency relationship among the target words and the importance of each dependency path.

In this embodiment, after the importance of each dependency path in the application scene is determined through step S702, the target text may be semantically expressed further according to the dependency relationship between the target words in the target text and the importance of each dependency path in the application scene.

In an implementation manner of this embodiment, S703 may specifically include steps E-F:

and E, step E: and determining path coding vectors corresponding to all the dependent paths according to the path coding vector corresponding to each dependent path and the path weight.

In this implementation, after determining the path code vector corresponding to each dependent path and the path weight of each dependent path in the target text through step S702, the path code vectors of each dependent path may be subjected to weighted summation to determine the path code vector S corresponding to all dependent paths of the target text, and a specific formula for calculating the path code vectors S of all dependent paths is as follows:

wherein M represents the number of dependency paths in a dependency syntax tree corresponding to the target text; i represents the ith dependency path in the dependency syntax tree; s _i A path code vector representing the ith dependency path; a is _i Representing the normalized path weights.

For example, the following steps are carried out: as shown in FIG. 8, for the target text "He called Tom goes to get a coat", it is trueDetermining a path code vector S corresponding to four dependent paths ₁ 、S ₂ 、S ₃ And S ₄ Then, the normalized path weights of the four dependent paths can be calculated by using the step D, and then the path code vectors of the four dependent paths are weighted and summed to obtain the path code vectors S corresponding to all the dependent paths.

Step F: and adopting the text encoding vector and the path encoding vectors corresponding to all the dependency paths to express the semantic information of the target text.

In the implementation manner, after the path coding vectors S corresponding to all the dependency paths of the target text are determined in step E, the semantic expression of the target text can be realized by combining the text coding vector h of the target text.

Specifically, the text encoding vector h of the target text and the path encoding vector S corresponding to all the dependency paths of the target text may be spliced together to achieve semantic expression of the target text. Furthermore, the semantic expression result can be used in application scenarios of natural language processing such as sentence emotion classification, category classification, sentence similarity retrieval and the like. Taking the sentence emotion classification scene as an example, the semantic expression result may be used as input data of a Support Vector Machine (SVM) classifier or a Multi-Layer Perceptron (MLP) to train a sentence emotion classification model.

In summary, the embodiment adopts a manner of combining the dependency relationship between each target word in the target text and each dependency path, that is, not only the text coding vector corresponding to the target text is determined according to the semantic information of each target word in the dependency syntax tree corresponding to the target text and the dependency relationship between each target word and other target words, but also each dependency path is coded to obtain the path coding vector corresponding to each dependency path, and the target text is semantically expressed by combining the two, so that the accuracy of the semantic expression result is further improved.

Fourth embodiment

In this embodiment, a text semantic expression apparatus will be described, and for related contents, please refer to the above method embodiment.

Referring to fig. 9, a schematic composition diagram of a text semantic expression apparatus provided in this embodiment is shown, where the apparatus 900 includes:

a target text acquiring unit 901 configured to acquire a target text to be expressed;

a target word obtaining unit 902, configured to perform word segmentation processing on the target text to obtain each target word;

a dependency relationship determining unit 903, configured to perform dependency syntax analysis on the target text, and determine a dependency relationship between target words;

and a text semantic expression unit 904, configured to perform semantic expression on the target text according to the dependency relationship between the target words.

In one implementation manner of this embodiment, the dependency relationship determination unit 903 includes:

In an implementation manner of this embodiment, the text semantic expression unit 904 includes:

In an implementation manner of this embodiment, the text semantic expression unit 904 is specifically configured to perform semantic expression on the target text according to a dependency relationship between target words and each dependency path, where each dependency path is each sub-path in a dependency syntax tree, the dependency syntax tree describes a dependency relationship between target words, and an end point of the sub-path is a leaf node of the dependency syntax tree.

an application scene determining subunit, configured to determine an application scene of a semantic expression result of the target text;

In an implementation manner of this embodiment, the importance determining subunit includes:

the path coding vector obtaining subunit is configured to code each dependency path to obtain a path coding vector corresponding to each dependency path, where the path coding vector expresses path information formed by each target word in the dependency path;

and the path weight determining subunit is used for determining the path weight of the dependent path by using the text encoding vector and the path encoding vector, wherein the path weight represents the importance of the dependent path under the application scene.

In an implementation manner of this embodiment, the text semantic expression subunit includes:

Further, an embodiment of the present application further provides a text semantic expression apparatus, including: a processor, a memory, a system bus;

the processor and the memory are connected through the system bus;

the memory is used for storing one or more programs, and the one or more programs comprise instructions which, when executed by the processor, cause the processor to execute any implementation method of the text semantic expression method.

Further, an embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed on a terminal device, the terminal device is caused to execute any implementation method of the text semantic expression method.

Further, an embodiment of the present application further provides a computer program product, which when running on a terminal device, causes the terminal device to execute any implementation method of the text semantic expression method.

As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that all or part of the steps in the above embodiment methods can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network communication device such as a media gateway, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

It should be noted that, in the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A text semantic expression method is characterized by comprising the following steps:

acquiring a target text to be expressed;

performing semantic expression on the target text according to the dependency relationship among the target words and the importance of each dependency path; and the importance is determined according to the text coding vector of the target text and the path coding vector of each dependent path.

2. The method of claim 1, wherein determining dependencies between respective target terms comprises:

and determining the dependency relationship between two words in the word pairs for the word pairs respectively corresponding to the target words.

3. The method according to claim 2, wherein the semantically expressing the target text according to the dependency relationship between the target words comprises:

4. The method of claim 1, wherein semantically expressing the target text according to dependencies between the target words comprises:

and semantically expressing the target text according to the dependency relationship among the target words and each dependency path, wherein each dependency path is each sub-path in a dependency syntax tree, the dependency syntax tree describes the dependency relationship among the target words, and the end point of each sub-path is a leaf node of the dependency syntax tree.

5. The method according to claim 4, wherein the semantically expressing the target text according to the dependency relationship between the respective target words and each dependency path comprises:

6. The method of claim 5, wherein the separately determining the importance of each dependency path in the application scenario comprises:

and determining a path weight of the dependent path by using the text encoding vector and the path encoding vector, wherein the path weight characterizes the importance of the dependent path in the application scene.

7. The method according to claim 6, wherein the semantically expressing the target text according to the dependency relationship between the respective target words and the importance of each dependency path comprises:

determining path code vectors corresponding to all the dependent paths according to the path code vector corresponding to each dependent path and the path weight;

8. A text semantic expression apparatus, comprising:

the text semantic expression unit is used for performing semantic expression on the target text according to the dependency relationship among the target words and the importance of each dependency path; and the importance is determined according to the text coding vector of the target text and the path coding vector of each dependency path.

9. The apparatus according to claim 8, wherein the dependency determination unit comprises:

10. The apparatus of claim 9, wherein the text semantic expression unit comprises:

the first relation vector determining subunit is used for determining, for each word pair, a word vector corresponding to each word in the word pair and a relation vector corresponding to the dependency relationship between two words in the word pair;

11. The apparatus according to claim 8, wherein the text semantic expression unit is specifically configured to semantically express the target text according to a dependency relationship between target words and each dependency path, where each dependency path is each sub-path in a dependency syntax tree, the dependency syntax tree describes a dependency relationship between target words, and an end point of the sub-path is a leaf node of the dependency syntax tree.

12. The apparatus according to claim 11, wherein the text semantic expression unit comprises:

13. The apparatus of claim 12, wherein the importance determination subunit comprises:

a second relation vector determining subunit, configured to determine, for each word pair, a word vector corresponding to each word in the word pair and a relation vector corresponding to a dependency relationship between two words in the word pair;

14. The apparatus of claim 13, wherein the text semantic expression subunit comprises:

15. A text semantic expression apparatus, comprising: a processor, a memory, a system bus;

the processor and the memory are connected through the system bus;

the memory is to store one or more programs, the one or more programs comprising instructions, which when executed by the processor, cause the processor to perform the method of any of claims 1-7.

16. A computer-readable storage medium having stored therein instructions that, when executed on a terminal device, cause the terminal device to perform the method of any one of claims 1-7.

17. A computer program product, characterized in that the computer program product, when run on a terminal device, causes the terminal device to perform the method of any of claims 1-7.