CN113254573A

CN113254573A - Text abstract generation method and device, electronic equipment and readable storage medium

Info

Publication number: CN113254573A
Application number: CN202010089026.3A
Authority: CN
Inventors: 徐海洋; 韩堃
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2020-02-12
Filing date: 2020-02-12
Publication date: 2021-08-13

Abstract

The embodiment of the invention discloses a text abstract generating method, a text abstract generating device, electronic equipment and a readable storage medium, by obtaining a document structure diagram that characterizes the relationships between words and sentences in the target text and semantic vectors for the words in the target text, and the semantic vector and the document structure chart of each word are input into a graph network in an abstract generation model so as to output the structure vector of each word according to the relationship among the words, determining a target word vector for each of the words based on the semantic vector and the structure vector for each of the words, determining the abstract of the target text according to the word vector of each word, that is, the embodiment of the invention obtains the structure vector of each word by using the graph network, and the semantic vectors and the structural vectors of all the words are combined to obtain the abstract of the target text, so that the accuracy and the simplicity of the text abstract are improved.

Description

Text abstract generation method and device, electronic equipment and readable storage medium

Technical Field

The invention relates to the technical field of computers, in particular to a text abstract generating method and device, electronic equipment and a readable storage medium.

Background

With the development of the internet, data received by people is subjected to well-spraying, so how to quickly and automatically extract a brief summary containing key information from mass data is a problem to be solved urgently at present.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for generating a text abstract, an electronic device, and a readable storage medium, so as to improve accuracy and simplicity of the text abstract.

In a first aspect, an embodiment of the present invention provides a text summary generating method, where the method includes:

acquiring a target text;

determining a document structure diagram of the target text, wherein the document structure diagram is used for representing the relationship between words and sentences in the target text;

determining semantic vectors of words in the target text;

inputting the semantic vector of each word and the document structure diagram into a graph network in an abstract generation model so as to output the structure vector of each word according to the relation between the words;

determining a target word vector of each word according to the semantic vector and the structure vector of each word;

and determining the abstract of the target text according to the target word vector of each word.

Optionally, determining the target word vector of each word according to the semantic vector and the structure vector of each word includes:

determining an initial word vector of each word according to the semantic vector and the structure vector of each word;

determining a document vector of the target text according to the initial word vector of each word;

determining a gate vector of each word according to the document vector and the initial word vector of each word, wherein the gate vector is used for representing the weight of the corresponding word;

and filtering the information of the initial word vector of each word according to the gate vector of each word to obtain the target word vector of each word.

Optionally, the abstract generation model further includes a long-term and short-term memory network, and determining semantic vectors of words in the target text includes:

and inputting the target text into a long-time memory network in an abstract generation model, and outputting semantic vectors of words in the target text.

Optionally, the graph network is a graph convolution neural network or a graph attention network.

Optionally, the graph network is a graph convolution neural network, and a convolution operation of the graph convolution neural network satisfies the following formula:

wherein, sigma is the activation function of the graph convolution neural network,

for the output of the i-th node at the l-th convolutional layer,

m (i) is the set of neighboring nodes of the ith node,

is a trainable parameter in which the input of the layer 1 convolutional layer

Is the semantic vector of the ith node.

Optionally, determining the document structure diagram of the target text includes:

performing word segmentation processing on the sentences in the target text;

analyzing the dependency relationship of the sentences to determine the dependency edges among the words in the sentences;

determining adjacent edges of adjacent sentences in the target text;

performing reference resolution processing on the target text, and determining a reference resolution edge, wherein nodes corresponding to the reference resolution edge represent the same object;

and connecting the same words in the adjacent sentences in the target text, and determining the same word edges.

Optionally, determining the document structure diagram of the target text further includes:

and responding to the fact that the sentences in the target text are the dialogue sentences of a plurality of roles, connecting adjacent sentences of the same role, and determining role edges.

Optionally, the abstract generation model is trained by the following steps:

acquiring a first data set, wherein the first data set comprises a plurality of multi-role dialog texts in a first language and abstracts of each dialog text;

determining a document structure diagram of each dialog text in the first data set;

inputting each dialog text and a document structure diagram of each dialog text in the first data set into the abstract generating model for processing;

and adjusting parameters of the abstract generating model according to the output of the abstract generating model and the abstract of each dialog text so as to train the abstract generating model.

Optionally, the abstract generation model is trained by the following steps:

acquiring a second data set, wherein the second data set comprises a plurality of second language texts and abstracts of the second language texts;

determining a document structure diagram of each second language text in the second data set;

inputting each second language text in the second data set and a document structure diagram of each second language text into the abstract generating model for processing;

and adjusting parameters of the abstract generating model according to the output of the abstract generating model and the abstract of each second language text so as to train the abstract generating model.

In a second aspect, an embodiment of the present invention provides a text summary generating apparatus, where the apparatus includes:

a target text acquisition unit configured to acquire a target text;

the document structure diagram determining unit is configured to determine a document structure diagram of the target text, and the document structure diagram is used for representing the relationship between words and sentences in the target text;

a semantic vector determination unit configured to determine semantic vectors of words in the target text;

a structure vector determining unit, configured to input the semantic vector of each word and the document structure diagram into a graph network in a summary generation model, so as to output the structure vector of each word according to the relationship between each word;

a target word vector obtaining unit configured to determine a target word vector of each word according to the semantic vector and the structure vector of each word;

and the abstract generating unit is configured to determine an abstract of the target text according to the target word vector of each word.

In a third aspect, an embodiment of the present invention provides an electronic device, including a memory and a processor, where the memory is used to store one or more computer program instructions, and the processor executes the one or more computer program instructions to implement the method as described above.

In a fourth aspect, embodiments of the present invention provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement a method as described above.

The method comprises the steps of obtaining a document structure diagram representing the relation between words and sentences in a target text and semantic vectors of the words in the target text, inputting the semantic vectors of the words and the document structure diagram into a graph network in an abstract generation model, outputting the structure vectors of the words according to the relation between the words, determining the target word vectors of the words according to the semantic vectors and the structure vectors of the words, and determining the abstract of the target text according to the word vectors of the words, namely, obtaining the structure vectors of the words by adopting the graph network and combining the semantic vectors and the structure vectors of the words to obtain the abstract of the target text so as to improve the accuracy and the simplicity of the text abstract.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent from the following description of the embodiments of the present invention with reference to the accompanying drawings, in which:

FIG. 1 is a flow chart of a text summary generation method of an embodiment of the present invention;

FIG. 2 is a flowchart of a document structure diagram generation method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a dependency tree for statements of an embodiment of the present invention;

FIG. 4 is a schematic diagram of a document structure diagram according to an embodiment of the invention;

FIG. 5 is a schematic diagram of another document structure diagram of an embodiment of the present invention;

FIG. 6 is a flowchart of a method for obtaining a target word vector according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a summary generation model of an embodiment of the present invention;

FIG. 8 is a flowchart of a method for training a summary generation model according to an embodiment of the present invention;

FIG. 9 is a flow chart of another method for training a digest generation model according to an embodiment of the present invention;

FIG. 10 is a diagram illustrating the evaluation results of the abstract generating text model according to the embodiment of the invention;

fig. 11 is a schematic diagram of a text summary generation apparatus according to an embodiment of the present invention;

fig. 12 is a schematic diagram of an electronic device of an embodiment of the invention.

Detailed Description

The present invention will be described below based on examples, but the present invention is not limited to only these examples. In the following detailed description of the present invention, certain specific details are set forth. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details. Well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the present invention.

Further, those of ordinary skill in the art will appreciate that the drawings provided herein are for illustrative purposes and are not necessarily drawn to scale.

Unless the context clearly requires otherwise, throughout the description, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, what is meant is "including, but not limited to".

In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified.

The abstract generation method in the related technology mainly aims at the text to perform linear sequence modeling, and ignores potential relation information in the text, such as dependency relation between words in each sentence, sequence relation between the sentences, reference relation of the words in each sentence and the like, so that the generated abstract is not accurate and concise enough.

Fig. 1 is a flowchart of a text summary generation method according to an embodiment of the present invention. As shown in fig. 1, the text summary generating method according to the embodiment of the present invention includes the following steps:

step S110, acquiring a target text. Optionally, the target text may be a dialog-type text or a news-type text, which is not limited in this embodiment. For example, if the target text is a customer service conversation record of the network platform, the abstract of the customer service conversation record text is generated to extract the core information of the corresponding text, so that the customer service can quickly find the problem and provide a corresponding solution. If the target text is a news text, the user can judge whether the news is interesting information through the core information in the abstract, therefore, the user can quickly filter the unnecessary information through the abstract, the speed of browsing the information by the user is increased, the user can quickly acquire effective information, and the experience of the user can be further improved.

Step S120, determining a document structure diagram of the target text. The document structure diagram is used for representing the relation between words and sentences in the target text. Optionally, in this embodiment, the document structure diagram uses each term as a node, and uses the relationship between terms and the relationship between statements as an edge.

FIG. 2 is a flowchart of a document structure diagram generation method according to an embodiment of the present invention. In an alternative implementation, as shown in fig. 2, step S120 may include:

step S121, performing word segmentation processing on the sentence in the target text. In an alternative implementation manner, word segmentation processing may be performed on a sentence in a target text by using word segmentation tools such as Stanford CoreNLP, ICTCLAS, HanLP, "jiba," etc., and it should be understood that the word segmentation tools are not limited in this embodiment. Optionally, if the sentence in the target text is an english text, the target text may be participled by using Stanford CoreNLP, and if the sentence in the target text is a chinese text, the target text may be participled by using a "jiba" participle tool.

Stanford CoreNLP is a human language technology tool that can give basic forms of words, e.g., tag the part of speech of a word, tag sentence structures with phrases and syntactic dependencies, indicate the same objects to which noun phrases refer, represent emotion, etc. The word segmentation tool supports three word segmentation modes: precision mode, full mode, and search engine mode. In the exact mode, the sentence is attempted to be cut open most accurately, suitable for text analysis. In the full mode, all words which can be used as words in the sentence are scanned out, the speed is high, but ambiguity cannot be solved. Under the search engine mode, long words are segmented again on the basis of the accurate mode, the recall rate is improved, and the method is suitable for word segmentation of the search engine. In this embodiment, the target text may be subjected to word segmentation processing in an accurate mode of the word segmentation tool.

And step S122, carrying out dependency relationship analysis on the sentences in the target text, and determining the dependency edges among the words in each sentence. Wherein each term node and dependency edge in a statement form a dependency tree for that statement. In an optional implementation manner, dependency relationship analysis is performed on each sentence in the target text by using the Stanford CoreNLP, a dependency edge with a direction between words in each sentence is determined, and a dependency tree of each sentence is formed.

The dependency syntax explains the syntax structure by analyzing the dependency relationship among the components in the language unit, and the core verb in the sentence is claimed to be the central component of the self-configuration of other components, while the core verb is not subject to any other components, and all the subject components are subject to the subject in a certain relationship. The dependency grammar has 5 theorems: (1) there is a central component in a sentence called root, which is independent of other components. (2) The other ingredients depend directly on a certain ingredient. (3) Neither component can depend on two or more components. (4) If component A directly depends on component B and component C is located between component A and component B in the sentence, then component C depends on either component A or component B or some component between component A and component B. (5) The other components on the left and right sides of the central component are not related to each other.

FIG. 3 is a diagram of a dependency tree for statements in accordance with an embodiment of the present invention. As shown in fig. 3, taking the sentence "passenger gets on the vehicle at the location X" as an example, the word segmentation result is "passenger/at/location X/getting on the vehicle", where "passenger" is the 1 st word in the sentence, and its part of speech is noun, "at" part of speech is preposition p, "location X" is noun, and "getting on the vehicle" is verb v. Wherein the dependency between "passenger" and "boarding" characterizes the predicate SBV. "at point X" represents a point's dependency, which represents the subject relationship POB, "between" and "point X" represents the subject relationship ADV, and "between" point X "and" boarding "represents the subject modification relationship ADV, and each term is taken as a node, and the terms having the dependency are connected in a corresponding dependency relationship to form a corresponding dependency tree 3, in which each dependency tree has a corresponding root node root. As shown in fig. 3, the central component in the sentence is "get on", that is, one dependent edge corresponding to the node "get on" in the dependency tree 3 points to the root node root, and the dependent edge relationship between the node "get on" and the root node root is the core relationship HED.

Step S123, determine the adjacent edges of adjacent sentences in the target text. Optionally, root nodes of dependency trees of adjacent sentences in the target text are connected to characterize adjacency relations between the sentences, and corresponding adjacency edges are formed.

And step S124, performing reference resolution processing on the target text, and determining a reference resolution edge. Wherein, the nodes corresponding to the resolution edges represent the same object.

FIG. 4 is a diagram illustrating a document structure according to an embodiment of the present invention. As shown in fig. 4, assume that the target text includes the adjacent sentence "passenger gets on the vehicle at location X. "and sentence" at what time did he get on the car? "passenger" in the first sentence and "he" in the second sentence refer to the same person, and the words "passenger" and "he" are connected to form corresponding reference resolution edges.

And step S125, connecting the same words in the adjacent sentences in the target text, and determining the same word edges. As shown in fig. 4, the first sentence and the second sentence both include the word "getting on the bus", and the words "getting on the bus" in the two adjacent sentences are connected to form corresponding same word edges.

Therefore, in the embodiment, the document structure diagram of the target text is determined by determining the dependency edges between the words in each sentence in the target text, determining the adjacent edges of the adjacent sentences, connecting the reference resolution edges of the same object, and connecting the same word edges of the same words in the adjacent sentences, so that the document structure diagram of the embodiment can more comprehensively represent the association relationship between the sentences and the words in the target text, and the accuracy and the simplicity of the text abstract can be further improved in the subsequent processing.

In an optional implementation manner, step S120 further includes step S126:

step S126, responding to the fact that the sentences in the target text are the dialogue sentences of a plurality of roles, connecting adjacent sentences of the same role, and determining role edges.

It should be understood that, in the embodiment of the present invention, there is no sequential execution order between step S122 and step S126, that is, each type of edge of the document structure diagram may be determined sequentially or simultaneously, and this embodiment does not limit this.

FIG. 5 is a schematic diagram of another document structure diagram according to an embodiment of the present invention. In this embodiment, a target text is described as an example of a dialog text, where the target text is a dialog text of a character a and a character b, and the dialog contents are:

and a role a: passenger/at/location X/boarding/.

And b, role b: he/at/what time/getting on/are?

And a role a: he/at/noon 12/boarding/.

And b, role b: good/.

Where "/" is used to characterize the word segmentation result.

As shown in fig. 5, in the document structure diagram 5, different line types are used to characterize the edges of different relations (the relation type of each edge characterization is not all shown in fig. 5), wherein, the edge between different words in each sentence is a dependency edge, the edge of adjacent sentences connected by the root node is an adjacent edge, for example, the edge of the first dialog connecting role a and role b, the edge of the adjacent sentence of the same role connected by the root node is the role edge, for example, the edge connecting the adjacent sentences corresponding to the role a (i.e. the edge connecting the first sentence and the third sentence in the target text), the edge connecting the same word in the adjacent sentences is the same word edge, for example, an edge between "getting on" in the first sentence and "getting on" in the second sentence, an edge connecting words referring to the same object is an edge referring to a resolved edge, for example, an edge between "passenger" in the first sentence and "him" in the second sentence.

Therefore, in the embodiment, the dependency edges between words in each sentence in the target text are determined, the reference resolution edges of the same object are connected, the same word edges of the same words in adjacent sentences are connected, and the adjacent edges of the adjacent sentences and the role edges of the adjacent sentences corresponding to the same role are determined to determine the document structure diagram of the target text.

Step S130, semantic vectors of words in the target text are determined. In an optional implementation manner, the text summary generation method of the embodiment is obtained by a pre-trained summary generation model. The abstract generation model comprises a Long Short-Term Memory (LSTM). Step S130 may specifically be: and inputting the target text into a long-time memory network in the abstract generation model, and outputting semantic vectors of words in the target text.

Optionally, the present embodiment determines semantic vectors of words in the target text by using a BiLSTM network formed by combining forward LSTM and backward LSTM.

The LSTM is a time recursive neural network, which is suitable for processing and predicting important events with relatively long intervals and delays in a time sequence, a processor for judging whether information is useful is added into the LSTM and is called a cell, the cell comprises an input gate, a forgetting gate and an output gate, after the information enters the LSTM, whether the information is useful can be judged according to rules, wherein the information which conforms to the algorithm authentication is left, and the non-conforming information is forgotten through the forgetting gate. Therefore, the context information is analyzed through a bidirectional long-time memory network (BilSTM) and the useful context information of the words is reserved, so that the semantic expression capability of the semantic vectors of the obtained words is improved.

Step S140, the semantic vectors and the document structure diagram of the words in the target text are input into a graph network in the abstract generation model, so that the structure vectors of the words are output according to the relation among the words. In an alternative implementation, the Graph Network is a Graph Convolutional neural Network (GCN) or a Graph Attention Network (GAT or GAN).

The core idea of the graph convolution neural network is to aggregate node information by using information of edges in a graph, thereby generating a new node representation. The graph convolution neural network includes two types, one is based on a spatial domain or a vertex domain, and the other is based on a frequency domain or a spectral domain. The graph convolution neural network of the space domain or the vertex domain directly defines the convolution operation on the connection relation of each node based on the space domain convolution method. The graph convolution neural network of the frequency domain or the spectral domain realizes the convolution operation on the topological graph by means of the theory of the graph, namely by means of the eigenvalue and the eigenvector of the Laplace matrix of the graph.

The graph attention network proposes weighted summation of neighboring node features with an attention mechanism. Wherein the weights of the neighboring node features are completely dependent on the node features, independent of the graph structure. In the graph attention network, different weights can be allocated to each node in the graph according to the characteristics of adjacent nodes (nodes sharing edges), and after the attention mechanism is introduced, the node characteristics are only related to the adjacent nodes, so that the information of the whole graph is not required to be obtained, the calculation amount is small, and the data processing efficiency is high.

And S150, determining a target word vector of each word according to the semantic vector and the structure vector of each word in the target text.

Fig. 6 is a flowchart of a method for obtaining a target word vector according to an embodiment of the present invention. In an alternative implementation manner, as shown in fig. 6, step S150 may specifically include the following steps:

step S151, determining an initial word vector of each word according to the semantic vector and the structure vector of each word in the target text. In an alternative implementation, the semantic vector and the structure vector of each word are combined and concatenated to obtain an initial word vector of each word.

Step S152, determining the document vector of the target text according to the initial word vector of each word in the target text.

Step S153, determining a gate vector of each word according to the document vector of the target text and the initial word vector of each word. Wherein the gate vector is used to characterize the weight of the corresponding word.

Step S154, the initial word vector of each word is filtered according to the gate vector of each word in the target text, and the target word vector of each word is obtained.

In the embodiment, the structure vectors of the words are obtained by adopting the graph network, and the semantic vectors and the structure vectors of the words are combined to obtain the target word vectors of the words in the target text, so that the accuracy and the simplicity of the text abstract can be improved.

Step S160, determining the abstract of the target text according to the target word vector of each word in the target text. In an optional implementation manner, the abstract of the target text is generated according to the importance degree of each word by performing decoding operation on the target word vector in the target text.

The method comprises the steps of obtaining a document structure diagram representing the relation between words and sentences in a target text and semantic vectors of the words in the target text, inputting the semantic vectors of the words and the document structure diagram into a graph network in an abstract generation model, outputting the structure vectors of the words according to the relation between the words, determining the target word vectors of the words according to the semantic vectors and the structure vectors of the words, and determining the abstract of the target text according to the word vectors of the words, namely, obtaining the structure vectors of the words by adopting the graph network and combining the semantic vectors and the structure vectors of the words to obtain the abstract of the target text, so that the accuracy and the simplicity of the text abstract are improved.

FIG. 7 is a schematic diagram of a digest generation model according to an embodiment of the present invention. The present embodiment is described by taking a graph network as a graph convolution neural network as an example, and it should be understood that the present embodiment does not limit the graph network used, and other types of graph networks, such as a graph attention network, may be applied to the abstract generation model.

As shown in FIG. 7, the digest generation model 7 includes a BilSTM network 72, a graph convolution neural network 73, a gate attention network 74, and a decoder 75.

In the present embodiment, the target text is processed with reference to the processing method of step S121 to step S126 in fig. 2 to obtain document structure diagram 71 of the target text. In the document structure diagram 71, two sentences 711 and 712 are included, and the sentence 711 includes a node W₁-W₆Statement 712 includes node W₇-W₁₂. In statement 711, the core node is node W₃In statement 712, the core node is node W₉. Document structure fig. 7 shows that different linear and directional connections represent different connection relationships.

In this embodiment, first, the target text after word segmentation processing is input into the BiLSTM network 72 to perform context processing on the target text, so as to obtain semantic vectors of words in the target text:

x_i＝W_ew_i,i∈{1,…,n} (1)

wherein, w_iIn fig. 7, the number n of nodes in the target text is 12, W is the number of nodes in the target text, n is the number of nodes in the target text, and n is an integer greater than or equal to 1_eEmbedded vector, x, corresponding to ith node_iIs the ithNode passes through embedded vector W_eThe processed distributed representation is then presented to the user,

for the ith node, through the forward LSTM (i.e., via the forward LSTM)

) The forward semantic vector after the operation is carried out,

for the ith node, pass through the backward LSTM (i.e.

) The backward semantic vector after the operation is carried out,

is the semantic vector of the ith node.

The semantic vector for each word in the target text (i.e., the output of BilSt network 72) and document structure map 7 are then input into a graph convolution neural network 73 to obtain the structure vector for each word. The convolution operation of the graph convolution neural network meets the following formula:

the output of ith node at the l convolutional layer, i ∈ {1, …, n }, n is the number of nodes in the target text,

m (i) is the set of nodes adjacent to the ith node (the set of nodes that also share edges with the ith node),

is a trainable parameter in which the input of the layer 1 convolutional layer

As semantic vector of ith node

In this embodiment, when the first convolutional layer is the last convolutional layer in the graph convolutional neural network,

as a structure vector of the ith node

Optionally, in this embodiment, for different types of edges (for example, adjacent edges, same word edges, edges in the dependency tree, and the like) corresponding to the ith node, the parameters are set

Different.

In this embodiment, the semantic vector and the structure vector of the word in the target text are connected to obtain an initial word vector h of the node_i：

Then, the initial word vector h of the node in the target text is used₁-h_nInput to the gate attention network 74 to obtain a target word vector for each word in the target text:

determining a document vector of the target text according to the initial word vector of the node:

u_i＝tanh(W_wh_i+b_w) (7)

d_v＝∑_na_ih_i (9)

wherein h is_iThe initial word vector of the ith node is represented by i ∈ {1, …, n }, n is the number of nodes in the target text, tanh is a hyperbolic tangent function, and W is_w、b_w、u_wIs a trainable parameter, exp is an exponential function based on e, u_iFor the hyperbolic tangent value calculated from the node initial word vector,

is u_iA transpose of_iFor the weight parameter corresponding to the ith node, d_vA document vector that is the target text.

Determining a gate vector of each word according to the document vector of the target text and the initial word vector of each word:

g_i＝σ(W_gh_i+U_gd_v+b_g) (10)

where σ is the activation function of the door attention network, W_g、U_gAnd b_gFor trainable parameters, h_iIs the initial word vector of the ith node, i is e {1, …, n }, n is the number of nodes in the target text, d_vDocument vector, g, for target text_iIs the gate vector of the ith node.

And (3) filtering the information of the initial word vector of each word according to the gate vector of each word in the target text to obtain the target word vector of each word:

wherein h is_iIs the initial word vector of the ith node, i is equal to {1, …, n }, n is the number of nodes in the target text, g_iA gate vector for the ith node, which is an element-wise product between vectors,

is the target word vector of the ith node.

Then, the target word vectors of the nodes in the target text are input to the decoder 75 for decoding, so as to extract the abstract of the target text composed of the important words.

Fig. 8 is a flowchart of a method for training a summary generation model according to an embodiment of the present invention. In an alternative implementation, as shown in fig. 8, the summary generation model is trained by:

step S210, a first data set is acquired. Wherein the first data set includes a plurality of multi-role dialog texts in a first language and a summary of each dialog text. The first language may be chinese or dialog text of other languages, which is not limited in this embodiment.

Step S220 is to determine a document structure diagram of each dialog text in the first data set. In an alternative implementation, the processing method of steps S121 to S126 in fig. 2 is used to perform processing to obtain a document structure diagram of each dialog text in the first data set.

Step S230, inputting each dialog text in the first data set and the document structure diagram of each dialog text into the abstract generating model for processing, and the processing procedure refers to the embodiment in fig. 7, which is not described herein again.

Step S240, adjusting parameters of the abstract generating model according to the output of the abstract generating model and the abstract of each dialog text, so as to train the abstract generating model.

The embodiment trains the abstract generation model by adopting the first data set comprising a plurality of dialog texts, so that the abstract generation model can extract core information from the texts in a dialog form, and when the abstract generation model is applied to customer service data, customer service can acquire the core information from a large number of chat records, thereby accurately recording problems and providing a solution.

Fig. 9 is a flowchart of a method for training a digest generation model according to an embodiment of the present invention. In an alternative implementation, as shown in fig. 9, the summary generation model is trained by:

step S310, a second data set is obtained, where the second data set includes a plurality of second language texts and abstracts of the second language texts. The second data set may be a news data set, for example, when the second language is english, an open domain news data set of CNN/daisy is used as the second data set. Optionally, the second language may be the same as or different from the first language, and this embodiment does not limit this.

Step S320, determining a document structure diagram of each second language text in the second data set. In an alternative implementation, the processing method in steps S121 to S126 in fig. 2 is used to perform processing to obtain a document structure diagram of each second language text in the second data set.

Step S330, inputting the second language texts in the second data set and the document structure diagram of the second language texts into the abstract generating model for processing, and the processing process refers to the embodiment in fig. 7, which is not repeated herein.

Step S340, adjusting parameters of the abstract generating model according to the output of the abstract generating model and the abstract of each second language text, so as to train the abstract generating model.

The embodiment trains the abstract generating model by adopting the second data set comprising a plurality of second language texts, so that the abstract generating model can extract core information from the texts in the form of news, so that a user can obtain an accurate abstract to quickly judge whether the core information is required information, and the efficiency of reading documents by the user is improved.

In the embodiment of the present invention, the data sets in fig. 8 and fig. 9 are used to train the abstract generating model for example, and it should be understood that the data sets of the application scenarios may be collected to train the abstract generating model according to different application scenarios, so that the trained abstract generating model can generate abstract information conforming to the application scenarios.

Fig. 10 is a schematic diagram of the evaluation result of the abstract generating text model according to the embodiment of the invention. In the present embodiment, a round-organized based Evaluation method is taken as an example for explanation, and it should be understood that other Evaluation methods, such as BLEU, etc., can be used for evaluating the abstract generation model in the present embodiment.

The route is a set of indexes for evaluating the automatically generated abstract and the machine translation, and corresponding scores are obtained by comparing the automatically generated abstract or the translation with a set of reference abstract so as to measure the similarity between the automatically generated abstract or the translation and the reference abstract.

As shown in FIG. 10, in this embodiment, the summary generation model of this embodiment is evaluated by using the evaluation results of Rouge-1(R-1), Rouge-2(R-2) and Rouge-L (R-L). That is, the accurate recall rate of the automatically generated summary is adopted as the evaluation result.

As shown in FIG. 10, the first portion of data 101 is the baseline of each evaluation index, wherein "Refresh" comes from the paper "Shashi Narayan, Ronald Cardenas, Nikos Papasarantopodos, Shay B Cohen, Mirela Lapata, Jiangshengyu, and Yi Chang," Document modeling with external attribute for sensing external attribute, "in Proceedings of the 56th environmental recording of the Association for the comparative experiment linearity (Volume 1: Loper), 2018, vol.1, pp.2030. "RNN-RL" is from the paper "Yen-Chun Chen and Mohit Bansal," Fast reactive rendering with respect to the relationship-selected sensitivity writing, "in Proceedings of the 56th environmental evaluation of the Association for the comparative regulations (Volume 1: Long Papers),2018, pp.675-686".

The second part of data 102 is the evaluation data of other existing automatic summarization Methods, wherein "Bottom-up" comes from the paper "Sebastian Gehrmann, Yunitian Deng, and Alexander Rush," Bottom-up interactive summary, "in Proceedings of the 2018Conference on actual Methods in Natural Language Processing,2018, pp.4098-4109". "Info-select" is from the article "Wei Li, Xinyan Xiao, Yajuan Lyu, and YuanzhuoWang," Improving neural responsive summary evaluation with application information selection modification, "in Proceedings of the 2018Conference on statistical Methods in Natural Language Processing,2018, pp.1787-1796".

The third part of data is evaluation data of the abstract generation model of the embodiment. As can be seen from FIG. 10, the route-1 (R-1), route-2 (R-2) and route-L (R-L) of the summary generation model of the present embodiment are all above the respective baseline, and are significantly better than other automatic summary generation methods.

The embodiment of the invention acquires the structural vector of each word by adopting the graph network, and combines the semantic vector and the structural vector of each word to acquire the abstract of the target text, thereby improving the accuracy of text abstract.

Fig. 11 is a schematic diagram of a text summary generation apparatus according to an embodiment of the present invention. As shown in fig. 11, the text digest generation apparatus 11 according to the embodiment of the present invention includes a target text acquisition unit 111, a document structure diagram determination unit 112, a semantic vector determination unit 113, a structure vector determination unit 114, a target word vector acquisition unit 115, and a digest generation unit 116.

The target text acquisition unit 111 is configured to acquire a target text. The document structure diagram determining unit 112 is configured to determine a document structure diagram of the target text, where the structure diagram is used for representing the relationship between words and sentences in the target text. The semantic vector determination unit 113 is configured to determine semantic vectors for words in the target text. Optionally, the semantic vector determining unit 113 is further configured to input the target text into a long-term memory network in the abstract generation model, and output a semantic vector of a word in the target text. The structure vector determination unit 114 is configured to input the semantic vector of each word and the document structure diagram into a graph network in a summary generation model, so as to output the structure vector of each word according to the relationship between each word. Optionally, the graph network is a graph convolution neural network or a graph attention network. The target word vector obtaining unit 115 is configured to determine a target word vector for each of the words according to the semantic vector and the structure vector of each of the words. The abstract generating unit 116 is configured to determine an abstract of the target text according to the word vector of each of the words.

Fig. 12 is a schematic diagram of an electronic device of an embodiment of the invention. As shown in fig. 12, the electronic device shown in fig. 12 is a general-purpose data processing apparatus including a general-purpose computer hardware structure including at least a processor 121 and a memory 122. The processor 121 and the memory 122 are connected by a bus 123. The memory 122 is adapted to store instructions or programs executable by the processor 121. Processor 121 may be a stand-alone microprocessor or a collection of one or more microprocessors. Thus, processor 121 implements the processing of data and the control of other devices by executing instructions stored by memory 122 to thereby perform the method flows of embodiments of the present invention as described above. The bus 123 connects the above components together, and also connects the above components to a display controller 124 and a display device and an input/output (I/O) device 125. Input/output (I/O) device 125 may be a mouse, keyboard, modem, network interface, touch input device, motion sensing input device, printer, and other devices known in the art. Typically, the input/output devices 125 are coupled to the system through input/output (I/O) controllers 126.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus (device) or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may employ a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations of methods, apparatus (devices) and computer program products according to embodiments of the application. It will be understood that each flow in the flow diagrams can be implemented by computer program instructions.

These computer program instructions may be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows.

These computer program instructions may also be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows.

Another embodiment of the invention is directed to a non-transitory storage medium storing a computer-readable program for causing a computer to perform some or all of the above-described method embodiments.

That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A text summary generation method, the method comprising:

acquiring a target text;

determining semantic vectors of words in the target text;

2. The method of claim 1, wherein determining a target word vector for each of the words based on the semantic vector and the structure vector for each of the words comprises:

3. The method of claim 1, wherein the abstract generation model further comprises a long-and-short memory network, and wherein determining semantic vectors for words in the target text comprises:

4. The method of claim 1, wherein the graph network is a graph convolution neural network or a graph attention network.

5. The method of claim 4, wherein the graph network is a convolutional neural network, and a convolution operation of the convolutional neural network satisfies the following formula:

volume at l level for ith nodeThe output of the build-up layer,

m (i) is the set of neighboring nodes of the ith node,

is a trainable parameter in which the input of the layer 1 convolutional layer

Is the semantic vector of the ith node.

6. The method of claim 1, wherein determining the document structure diagram of the target text comprises:

performing word segmentation processing on the sentences in the target text;

determining adjacent edges of adjacent sentences in the target text;

7. The method of claim 6, wherein determining the document structure diagram of the target text further comprises:

8. The method according to any of claims 5-7, wherein the summary generation model is trained by:

9. The method according to any of claims 5-7, wherein the summary generation model is trained by:

10. An apparatus for generating a text summary, the apparatus comprising:

a target text acquisition unit configured to acquire a target text;

11. An electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer program instructions, wherein the one or more computer program instructions are executed by the processor to implement the method of any of claims 1-9.

12. A computer-readable storage medium on which computer program instructions are stored, which computer program instructions, when executed by a processor, are to implement a method according to any one of claims 1-9.