CN116628186A

CN116628186A - Text abstract generation method and system

Info

Publication number: CN116628186A
Application number: CN202310869688.6A
Authority: CN
Inventors: 李志杰; 郭晋; 姜波清; 于瑞清; 刀国羚
Original assignee: Lemai Information Technology Hangzhou Co ltd
Current assignee: Lemai Information Technology Hangzhou Co ltd
Priority date: 2023-07-17
Filing date: 2023-07-17
Publication date: 2023-08-22
Anticipated expiration: 2043-07-17
Also published as: CN116628186B

Abstract

The present disclosure provides a method and a system for generating a text abstract, which includes obtaining a byte length of a target text message, and if the byte length exceeds a preset byte threshold value, screening the target text message to determine a screening abstract corresponding to the target text message; splitting the screening abstract into a plurality of sentences, vectorizing the sentences to express, determining the similarity among the sentences, taking the sentences as nodes, taking the similarity among the sentences as connecting edges, and constructing an abstract graph corresponding to the screening abstract; extracting local semantic features and global semantic features of nodes in the abstract graph and position information of each word in the nodes through a preset abstract generation model, and respectively distributing corresponding weight coefficients for the local semantic features, the global semantic features and the position information according to an attention mechanism to generate a text abstract corresponding to the target text information.

Description

Text abstract generation method and system

Technical Field

The disclosure relates to text abstract extraction technology, and in particular relates to a text abstract generation method and system.

Background

Today, with the rapid growth of the pace of life of people, the amount of text data is also rapidly increasing, and a large amount of text data is emerging, which makes it difficult for both human beings and computers to quickly acquire and process main information in text. Therefore, how to extract the core content from massive text information rapidly and effectively by mathematical theory and technical means and extract the core content into a high-quality abstract has become a problem to be solved urgently.

CN201610232659.9, a text abstract generating system and a method based on an encoding-decoding deep neural network, comprising an Internet text acquisition module, a text extraction module and a text extraction module, wherein the Internet text acquisition module is used for acquiring text information on the Internet; the data preprocessing module is used for preprocessing the text information; the abstract model training module is used for extracting quantitative text information from the text information subjected to pretreatment and training the text information according to the coding-decoding deep neural network model to obtain an abstract training model; and the abstract generation module is used for inputting and outputting the preprocessed text information according to the coding and decoding deep neural network model and outputting abstract information with preset length.

Specifically, the extraction technology based on the encoder-decoder architecture is limited by the receptive field of the architecture, and most models except partial heavyweight models can only encode and decode a small number of sentences in the input text, so that semantic information which can be captured by the neural model is limited. Moreover, since decimated techniques based on encoder-decoder architecture typically rely on sequential input forms, these techniques tend to ignore structural information of text that is critical to locating critical content.

CN202210433352.0, a method, a device, computer equipment and a storage medium for generating an extraction type text abstract, comprising the steps of numbering each sentence in a training corpus, carrying out word segmentation processing on each sentence after numbering to obtain word segmentation results, respectively calculating word level information entropy and phrase level information entropy of each sentence according to the word segmentation results, carrying out feature extraction on the sentences, obtaining sentence level information entropy according to extracted sentence feature vectors, calculating contribution degree of the obtained sentences according to the word level information entropy, the phrase level information entropy and the sentence level information entropy, selecting a target training sentence based on the contribution degree, training a pre-constructed neural network according to the target training sentence to obtain a text abstract generation model, inputting the target text into the text abstract generation model, and outputting the text abstract.

In general, the extraction technology adopts text units in the original text, so that the generated semantic consistency in the units is guaranteed to a certain extent, but the abstract generated by the technology is often not concise enough, redundant texts exist in the abstract result due to the excessively coarse granularity, and the consistency among the text units cannot be guaranteed well.

Disclosure of Invention

The embodiment of the disclosure provides a text abstract generating method and a system, which aim to solve part of problems in the prior art, namely the technical defects.

In a first aspect of embodiments of the present disclosure,

the method for generating the text abstract comprises the following steps:

acquiring the byte length of target text information, and screening the target text information to determine a screening abstract corresponding to the target text information if the byte length exceeds a preset byte threshold, wherein the screening abstract is obtained by extracting text content which is strongly associated with the target abstract in the target text information;

splitting the screening abstract into a plurality of sentences, vectorizing the sentences to express, determining the similarity among the sentences, taking the sentences as nodes, taking the similarity among the sentences as connecting edges, and constructing an abstract graph corresponding to the screening abstract;

Extracting local semantic features and global semantic features of nodes in the abstract graph and position information of each word in the nodes through a preset abstract generation model, and respectively distributing corresponding weight coefficients for the local semantic features, the global semantic features and the position information according to an attention mechanism to generate a text abstract corresponding to the target text information, wherein the abstract generation model is formed by combining a plurality of neural networks and is used for extracting the text abstract in the text information.

In an alternative embodiment of the present invention,

the step of screening the target text information to determine a screening abstract corresponding to the target text information comprises the following steps:

if the byte length of the target text information exceeds a preset byte threshold value, determining the number of sentences of the target text information, and carrying out segmentation processing on the target text information based on the number of sentences to construct a segmented text set;

extracting key sentences of the segmented text set through a key sentence extraction algorithm according to the segmented text set, and constructing a candidate abstract set by combining a greedy selection algorithm;

and obtaining the theme characteristics and the semantic characteristics of the candidate abstract set, determining the matching degree of the candidate abstract set and the target text information according to the theme characteristics and the semantic characteristics, and sequencing the candidate abstract set according to the matching degree to generate a screening abstract.

In an alternative embodiment of the present invention,

the matching degree of the candidate abstract set and the target text information is determined as shown in the following formula:

wherein ,representing the candidate abstract setAAnd the target text informationXMatching degree of-> 、/>Respectively representing a first weight value corresponding to the theme feature set and a second weight value corresponding to the semantic feature set,sim()representing a similarity function, ++>、/>Theme feature set and semantic feature set representing target text information, respectively, < ->、Subject feature set and semantic feature set respectively representing candidate abstract sets, and (B)>Representing the average of the first weight value and the second weight value.

In an alternative embodiment of the present invention,

the extracting the position information of each word in the node in the abstract graph by the preset abstract generating model comprises the following steps:

part-of-speech tagging is carried out on words contained in the nodes of the abstract graph, and first position information of each word in a single sentence is determined according to a position coding module preset in the abstract generation model;

analyzing the position dependency relationship of the words in each node by a dependency syntax analysis method according to the graph position information of each node in the abstract graph;

And distributing corresponding position coefficients for the first position information and the position dependency relationship respectively based on the pre-extracted keyword set of the screening abstract and the spatial position relationship between each word in the nodes in the abstract graph, and determining the position information of each word in the nodes in the abstract graph.

In an alternative embodiment of the present invention,

the extracting global semantic features of nodes in the abstract graph through a preset abstract generation model comprises the following steps:

mapping words contained in nodes of the abstract graph into dictionary sequence numbers of a preset dictionary in the abstract generation model, and mapping the words into word vectors according to the dictionary sequence numbers;

after the word vectors are subjected to rolling and pooling operations, determining first semantic features; performing forward coding and backward coding on the word vectors, performing length compression, and determining second semantic features;

and after the first semantic features and the second semantic features are spliced, determining global semantic features of nodes in the abstract graph.

In a second aspect of the embodiments of the present disclosure,

there is provided a text summary generation system comprising:

the first unit is used for acquiring the byte length of the target text information, and screening the target text information to determine a screening abstract corresponding to the target text information if the byte length exceeds a preset byte threshold value, wherein the screening abstract is obtained by extracting text content which is strongly associated with the target abstract from the target text information;

The second unit is used for splitting the screening abstract into a plurality of sentences, vectorizing the sentences to express, determining the similarity among the sentences, taking the sentences as nodes, taking the similarity among the sentences as connecting edges, and constructing an abstract graph corresponding to the screening abstract;

and the third unit is used for extracting local semantic features and global semantic features of the nodes in the abstract graph and position information of each word in the nodes through a preset abstract generation model, respectively distributing corresponding weight coefficients for the local semantic features, the global semantic features and the position information according to an attention mechanism, and generating a text abstract corresponding to the target text information, wherein the abstract generation model is formed by combining a plurality of neural networks and is used for extracting the text abstract in the text information.

In an alternative embodiment of the present invention,

the first unit is further configured to:

In an alternative embodiment of the present invention,

the third unit is further configured to:

In an alternative embodiment of the present invention,

the third unit is further configured to:

In a third aspect of the embodiments of the present disclosure,

there is provided an electronic device including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the instructions stored in the memory to perform the method described previously.

In a fourth aspect of embodiments of the present disclosure,

there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method as described above.

The beneficial effects of the text abstract of the disclosure may refer to the effects of the corresponding parts of the technical features in the specific embodiments, and are not described herein again.

Drawings

FIG. 1 is a schematic flow chart of a text summary generation method according to an embodiment of the disclosure;

fig. 2 is a schematic diagram of a text summarization system according to an embodiment of the disclosure.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, not all embodiments. Based on the embodiments in this disclosure, all other embodiments that a person of ordinary skill in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.

The technical scheme of the present disclosure is described in detail below with specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

Fig. 1 is a flow chart of a text summary generation method according to an embodiment of the disclosure, as shown in fig. 1, where the method includes:

s101, acquiring byte length of target text information, and screening the target text information to determine a screening abstract corresponding to the target text information if the byte length exceeds a preset byte threshold, wherein the screening abstract is obtained by extracting text content which is strongly associated with the target abstract in the target text information;

in the existing abstract generation model, the input end is a text sequence combination of an original document and a reference abstract, but with the increase of the length of the original document, the training difficulty of the abstract generation model can be increased due to an overlong input sequence. In addition, the existing abstract generation model limits the length of the input sequence to 512 characters, and the excess part is truncated, which results in information loss of the text sequence.

Illustratively, a preset byte threshold is set for determining when the byte length of the target text message exceeds the limit, which may be set according to specific requirements and system limitations, assuming that the preset byte threshold is set to 500 bytes; obtaining target text information to be screened, which may be text input by a user, text records in a database, or text data of other sources; calculating the byte length of the target text information: encoding the target text information, typically using UTF-8 encoding, then calculating the byte length, the byte length of different characters under UTF-8 encoding may be different, e.g., english characters typically occupy 1 byte, and chinese characters typically occupy 3 bytes.

Judging whether the byte length exceeds a preset threshold value: comparing the byte length of the target text information with a preset byte threshold value, and if the byte length exceeds the preset threshold value, performing the operation of screening the abstract; extracting text content strongly associated with the target abstract: determining which text content has a strong association with the target abstract may include keyword extraction, text abstract algorithms, or other related techniques, depending on the actual requirements.

In an alternative embodiment of the present invention,

Illustratively, if the byte length of the target text information does not exceed the preset byte threshold, the target text information is processed as a paragraph without segmentation processing; if the byte length of the target text information exceeds a preset byte threshold, the following steps are performed.

Dividing the target text information into a plurality of sentences by using a sentence dividing algorithm, counting the number of the divided sentences, recording as N, calculating the average number of sentences of each segment M=ceil (N/k), wherein k is a segmentation coefficient, and is used for controlling the number of the sentences of each segment, grouping the sentences according to a mode that each M sentences are one segment, and constructing a segmented text set;

for each segmented text, a key sentence extraction algorithm is used to extract key sentences from the segmented text to form a candidate abstract set. The key sentence extraction algorithm can be implemented by using a technology based on TF-IDF, textRank, BERT and the like. And selecting and sequencing the candidate abstract sets based on a greedy selection algorithm, and selecting sentences with highest correlation with the overall abstract to add into the candidate abstract sets.

For each sentence in the candidate abstract set, a topic modeling algorithm (such as LDA) can be used to obtain its topic features, and Word embedding models (such as Word2Vec, BERT) can be used to calculate the semantic features of the sentence. The topic features may reveal topic trends of the text, while the semantic features may capture semantic information in sentences. By combining the two characteristics, the matching degree between the candidate abstract set and the target text information can be measured more accurately, so that the accuracy of the matching degree is improved. If two sentences are very similar in topic or semantic features, it is likely that the information they convey is duplicative. By considering the theme characteristics and the semantic characteristics when the matching degree is considered, repeated contents in the candidate abstract set can be reduced, and the compactness and accuracy of information of the generated abstract are improved.

Calculating the matching degree score of each sentence in the candidate abstract set and the target text information according to the theme characteristics and the semantic characteristics; and sorting the candidate abstract sets according to the matching degree scores to obtain sorted candidate abstract sets, and selecting the first several sentences as screening abstracts according to the sorted candidate abstract sets.

Illustratively, the determining the matching degree of the candidate abstract set and the target text information is as follows:

In the extraction process of the segment type abstract, the existing model only calculates the semantic matching degree of the segment text and the segment candidate abstract, and the relevance between the segment candidate abstract and the original document is not considered, so that the extraction result of the segment text has lower relevance with the original document and higher relevance between the extraction results of each segment, and finally, the redundancy between the extraction texts is increased, and the input source of the abstract generation model is influenced.

The method for generating the abstract effectively generates the screening abstract related to the target text information through the steps of segmentation processing, key sentence extraction, greedy selection, acquisition of theme features and semantic features, matching degree sequencing and the like, can improve the information concentration effect, effectively reduces the information redundancy, extracts the content with the most representativeness and importance, and lays a foundation for subsequent abstract generation.

And adding an extraction process before the abstract generation model, and dividing the text into a plurality of fragments for extraction, wherein the result is used as the input of the abstract model. The method not only reduces the input length of the abstract model and improves the parallelism of the model, but also can reduce text information irrelevant to the semantics of the original document by the extracted input sequence and refine the input source of the model.

S102, splitting the screening abstract into a plurality of sentences, vectorizing the sentences to express, determining the similarity among the sentences, taking the sentences as nodes, taking the similarity among the sentences as connecting edges, and constructing an abstract graph corresponding to the screening abstract;

in an alternative embodiment of the present application,

for an input text needing to be abstracted, the application converts the text into a fully-connected graph G= { V, E) taking sentences as nodes, wherein V= { S1, S2, ⋯, sn } represents a node set E= {1,2, ⋯, n } in the graph, represents a set of connecting edges between nodes in the graph, which means that each node in the graph G= { V, E } is connected with other nodes, and hidden semantic features can be mined through the propagation process of node messages to obtain a higher-quality abstract by converting the text abstract into the fully-connected graph.

The method comprises the steps of dividing a screening abstract into a plurality of sentences by taking sentences as a dividing standard, converting the divided sentences into vector representations by using word2vec, determining an adjacency matrix of the sentences after being converted into the vector representations, determining the similarity of each sentence through cosine similarity according to the adjacency matrix of each sentence, and connecting two nodes with highest similarity, so as to construct an abstract graph corresponding to the screening abstract.

By way of example only, and in an illustrative,

screening abstract: "dogs are human faithful friends, and there are many ways to keep dogs on attention. In addition, multi-companion dogs and care are also important methods for caring for dogs. "split sentence:

sentence 1: "dogs are human faithful friends";

sentence 2: "there are many ways in which dogs need to be raised;

sentence 3: "furthermore, multi-companion dogs, care, are also important methods for keeping dogs";

sentence vectorization representation: each sentence is converted into a vector representation for similarity calculation and construction of abstract figures, and the vector representation of the sentence can be obtained using Word embedding models (e.g., word2Vec, gloVe) or pre-trained language models (e.g., BERT, GPT). Each sentence is converted into a vector representation using a pre-trained language model BERT.

Sentence 1 vector representation: [0.12, 0.56, 0.78, ];

sentence 2 vector representation: [0.43, 0.78, 0.21, ];

sentence 3 vector representation: [0.76, 0.32, 0.91, ];

the similarity between the sentences is calculated by using the sentences represented by the vectors, and a summary graph is constructed, and common similarity calculation methods comprise cosine similarity, euclidean distance and the like. For example, the cosine similarity between sentences is calculated, and a summary graph is constructed:

similarity of sentence 1 to sentence 2: 0.85; similarity of sentence 1 to sentence 3: 0.42; similarity of sentence 2 to sentence 3: 0.61;

the summary map constructed is as follows:

and (3) node: sentence 1, sentence 2, sentence 3;

edges: similarity between sentence 1 and sentence 2, similarity between sentence 1 and sentence 3, and similarity between sentence 2 and sentence 3.

The text information can be represented in a finer granularity by splitting the screening abstract into a plurality of sentences, so that the abstract graph can show more semantic relations; each sentence is converted into vector representation, text information can be converted into machine-processable numerical representation, and subsequent similarity calculation and graph construction are facilitated; and taking sentences as nodes and similarity as connecting edges, and constructing the abstract graph. The abstract diagram shows the relation among sentences in the form of a diagram, and can intuitively show the structure and the relativity of text information; through further analysis of the abstract graph, deeper modes and information in the text can be found, more insight and discovery are provided, and grasping and mining capabilities of the text information are improved.

S103, extracting local semantic features and global semantic features of nodes in the abstract graph and position information of each word in the nodes through a preset abstract generation model, and respectively distributing corresponding weight coefficients for the local semantic features, the global semantic features and the position information according to an attention mechanism to generate a text abstract corresponding to the target text information, wherein the abstract generation model is formed by combining a plurality of neural networks and is used for extracting the text abstract in the text information.

Illustratively, the abstract generation model of the embodiment of the application can be formed by combining a plurality of neural networks, which can comprise a convolutional neural network, a cyclic neural network or a deep learning model adopting a self-attention mechanism, and the text abstract in the text information is extracted by combining the plurality of neural networks.

It can be appreciated that the local semantic features can help identify keywords, phrases or concepts in sentences, thereby analyzing semantic associations between them and revealing hidden information and relationships in the text; when the abstract is generated, the key information in the sentences can be better selected and combined by utilizing the local semantic features, so that the accuracy and the integrity of the generated abstract are improved.

The global semantic features can capture semantic information of the whole text, perform global analysis and understanding on the text, and help understand the theme, structure and logic of the text; by extracting global semantic features, the meaning of sentences or phrases can be better understood and context-dependent reasoning and generation can be performed taking into account the context information in the text.

By analyzing the position information of words in the nodes, semantic roles of the words in sentences, such as subjects, predicates, objects and the like, can be deduced, so that the structure and grammar of the sentences are deeply understood; the position information of the words in the nodes can help to carry out syntactic analysis, identify phrase structures and dependency relations in sentences, and reveal grammatical relations among different components in the sentences; the positional information of the words in the nodes can help understand the contextual meaning of the words in the sentence, better capture the contextual context and infer the meaning of the words by analyzing the positional relationship of the words in the sentence.

Further, by assigning weight coefficients to the local semantic features and the global semantic features, the expression of semantic relevance can be enhanced; the features with higher weights have more influence, can better capture the semantic relation and the context in the text, and improve the effect of text understanding and task generating. Assigning weight coefficients to the location information of words in the nodes can model context and consistency when generating text; by paying attention to the position information of the words, the generated text can be ensured to have consistency in syntax and semanteme, so that the consistency and fluency of text generation are improved; the local semantic features, the global semantic features and the word position information can be weighted by distributing weight coefficients, so that the importance of different features is adjusted; this ensures that important information is more focused when generating summaries or performing other tasks, reducing the impact on irrelevant or secondary information, improving the accuracy and relevance of the results.

In an alternative embodiment of the present invention,

Illustratively, let us assume that we have a summary graph that contains a plurality of nodes, each node representing a sentence. For example, we consider the following summary map of three nodes:

node 1: "The cat is sitting on the mat:", node 2: "The dog is running in the park:", node 3: "The bird is flying in the sky:", may use a part-of-speech tagger to tag the words in each Node, e.g., using a Stanford part-of-speech tagger to tag Node 1 for part-of-speech: "The/DT cat/NN is/VBZ positioning/VBG on/IN The/DT mat/NN". Then, the first location information is determined for the words in each Node, a relative distance code may be used, for example, for the words in Node 1, the following codes may be used:

The: 000、cat: 001、is: 002、sitting: 003、on: 004、mat: 005，

A dependency syntax analyzer is used to perform dependency syntax analysis on each node to determine dependencies between words. For example, using the Stanford dependency syntax analyzer, node 1 is analyzed to obtain the following dependencies:

det(cat, The)、nsubj(sitting, cat)、cop(sitting, is)、root(ROOT, sitting)、case(mat, on)、det(mat, the)、nmod(on, mat)，

based on the pre-extracted keyword set of the screening abstract and the spatial position relation between words and keywords in the nodes, corresponding position coefficients can be distributed for the position dependency relation and the first position information.

First, according to the keyword set (for example, "cat", "dog", "bird") and the words in the nodes in the abstract graph, the spatial position relationship between the words and the keywords can be determined (for example, the position relationship between "cat" and the keywords "cat" is left, and the position relationship between the words and the keywords "dog" is right).

Next, a position coefficient is assigned to the position dependency relationship and the first position information based on the spatial position relationship. For example, if the positional relationship of "cat" and "dog" is to the right, a higher weight may be assigned to the positional dependency relationship, indicating that "cat" has a greater effect on the position of "dog".

The final position information can be determined by comprehensively considering the first position information, the position dependency relationship and the position coefficients thereof, and the first position information and the position dependency relationship can be subjected to weighted fusion or other processing modes according to specific requirements so as to obtain the most accurate position information.

The part of speech of each word can be determined by marking the part of speech of the word in the abstract graph node. Meanwhile, the first position information can be determined for each word in a single sentence by utilizing a preset position coding module, so that grammar roles and position information of the words can be combined, and a basis is provided for subsequent position relation analysis; by performing dependency syntactic analysis on each node in the abstract graph, the dependency relationship among the words in the nodes can be analyzed, so that the position dependency relationship of the words is determined, the syntactic structure and semantic association among the words can be captured, and the expression of the position information is further enriched; the position coefficient reflects the importance of the words in the nodes and the contribution degree of the position information, and the position information of the words in the nodes can be more accurately captured by giving weights to the position information.

In an alternative embodiment of the present invention,

Illustratively, the summary generation model of the present application may be based on a combination of multiple neural networks, wherein the summary generation model may include a convolutional neural network and a bi-directional long-short term neural network.

Text representation and feature extraction are required to be performed on a text data set before a text summarization task is performed, wherein the situation of feature extraction has a direct effect on the accuracy of the text summarization, and the adoption of a good feature extraction model is very important for the text summarization task. When the BI-LSTM model is used for acquiring the characteristics of the text, the text information can be coded in the forward direction and the reverse direction, but only phrase-level characteristic information can be extracted, so that local important information is lost, the content of the generated abstract result is repeated, and the diversity is lacked.

By way of example, the application carries out convolution and pooling operations on word vectors through a convolutional neural network to obtain first semantic features, wherein the first semantic features are used for indicating features output by the convolutional neural network; the convolutional neural network has strong characteristic capturing capability and can effectively perform text analysis; the application carries out forward coding and backward coding on word vectors through the bidirectional long-short-term neural network and carries out length compression, wherein the second semantic features are used for indicating the features output by the bidirectional long-short-term neural network, and carrying out forward coding and backward coding on the word vectors, so that the context relation of the words can be captured, and the coding mode can learn the dependency relation and semantic association between the words through the training of a model, thereby better representing global semantic information. It should be noted that, in the present application, the operations of rolling and pooling the word vectors and forward encoding and backward encoding the word vectors may refer to the prior art, and the embodiment of the present application is not limited thereto.

Further, in order to reduce the subsequent calculation amount, the word vectors after forward encoding and backward encoding can be subjected to length compression, text features represented by the convolutional neural network are spliced with output results of the bidirectional long-short-term neural network, and the reason for characteristic splicing is that the information of the upper layer is not selected by gating but all enters the network structure of the lower layer, so that more and more full characteristic information is reserved as much as possible.

In an optional implementation manner, corresponding weight coefficients are respectively allocated to the local semantic feature, the global semantic feature and the location information according to an attention mechanism, and corresponding weight coefficients may be allocated to the local semantic feature, the global semantic feature and the location information by using an existing attention mechanism, which is not limited in this embodiment of the present application. And generating a text abstract corresponding to the target text information, the abstract generation model can be integrated with a Seq2Seq model or a Transformer model for generating the text abstract.

In a second aspect of the embodiments of the present disclosure,

fig. 2 is a schematic structural diagram of a text summarization generating system according to an embodiment of the disclosure, including:

In an alternative embodiment of the present invention,

the first unit is further configured to:

In an alternative embodiment of the present invention,

wherein ,representing the candidate abstract setAAnd the target text informationXMatching degree of-> 、/>Respectively representing a first weight value corresponding to the theme feature set and a second weight value corresponding to the semantic feature set,sim()representing a similarity function, ++ >、/>Theme feature set and semantic feature set representing target text information, respectively, < ->、Subject feature set and semantic feature set respectively representing candidate abstract sets, and (B)>Representing the average of the first weight value and the second weight value.

In an alternative embodiment of the present invention,

the third unit is further configured to:

In an alternative embodiment of the present invention,

the third unit is further configured to:

In a third aspect of the embodiments of the present disclosure,

there is provided an electronic device including:

a processor;

a memory for storing processor-executable instructions;

In a fourth aspect of embodiments of the present disclosure,

The present invention may be a method, apparatus, system, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing various aspects of the present invention.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present disclosure, and not for limiting the same; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present disclosure.

Claims

1. A text summary generation method, comprising:

2. The method of claim 1, wherein the screening the target text information to determine a screening summary corresponding to the target text information comprises:

3. The method of claim 2, wherein the step of determining the position of the substrate comprises,

；

wherein ,representing the candidate abstract setAAnd the target text informationXMatching degree of-> 、/>Respectively representing a first weight value corresponding to the theme feature set and a second weight value corresponding to the semantic feature set, sim()Representing a similarity function, ++>、/>Theme feature set and semantic feature set representing target text information, respectively, < ->、/>Subject feature set and semantic feature set respectively representing candidate abstract sets, and (B)>Representing the average of the first weight value and the second weight value.

4. The method of claim 1, wherein the extracting, by a preset abstract generation model, the location information of each term in the node in the abstract graph in the node comprises:

5. The method according to claim 1, wherein the extracting global semantic features of nodes in the abstract map by a preset abstract generation model comprises:

6. A text excerpt generation system, comprising:

7. The system of claim 6, wherein the first unit is further configured to:

8. The system of claim 7, wherein the system further comprises a controller configured to control the controller,

；

wherein ,representing the candidate abstract setAAnd the target text informationXMatching degree of-> 、/>Respectively representing a first weight value corresponding to the theme feature set and a second weight value corresponding to the semantic feature set,sim()representing a similarity function, ++>、/>Theme feature set and semantic feature set representing target text information, respectively, < ->、/>Subject feature set and semantic feature set respectively representing candidate abstract sets, and (B)>Representing the average of the first weight value and the second weight value.

9. The system of claim 6, wherein the third unit is further configured to:

10. The system of claim 6, wherein the third unit is further configured to: