CN113590811A - Text abstract generation method and device, electronic equipment and storage medium - Google Patents

Text abstract generation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113590811A
CN113590811A CN202110954296.0A CN202110954296A CN113590811A CN 113590811 A CN113590811 A CN 113590811A CN 202110954296 A CN202110954296 A CN 202110954296A CN 113590811 A CN113590811 A CN 113590811A
Authority
CN
China
Prior art keywords
sentences
sentence
target text
text
graph structure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110954296.0A
Other languages
Chinese (zh)
Inventor
方俊波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An International Smart City Technology Co Ltd
Original Assignee
Ping An International Smart City Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An International Smart City Technology Co Ltd filed Critical Ping An International Smart City Technology Co Ltd
Priority to CN202110954296.0A priority Critical patent/CN113590811A/en
Publication of CN113590811A publication Critical patent/CN113590811A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The application is applicable to the technical field of artificial intelligence and data mining, and provides a text abstract generation method, a text abstract generation device, electronic equipment and a storage medium, wherein the method comprises the following steps: calculating similarity scores between sentences in the target text, taking the sentences as nodes and the similarity scores between the sentences as the weight of edges, and constructing a graph structure model of the target text; iteratively updating the graph structure model by adopting a sorting algorithm, judging whether the iteratively updated graph structure model meets a preset convergence condition, and if so, selecting sentences corresponding to each node in a preset number from high to low as candidate sentences of the abstract according to the importance score value of each node in the iteratively updated graph structure model; and performing component compression processing on the candidate sentences according to a preset dependency analysis rule to generate the abstract of the target text. The method has the advantages of low cost, high speed, less consumed computing resources, less abstract redundant information, correct grammar and complete semantics.

Description

Text abstract generation method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence and data mining technologies, and in particular, to a text summary generation method and apparatus, an electronic device, and a storage medium.
Background
With the explosive growth of internet information, how to effectively obtain required information from massive internet information is a key technical problem of information retrieval at present. The text abstract refers to key information extracted from a long and complete semantic article. The text abstract is very important in the information retrieval process, and key information is extracted through the abstract, so that the text length can be shortened, the interference of redundant information is reduced, the information retrieval efficiency is effectively improved, and the user experience is greatly improved. The existing algorithms for acquiring the text abstract include an extraction formula and a generation formula. Most of the existing extraction algorithms are supervised classification algorithms, the two classification algorithms are sentence models, the supervised algorithm models are complex, the calculation real-time performance is poor, a large amount of labeled texts are needed, the cost is high, the text abstract obtained based on the algorithms is extracted from the original text and generally consists of a plurality of key sentences of the original text, and more redundant information is reserved. The existing generation type algorithm generally adopts a supervised Seq2Seq model architecture, and also has the problems of large quantity of labeled texts and high cost, the content is generally the summarization and the rewriting of the key content of the original text based on the text abstract obtained by the algorithm, the word-by-word generation mode causes the algorithm to have low speed and more consumed computing resources, and the obtained text abstract possibly has the problems of being not in accordance with grammar rules, being difficult to understand and the like.
Disclosure of Invention
In view of this, embodiments of the present application provide a text summary generation method, an apparatus, an electronic device, and a storage medium, which can solve the problems that a large amount of labeled texts are required due to the adoption of a supervised model in the conventional abstraction algorithm and the conventional generation algorithm, the cost is high, the redundant information of the summary obtained by the conventional abstraction algorithm is more, the word-by-word generation is required when the text summary is generated by the conventional generation algorithm, the speed is low, and the consumption of computing resources is more.
A first aspect of an embodiment of the present application provides a text summary generation method, including:
calculating similarity scores between sentences in the target text, taking the sentences as nodes and the similarity scores between the sentences as the weight of edges, and constructing a graph structure model of the target text;
iteratively updating the graph structure model by adopting a preset sequencing algorithm, and judging whether the graph structure model after iterative updating meets a preset convergence condition;
if the iteratively updated graph structure model meets a preset convergence condition, selecting sentences corresponding to each node in a preset number as candidate sentences of the abstract from high to low according to the importance score value of each node in the iteratively updated graph structure model;
and performing component compression processing on the candidate sentences according to a preset dependency analysis rule to generate the abstract of the target text.
With reference to the first aspect, in a first possible implementation manner of the first aspect, the step of calculating a similarity score between sentences in the target text, and constructing a graph structure model of the target text with the sentences as nodes and the similarity scores between sentences as edges includes:
splitting the target text according to sentence granularity, and respectively performing vector representation processing on each sentence obtained by splitting to obtain vectorized representation of each sentence;
respectively taking all sentences in the target text as target sentences, determining all adjacent sentences corresponding to the target sentences according to the positions of the target sentences in the target text, and calculating similarity scores between the target sentences and the adjacent sentences corresponding to the target sentences;
and aiming at all sentences in the target text, constructing a graph structure model of the target text by taking the sentences as nodes and taking similarity scores between the sentences as the weight of edges.
With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the splitting the target text according to a sentence granularity, and performing vector characterization processing on each sentence obtained by the splitting to obtain a vectorized representation of each sentence includes:
carrying out word segmentation on sentences obtained by splitting the target text according to the sentence granularity to obtain representation words in the sentences and the appearance frequencies corresponding to the representation words;
obtaining word vectors corresponding to the representation words in the sentence by traversing a word vector database constructed by pre-training;
and performing sentence representation processing on the sentence according to the occurrence frequency and the word vector corresponding to each representation word in the sentence to obtain vectorized representation of the sentence.
With reference to the second possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, after the step of performing sentence characterization processing on the sentence according to the occurrence frequency and the word vector corresponding to each characterization word in the sentence to obtain a vectorized representation of the sentence, the method further includes:
respectively extracting column vectors in the vectorization representation of each sentence;
splicing the column vectors in the vectorization representation of each sentence to generate a vector matrix;
calculating the vector matrix by adopting a preset singular value decomposition algorithm, and solving a singular value vector of the vector matrix;
and updating the vectorized representation of each sentence in the target text according to the singular value vector.
With reference to the first, second, or third possible implementation manner of the first aspect, in a fourth possible implementation manner of the first aspect, the step of constructing, for all sentences in the target text, a graph structure model of the target text by using the sentences as nodes and using similarity scores between the sentences as weights of edges further includes:
identifying whether the same sentence exists in all sentences based on all sentences in the target text;
if the same sentence exists, a common node is configured for the same sentence, and the adjacent sentences corresponding to the same sentence are all connected with the common node.
With reference to the first aspect or the first, second, or third possible implementation manner of the first aspect, in a fifth possible implementation manner of the first aspect, the dependency analysis rule is characterized by a preset sentence structure template, and the step of performing component compression processing on the candidate sentence according to the preset dependency analysis rule to generate the abstract of the target text includes:
and compressing the candidate sentences according to a preset sentence structure template, reserving words in the candidate sentences corresponding to the preset sentence structure template, generating simplified sentences corresponding to the candidate sentences, and generating the abstracts of the target text according to the simplified sentences.
With reference to the first aspect or the first, second, or third possible implementation manner of the first aspect, in a sixth possible implementation manner of the first aspect, in the step of iteratively updating the graph structure model by using a preset ordering algorithm, and determining whether the graph structure model after iterative updating meets a preset convergence condition, the preset convergence condition is configured to:
and the difference between the importance point value obtained by each node in the graph structure model after iterative updating and the importance point value obtained by each node in the previous iterative updating is smaller than a preset threshold value.
A second aspect of an embodiment of the present application provides a text summary generating apparatus, including:
the model building module is used for calculating similarity scores between sentences in the target text, and building a graph structure model of the target text by taking the sentences as nodes and the similarity scores between the sentences as the weight of edges;
the model iteration module is used for carrying out iteration updating on the graph structure model by adopting a preset sequencing algorithm and judging whether the graph structure model after the iteration updating meets a preset convergence condition or not;
the candidate sentence selecting module is used for selecting sentences corresponding to each node in a preset number as candidate sentences of the abstract according to the importance score value of each node in the graph structure model after the iterative update from high to low if the graph structure model after the iterative update meets a preset convergence condition;
and the text abstract generating module is used for performing component compression processing on the candidate sentences according to a preset dependency analysis rule to generate the abstract of the target text.
A third aspect of embodiments of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the electronic device, where the processor implements the steps of the text summary generation method provided in the first aspect when executing the computer program.
A fourth aspect of embodiments of the present application provides a computer-readable storage medium, which stores a computer program, and the computer program, when executed by a processor, implements the steps of the text summary generation method provided in the first aspect.
The text abstract generation method, the text abstract generation device, the electronic equipment and the storage medium have the following beneficial effects:
according to the method, a graph structure model of the target text is constructed by calculating similarity scores between sentences in the target text, taking the sentences as nodes and the similarity scores between the sentences as the weights of edges. And carrying out iterative updating on the graph structure model by adopting a sorting algorithm, judging whether the graph structure model after iterative updating meets a preset convergence condition, and if the graph structure model after iterative updating meets the preset convergence condition, selecting sentences corresponding to each node in a preset number as candidate sentences of the abstract according to the importance score value of each node in the graph structure model after iterative updating from high to low. And performing component compression processing on the candidate sentences according to a preset dependency analysis rule to generate the abstract of the target text. The method adopts a preset sorting algorithm to iteratively update the graph structure model until a preset convergence condition is met, so that a preset number of candidate sentences with high importance scores are obtained, the method belongs to an unsupervised algorithm, a sample does not need to be marked, and the data cost is low and the speed is high. The candidate sentences are subjected to component compression processing according to a dependency analysis mode, so that the generated abstract content can greatly compress the length of the abstract text while maintaining key information, the grammar correctness and the semantic integrity can be ensured, and the problems of high cost due to the fact that a large amount of labeled texts are needed by the traditional extraction algorithm and the traditional generation algorithm due to the adoption of a supervision model, high abstract redundant information obtained by the traditional extraction algorithm, low speed and high consumption of computing resources due to the fact that the traditional generation algorithm needs to generate the text abstract word by word are solved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a flowchart of a method for generating a text summary according to an embodiment of the present disclosure;
fig. 2 is a flowchart of a method for constructing a graph structure model in a text abstract generating method according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a method for performing vector representation processing on a sentence in a text abstract generating method according to an embodiment of the present application;
fig. 4 is a schematic flowchart of another method for performing vector representation processing on a sentence in the text abstract generating method according to the embodiment of the present application;
fig. 5 is a flowchart of another method for constructing a graph structure model in the text abstract generating method according to the embodiment of the present application;
fig. 6 is a block diagram of a basic structure of a text summary generation apparatus according to an embodiment of the present application;
fig. 7 is a block diagram of a basic structure of an electronic device according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Referring to fig. 1, fig. 1 is a flowchart of a method for generating a text summary according to an embodiment of the present disclosure. The details are as follows:
step S11: and calculating similarity scores between sentences in the target text, taking the sentences as nodes and the similarity scores between the sentences as the weights of edges, and constructing a graph structure model of the target text.
In this embodiment, first, a sentence set can be obtained by splitting the target text according to the sentence granularity. Then, by subjecting each sentence in the sentence set to vectorization processing, a vectorized representation of each sentence can be obtained. After vectorization representation of each sentence is obtained, similarity scores between every two sentences in the sentence set are calculated in a cosine similarity calculation mode. And then, the sentences in the sentence set are taken as nodes, wherein one sentence corresponds to one node, and the similarity score between the sentences is taken as the weight of the edges connected between the sentences, so that a graph structure model is constructed. In the present embodiment, the sentence-to-sentence edges are obtained by connecting sentences according to the positional relationship of each sentence in the target text. Illustratively, sentence 2 is ranked after sentence 1 and sentence 3 is ranked after sentence 2 according to the position relationship of each sentence in the target text, then the connection between sentences 1 and 2 forms an edge and the connection between sentences 2 and 3 forms an edge.
Step S12: and carrying out iterative updating on the graph structure model by adopting a preset sequencing algorithm, and judging whether the graph structure model after iterative updating meets a preset convergence condition.
In this embodiment, the preset ordering algorithm adopts a PageRank algorithm, and the graph structure model is iteratively trained by adopting the PageRank algorithm, wherein an objective function for performing iterative update specifically includes:
Figure BDA0003219682270000071
wherein u and v are represented as two nodes in the graph structure model, and one node corresponds to one sentence in the representation target text; w (u, v) is expressed as the weight of an edge between two nodes of u and v, namely a similarity score between sentences; d (v) degree expressed as node v, i.e. the number of nodes adjacent to node u; nb (u) neighboring node denoted u; PR (v) represents the importance point value obtained by updating the node v in the previous iteration; d is expressed as a damping coefficient and is set to 0.8; and N represents the number of nodes in the graph structure model.
In this embodiment, the cyclic iterative update is performed based on the objective function, so that the change of the importance score value of each node in the graph structure model is continuously reduced, and when the objective function is iteratively updated to a convergence state, the importance score value of each node in the graph structure model tends to be a stable value. For example, a convergence condition of the objective function may be configured in advance, and specifically, the convergence condition may be configured such that differences between importance scores obtained by nodes in the graph structure model after the iterative update and importance scores obtained by nodes in respective previous iterative updates are all smaller than a preset threshold. In the present embodiment, the preset threshold is set to 0.0001. And performing iterative update calculation on each node in the graph structure model by adopting the objective function, when the difference between the importance point value obtained by each node in the graph structure model after iterative update and the importance point value obtained by each previous iterative update is smaller than a preset threshold value, judging that the graph structure model after iterative update meets the convergence condition, and otherwise, judging that the graph structure model does not meet the convergence condition.
Step S13: and if the iteratively updated graph structure model meets a preset convergence condition, selecting sentences corresponding to each node in a preset number as candidate sentences of the abstract from high to low according to the importance score value of each node in the iteratively updated graph structure model.
In this embodiment, when the iteratively updated graph structure model meets the preset convergence condition, it is described that the importance score value of each node in the graph structure model tends to a stable value, that is, it is described that the importance ranking of each sentence in the target text is not changed basically. At this time, based on the graph structure model obtained after the iteration update, the nodes are sorted from high to low according to the importance score value obtained by each node in the graph structure model after the iteration update, and then sentences corresponding to a preset number of nodes (topK) are selected as candidate sentences of the abstract from the highest importance score value.
Step S14: and performing component compression processing on the candidate sentences according to a preset dependency analysis rule to generate the abstract of the target text.
In this embodiment, after a preset number of candidate sentences are selected and obtained, the candidate sentences may be subjected to component compression processing according to a preset dependency analysis rule, so as to generate an abstract of the target text. Specifically, a text dependency analysis technology tool (such as PyLTP) is used for performing structure splitting on each candidate sentence, the main structure of the sentence is reserved, and redundant information is removed. For example, in this embodiment, the preset dependency analysis rule may be characterized as a sentence structure template, such as a sentence structure template that only retains the subject, predicate and object of a sentence, based on which, after performing dependency analysis, words that represent other dependencies in the candidate sentence except for the subject, predicate and object may be eliminated according to the template, and only the words involved in the dependencies included in the rule template, that is, the subject, predicate and object, are retained, thereby generating a simplified sentence with complete syntax and semantics. In this embodiment, for each candidate sentence, a simplified sentence can be obtained correspondingly. In this embodiment, the simplified sentences are arranged, sorted and connected according to the sequence of the corresponding candidate sentences appearing in the target text, so as to obtain the final abstract.
As can be seen from the above, in the text abstract generating method provided by this embodiment, the graph structure model is iteratively updated by using the preset sorting algorithm until the preset convergence condition is satisfied, and a preset number of candidate sentences with high importance scores are obtained, and the method belongs to an unsupervised algorithm, does not need to label a sample, and is low in data cost and high in speed. And the candidate sentences are subjected to component compression processing according to a dependency analysis mode, so that the generated abstract content can greatly compress the length of the abstract text while maintaining key information, and the correctness of grammar and the integrity of semantics can be ensured. The method solves the problems that the traditional extraction algorithm and the generation algorithm need a large amount of labeled texts due to the adoption of a supervised model and are high in cost, the traditional extraction algorithm obtains more abstract redundant information, the traditional generation algorithm needs to generate the text abstract word by word, the speed is low, and the consumption of computing resources is high.
In some embodiments of the present application, please refer to fig. 2, and fig. 2 is a flowchart of a method for constructing a graph structure model in a text abstract generating method according to an embodiment of the present application. The details are as follows:
step S21: and splitting the target text according to the sentence granularity, and respectively performing vector representation processing on each sentence obtained by splitting to obtain vectorized representation of each sentence.
In this embodiment, the target text is split into a plurality of sentences according to the sentence granularity, and specifically, the target text may be split by recognizing punctuation marks in the target text. After the target text is split into a plurality of sentences, vector representation processing is respectively carried out on each split sentence, so that each sentence has a corresponding vectorized representation. In this embodiment, the vector characterization processing on the sentence includes, but is not limited to, performing the vector characterization processing in a word2vec weighting manner or performing the vector characterization processing by means of a pre-training model, where the pre-training model may include, but is not limited to, a Skip night model, a Quick night model, an LSTM model, an ESIM model, an ELMO model, a BERT model, and the like.
Step S22: and respectively taking all sentences in the target text as target sentences, determining all adjacent sentences corresponding to the target sentences according to the positions of the target sentences in the target text, and calculating similarity scores between the target sentences and the adjacent sentences corresponding to the target sentences.
In this embodiment, after the target text is split into a plurality of sentences, each split sentence may be used as a target sentence, and the context is analyzed according to the position of the target sentence in the target text, so as to determine all adjacent sentences corresponding to the target sentence. For example, assuming that the target text is split into 3 sentences, when the 1 st sentence in the target text is taken as the target sentence, the 2 nd sentence in the target text is determined as an adjacent sentence corresponding to the target sentence; when the 2 nd sentence in the target text is taken as a target sentence, determining the 1 st sentence and the 3 rd sentence in the target text as adjacent sentences corresponding to the target sentence; and when the 3 rd sentence in the target text is taken as the target sentence, determining the 2 nd sentence in the target text as an adjacent sentence corresponding to the target sentence.
In this embodiment, after determining all the adjacent sentences corresponding to each sentence in the target text, each sentence is taken as the target sentence one by one, and the similarity scores between the target sentence and each adjacent sentence corresponding thereto are respectively calculated to obtain the similarity scores between the target sentence and each adjacent sentence. It should be noted that, a similarity score can be calculated from the correspondence between a target sentence and an adjacent sentence, that is, when a target sentence has a plurality of corresponding adjacent sentences, a corresponding similarity score can be calculated from the correspondence between the target sentence and each adjacent sentence. For example, the similarity degree value between the target sentence and the adjacent sentence can be obtained by using a cosine similarity calculation formula:
Figure BDA0003219682270000101
VA and VB represent vectorization representation of sentences a and B respectively, VA (i) represents a characteristic value of the i-th characteristic word of the sentence VA, and VB (i) represents a characteristic value of the i-th characteristic word of the sentence VB.
Step S23: and aiming at all sentences in the target text, constructing a graph structure model of the target text by taking the sentences as nodes and taking similarity scores between the sentences as the weight of edges.
In this embodiment, after calculating similarity scores between each sentence and each corresponding adjacent sentence for all sentences in the target text, the sentences are used as nodes to connect two adjacent sentences displayed in the target text to form an edge, and the similarity scores obtained by calculation between the two connected sentences are used as weights of the edge, so as to construct a graph structure model for obtaining the target text.
In some embodiments of the present application, please refer to fig. 3, and fig. 3 is a flowchart illustrating a method for performing vector representation processing on a sentence in a text abstract generating method according to an embodiment of the present application.
Step S31: and performing word segmentation processing on the sentences obtained by splitting the target text according to the sentence granularity to obtain the representation words in the sentences and the appearance frequencies corresponding to the representation words.
In this embodiment, for a sentence obtained by splitting a target text by the sentence granularity, the sentence may be further split by the word granularity to split the sentence into a plurality of words, and the plurality of words are used as the characterizing words of the sentence. It can be understood that after the sentence is split according to the word granularity to obtain a plurality of words, the words can be filtered to obtain words with substantial meanings as the representation words of the sentence. After the sentence is split according to the word granularity, the occurrence frequency of each representation word in the sentence can be obtained in a statistical mode. For example, the token words of a sentence may be denoted as { w1, w2, … wn }, and the frequency of occurrence of each token word in the target text may be denoted as { f (w1), f (w2), … f (wn) }.
Step S32: and obtaining word vectors corresponding to the representation words in the sentence by traversing a word vector database constructed by pre-training.
In this embodiment, the corpus may be obtained in advance in a large scale based on fox search news, daily national literature, and the like, then the text in the corpus is participled, word vectors of each word are obtained by training using a skip-gram method, and the word vectors are stored in a unified manner to form a word vector database. After the characteristic words of the sentence are obtained, the word vector corresponding to each characteristic word can be obtained in the word vector database by traversing the word vector database which is trained and constructed in advance according to the characteristic words.
Step S33: and performing sentence representation processing on the sentence according to the occurrence frequency and the word vector corresponding to each representation word in the sentence to obtain vectorized representation of the sentence.
In this embodiment, after the word vector corresponding to each representative word in the sentence is obtained, the vectorization of the sentence may be expressed as Vs according to the occurrence frequency and the word vector corresponding to each representative word in the sentence:
Figure BDA0003219682270000111
wherein n represents the number of the representation words in the sentence; wi is expressed as the ith characteristic word in the sentence; (wi) representing the frequency of occurrence of the ith characteristic word in the sentence; a is expressed as a constant, and the initialization value of a is 0.0001.
In some embodiments of the present application, please refer to fig. 4, and fig. 4 is a flowchart illustrating another method for performing vector representation processing on a sentence in the text abstract generating method according to the embodiment of the present application. The details are as follows:
step S41: respectively extracting column vectors in the vectorization representation of each sentence;
step S42: splicing the column vectors in the vectorization representation of each sentence to generate a vector matrix;
step S43: calculating the vector matrix by adopting a preset singular value decomposition algorithm, and solving a singular value vector of the vector matrix;
step S44: and updating the vectorized representation of each sentence in the target text according to the singular value vector.
In this embodiment, after obtaining the vectorized representation Vs of the sentence, further, the column vectors in the vectorized representation Vs of each sentence in the target text may be extracted and spliced, so as to generate a vector matrix, and then a singular value decomposition method on the mathematical matrix is used to solve a first singular value vector u of the vector matrix, so as to update the vectorized representation of the sentence based on the first singular value vector u, where a specific update calculation formula is expressed as:
V’s=Vs-u·uT·Vs
wherein, V'sRepresented as a vectorized representation of the updated sentence; vsA vectorized representation represented as a pre-update sentence; u. ofTA transpose represented as a vector u of singular values; the operator represents a matrix multiplication.
In some embodiments of the present application, please refer to fig. 5, and fig. 5 is a flowchart of another method for constructing a graph structure model in the text abstract generating method according to the embodiment of the present application. The details are as follows:
step S51: identifying whether the same sentence exists in all sentences based on all sentences in the target text;
step S52: if the same sentence exists, a common node is configured for the same sentence, and the adjacent sentences corresponding to the same sentence are all connected with the common node.
In this embodiment, when the graph structure model is constructed, after the target text is split into a plurality of sentences, based on all sentences in the target text, whether the same sentence exists in all the sentences may be identified, if the same sentence exists, a common node is configured for the same sentence, and adjacent sentences corresponding to the same sentence are all connected to the common node. For example, if two sentences in the target text are identical sentences, the two identical sentences are used as a node to construct the graph structure model, that is, the two identical sentences share a node in the graph structure model, and the adjacent sentences corresponding to the two identical sentences are connected to the shared node. For example, assuming that the target text is split into 10 sentences, and the 3 rd sentence and the 8 th sentence are the same sentence, the 2 nd sentence and the 4 th sentence adjacent to the 3 rd sentence and the 7 th sentence and the 9 th sentence adjacent to the 8 th sentence can be taken as adjacent sentences corresponding to the same node in the graph structure model.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
In some embodiments of the present application, please refer to fig. 6, and fig. 6 is a block diagram illustrating a basic structure of a text abstract generating apparatus according to an embodiment of the present application. The apparatus in this embodiment comprises means for performing the steps of the method embodiments described above. The following description refers to the embodiments of the method. For convenience of explanation, only the portions related to the present embodiment are shown. As shown in fig. 6, the text digest generation apparatus includes: a model construction module 61, a model iteration module 62, a candidate sentence selection module 63, and a text summary generation module 64. Wherein: the model construction module 61 is configured to calculate similarity scores between sentences in the target text, and construct a graph structure model of the target text by using the sentences as nodes and the similarity scores between the sentences as weights of edges. The model iteration module 62 is configured to perform iteration update on the graph structure model by using a preset ordering algorithm, and determine whether the graph structure model after iteration update meets a preset convergence condition. The candidate sentence selecting module 63 is configured to select sentences corresponding to a preset number of nodes as candidate sentences of the abstract according to the importance score value of each node in the iteratively updated graph structure model from high to low if the iteratively updated graph structure model meets a preset convergence condition. The text abstract generating module 64 is configured to perform component compression processing on the candidate sentences according to a preset dependency analysis rule, so as to generate an abstract of the target text.
It should be understood that the text summary generation apparatus corresponds to the text summary generation method one to one, and will not be described herein again.
In some embodiments of the present application, please refer to fig. 7, and fig. 7 is a basic structural block diagram of an electronic device according to an embodiment of the present application. As shown in fig. 7, the electronic apparatus 7 of this embodiment includes: a processor 71, a memory 72 and a computer program 73, e.g. a program of a text summary generation method, stored in said memory 72 and operable on said processor 71. The processor 71 implements the steps in the various embodiments of the text digest generation method described above when executing the computer program 73. Alternatively, the processor 71 implements the functions of the modules in the embodiment corresponding to the text abstract generating apparatus when executing the computer program 73. Please refer to the description related to the embodiment, which is not repeated herein.
Illustratively, the computer program 73 may be divided into one or more modules (units) that are stored in the memory 72 and executed by the processor 71 to accomplish the present application. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 73 in the electronic device 7. For example, the computer program 73 may be divided into a model building module, a model iteration module, a candidate sentence selection module, and a text summary generation module, each of which functions as described above.
The turntable device may include, but is not limited to, a processor 71, a memory 72. It will be appreciated by those skilled in the art that fig. 7 is merely an example of the electronic device 7 and does not constitute a limitation of the electronic device 7 and may include more or less components than shown, or combine certain components, or different components, e.g. the turntable device may also include input output devices, network access devices, buses, etc.
The Processor 71 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 72 may be an internal storage unit of the electronic device 7, such as a hard disk or a memory of the electronic device 7. The memory 72 may also be an external storage device of the electronic device 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 7. Further, the memory 72 may also include both an internal storage unit and an external storage device of the electronic device 7. The memory 72 is used for storing the computer program and other programs and data required by the turntable device. The memory 72 may also be used to temporarily store data that has been output or is to be output.
It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.
The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments. In this embodiment, the computer-readable storage medium may be nonvolatile or volatile.
The embodiments of the present application provide a computer program product, which when running on a mobile terminal, enables the mobile terminal to implement the steps in the above method embodiments when executed.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A text summary generation method is characterized by comprising the following steps:
calculating similarity scores between sentences in the target text, taking the sentences as nodes and the similarity scores between the sentences as the weight of edges, and constructing a graph structure model of the target text;
iteratively updating the graph structure model by adopting a preset sequencing algorithm, and judging whether the graph structure model after iterative updating meets a preset convergence condition;
if the iteratively updated graph structure model meets a preset convergence condition, selecting sentences corresponding to each node in a preset number as candidate sentences of the abstract from high to low according to the importance score value of each node in the iteratively updated graph structure model;
and performing component compression processing on the candidate sentences according to a preset dependency analysis rule to generate the abstract of the target text.
2. The method for generating a text abstract according to claim 1, wherein the step of calculating similarity scores between sentences in the target text, and constructing the graph structure model of the target text with the sentences as nodes and the sentences as edges comprises:
splitting the target text according to sentence granularity, and respectively performing vector representation processing on each sentence obtained by splitting to obtain vectorized representation of each sentence;
respectively taking all sentences in the target text as target sentences, determining all adjacent sentences corresponding to the target sentences according to the positions of the target sentences in the target text, and calculating similarity scores between the target sentences and the adjacent sentences corresponding to the target sentences;
and aiming at all sentences in the target text, constructing a graph structure model of the target text by taking the sentences as nodes and taking similarity scores between the sentences as the weight of edges.
3. The method for generating a text abstract according to claim 2, wherein the step of splitting the target text according to sentence granularity and performing vector representation processing on each split sentence to obtain a vectorized representation of each sentence comprises:
carrying out word segmentation on sentences obtained by splitting the target text according to the sentence granularity to obtain representation words in the sentences and the appearance frequencies corresponding to the representation words;
obtaining word vectors corresponding to the representation words in the sentence by traversing a word vector database constructed by pre-training;
and performing sentence representation processing on the sentence according to the occurrence frequency and the word vector corresponding to each representation word in the sentence to obtain vectorized representation of the sentence.
4. The method according to claim 3, wherein the step of performing sentence representation processing on the sentence according to the occurrence frequency and word vectors corresponding to the representation words in the sentence to obtain a vectorized representation of the sentence further comprises:
respectively extracting column vectors in the vectorization representation of each sentence;
splicing the column vectors in the vectorization representation of each sentence to generate a vector matrix;
calculating the vector matrix by adopting a preset singular value decomposition algorithm, and solving a singular value vector of the vector matrix;
and updating the vectorized representation of each sentence in the target text according to the singular value vector.
5. The method for generating a text summary according to any one of claims 2 to 4, wherein the step of constructing the graph structure model of the target text by using sentences as nodes and similarity scores between sentences as weights of edges for all sentences in the target text further comprises:
identifying whether the same sentence exists in all sentences based on all sentences in the target text;
if the same sentence exists, a common node is configured for the same sentence, and the adjacent sentences corresponding to the same sentence are all connected with the common node.
6. The method for generating the text abstract according to any one of claims 1-4, wherein the dependency analysis rule is characterized by a preset sentence structure template, and the step of performing component compression processing on the candidate sentences according to the preset dependency analysis rule to generate the abstract of the target text comprises:
and compressing the candidate sentences according to a preset sentence structure template, reserving words in the candidate sentences corresponding to the preset sentence structure template, generating simplified sentences corresponding to the candidate sentences, and generating the abstracts of the target text according to the simplified sentences.
7. The method for generating the text summary according to any one of claims 1 to 4, wherein in the step of iteratively updating the graph structure model by using a preset ordering algorithm and determining whether the graph structure model after iterative updating meets a preset convergence condition, the preset convergence condition is configured to:
and the difference between the importance point value obtained by each node in the graph structure model after iterative updating and the importance point value obtained by each node in the previous iterative updating is smaller than a preset threshold value.
8. A text summary generation apparatus, comprising:
the model building module is used for calculating similarity scores between sentences in the target text, and building a graph structure model of the target text by taking the sentences as nodes and the similarity scores between the sentences as the weight of edges;
the model iteration module is used for carrying out iteration updating on the graph structure model by adopting a preset sequencing algorithm and judging whether the graph structure model after the iteration updating meets a preset convergence condition or not;
the candidate sentence selecting module is used for selecting sentences corresponding to each node in a preset number as candidate sentences of the abstract according to the importance score value of each node in the graph structure model after the iterative update from high to low if the graph structure model after the iterative update meets a preset convergence condition;
and the text abstract generating module is used for performing component compression processing on the candidate sentences according to a preset dependency analysis rule to generate the abstract of the target text.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 7 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202110954296.0A 2021-08-19 2021-08-19 Text abstract generation method and device, electronic equipment and storage medium Pending CN113590811A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110954296.0A CN113590811A (en) 2021-08-19 2021-08-19 Text abstract generation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110954296.0A CN113590811A (en) 2021-08-19 2021-08-19 Text abstract generation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113590811A true CN113590811A (en) 2021-11-02

Family

ID=78238409

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110954296.0A Pending CN113590811A (en) 2021-08-19 2021-08-19 Text abstract generation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113590811A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114841146A (en) * 2022-05-11 2022-08-02 平安科技(深圳)有限公司 Text abstract generation method and device, electronic equipment and storage medium
CN116628186A (en) * 2023-07-17 2023-08-22 乐麦信息技术(杭州)有限公司 Text abstract generation method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104503958A (en) * 2014-11-19 2015-04-08 百度在线网络技术(北京)有限公司 Method and device for generating document summarization
CN109739973A (en) * 2018-12-20 2019-05-10 北京奇安信科技有限公司 Text snippet generation method, device, electronic equipment and storage medium
US20210117617A1 (en) * 2019-10-17 2021-04-22 Amadeus S.A.S. Methods and systems for summarization of multiple documents using a machine learning approach
CN113254593A (en) * 2021-06-18 2021-08-13 平安科技(深圳)有限公司 Text abstract generation method and device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104503958A (en) * 2014-11-19 2015-04-08 百度在线网络技术(北京)有限公司 Method and device for generating document summarization
CN109739973A (en) * 2018-12-20 2019-05-10 北京奇安信科技有限公司 Text snippet generation method, device, electronic equipment and storage medium
US20210117617A1 (en) * 2019-10-17 2021-04-22 Amadeus S.A.S. Methods and systems for summarization of multiple documents using a machine learning approach
CN113254593A (en) * 2021-06-18 2021-08-13 平安科技(深圳)有限公司 Text abstract generation method and device, computer equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114841146A (en) * 2022-05-11 2022-08-02 平安科技(深圳)有限公司 Text abstract generation method and device, electronic equipment and storage medium
CN114841146B (en) * 2022-05-11 2023-07-04 平安科技(深圳)有限公司 Text abstract generation method and device, electronic equipment and storage medium
CN116628186A (en) * 2023-07-17 2023-08-22 乐麦信息技术(杭州)有限公司 Text abstract generation method and system
CN116628186B (en) * 2023-07-17 2023-10-24 乐麦信息技术(杭州)有限公司 Text abstract generation method and system

Similar Documents

Publication Publication Date Title
CN114416927B (en) Intelligent question-answering method, device, equipment and storage medium
CN112819023B (en) Sample set acquisition method, device, computer equipment and storage medium
CN111209738A (en) Multi-task named entity recognition method combining text classification
US11238050B2 (en) Method and apparatus for determining response for user input data, and medium
CN111767796A (en) Video association method, device, server and readable storage medium
CN113590811A (en) Text abstract generation method and device, electronic equipment and storage medium
CN115795061B (en) Knowledge graph construction method and system based on word vector and dependency syntax
CN111881264B (en) Method and electronic equipment for searching long text in question-answering task in open field
CN114036276A (en) Information extraction method, device, equipment and storage medium
CN113515620A (en) Method and device for sorting technical standard documents of power equipment, electronic equipment and medium
CN117520503A (en) Financial customer service dialogue generation method, device, equipment and medium based on LLM model
CN112417147A (en) Method and device for selecting training samples
CN116484025A (en) Vulnerability knowledge graph construction method, vulnerability knowledge graph evaluation equipment and storage medium
CN114547257B (en) Class matching method and device, computer equipment and storage medium
CN108733702B (en) Method, device, electronic equipment and medium for extracting upper and lower relation of user query
CN113468311B (en) Knowledge graph-based complex question and answer method, device and storage medium
JP7121819B2 (en) Image processing method and apparatus, electronic device, computer-readable storage medium, and computer program
CN114021541A (en) Presentation generation method, device, equipment and storage medium
CN110457455B (en) Ternary logic question-answer consultation optimization method, system, medium and equipment
CN112597208A (en) Enterprise name retrieval method, enterprise name retrieval device and terminal equipment
CN112541069A (en) Text matching method, system, terminal and storage medium combined with keywords
CN110688472A (en) Method for automatically screening answers to questions, terminal equipment and storage medium
CN112579769A (en) Keyword clustering method and device, storage medium and electronic equipment
CN111401034A (en) Text semantic analysis method, semantic analysis device and terminal
CN113283229B (en) Text similarity calculation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination