CN116108165A

CN116108165A - Text abstract generation method and device, storage medium and electronic equipment

Info

Publication number: CN116108165A
Application number: CN202310347275.1A
Authority: CN
Inventors: 韩国权; 蔡惠民; 高山; 董厚泽; 支婷; 洒科进; 曹扬
Original assignee: CETC Big Data Research Institute Co Ltd
Current assignee: CETC Big Data Research Institute Co Ltd
Priority date: 2023-04-04
Filing date: 2023-04-04
Publication date: 2023-05-12
Anticipated expiration: 2043-04-04
Also published as: CN116108165B

Abstract

The invention provides a text abstract generation method, a device, a storage medium and electronic equipment, wherein the method comprises the following steps: extracting keywords in a target text; performing quantity expansion on the corresponding keywords based on the importance degree of the keywords in the original word sequence with the keywords to obtain an effective word sequence of the target sentence; determining the correlation degree between the target sentences and other target sentences according to the effective word sequence; determining the influence weight of the target sentence according to the relevance; a text excerpt of the target text is formed based on the plurality of target sentences with highest impact weights. According to the technical scheme provided by the embodiment of the invention, the keywords are extracted first, the effective word sequence after the number of the keywords is expanded is determined, and the correlation degree between target sentences required when the text abstract is required to be extracted can be more accurately represented based on the effective word sequence, so that the influence weight of the target sentences can be more accurately determined, and the text abstract can be more accurately extracted.

Description

Text abstract generation method and device, storage medium and electronic equipment

Technical Field

The invention relates to the technical field of text summaries, in particular to a text summary generation method, a text summary generation device, a storage medium and electronic equipment.

Background

Text summary generation is one of the main directions of text generation tasks, which is an information compression technique that utilizes various techniques to automatically convert text into a short summary. Currently, the ways of generating text summaries mainly include two types: extraction and generation. Extraction refers to extracting one sentence or several sentences from the text to form a abstract. The generation formula refers to automatically generating a summary on a text basis, which is an end-to-end process.

In carrying out the inventive process, the inventors found that:

the existing extraction type can extract abstracts based on experience or graphs and the like, but the problem of poor abstracts quality easily exists; the generation formula requires training of the codec, for example, the Seq2Seq model, and a large amount of training data is required for learning and training, and is mainly used in the field with rich data sets such as news, and the effect in other fields is general.

Disclosure of Invention

In order to solve the problem that the existing scheme is difficult to quickly and accurately extract the text abstract, the embodiment of the invention aims to provide a text abstract generation method, a device, a storage medium and electronic equipment.

In a first aspect, an embodiment of the present invention provides a text summary generating method, including:

acquiring a target text to be processed;

extracting keywords in the target text;

determining an original word sequence of a target sentence in the target text, and expanding the number of corresponding keywords based on the importance degree of the keywords in the original word sequence with the keywords to obtain an effective word sequence of the target sentence; the expansion quantity of the keywords and the importance degree of the keywords are in positive correlation;

determining the relativity between the target sentences and other target sentences according to the similarity between the effective word sequences of the target sentences and the effective word sequences of other target sentences;

determining influence weights of the target sentences according to the relativity between the target sentences and other target sentences, wherein the influence weights of the target sentences are used for representing influence of the target sentences in the target text;

and forming a text abstract of the target text based on a plurality of target sentences with highest influence weights.

Optionally, the determining the impact weight of the target sentence according to the relatedness between the target sentence and other target sentences includes:

performing iterative execution of multiple rounds of influence weight updating operation until an iteration ending condition is met, and taking the influence weight determined at the end of the iteration as the influence weight of a corresponding target sentence;

wherein the influence weight update operation of the kth round includes:

according to the kth-1 round of influence weight of the ith target sentence and the kth-1 round of influence weight of the jth target sentence, updating the correlation degree between the ith target sentence and the jth target sentence, and determining the kth round of correlation degree between the ith target sentence and the jth target sentence; the ith target sentence and the jth target sentence are any two target sentences in the target text; the k-th round of correlation between the ith target sentence and the jth target sentence is in positive correlation with the k-1-th round of influence weight of the ith target sentence and the k-1-th round of influence weight of the jth target sentence;

generating a k-th round of adjacency matrix M _k The adjacency matrix M _k The element in (a) represents the kth round of relatedness between the ith target sentence and the jth target sentence;

according to the k-th round adjacency matrix M _k Updating the k-1 th round of influence weight of each target sentence, and determining the k-1 th round of influence weight of each target sentence, wherein the k-1 th round of influence weight of each target sentence satisfies the following conditions:

；

wherein n represents the total number of target sentences, TR _k (V _i ) Representing the ith target sentence V _i The kth round of influence weight, TR _k-1 (V _i ) Representing the ith target sentence V _i Impact weight, m of the k-1 th round _ki Representing the adjacency matrix M _k The sum of all elements of row i, i=1, 2, …, n; d represents a preset damping coefficient, and d is more than 0 and less than 1;

r represents a column vector of n dimensions with all elements being 1.

Optionally, the updating the relevance between the ith target sentence and the jth target sentence according to the kth-1 round of influence weight of the ith target sentence and the kth-1 round of influence weight of the jth target sentence, and determining the kth round of relevance between the ith target sentence and the jth target sentence includes:

determining a relativity correction term Deltaw between an ith target sentence and a jth target sentence according to the kth-1 round influence weight of the ith target sentence and the kth-1 round influence weight of the jth target sentence _k ：

；

Wherein a is a preset coefficient, a is more than 0 and less than 0.5, k represents the current round, and T is a preset iteration total round; f () is a preset function, and

representation and representationIth target sentence V _i The k-1 th round of impact weight TR _k-1 (V _i ) Is a function of positive correlation and +.>

<1；

Increasing the correction term Δw for the kth-1 round of correlation between the ith target sentence and the jth target sentence _k And generating a kth round of relevance between the ith target sentence and the jth target sentence.

Optionally, the preset function satisfies:

； or ,

；

wherein ,

the method comprises the steps of carrying out a first treatment on the surface of the u=1, 2, …, L being a preset positive integer.

In a second aspect, an embodiment of the present invention further provides a text summary generating device, including:

the acquisition module is used for acquiring a target text to be processed;

the keyword extraction module is used for extracting keywords in the target text;

the word sequence updating module is used for determining an original word sequence of a target sentence in the target text, and expanding the number of corresponding keywords based on the importance degree of the keywords in the original word sequence with the keywords to obtain an effective word sequence of the target sentence; the expansion quantity of the keywords and the importance degree of the keywords are in positive correlation;

the relevance determining module is used for determining the relevance between the target sentence and other target sentences according to the similarity between the effective word sequence of the target sentence and the effective word sequences of other target sentences;

the influence weight determining module is used for determining influence weights of the target sentences according to the relativity between the target sentences and other target sentences, wherein the influence weights of the target sentences are used for representing influence of the target sentences in the target text;

and the abstract module is used for forming a text abstract of the target text based on a plurality of target sentences with highest influence weights.

Optionally, the influence weight determining module determines the influence weight of the target sentence according to the relatedness between the target sentence and other target sentences, including:

wherein the influence weight update operation of the kth round includes:

；

wherein n represents the total number of target sentences, TR _k (V _i ) Representing the ith target sentence V _i The kth round of influence weight, TR _k-1 (V _i ) Representing the ith target sentence V _i Impact weight, m of the k-1 th round _ki Representing the adjacency matrix M _k The sum of all elements of row i, i=1, 2, …, n; d represents a preset damping coefficient, and d is more than 0 and less than 1; r represents a column vector of n dimensions with all elements being 1.

In a third aspect, an embodiment of the present invention further provides a computer storage medium, where computer executable instructions are stored, where the computer executable instructions are used in the text summary generating method described in any one of the foregoing aspects.

In a fourth aspect, an embodiment of the present invention further provides an electronic device, including:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any one of the text excerpt generation methods described above.

In the scheme provided by the first aspect of the embodiment of the invention, the keywords in the target text are extracted first, and then the number of the keywords in the target sentence is expanded to form an effective word sequence with larger influence of the keywords; and determining the relativity between target sentences based on the effective word sequences, and further determining the influence weight of the target sentences, thereby selecting the target sentences suitable for being used as text abstracts. According to the method, the keywords are extracted first, then the effective word sequence with the number of the keywords expanded is determined, the correlation degree between target sentences required when the text abstract is required to be extracted can be more accurately represented based on the effective word sequence, so that the influence weight of the target sentences can be more accurately determined, and the text abstract can be more accurately extracted. In addition, when the influence weight is determined based on the correlation degree, a training model is not needed, the influence of training data is avoided, and the processing efficiency is higher.

In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 shows a flowchart of a text summary generation method provided by an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a text abstract generating device according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device for executing a text abstract generating method according to an embodiment of the invention.

Detailed Description

In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

In the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured," and the like are to be construed broadly and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.

The method for generating the text abstract provided by the embodiment of the invention is shown in fig. 1, and comprises the following steps:

step 101: and obtaining the target text to be processed.

The text to be processed is a text to be extracted from the text abstract, and for convenience of description, the text is called a target text. For example, the target text may be an article or the like that requires a text digest to be extracted.

Step 102: and extracting keywords in the target text.

The target text contains words which can better represent text meaning, namely keywords. For example, keywords may be extracted based on TF-IDF (term frequency-inverse text frequency) index, or based on TextRank algorithm or the like, which is not limited in this embodiment.

Step 103: and determining an original word sequence of the target sentence in the target text, and carrying out quantity expansion on the corresponding keywords based on the importance degree of the keywords in the original word sequence with the keywords to obtain an effective word sequence of the target sentence. Wherein, the expansion number of the keywords and the importance degree of the keywords are in positive correlation.

Firstly, dividing a target text, extracting sentences in the target text, namely target sentences, and forming word sequences of the target sentences, namely original word sequences. For example, the target sentence may be segmented (the segmentation process may be the segmentation process referred to in step 102 above, and no repeated segmentation is required) and the part-of-speech tagging may be performed, and then the stop word is filtered, so that only meaningful words, such as nouns, verbs, adjectives, and the like, are retained, and an original word sequence of the target sentence is obtained, where the original word sequence includes at least one word in the target sentence in a sequential order. And, the target text contains a plurality of target sentences, and the original word sequence of each target sentence can be determined.

Some of the target sentences in the target text contain keywords, while the rest of the target sentences do not contain keywords. For target sentences containing keywords, the embodiment of the invention expands the number of the keywords in the original word sequence of the target sentences, namely expands the keywords from one to a plurality of keywords, and the expanded original word sequence is called as an effective word sequence. For example, the original word sequence of a target text A is [ a ] ₁ ,a ₂ ,a ₃ ,a ₄ ,a ₅ ]，a _i Representing words in the target text a; if a is ₃ As the key word, the key word a can be used ₃ The number of the expanded original word sequences (i.e. target word sequences) can be: [ a ] ₁ ,a ₂ ,a ₃ ,a ₃ ,a ₃ ,a ₄ ,a ₅ ]I.e. keyword a ₃ The number of (2) extends from one to three. For target sentences which do not contain keywords, the original word sequence is directly used as an effective word sequence.

In addition, in the embodiment of the invention, when the number of the keywords is expanded, the number of the keywords to be expanded and the importance degree of the keywords are in positive correlation, namely, the higher the importance degree of the keywords is, the more the number of the keywords to be expanded is. Wherein the importance degree of the keywords is an index determined when extracting the keywords in the target text; the higher the importance of a word, the more likely that the word is to be used as a keyword; for example, the importance level is TF-IDF.

Step 104: and determining the correlation degree between the target sentence and other target sentences according to the similarity degree between the effective word sequence of the target sentence and the effective word sequences of other target sentences.

In the embodiment of the invention, the effective word sequence of the target sentence is the word sequence of the expanded keywords, and the word sequence can be used for representing the importance degree of the target sentence. The embodiment of the invention determines the similarity degree between two sentences based on the effective word sequences of the two sentences, and can take the similarity degree as the correlation degree between the two sentences; alternatively, the degree of similarity may be normalized, and the degree of similarity after normalization may be used as the degree of correlation between the two. The euclidean distance between the valid word sequences of the two target sentences can be used as the similarity degree between the valid word sequences, or word vectors corresponding to the two valid word sequences can be determined based on a word model (such as word2 vec), and cosine similarity between the two word vectors is used as the similarity degree between the two word vectors.

Step 105: and determining the influence weight of the target sentence according to the relativity between the target sentence and other target sentences, wherein the influence weight of the target sentence is used for representing the influence of the target sentence in the target text.

Step 106: a text excerpt of the target text is formed based on the plurality of target sentences with highest impact weights.

In the embodiment of the invention, the correlation degree between any two target sentences in the target text can be determined, the overall correlation degree condition can represent the correlation degree between each target sentence and other target sentences, and the more important target sentences are, the larger the influence (namely the influence weight) of the target sentences is, the larger the correlation degree between the target sentences and other target sentences is, so that the influence weight of the target sentences can be determined based on the correlation degree between the target sentences and other target sentences. For example, the impact weight of each target sentence may be determined based on a conventional TextRank algorithm. After determining the impact weight of each target sentence, the target sentences with the highest impact weights can be used as the text digests of the target texts. For example, all target sentences are ordered according to the influence weight, and the top-ranked (for example, the top 1%) target sentences are used as the text abstract of the target text; or, taking a plurality of target sentences with the influence weight larger than a preset threshold as the text abstract of the target text.

According to the text abstract generation method provided by the embodiment of the invention, the keywords in the target text are extracted firstly, and then the number of the keywords in the target sentences is expanded to form an effective word sequence with larger influence of the keywords; and determining the relativity between target sentences based on the effective word sequences, and further determining the influence weight of the target sentences, thereby selecting the target sentences suitable for being used as text abstracts. According to the method, the keywords are extracted first, then the effective word sequence with the number of the keywords expanded is determined, the correlation degree between target sentences required when the text abstract is required to be extracted can be more accurately represented based on the effective word sequence, so that the influence weight of the target sentences can be more accurately determined, and the text abstract can be more accurately extracted. In addition, when the influence weight is determined based on the correlation degree, a training model is not needed, the influence of training data is avoided, and the processing efficiency is higher.

Optionally, the embodiment of the invention improves the TextRank algorithm, and determines the influence weight based on the improved TextRank algorithm. Specifically, the step 105 "determining the influence weight of the target sentence according to the degree of correlation between the target sentence and the other target sentences" described above includes:

step B1: and iteratively executing multiple rounds of influence weight updating operation until the iteration ending condition is met, and taking the influence weight determined at the time of iteration ending as the influence weight of the corresponding target sentence.

Wherein the influence weight updating operation of each round is used for iteratively updating the influence weight of each target sentence. When the iteration turns reach a preset value (such as 200, etc.), the iteration is ended; or if the influence weight of each target sentence is converged, the iteration ending condition is also met, and the iteration is ended.

Specifically, in step B1, the influence weight updating operation of the kth round includes:

step B11: according to the kth-1 round of influence weight of the ith target sentence and the kth-1 round of influence weight of the jth target sentence, updating the correlation degree between the ith target sentence and the jth target sentence, and determining the kth round of correlation degree between the ith target sentence and the jth target sentence; the ith target sentence and the jth target sentence are any two target sentences in the target text; the k-th round of relevance between the ith target sentence and the jth target sentence is positive correlation with the k-1 th round of influence weight of the ith target sentence and the k-1 th round of influence weight of the jth target sentence.

Step B12: generating a k-th round of adjacency matrix M _k Adjacency matrix M _k The element in (c) represents the kth round of relevance between the ith target sentence and the jth target sentence.

Step B13: according to the k-th round adjacency matrix M _k Updating the k-1 th round of influence weight of each target sentence, and determining the k-1 th round of influence weight of each target sentence, wherein the k-1 th round of influence weight of each target sentence meets the following conditions:

；

wherein n represents the total number of target sentences, TR _k (V _i ) Representing the ith target sentence V _i The kth round of influence weight, TR _k-1 (V _i ) Representing the ith target sentence V _i Impact weight, m of the k-1 th round _ki Representing an adjacency matrix M _k The sum of all elements of row i, i=1, 2, …, n; d represents a preset damping coefficient, 0 < d < 1, e.g., d=0.85; r represents an n-dimensional column vector with all elements 1, i.e

。

In the embodiment of the invention, for any two target sentences in the target text, the relevance of the current round (namely the kth round) is updated based on the influence weight of the previous round (namely the kth round-1), so that the influence weight of the current round is determined, and the update of the influence weight is realized. Specifically, in V _i Represents the ith target sentence in V _j Representing the j-th target sentence, i and j each being a positive integer of 1 to n, n representing the total number of target sentences; represented by TRImpact weight, and the k-1 th round of impact weight of the ith target sentence is denoted as TR _k-1 (V _i ) Accordingly, the k-1 th round of influence weight of the jth target sentence is TR _k-1 (V _j ) The method comprises the steps of carrying out a first treatment on the surface of the The (k-1) th round of impact weight TR of the (i) th target sentence _k-1 (V _i ) The larger it is, the more important it is, and the more correlation can be set for it; similarly, the k-1 th round of the jth target sentence affects the weight TR _k-1 (V _j ) The larger the correlation degree is, namely the k-th round of correlation degree between the ith target sentence and the jth target sentence is positive correlation relation with the k-1-th round of influence weight of the ith target sentence and the k-1-th round of influence weight of the jth target sentence.

After the updating determines the relatedness between any two target sentences, the adjacency matrix of the target text can be updated. In the embodiment of the invention, each element in the adjacency matrix represents the correlation degree between two target sentences; specifically, for the k-th round, the adjacency matrix M _k Wherein the element represents the kth round of correlation between the ith target sentence and the jth target sentence, the adjacency matrix M _k Is an n x n matrix. For example, to

Representing the kth round of correlation between the ith and jth target sentences, then the adjacency matrix M _k Can be expressed as:

。

wherein the relativity between any ith target sentence and itself is 0, namely

。

And, the embodiment of the invention determines the adjacency matrix M _k The sum of all elements of each row in the row i is m _ki I.e.

. For example, the sum of all elements of line 1 +.>

. Then, the kth round of influence weight of each target sentence can be updated and determined based on the step B13, namely: />

。

Wherein the kth round of impact weight of the ith target sentence is denoted as TR _k (V _i ) The kth round of influence weight of the jth target sentence is TR _k (V _j ). In the embodiment of the present invention, the k-1 th round of impact weight for each target sentence is divided by the sum of all elements of the corresponding line, e.g.,

the method comprises the steps of carrying out a first treatment on the surface of the Even if the correlation between the target sentences is updated, convergence of the Markov (Markov) process can be always guaranteed, and finally, the converged influence weight can be obtained.

In addition, the initial relevance of the target sentence is the relevance determined in step 104, and the initial impact weight of the target sentence is the average value of all target sentences; for example, if the target text contains n target sentences, then the initial impact weight of each target sentence is 1/n. For example, in the influence weight update operation of round 1, the k-1 influence weight of each target sentence is 1/n.

Optionally, the step B11 "the step of updating the degree of correlation between the ith target sentence and the jth target sentence according to the kth-1 round of influence weight of the ith target sentence and the kth-1 round of influence weight of the jth target sentence" the step of determining the degree of correlation between the ith target sentence and the jth target sentence includes:

step B111: determining a relativity correction term delta w between the ith target sentence and the jth target sentence according to the kth-1 round influence weight of the ith target sentence and the kth-1 round influence weight of the jth target sentence _k ：

；

representing and i-th target sentence V _i The k-1 th round of impact weight TR _k-1 (V _i ) Is a function of positive correlation and +.>

<1。

Step B112: adding a correction term Deltaw to the k-1 th round of relevance between the ith target sentence and the jth target sentence _k And generating the kth round of relevance between the ith target sentence and the jth target sentence. For example, the number of the cells to be processed,

。

in the embodiment of the invention, a correction term for correcting the relevance of the current round is determined based on the influence weight of the previous round of the target sentence, and the correction term is added on the basis of the previous round of relevance, so that the relevance of the current round is updated.

Embodiments of the invention

Representing the correction term when updating the relevance of two target sentences. Specifically, if the ith target sentence and the kth-1 th round of impact weight TR of the jth target sentence _k-1 (V _i )、TR _k-1 (V _j ) If the correlation degree is larger, the correlation degree between the two is considered to be higher, and the correlation degree between the two can be properly increased, so that the k-th round correlation degree between the two is obtained; if the influence weights of the ith target sentence and the jth target sentence are the k-1 th round of influence weights TR _k-1 (V _i )、TR _k-1 (V _j ) If the correlation degree is smaller, the correlation degree is possibly larger, but the correlation degree is smaller when the influence weight is determined; if the ith target sentence and the jth target sentenceInfluence weight of child k-1 th round influence weight TR _k-1 (V _i )、TR _k-1 (V _j ) And if the two are larger or smaller, the correlation degree of the two is smaller.

Furthermore, the embodiment of the invention sets the adjustment coefficient related to the round k for the correction term based on the sigmoid function, namely

The method comprises the steps of carrying out a first treatment on the surface of the Where a is generally set to a small value, for example, a=0.1. In the embodiment of the invention, in the initial iteration stage (namely when the k value is smaller), the influence weight is not accurate, and the influence of the influence weight on the correlation degree is set to be smaller so as to avoid obtaining unsuitable or even wrong correlation degree; in the middle iteration stage, the influence of the influence weight on the correlation degree can be gradually increased, so that the convergence process can be accelerated while the correlation degree is updated; in the later iteration stage (i.e. when the k value is larger, for example, when k is close to the total round T), the influence of the influence weight on the correlation degree is larger, and the degree of the influence is kept basically unchanged, so that the problem that the correlation degree is difficult to converge at last due to updating can be avoided.

Wherein, since the influence weight itself is smaller than 1, the above-mentioned preset function f () can satisfy:

i.e. the correction term is determined directly based on the impact weight of the previous round.

Alternatively, the correction term may be determined based on the influence weights of the previous rounds, i.e., the preset function f () satisfies:

。

in an embodiment of the present invention,

weights representing the respective round impact weights; the weighting summation is carried out on the previous multi-round influence weight, and the weighting summation result is taken as +.>

And thereby determine the correction term.

Specifically, the number of rounds L that need to be selected forward is set, i.e., the impact weights of the previous L rounds are determined. At the kth round, the impact weights at the kth round 1 (u=l), the kth round 2 (u=l-1), … …, the kth round L (u=1) need to be determined; and, the weight is

The method meets the following conditions: />

U=1, 2, …, L, which is a power function and monotonically increases. As described above, as the iterative process proceeds, the more accurate the influence weights are, the embodiment of the present invention is obtained by setting +.>

A lower weight may be set for the lower turn (corresponding to a smaller u value)>

Setting a larger weight for the high rounds (corresponding to a larger u value)

. And, ownership ∈>

The sum is 1, i.e.)>

。

The text abstract generating method flow is described in detail above, the method can also be realized by the corresponding device, and the structure and function of the device are described in detail below.

Based on the same inventive concept, the embodiment of the present invention further provides a text abstract generating device, as shown in fig. 2, including:

an acquisition module 21, configured to acquire a target text to be processed;

a keyword extraction module 22, configured to extract keywords in the target text;

the word sequence updating module 23 is configured to determine an original word sequence of a target sentence in the target text, and perform quantity expansion on corresponding keywords based on importance degrees of the keywords in the original word sequence with the keywords, so as to obtain an effective word sequence of the target sentence; the expansion quantity of the keywords and the importance degree of the keywords are in positive correlation;

a relevance determining module 24, configured to determine a relevance between the target sentence and other target sentences according to a similarity between the valid word sequence of the target sentence and valid word sequences of other target sentences;

an impact weight determining module 25, configured to determine an impact weight of the target sentence according to a correlation between the target sentence and other target sentences, where the impact weight of the target sentence is used to represent an impact of the target sentence in the target text;

the summarization module 26 is configured to form a text summary of the target text based on the target sentences with the highest impact weights.

Optionally, the impact weight determining module 25 determines the impact weight of the target sentence according to the correlation degree between the target sentence and other target sentences, including:

wherein the influence weight update operation of the kth round includes:

；

Optionally, the impact weight determining module 25 updates the relevance between the ith target sentence and the jth target sentence according to the kth-1 round of impact weight of the ith target sentence and the kth-1 round of impact weight of the jth target sentence, and determines the kth round of relevance between the ith target sentence and the jth target sentence, including:

；/>

<1；

Optionally, the preset function satisfies:

； or ,

；

wherein ,

The embodiment of the present invention also provides a computer storage medium storing computer-executable instructions containing a program for executing the above-described text digest generation method, the computer-executable instructions being capable of executing the method of any of the above-described method embodiments.

The computer storage media may be any available media or data storage device that can be accessed by a computer, including, but not limited to, magnetic storage (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical storage (e.g., CD, DVD, BD, HVD, etc.), and semiconductor storage (e.g., ROM, EPROM, EEPROM, nonvolatile storage (NAND FLASH), solid State Disk (SSD)), etc.

Fig. 3 shows a block diagram of an electronic device according to another embodiment of the invention. The electronic device 1100 may be a host server with computing capabilities, a personal computer PC, or a portable computer or terminal that is portable, etc. The specific embodiments of the present invention are not limited to specific implementations of electronic devices.

The electronic device 1100 includes at least one processor 1110, a communication interface (Communications Interface) 1120, a memory 1130, and a bus 1140. Wherein processor 1110, communication interface 1120, and memory 1130 communicate with each other through bus 1140.

The communication interface 1120 is used to communicate with network elements including, for example, virtual machine management centers, shared storage, and the like.

The processor 1110 is used to execute programs. The processor 1110 may be a central processing unit CPU, or an application specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention.

The memory 1130 is used for executable instructions. Memory 1130 may include high-speed RAM memory or non-volatile memory (nonvolatile memory), such as at least one magnetic disk memory. Memory 1130 may also be a memory array. Memory 1130 may also be partitioned and the blocks may be combined into virtual volumes according to certain rules. The instructions stored in memory 1130 may be executable by processor 1110 to enable processor 1110 to perform the text excerpt generation method of any of the method embodiments described above.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A text summary generation method, comprising:

acquiring a target text to be processed;

extracting keywords in the target text;

2. The method of claim 1, wherein the determining the impact weight of the target sentence according to the degree of correlation between the target sentence and other target sentences comprises:

wherein the influence weight update operation of the kth round includes:

；

r represents a column vector of n dimensions with all elements being 1.

3. The method according to claim 2, wherein the updating the degree of correlation between the ith target sentence and the jth target sentence according to the kth-1 round of impact weight of the ith target sentence and the kth-1 round of impact weight of the jth target sentence, determining the kth round of degree of correlation between the ith target sentence and the jth target sentence, comprises:

；

Wherein a is a pre-preparationSetting coefficients, wherein a is more than 0 and less than 0.5, k represents the current round, and T is the preset iteration total round; f () is a preset function, and

< 1；

4. A method according to claim 3, wherein the predetermined function satisfies:

； or ,

；

wherein ,

5. A text digest generating apparatus, comprising:

the acquisition module is used for acquiring a target text to be processed;

6. The apparatus of claim 5, wherein the impact weight determination module determines the impact weight of the target sentence based on a degree of correlation between the target sentence and other target sentences, comprising:

wherein the influence weight update operation of the kth round includes:

；

r represents a column vector of n dimensions with all elements being 1.

7. A computer storage medium storing computer executable instructions for performing the text digest generation method of any one of claims 1-4.

8. An electronic device, comprising:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the text excerpt generation method of any of claims 1-4.