CN113468854A - Multi-document automatic abstract generation method - Google Patents

Multi-document automatic abstract generation method Download PDF

Info

Publication number
CN113468854A
CN113468854A CN202110703934.1A CN202110703934A CN113468854A CN 113468854 A CN113468854 A CN 113468854A CN 202110703934 A CN202110703934 A CN 202110703934A CN 113468854 A CN113468854 A CN 113468854A
Authority
CN
China
Prior art keywords
document
vector
abstract
attention
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110703934.1A
Other languages
Chinese (zh)
Inventor
杨鹏
周华健
刘子健
李文军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Huaxun Technology Co ltd
Original Assignee
Zhejiang Huaxun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Huaxun Technology Co ltd filed Critical Zhejiang Huaxun Technology Co ltd
Priority to CN202110703934.1A priority Critical patent/CN113468854A/en
Publication of CN113468854A publication Critical patent/CN113468854A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method for automatically generating a summary abstract of multiple documents, which can automatically generate a summary abstract for multiple texts under the same theme. Firstly, preprocessing a preset text abstract data set to obtain input data required by model training; then, a hierarchical Transformer multi-document abstract generation model is constructed, and model training is carried out by combining triple loss and cross entropy loss; and finally, preprocessing a plurality of texts to be processed, inputting the preprocessed plurality of texts into the trained abstract model, and automatically generating the summarized abstract of the plurality of texts. Compared with the prior art, the method and the device can provide rich hierarchical structure information for the summary generation process by effectively combining the semantic information in the document and the dependency relationship between the documents, thereby improving the context consistency and the information coverage of the summary result.

Description

Multi-document automatic abstract generation method
Technical Field
The invention relates to a multi-document automatic abstract generation method, and belongs to the technical field of Internet and artificial intelligence.
Background
In recent years, with the rapid development of internet technology, networks have become an important channel for people to obtain information, however, network information has the characteristics of redundant content and huge quantity, and the efficiency of people to obtain important information is greatly reduced. The Multi-Document Summarization (MDS) technology aims to analyze, refine and integrate multiple documents with the same or similar topics to generate a summarized summary capable of summarizing a central topic, and can effectively realize content aggregation of multiple documents on the same topic, thereby helping a user quickly and clearly understand the main content of Document information.
At present, the mainstream multi-document summarization technology generally utilizes a deep neural network model to perform rich semantic vector coding on a vocabulary and a document respectively at two levels, so that the dependency relationship between the vocabulary semantics in the document and the document is captured, and then document level information is utilized to perform summarization generation. However, the above method mainly has the following three problems: firstly, in order to extract a cross-document relationship, a document needs to be subjected to feature representation firstly, however, the existing method lacks global explicit constraint, so that important information is easy to be lost in document representation, and document relationship modeling is not facilitated; secondly, a plurality of documents with the same theme have an obvious information overlapping problem, and the generated abstract has more redundant information easily under the condition of not screening; thirdly, the existing method simply fuses the document hierarchical information in the forms of splicing or adding and the like, and the deep association of the hierarchical features is difficult to effectively construct.
Disclosure of Invention
Aiming at the problems and the defects in the prior art, the invention provides the multi-document automatic abstract generating method which can effectively combine the dependency relationship between the semantic information in the document and provide rich hierarchical structure information for the abstract generating process, thereby improving the context consistency and the information coverage of the abstract result.
In order to achieve the purpose, the method for generating the multi-document automatic abstract comprises the steps of firstly extracting sub-theme representation of a document and constructing a central theme representation of a document set, and further generating a document vector with more theme relevance; then, filtering semantic information in the document through an information gating mechanism to obtain a vocabulary vector with more remarkable information; and finally, performing information integration on two levels of the document and the vocabulary by using a hierarchical attention mechanism, and fusing semantic information of the two levels into a context vector so as to guide the abstract generating process. The method mainly comprises four steps, specifically as follows:
step 1: the method comprises the steps of data preprocessing, namely performing truncation, word embedding and position coding on texts in a preset text abstract data set, adding word embedding representation and position coding to obtain word vector representation of each word, wherein each sample in the preset data set comprises a plurality of texts with the same theme and a corresponding artificial abstract;
step 2: and constructing and training a multi-document abstract generation model. Firstly, extracting vocabulary semantic information from a sample text after word vector representation by using a Transformer coding module, and integrating the vocabulary semantic information by using a topic fusion attention module to generate document vector representation; then realizing information interaction among the documents through multi-head self-attention, and obtaining document vector representation containing document dependency relationship through a residual structure, layer normalization and a feedforward neural network; then, through information gating, information screening is carried out on the vocabulary semantic vectors, the semantic vectors obtained by the vocabulary and the document in two layers are fused through a layering attention mechanism, and the generated context vector is used for guiding abstract generation; and finally, training the model by utilizing the triple loss and the cross entropy loss.
And step 3: and generating the abstract of the plurality of texts to be abstracted. For the text to be abstracted, firstly, text truncation, word embedding and position coding are carried out, and the obtained word vector representation is input into the abstract generation model trained in the step 3 to generate the text topic abstract.
Compared with the prior art, the invention has the following advantages:
(1) the document expression method based on topic fusion attention can guide vocabulary semantic vectors to generate document vector expression with more comprehensive information and more relevant topics by constructing the central topic, and the problem of important information loss is relieved;
(2) the information gating mechanism adopted by the invention can pre-filter the information of the vocabulary semantic vectors, reduce the interference of redundant information and effectively improve the accuracy of the abstract result;
(3) the hierarchical attention mechanism adopted by the invention integrates and associates the vocabulary semantic information in the document and the external associated information between the document hierarchically, can effectively construct deep association of hierarchical characteristics, can provide rich hierarchical semantic information for the abstract generation process, and improves the context consistency of the abstract result.
Drawings
FIG. 1 is a diagram of a multi-document abstract model framework according to an embodiment of the present invention.
Fig. 2 is an overall structural view of the subject fusion attention module.
Detailed Description
The invention will be further illustrated with reference to specific examples in order to provide a better understanding and appreciation of the invention.
Example (b): referring to fig. 1 and fig. 2, the method for generating an automatic summary of multiple documents provided by the present invention includes the following specific implementation steps:
step 1, data preprocessing, in this embodiment, a preset data set D is preprocessed first. Performing truncation processing on M texts included in each sample, wherein the length of each text after truncation is Len/M, and if the length of the text before truncation is less than Len/M, the lengths of the texts before and after truncation are unchanged, where Len in this embodiment is 1500; performing word embedding mapping and position coding on each cut text, wherein a word embedding matrix is a parameter matrix which can be learned, and the position coding adopts a position coding module in a transform model;
step 2, the data set D processed in the step 1 is used for building and training a multi-document abstract generation model, and the implementation of the step can be divided into the following substeps:
substep 2-1, constructing an internal feature extraction layer using l stacksThe Transformer coding sublayer extracts semantic information expressed by each input text word vector in each sample to obtain the vocabulary semantic vector expression of the jth word in the ith input text
Figure BDA00031304395900000310
And a document vector representation with fixed dimensionality is constructed on the basis of the vocabulary semantic vector representation through a topic fusion attention module, and the topic fusion attention module comprises three parts of subtopic coding, subtopic fusion and attention calculation. The sub-topic coding adopts a two-layer bidirectional LSTM network to generate the sub-topic vector representation of each text, the input of a sub-topic coder is a vocabulary semantic vector sequence, and the output forward hidden state is
Figure BDA0003130439590000031
And backward hidden state
Figure BDA0003130439590000032
Stitching to obtain sub-topic vector representations
Figure BDA0003130439590000033
Calculating the central theme vector of the input text set by adopting a weighted summation mode
Figure BDA0003130439590000034
Figure BDA0003130439590000035
Figure BDA0003130439590000036
Wherein the weight factor wiRepresenting the degree of contribution of the sub-topic vector to the central topic vector, N representing the total number of documents in the document set, TsumVector sum of all document subtopic vectors in the document set, [ T ]i;Tsum]Is TiAnd TsumOfAnd v is a learnable weight matrix parameter. Center based topic vector
Figure BDA0003130439590000037
Integrating the vocabulary semantic vectors by adopting an attention mechanism and constructing a document vector representation
Figure BDA0003130439590000038
Figure BDA0003130439590000039
Figure BDA0003130439590000041
Wherein,
Figure BDA0003130439590000042
for the lexical semantic vector sequence of the ith document, WdJ is the number of words contained in the input document, which is a learnable parameter matrix.
And a substep 2-2 of constructing an external information interaction layer, wherein the embodiment adopts a Multi-Head Self Attention mechanism (Multi-Head Self Attention mechanism) to realize information interaction between documents so as to capture the association information between the documents, and the input is vector representation of each document
Figure BDA0003130439590000043
In this embodiment, the value of the attention head number is 8, and a final document vector d is obtained through a residual structure, layer normalization and a feedforward neural network modulei
And a sub-step 2-3 of information gating filtering, wherein the information filtering is carried out on the vocabulary semantic vector generated by the encoder by utilizing the document sub-theme vector representation so as to reduce unnecessary redundant content. For the jth word in the ith document, the corresponding gating vector
Figure BDA0003130439590000044
The calculation formula of (a) is as follows:
Figure BDA0003130439590000045
wherein, Wg、Ug、bgFor learnable parameters, σ (-) is sigmoid function, and then vector point multiplication is carried out on the vocabulary semantic vector by using the gating vector to realize information filtering:
Figure BDA0003130439590000046
substep 2-4, hierarchical attention calculation. In the embodiment, a hierarchical attention mechanism is adopted to fuse the document vector and the vocabulary vector to generate the context vector containing rich hierarchical semantic information, and the input of the mechanism comprises three parts, namely all document vectors d obtained in the substep 2-2 and vocabulary semantic vectors filtered in the substep 2-3
Figure BDA0003130439590000047
And the hidden state vector h at the current decoding momenttWherein h istFrom the input y at the moment t of the decodertObtained by word embedding, position coding, multi-head self-attention shielding, residual error connection and layer normalization and output, and y is obtained in the training processtThe t-th word of the artificial summary contained in the sample. The mechanism first performs attention calculation at the document level to generate a document context vector
Figure BDA0003130439590000048
Figure BDA0003130439590000049
Wherein,
Figure BDA0003130439590000051
for attention weight, from htAnd all ofThe document vector d is calculated in the form of equation 3. Attention calculation is then performed at the lexical level and document attention weights are used
Figure BDA0003130439590000052
And (3) adjusting:
Figure BDA0003130439590000053
Figure BDA0003130439590000054
wherein,
Figure BDA0003130439590000055
for the lexical attention weight of the jth word in the ith document,
Figure BDA0003130439590000056
is a lexical context vector. Finally, the context vectors obtained respectively on the document level and the vocabulary level are subjected to dimension splicing and linear mapping to generate a context vector ct
Figure BDA0003130439590000057
Wherein WcAre learnable weight parameters.
Substep 2-5, constructing abstract probability layer, and comparing context vector ctAnd hidden state vector htObtaining an output vector of a decoder at the t moment through residual connection, layer normalization and a feedforward neural network
Figure BDA0003130439590000058
And converting the predicted probability distribution P into abstract words through a full connection layer fc and softmax function, wherein the calculation formula is as follows:
Figure BDA0003130439590000059
substep 2-6, constructing a loss function layer, wherein the loss function layer adopts a cross entropy loss function L of a prediction abstract and an artificial abstractSAnd triple loss function L of document topic extractionTTo construct the overall loss function of the model. Where the triplet loss function is calculated as follows
LT=max{d(TA,TP)-d(TA,TN)+Margin,0} (12)
d(TA,TP)=1-cos(TA,TP) (13)
d(TA,TN)=1-cos(TA,TN) (14)
Ltotal=αLS+βLT (15)
Wherein L isTFor triple loss, Margin is a boundary distance, the value of the embodiment is 1, so that the difference between the positive example P and the negative example N in the subject semantics is ensured, and T isASubtopic vector, T, being a true summaryPFor the central topic vector, T, of the input document setNA central topic vector for another sample set of documents; the cos function is used for calculating cosine values of an included angle between two theme vectors and measuring semantic similarity between the theme vectors; alpha and beta are over parameters, which represent respective weight coefficients of the two losses, and values of 0.9 and 1.5 are taken in this embodiment, respectively. L isSPredicted cross entropy loss for abstract words; l istotalIs the overall loss function of the model.
And substep 2-7, model training. In this embodiment, all the parameters to be trained are initialized in a random initialization manner, an Adam optimizer is used for gradient back propagation to update the model parameters in the training process, the initial learning rate is set to 0.001, and β is1、β2The coefficients are set to 0.9 and 0.998, respectively, the batch size is set to 16, and model training is stopped after 3 consecutive rounds of the loss function no longer decline or the training round exceeds 50.
And 3, generating the abstract by using the trained model. Preprocessing a plurality of texts to be abstracted in a mode of step 1 and inputting the texts into a trained model, wherein the initial input of a decoder is a special mark "< START >", a prediction abstract word at each moment is a word with the maximum probability output by an abstraction probability layer, the prediction abstract word is used as the input of the decoder at the next moment, and when an output END mark "< END >", the abstract generation is stopped, and the generated abstract word is output as the prediction abstract of an input text set.
Based on the same inventive concept, the embodiment of the present invention further provides a multi-document automatic summary generation apparatus, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the computer program is loaded into the processor, the multi-document automatic summary generation method is implemented.
It is to be understood that the examples are for purposes of illustration only and are not intended to limit the scope of the invention, which is to be construed as broadly as the equivalent forms of the present invention which will be defined in the appended claims to which the present invention pertains, as the skilled artisan will appreciate upon review of the present disclosure.

Claims (6)

1. A multi-document automatic abstract generation method is characterized by comprising the following steps:
step 1: the pre-processing of the data is carried out,
step 2: constructing and training a multi-document abstract generation model;
and step 3: and generating the abstract of the plurality of texts to be abstracted.
2. The method for generating the multi-document automatic abstract according to claim 1, wherein the step 1: data preprocessing is specifically as follows: the method comprises the steps of performing truncation, word embedding and position coding on texts in a preset text summary data set, adding word embedding representation and position coding to obtain word vector representation of each word, wherein each sample in the preset data set comprises a plurality of texts with the same theme and a corresponding artificial summary.
3. The method for generating the multi-document automatic abstract according to claim 1, wherein the step 2: constructing and training a multi-document abstract generation model; the method comprises the following specific steps: firstly, extracting vocabulary semantic information from a sample text after word vector representation by using a Transformer coding module, and integrating the vocabulary semantic information by using a topic fusion attention module to generate document vector representation; then realizing information interaction among the documents through multi-head self-attention, and obtaining document vector representation containing document dependency relationship through a residual structure, layer normalization and a feedforward neural network; then, through information gating, information screening is carried out on the vocabulary semantic vectors, the semantic vectors obtained by the vocabulary and the document in two layers are fused through a layering attention mechanism, and the generated context vector is used for guiding abstract generation; and finally, training the model by utilizing the triple loss and the cross entropy loss.
4. The method for generating the multi-document automatic abstract according to claim 1, wherein the step 3: the method comprises the following steps of generating abstracts for a plurality of texts to be abstracted, specifically: for the text to be abstracted, firstly, text truncation, word embedding and position coding are carried out, and the obtained word vector representation is input into the abstract generation model trained in the step 3 to generate the text topic abstract.
5. The method for generating the multi-document automatic abstract according to claim 1, wherein in the step 1, data preprocessing is performed, a preset data set D is preprocessed, M texts contained in each sample are cut off, the length of each text after cutting off is Len/M, if the length of the text before cutting off is smaller than Len/M, the length of the text before and after cutting off is unchanged, and Len is 1500; and performing word embedding mapping and position coding on each cut text, wherein a word embedding matrix is a parameter matrix which can be learned, and the position coding adopts a position coding module in a transform model.
6. The method for generating the multi-document abstract automatically according to claim 1, wherein step 2, the data set D processed in step 1 is used to train the multi-document abstract generating model, and the implementation of the step is divided into the following sub-steps:
and a substep 2-1, constructing an internal feature extraction layer, extracting semantic information represented by each input text word vector in each sample by using l stacked transform coding sublayers, and obtaining the vocabulary semantic vector representation of the jth word in the ith input text
Figure FDA0003130439580000021
And constructing a document vector representation with fixed dimensionality on the basis of the vocabulary semantic vector representation through a topic fusion attention module, wherein the topic fusion attention module comprises three parts of subtopic coding, subtopic fusion and attention calculation, the subtopic coding adopts a two-layer bidirectional LSTM network to generate subtopic vector representation of each text, the input of a subtopic coder is a vocabulary semantic vector sequence, and the output forward hidden state is represented
Figure FDA0003130439580000022
And backward hidden state
Figure FDA0003130439580000023
Stitching to obtain sub-topic vector representations
Figure FDA0003130439580000024
Calculating the central theme vector of the input text set by adopting a weighted summation mode
Figure FDA0003130439580000025
Figure FDA0003130439580000026
Figure FDA0003130439580000027
Wherein the weight factor wiRepresenting the degree of contribution of the sub-topic vector to the central topic vector, N representing the total number of documents in the document set, TsumVector sum of all document subtopic vectors in the document set, [ T ]i;Tsum]Is TiAnd TsumV is a learnable weight matrix parameter, based on the central topic vector
Figure FDA0003130439580000028
Integrating the vocabulary semantic vectors by adopting an attention mechanism and constructing a document vector representation
Figure FDA0003130439580000029
Figure FDA00031304395800000210
Figure FDA00031304395800000211
Wherein,
Figure FDA00031304395800000212
for the lexical semantic vector sequence of the ith document, WdIs a parameter matrix which can be learnt, J is the number of words contained in the input document;
and a substep 2-2, constructing an external information interaction layer, realizing information interaction between documents by adopting a Multi-Head Self Attention mechanism (Multi-Head Self Attention mechanism) to capture the associated information between the documents, and inputting the associated information into the vector representation of each document
Figure FDA00031304395800000213
The attention head number is 8, and a final document vector d is obtained through a residual error structure, layer normalization and a feedforward neural network modulei
The sub-steps 2-3 of,information gating filtering, namely performing information filtering on the vocabulary semantic vector generated by the encoder by utilizing document subtopic vector representation to reduce unnecessary redundant content; for the jth word in the ith document, the corresponding gating vector
Figure FDA00031304395800000312
The calculation formula of (a) is as follows:
Figure FDA0003130439580000031
wherein, Wg、Ug、bgFor learnable parameters, σ (-) is sigmoid function, and then vector point multiplication is carried out on the vocabulary semantic vector by using the gating vector to realize information filtering:
Figure FDA0003130439580000032
substep 2-4, hierarchical attention calculation, adopting a hierarchical attention mechanism to fuse the document vector and the vocabulary vector to generate a context vector containing rich hierarchical semantic information, wherein the input of the mechanism comprises three parts, namely all document vectors d obtained in substep 2-2 and the vocabulary semantic vector filtered in substep 2-3
Figure FDA0003130439580000033
And the hidden state vector h at the current decoding momenttWherein h istFrom the input y at the moment t of the decodertObtained by word embedding, position coding, multi-head self-attention shielding, residual error connection and layer normalization and output, and y is obtained in the training processtFor the t-th word of the artificial abstract contained in the sample, the mechanism firstly carries out attention calculation on the document level to generate a document context vector
Figure FDA0003130439580000034
Figure FDA0003130439580000035
Wherein,
Figure FDA0003130439580000036
for attention weight, from htCalculating all document vectors d according to the form of formula 3, then performing attention calculation on the vocabulary level, and using the document attention weight
Figure FDA0003130439580000037
And (3) adjusting:
Figure FDA0003130439580000038
Figure FDA0003130439580000039
wherein,
Figure FDA00031304395800000310
for the lexical attention weight of the jth word in the ith document,
Figure FDA00031304395800000311
the context vector of the vocabulary is obtained, finally, the context vector obtained on the document level and the vocabulary level is subjected to dimension splicing and linear mapping to generate a context vector ct
Figure FDA0003130439580000041
Wherein WcIs a learnable weight parameter;
substeps 2-5, constructing abstract probabilitiesLayer, for context vector ctAnd hidden state vector htObtaining an output vector of a decoder at the t moment through residual connection, layer normalization and a feedforward neural network
Figure FDA0003130439580000042
And converting the predicted probability distribution P into abstract words through a full connection layer fc and softmax function, wherein the calculation formula is as follows:
Figure FDA0003130439580000043
substep 2-6, constructing a loss function layer, wherein the loss function layer adopts a cross entropy loss function L of a prediction abstract and an artificial abstractSAnd triple loss function L of document topic extractionTTo construct an overall loss function for the model, where the triplet loss function is calculated as follows
LT=max{d(TA,TP)-d(TA,TN)+Margin,0} (12)
d(TA,TP)=1-cos(TA,TP) (13)
d(TA,TN)=1-cos(TA,TN) (14)
Ltotal=αLS+βLT (15)
Wherein L isTFor triple loss, Margin is boundary distance, and the value is 1, so as to ensure that the positive example P and the negative example N have difference in theme semantics, TASubtopic vector, T, being a true summaryPFor the central topic vector, T, of the input document setNA central topic vector for another sample set of documents; the cos function is used for calculating cosine values of an included angle between two theme vectors and measuring semantic similarity between the theme vectors; alpha and beta are over parameters, representing respective weight coefficients of the two losses, LSPredicted cross entropy loss for abstract words; l istotalIs the overall loss function of the model;
the sub-steps 2-7 of,model training, initializing all parameters to be trained by adopting a random initialization mode, updating model parameters by adopting Adam optimizer to perform gradient back propagation in the training process, setting the initial learning rate to be 0.001, and setting the initial learning rate to be beta1、β2The coefficients are set to 0.9 and 0.998, respectively, the batch size is set to 16, and model training is stopped after 3 consecutive rounds of the loss function no longer decline or the training round exceeds 50.
CN202110703934.1A 2021-06-24 2021-06-24 Multi-document automatic abstract generation method Pending CN113468854A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110703934.1A CN113468854A (en) 2021-06-24 2021-06-24 Multi-document automatic abstract generation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110703934.1A CN113468854A (en) 2021-06-24 2021-06-24 Multi-document automatic abstract generation method

Publications (1)

Publication Number Publication Date
CN113468854A true CN113468854A (en) 2021-10-01

Family

ID=77872791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110703934.1A Pending CN113468854A (en) 2021-06-24 2021-06-24 Multi-document automatic abstract generation method

Country Status (1)

Country Link
CN (1) CN113468854A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115544244A (en) * 2022-09-06 2022-12-30 内蒙古工业大学 Cross fusion and reconstruction-based multi-mode generative abstract acquisition method
CN116362351A (en) * 2023-05-29 2023-06-30 深圳须弥云图空间科技有限公司 Method and device for training pre-training language model by using noise disturbance
CN117236323A (en) * 2023-10-09 2023-12-15 青岛中企英才集团商业管理有限公司 Information processing method and system based on big data

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115544244A (en) * 2022-09-06 2022-12-30 内蒙古工业大学 Cross fusion and reconstruction-based multi-mode generative abstract acquisition method
CN115544244B (en) * 2022-09-06 2023-11-17 内蒙古工业大学 Multi-mode generation type abstract acquisition method based on cross fusion and reconstruction
CN116362351A (en) * 2023-05-29 2023-06-30 深圳须弥云图空间科技有限公司 Method and device for training pre-training language model by using noise disturbance
CN116362351B (en) * 2023-05-29 2023-09-26 深圳须弥云图空间科技有限公司 Method and device for training pre-training language model by using noise disturbance
CN117236323A (en) * 2023-10-09 2023-12-15 青岛中企英才集团商业管理有限公司 Information processing method and system based on big data
CN117236323B (en) * 2023-10-09 2024-03-29 京闽数科(北京)有限公司 Information processing method and system based on big data

Similar Documents

Publication Publication Date Title
WO2022227207A1 (en) Text classification method, apparatus, computer device, and storage medium
CN110321563B (en) Text emotion analysis method based on hybrid supervision model
WO2021164200A1 (en) Intelligent semantic matching method and apparatus based on deep hierarchical coding
CN111738003B (en) Named entity recognition model training method, named entity recognition method and medium
CN113468854A (en) Multi-document automatic abstract generation method
CN114969304B (en) Method for generating abstract of case public opinion multi-document based on element diagram attention
CN112784051A (en) Patent term extraction method
CN111143563A (en) Text classification method based on integration of BERT, LSTM and CNN
CN110046250A (en) Three embedded convolutional neural networks model and its more classification methods of text
CN113111663B (en) Abstract generation method for fusing key information
CN114139497B (en) Text abstract extraction method based on BERTSUM model
CN114880461A (en) Chinese news text summarization method combining contrast learning and pre-training technology
CN117236323B (en) Information processing method and system based on big data
CN114265936A (en) Method for realizing text mining of science and technology project
CN116628186A (en) Text abstract generation method and system
Wei et al. Entity relationship extraction based on bi-LSTM and attention mechanism
CN114638228A (en) Chinese named entity recognition method based on word set self-attention
CN113051886B (en) Test question duplicate checking method, device, storage medium and equipment
CN113935308A (en) Method and system for automatically generating text abstract facing field of geoscience
CN117765450A (en) Video language understanding method, device, equipment and readable storage medium
CN113806528A (en) Topic detection method and device based on BERT model and storage medium
CN112347247A (en) Specific category text title binary classification method based on LDA and Bert
CN110287799B (en) Video UCL semantic indexing method and device based on deep learning
CN111008529A (en) Chinese relation extraction method based on neural network
CN111813927A (en) Sentence similarity calculation method based on topic model and LSTM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication