CN108090049A - Multi-document summary extraction method and system based on sentence vector - Google Patents

Multi-document summary extraction method and system based on sentence vector Download PDF

Info

Publication number
CN108090049A
CN108090049A CN201810045090.4A CN201810045090A CN108090049A CN 108090049 A CN108090049 A CN 108090049A CN 201810045090 A CN201810045090 A CN 201810045090A CN 108090049 A CN108090049 A CN 108090049A
Authority
CN
China
Prior art keywords
sentence
document
vector
sub
topics
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810045090.4A
Other languages
Chinese (zh)
Other versions
CN108090049B (en
Inventor
窦全胜
朱翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Technology and Business University
Original Assignee
Shandong Technology and Business University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Technology and Business University filed Critical Shandong Technology and Business University
Priority to CN201810045090.4A priority Critical patent/CN108090049B/en
Publication of CN108090049A publication Critical patent/CN108090049A/en
Application granted granted Critical
Publication of CN108090049B publication Critical patent/CN108090049B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering

Abstract

The invention discloses multi-document summary extraction methods and system based on sentence vector, comprise the following steps:S1, pretreatment document sets;S2, sentence vector is generated using doc2vec model trainings;S3, cluster are each sub-topics document;S4, sentence relation graph model is established in each sub-topics document;S5, sentence weight is calculated;S6, extraction sentence sort to form extracts.The present invention trains doc2vec models all sentences to be concentrated to be represented with vector destination document by big corpus;Gathered with spectral clustering for each sub-topics, one sentence of extraction in each sub-topics, so as to avoid sentence redundancy issue;Digest is formed according to name placement of the sentence in original text shelves, improves the contextual of digest sentence.

Description

Multi-document summary extraction method and system based on sentence vector
Technical field
The present invention relates to computer version excavation applications more particularly to the multi-document summary extraction method based on sentence vector And system.
Background technology
Document auto-abstracting technology carries out text summary and refining by computer for user, provides the generality information of text. User, which only needs briefly to read to make a summary, can tentatively spy upon the key content of full text, greatly improve user and obtain or understand information Efficiency.Single document autoabstract was the summary that computer automatically generates a document main contents by algorithm, from 1958 Since Luhn proposes that document automatically generates the method for summary, the research based on single document autoabstract is just unfolded in high gear, So that the result of single document autoabstract to reaching generally accepted degree so far.And multi-document auto-abstracting is by not With the comprehensive summary of document structure tree main contents.Up to the present, multi-document auto-abstracting technology with artificial intelligence section With related algorithm combine closely, and is more combined development with evolution algorithmic and deep learning algorithm in recent years.
Yan et al. will be for the first time by deep learning for text snippet, and input layer is word frequency vector, and hidden layer is by being limited Boltzmann machine Composition finally chooses important sentences by Dynamic Programming and forms summary.Rush carries out summary-type to original text shelves using deep learning and plucks Will, using convolutional network to former document coding, generated and made a summary with context attention feedforward neural network.Google is opened within 2016 Automatic abstract module Textsum based on deep learning in the deep learning frame Tensorflow of source.More documents are plucked automatically Can whether original text be derived from according to the sentence for forming summary and be divided into extraction-type summary and be abstracted formula summary.Extraction-type is made a summary The sentence of original text shelves is mainly subjected to importance assessment, then therefrom chooses emphasis sentence and forms summary.Abstract formula summary mainly from Word information is extracted in original text shelves, word series connection sentence is then organized to form summary.
The implementation method of abstract formula summary is too complicated now, and machine is to the understanding deficiency of natural language, it is necessary to manually participate in part More, development is improved relatively slowly also in the starting stage.Extraction-type summary is currently used method, the text based on graph model Maximum public subgraph method and side right analogue method are more commonly used method for measuring similarity in classification.Also have based on text diagram matrix The corresponding feature vector of left singular value based on carry out measuring similarity, essence assumes that the PCA that sample average is 0 drops Dimension.The problem of existing frequently-used extraction-type method of abstracting is primarily present sentence redundancy, and sentence linking is not smooth.
The content of the invention
For the sentence extracted in the prior art there are redundancy, the deficiencies of sentence order is chaotic, the present invention proposes a kind of to be based on sentence The multi-document summary extraction method of subvector is to provide accurate readable higher documentation summary.
The technical solution adopted by the present invention is:
Based on the multi-document summary extraction method of sentence vector, comprise the following steps:
S1:The document sets of summary to be extracted are pre-processed;
S2:Using doc2vec model trainings generation sentence vector;
S3:Each sub-topics document is saved as by sentence vector clusters and by corresponding sentence;
S4:Sentence relation graph model is established in each sub-topics document;
S5:According to the relation graph model that step S4 is established, sentence weight is calculated in each sub-topics document;
S6:Sentence is extracted to sort to form summary.
Further, S1 comprises the following steps:
Step S101:Sentence is divided according to sentence end mark to every document of document sets, the sentence branch of division is recorded, one A sentence accounts for a line;
Step S102:Record the corresponding position of each sentence;
Step S103:Every document content in document sets after division sentence is copied in same piece document and carries out collection of document And one sentence of the document after merging accounts for a line;
Step S104:To often row sentence cutting word and removing stop words in the document after merging.
Further, the corresponding position of sentence described in the step S102 in step S1 is expressed as:
Wherein, hn,iRepresent the position of i-th of sentence in a document, text in n-th documentnRepresent n-th document, len (textn) represent the sentence number that n-th document is included.
Further, step S2 comprises the following steps:
Step S201:All documents in Big-corpus are pre-processed by step S101 to the S104 in step S1, it will The pretreated document of Big-corpus is input to the distribution memory models PV-DM of the sentence vector in doc2vec, point of distich vector Cloth memory models PV-DM is trained;
Step S202:Trained sentence will be inputted by the pretreated destination documents of step S101 to S104 in step S1 The distribution memory models PV-DM of vector obtains sentence vector.
Wherein, the distribution memory models PV-DM of distich vector is trained in step S201, is comprised the following steps:
(2011) by the pretreated document of Big-corpus, often row sentence is initialized as k dimensional vectors with all words, by word w's The corresponding term vector of context sentence vector corresponding with sentence where the word is input to deep neural network model;
(2012) vector of input carries out to summation is cumulative, and the vector that adds up is as output in the hidden layer of deep neural network model The input of layer;
(2013) output layer of deep neural network model corresponds to a binary tree, and the binary tree is worked as with the word in big language material Leafy node, Huffman trees are constructed using the number that each word occurs in big language material as weights, and each word corresponds to the leaf in tree Child node, branch regards one time two classification as each time in tree, from root node into the corresponding leafy node paths of word w each burl The corresponding Label of point is 1-pj,pjFor the corresponding coding of j-th of node in path, each tree in addition to root node and leafy node Node corresponds to one and is used for supplemental training model with the auxiliary vector of length with sentence vector;
(2014) sentence vector, term vector and auxiliary vector are constantly corrected using the method that gradient rises, finally obtained trained The distribution memory models PV-DM of sentence vector.
The context of institute predicate w is each C word before and after word w.
The object function of neural metwork training is
Wherein, sentence is sentence, and doc is document after pretreatment, and w is word, and Context (sentence, w) is word w Context words and w where sentence.
Further, the cluster generation of sentence vector uses spectral clustering mode in step S3;
Further, step S3 comprises the following steps:
Step S301:The similar matrix W between all sentence vectors is built, kernel function uses gaussian kernel function,
Wherein, Wi,jFor sentence xiWith sentence xjBetween similarity, σ be Gauss radius;
Step S302:Calculate Laplacian Matrix L;
L=D-W
Wherein, D is diagonal matrix, and the line n element of D is the sum of W line n elements;
Step S303:Build the Laplacian Matrix of standardization
Step S304:It calculatesK minimum characteristic value and corresponding feature vector V;
Step S305:By feature vector by row arrangement form eigenmatrix, to the unitization formation matrix of every a line in eigenmatrix F,
I.e. matrix F often row institute into vectorial modulus value be 1;
Step S306:By the often capable sample for regarding a k dimension as of matrix F, clustered, gathered for C classes with Kmeans algorithms;
Step S307:Sentence corresponding to vector saves as C sub- subject documents in C classes.
Further, step S4 is specifically included:
It is the similarity between side and sentence using similarity of the sentence between node, sentence in each sub-topics document, establishes Sentence relation graph model;
Further, the similarity between sentence is calculated by included angle cosine value between sentence vector, cosine value sim (xi,xj) meter Calculating formula is:
Wherein, xi,xjFor two sentence vectors.
Further, step S5 specific steps include:
Each sentence weights initialisation, according to the relation graph model that step S4 is established, iteration update sentence weight:
Wherein, S (i) is the weight of sentence i, and δ (i) is to be higher than given threshold with sentence i similarities in same sub-topics document All sentences, | δ (j) | in same sub-topics document with sentence i similarities higher than given threshold all sentences sentence Subnumber mesh, S (j) are the weight of sentence j, and it is 0.85 that d, which is that damped coefficient sets up,.
Further, sentence weights initialisation is 1 in step S5, | δ (i) | the middle similarity threshold set is 0.05.
Further, step S6 is specifically included:The sentence of weight maximum is extracted in each sub-topics document, according in step S1 The name placement of the sentence that step S102 is obtained in a document, is combined into summary.
Based on the multi-document summary automatic extracting system of sentence vector, including:Memory, processor and storage are on a memory And the computer instruction run on a processor, when the computer instruction is run by processor, complete any of the above-described method institute The step of stating.
A kind of computer readable storage medium, operation thereon has computer program, when the computer program is run by processor, Complete the step described in any of the above-described method.
Description of the drawings
The accompanying drawings which form a part of this application are used for providing further understanding of the present application, and the application's is schematic Embodiment and its explanation do not form the improper restriction to the application for explaining the application.
Fig. 1 is the flow chart of the present invention.
Fig. 2 is the flow chart of pre-treatment step of the present invention.
Specific embodiment
It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the application.Unless otherwise finger Bright, all technical and scientific terms used herein has to be generally understood with the application person of an ordinary skill in the technical field Identical meanings.
Fig. 1 is the flow chart of the present invention, is comprised the following steps:
S1:Pre-process document sets;
S2:Using doc2vec model trainings generation sentence vector;
S3:Each sub-topics document is saved as by sentence vector clusters and by corresponding sentence;
S4:Sentence relation graph model is established in each sub-topics document;
S5:Sentence weight is calculated in each sub-topics document;
S6:Sentence is extracted to sort to form summary.
Specifically, the specific implementation step of step S1 is as shown in Fig. 2, it comprises the following steps:
Step S101:Sentence is divided with sentence end mark to every document of document sets, a sentence accounts for a line;
Step S102:Record the corresponding position of each sentence;
Step S103:Every document content in document sets after division sentence is copied in same piece document and carries out collection of document And one sentence of the document after merging accounts for a line;
Step S104:To often row sentence cutting word and removing stop words in the document after merging.
Further, the sentence position described in step S102 is expressed as:
Wherein hn,iRepresent the position of i-th of sentence in a document, text in n-th documentnRepresent n-th document, len is represented The sentence number that n-th document is included.
Step S2 specifically includes following steps:
Step S201:All documents in Big-corpus are pre-processed by step S101 to the S104 in step S1, it will The pretreated document of Big-corpus passes through PV-DM (the distribution memory models of the sentence vector) training in doc2vec;
Step S202:Trained mould will be imported by the pretreated destination documents of step S101 to S104 in step S1 Type obtains sentence vector.
Wherein, training PV-DM models specifically include following steps in step S201:
(1) by the pretreated document of Big-corpus, often row sentence is initialized as k dimensional vectors with all words, by above and below word w Literary (each C word before and after the word) corresponding term vector sentence vector input deep neural network mould corresponding with sentence where the word Type;
(2) these vectors of input are done summation in hidden layer to add up, add up input of the vector as output layer;
(3) output layer corresponds to a binary tree, it works as leafy node with the word occurred in big language material, with each word in big language material The Huffman trees that the number of appearance is constructed as weights, each word correspond to the leaf node in tree, branch each time in tree One time two classification can be regarded as, the corresponding Label of each tree node is 1- into the corresponding leafy node paths of word w from root node pj,pjFor the corresponding coding of j-th of node in path, except root node corresponding with each tree node in addition to leafy node one and sentence Subvector is used for supplemental training model with the vector of length, it is referred to as auxiliary vector;
(4) the method training pattern risen using gradient constantly corrects sentence vector, term vector and auxiliary vector, finally obtains training The distribution memory models of good sentence vector.
The object function of neural metwork training is
Wherein, sentence is sentence, and doc is document after pretreatment, and w is word, and Context (sentence, w) is word w Context words and w where sentence.
The cluster generation of each sub-topics document specifically includes following steps using spectral clustering mode in step S3:
Step S301:The similar matrix W between all sentences is built, kernel function uses gaussian kernel function,
Wherein Wi,jFor sentence xi,xjBetween similarity, σ be Gauss radius;
Step S302:Laplacian Matrix L is calculated,
L=D-W
Wherein, D is pair of horns matrix, and line n element is the sum of W line n elements;
Step S303:Laplacian Matrix after structure standardization
Step S304:It calculatesK minimum characteristic value and corresponding feature vector V;
Step S305:By feature vector by row arrangement form eigenmatrix, to the unitization formation matrix of every a line in eigenmatrix F, i.e. matrix F often row institute into vectorial modulus value be 1;
Step S306:Often row regards the sample that a k is tieed up as to F, is clustered, gathered for C classes with Kmeans algorithms;
Step S307:Sentence corresponding to vector saves as C document in C classes.
Step S4 is specifically included:
In each sub-topics document, using similarity of similarity of the sentence between node, sentence between side, sentence, establish Sentence relation graph model;
Further, the cosine similarity sim (xi,xj) calculation formula be:
Wherein, xi,xjFor two sentence vectors.
Step S5 specific steps include:
Each sentence weights initialisation updates sentence weight according to the relation graph model that step S4 is established using equation below iteration:
Wherein S (i) is the weight of sentence i, and δ (i) is higher than given threshold in same sub-topics document with sentence i similarities All sentences, | δ (j) | to be higher than the sentence of all sentences of given threshold with sentence i similarities in same sub-topics document Number, S (j) are the weight of sentence j, and it is 0.85 that d, which is that damped coefficient sets up,.
Further, sentence weights initialisation is 1 in step S5, | δ (i) | the middle similarity threshold set is 0.05.
Further, step S6 is specifically included:
The sentence of weight maximum is extracted in each sub-topics document, according to the method described in the step (2) in step S1 according to sentence Name placement of the son in original text shelves, is combined into summary.
In order to further describe the multi-document summary extraction method of the present invention, here is that have to three and " Wu Qingyuan dies " The result of the document structure tree summary of pass:
Celestial being is returned in Sina.com sports news Wu at the age of one hundred years old emergence of radically reforming, and the eternal legend of chess circle are lightly left away.Wu Qingyuan lifes on June 12 in 1914 It is the 3rd son in family in the Fujian Province of China.Second year is granted by three sections by Japanese chess academy, and nineteen fifty obtains nine sections. Mr. Wu Qingyuan develops chess road, guides and supports younger generation, does one's utmost.First emperor in golden age that he is as Japanese go faces Japan, It is referred to as " the Showa chess sage " of Japan.1961, Wu Qingyuan suffered from traffic accident, and gradually fade out a line, until formally living in retirement for 1984. Before Wu Qingyuan, never any chess player of go circle can reach his height.2014, Zhong chesses circle were Mr. Wu Birthday at the age of one hundred years old for holding congratulates that ceremony is grand grand, can possess the weiqi game chess scholar of these honor, all over the world only mono- people of Wu Qingyuan and .The family members of Wu Qingyuan prepare to hold farewell ceremony December 3 for Wu Qingyuan.Wu Qingyuan was once said:" one it is at the age of one hundred years old after I also will under Chess, two after death I also to play chess in universe.The road of go is pursued, Mr. Wu understands thoroughly life and death already.
The foregoing is merely the preferred embodiments of the application, are not limited to the application, for those skilled in the art For member, the application can have various modifications and variations.All any modifications within spirit herein and principle, made, Equivalent substitution, improvement etc., should be included within the protection domain of the application.

Claims (10)

1. based on the multi-document summary extraction method of sentence vector, it is characterized in that, comprise the following steps:
S1:The document sets of summary to be extracted are pre-processed;
S2:Using doc2vec model trainings generation sentence vector;
S3:Each sub-topics document is saved as by sentence vector clusters and by corresponding sentence;
S4:Sentence relation graph model is established in each sub-topics document;
S5:According to the relation graph model that step S4 is established, sentence weight is calculated in each sub-topics document;
S6:Sentence is extracted to sort to form summary.
2. the multi-document summary extraction method as described in claim 1 based on sentence vector, it is characterized in that, S1 include with Lower step:
Step S101:Sentence is divided according to sentence end mark to every document of document sets, the sentence branch of division is recorded, one A sentence accounts for a line;
Step S102:Record the corresponding position of each sentence;
Step S103:Every document content in document sets after division sentence is copied in same piece document and carries out collection of document And one sentence of the document after merging accounts for a line;
Step S104:To often row sentence cutting word and removing stop words in the document after merging.
3. the multi-document summary extraction method as described in claim 1 based on sentence vector, it is characterized in that, step S2 bags Include following steps:
Step S201:All documents in Big-corpus are pre-processed by step S101 to the S104 in step S1, it will The pretreated document of Big-corpus is input to the distribution memory models PV-DM of the sentence vector in doc2vec, point of distich vector Cloth memory models PV-DM is trained;
Step S202:Trained sentence will be inputted by the pretreated destination documents of step S101 to S104 in step S1 The distribution memory models PV-DM of vector obtains sentence vector.
4. the multi-document summary extraction method as claimed in claim 3 based on sentence vector, it is characterized in that, step S201 The distribution memory models PV-DM of middle distich vector is trained, and is comprised the following steps:
(2011) by the pretreated document of Big-corpus, often row sentence is initialized as k dimensional vectors with all words, by word w's The corresponding term vector of context sentence vector corresponding with sentence where the word is input to deep neural network model;
(2012) vector of input carries out to summation is cumulative, and the vector that adds up is as output in the hidden layer of deep neural network model The input of layer;
(2013) output layer of deep neural network model corresponds to a binary tree, and the binary tree is worked as with the word in big language material Leafy node, Huffman trees are constructed using the number that each word occurs in big language material as weights, and each word corresponds to the leaf in tree Child node, branch regards one time two classification as each time in tree, from root node into the corresponding leafy node paths of word w each burl The corresponding Label of point is 1-pj,pjFor the corresponding coding of j-th of node in path, each tree in addition to root node and leafy node Node corresponds to one and is used for supplemental training model with the auxiliary vector of length with sentence vector;
(2014) sentence vector, term vector and auxiliary vector are constantly corrected using the method that gradient rises, finally obtained trained The distribution memory models PV-DM of sentence vector.
5. the multi-document summary extraction method as described in claim 1 based on sentence vector, it is characterized in that, in step S3 The cluster generation of sentence vector is using spectral clustering mode.
6. the multi-document summary extraction method as claimed in claim 5 based on sentence vector, it is characterized in that, step S3 bags Include following steps:
Step S301:The similar matrix W between all sentence vectors is built, kernel function uses gaussian kernel function,
Step S302:Calculate Laplacian Matrix L;
Step S303:Build the Laplacian Matrix of standardization;
Step S304:Calculate k minimum characteristic value of Laplacian Matrix and corresponding feature vector V;
Step S305:By feature vector by row arrangement form eigenmatrix, to the unitization formation matrix of every a line in eigenmatrix F, i.e. matrix F often row institute into vectorial modulus value be 1;
Step S306:By the often capable sample for regarding a k dimension as of matrix F, clustered, gathered for C classes with Kmeans algorithms;
Step S307:Sentence corresponding to vector saves as C sub- subject documents in C classes.
7. the multi-document summary extraction method as described in claim 1 based on sentence vector, it is characterized in that, step S4 tools Body includes:
It is the similarity between side and sentence using similarity of the sentence between node, sentence in each sub-topics document, establishes Sentence relation graph model.
8. the multi-document summary extraction method as claimed in claim 2 based on sentence vector, it is characterized in that, step S6 tools Body includes:The sentence of weight maximum is extracted in each sub-topics document, is existed according to the obtained sentences of the step S102 in step S1 Name placement in document is combined into summary.
9. based on the multi-document summary automatic extracting system of sentence vector, it is characterized in that, including:It memory, processor and deposits The computer instruction run on a memory and on a processor is stored up, when the computer instruction is run by processor, completes power Profit requires the step of 1-8 any the methods.
10. a kind of computer readable storage medium, it is characterized in that, operation thereon has computer program, the computer program quilt When processor is run, the step of completing claim 1-8 any the method.
CN201810045090.4A 2018-01-17 2018-01-17 Multi-document abstract automatic extraction method and system based on sentence vectors Expired - Fee Related CN108090049B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810045090.4A CN108090049B (en) 2018-01-17 2018-01-17 Multi-document abstract automatic extraction method and system based on sentence vectors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810045090.4A CN108090049B (en) 2018-01-17 2018-01-17 Multi-document abstract automatic extraction method and system based on sentence vectors

Publications (2)

Publication Number Publication Date
CN108090049A true CN108090049A (en) 2018-05-29
CN108090049B CN108090049B (en) 2021-02-05

Family

ID=62181661

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810045090.4A Expired - Fee Related CN108090049B (en) 2018-01-17 2018-01-17 Multi-document abstract automatic extraction method and system based on sentence vectors

Country Status (1)

Country Link
CN (1) CN108090049B (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108897896A (en) * 2018-07-13 2018-11-27 深圳追科技有限公司 Keyword abstraction method based on intensified learning
CN108959269A (en) * 2018-07-27 2018-12-07 首都师范大学 A kind of sentence auto ordering method and device
CN109325109A (en) * 2018-08-27 2019-02-12 中国人民解放军国防科技大学 Attention encoder-based extraction type news abstract generating device
CN109582967A (en) * 2018-12-03 2019-04-05 深圳前海微众银行股份有限公司 Public sentiment abstract extraction method, apparatus, equipment and computer readable storage medium
CN109829161A (en) * 2019-01-30 2019-05-31 延边大学 A kind of method of multilingual autoabstract
CN109885683A (en) * 2019-01-29 2019-06-14 桂林远望智能通信科技有限公司 A method of the generation text snippet based on K-means model and neural network model
CN109902284A (en) * 2018-12-30 2019-06-18 中国科学院软件研究所 A kind of unsupervised argument extracting method excavated based on debate
CN109902168A (en) * 2019-01-25 2019-06-18 北京创新者信息技术有限公司 A kind of valuation of patent method and system
CN109977196A (en) * 2019-03-29 2019-07-05 云南电网有限责任公司电力科学研究院 A kind of detection method and device of magnanimity document similarity
CN110162778A (en) * 2019-04-02 2019-08-23 阿里巴巴集团控股有限公司 The generation method and device of text snippet
CN110362823A (en) * 2019-06-21 2019-10-22 北京百度网讯科技有限公司 The training method and device of text generation model are described
CN110399606A (en) * 2018-12-06 2019-11-01 国网信息通信产业集团有限公司 A kind of unsupervised electric power document subject matter generation method and system
CN110737768A (en) * 2019-10-16 2020-01-31 信雅达系统工程股份有限公司 Text abstract automatic generation method and device based on deep learning and storage medium
CN111813925A (en) * 2020-07-14 2020-10-23 混沌时代(北京)教育科技有限公司 Semantic-based unsupervised automatic summarization method and system
CN111914083A (en) * 2019-05-10 2020-11-10 腾讯科技(深圳)有限公司 Statement processing method, device and storage medium
US10902191B1 (en) * 2019-08-05 2021-01-26 International Business Machines Corporation Natural language processing techniques for generating a document summary
WO2021042529A1 (en) * 2019-09-02 2021-03-11 平安科技(深圳)有限公司 Article abstract automatic generation method, device, and computer-readable storage medium
CN112784043A (en) * 2021-01-18 2021-05-11 辽宁工程技术大学 Aspect-level emotion classification method based on gated convolutional neural network
CN112949299A (en) * 2021-02-26 2021-06-11 深圳市北科瑞讯信息技术有限公司 Method and device for generating news manuscript, storage medium and electronic device
CN113220853A (en) * 2021-05-12 2021-08-06 燕山大学 Automatic generation method and system for legal questions

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101187919A (en) * 2006-11-16 2008-05-28 北大方正集团有限公司 Method and system for abstracting batch single document for document set
US7398196B1 (en) * 2000-09-07 2008-07-08 Intel Corporation Method and apparatus for summarizing multiple documents using a subsumption model
CN101231634A (en) * 2007-12-29 2008-07-30 中国科学院计算技术研究所 Autoabstract method for multi-document
CN104778157A (en) * 2015-03-02 2015-07-15 华南理工大学 Multi-document abstract sentence generating method
CN104834735A (en) * 2015-05-18 2015-08-12 大连理工大学 Automatic document summarization extraction method based on term vectors
CN107357899A (en) * 2017-07-14 2017-11-17 吉林大学 Based on the short text sentiment analysis method with product network depth autocoder

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7398196B1 (en) * 2000-09-07 2008-07-08 Intel Corporation Method and apparatus for summarizing multiple documents using a subsumption model
CN101187919A (en) * 2006-11-16 2008-05-28 北大方正集团有限公司 Method and system for abstracting batch single document for document set
CN101231634A (en) * 2007-12-29 2008-07-30 中国科学院计算技术研究所 Autoabstract method for multi-document
CN104778157A (en) * 2015-03-02 2015-07-15 华南理工大学 Multi-document abstract sentence generating method
CN104834735A (en) * 2015-05-18 2015-08-12 大连理工大学 Automatic document summarization extraction method based on term vectors
CN107357899A (en) * 2017-07-14 2017-11-17 吉林大学 Based on the short text sentiment analysis method with product network depth autocoder

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘欣等: "基于PV_DM模型的多文档摘要方法", 《计算机应用与软件》 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108897896A (en) * 2018-07-13 2018-11-27 深圳追科技有限公司 Keyword abstraction method based on intensified learning
CN108959269A (en) * 2018-07-27 2018-12-07 首都师范大学 A kind of sentence auto ordering method and device
CN108959269B (en) * 2018-07-27 2019-07-05 首都师范大学 A kind of sentence auto ordering method and device
CN109325109A (en) * 2018-08-27 2019-02-12 中国人民解放军国防科技大学 Attention encoder-based extraction type news abstract generating device
CN109325109B (en) * 2018-08-27 2021-11-19 中国人民解放军国防科技大学 Attention encoder-based extraction type news abstract generating device
CN109582967A (en) * 2018-12-03 2019-04-05 深圳前海微众银行股份有限公司 Public sentiment abstract extraction method, apparatus, equipment and computer readable storage medium
CN109582967B (en) * 2018-12-03 2023-08-18 深圳前海微众银行股份有限公司 Public opinion abstract extraction method, device, equipment and computer readable storage medium
CN110399606A (en) * 2018-12-06 2019-11-01 国网信息通信产业集团有限公司 A kind of unsupervised electric power document subject matter generation method and system
CN110399606B (en) * 2018-12-06 2023-04-07 国网信息通信产业集团有限公司 Unsupervised electric power document theme generation method and system
CN109902284A (en) * 2018-12-30 2019-06-18 中国科学院软件研究所 A kind of unsupervised argument extracting method excavated based on debate
CN109902168A (en) * 2019-01-25 2019-06-18 北京创新者信息技术有限公司 A kind of valuation of patent method and system
US11847152B2 (en) 2019-01-25 2023-12-19 Beijing Innovator Information Technology Co., Ltd. Patent evaluation method and system that aggregate patents based on technical clustering
CN109885683A (en) * 2019-01-29 2019-06-14 桂林远望智能通信科技有限公司 A method of the generation text snippet based on K-means model and neural network model
CN109885683B (en) * 2019-01-29 2022-12-02 桂林远望智能通信科技有限公司 Method for generating text abstract based on K-means model and neural network model
CN109829161B (en) * 2019-01-30 2023-08-04 延边大学 Method for automatically abstracting multiple languages
CN109829161A (en) * 2019-01-30 2019-05-31 延边大学 A kind of method of multilingual autoabstract
CN109977196A (en) * 2019-03-29 2019-07-05 云南电网有限责任公司电力科学研究院 A kind of detection method and device of magnanimity document similarity
CN110162778A (en) * 2019-04-02 2019-08-23 阿里巴巴集团控股有限公司 The generation method and device of text snippet
CN111914083A (en) * 2019-05-10 2020-11-10 腾讯科技(深圳)有限公司 Statement processing method, device and storage medium
CN110362823A (en) * 2019-06-21 2019-10-22 北京百度网讯科技有限公司 The training method and device of text generation model are described
CN110362823B (en) * 2019-06-21 2023-07-28 北京百度网讯科技有限公司 Training method and device for descriptive text generation model
US10902191B1 (en) * 2019-08-05 2021-01-26 International Business Machines Corporation Natural language processing techniques for generating a document summary
WO2021042529A1 (en) * 2019-09-02 2021-03-11 平安科技(深圳)有限公司 Article abstract automatic generation method, device, and computer-readable storage medium
CN110737768B (en) * 2019-10-16 2022-04-08 信雅达科技股份有限公司 Text abstract automatic generation method and device based on deep learning and storage medium
CN110737768A (en) * 2019-10-16 2020-01-31 信雅达系统工程股份有限公司 Text abstract automatic generation method and device based on deep learning and storage medium
CN111813925A (en) * 2020-07-14 2020-10-23 混沌时代(北京)教育科技有限公司 Semantic-based unsupervised automatic summarization method and system
CN112784043A (en) * 2021-01-18 2021-05-11 辽宁工程技术大学 Aspect-level emotion classification method based on gated convolutional neural network
CN112949299A (en) * 2021-02-26 2021-06-11 深圳市北科瑞讯信息技术有限公司 Method and device for generating news manuscript, storage medium and electronic device
CN113220853B (en) * 2021-05-12 2022-10-04 燕山大学 Automatic generation method and system for legal questions
CN113220853A (en) * 2021-05-12 2021-08-06 燕山大学 Automatic generation method and system for legal questions

Also Published As

Publication number Publication date
CN108090049B (en) 2021-02-05

Similar Documents

Publication Publication Date Title
CN108090049A (en) Multi-document summary extraction method and system based on sentence vector
Abdullah et al. SEDAT: sentiment and emotion detection in Arabic text using CNN-LSTM deep learning
Zhao et al. Disease named entity recognition from biomedical literature using a novel convolutional neural network
CN111241294B (en) Relationship extraction method of graph convolution network based on dependency analysis and keywords
Quan et al. An efficient framework for sentence similarity modeling
CN109635280A (en) A kind of event extraction method based on mark
CN106202010A (en) The method and apparatus building Law Text syntax tree based on deep neural network
Yan et al. Named entity recognition by using XLNet-BiLSTM-CRF
CN110083710A (en) It is a kind of that generation method is defined based on Recognition with Recurrent Neural Network and the word of latent variable structure
CN108875809A (en) The biomedical entity relationship classification method of joint attention mechanism and neural network
CN111222318B (en) Trigger word recognition method based on double-channel bidirectional LSTM-CRF network
Tang et al. Deep sequential fusion LSTM network for image description
CN110347796A (en) Short text similarity calculating method under vector semantic tensor space
Ma et al. Data augmentation for chinese text classification using back-translation
CN114925205B (en) GCN-GRU text classification method based on contrast learning
Thattinaphanich et al. Thai named entity recognition using Bi-LSTM-CRF with word and character representation
CN113535897A (en) Fine-grained emotion analysis method based on syntactic relation and opinion word distribution
Jia et al. Attention in character-based BiLSTM-CRF for Chinese named entity recognition
Yan et al. MoGCN: Mixture of gated convolutional neural network for named entity recognition of chinese historical texts
CN112818698A (en) Fine-grained user comment sentiment analysis method based on dual-channel model
Jin et al. WordTransABSA: Enhancing Aspect-based Sentiment Analysis with masked language modeling for affective token prediction
CN115858736A (en) Emotion text generation method based on emotion prompt fine adjustment
Sun et al. Chinese microblog sentiment classification based on convolution neural network with content extension method
Wang Research on the art value and application of art creation based on the emotion analysis of art
Shuang et al. Combining word order and cnn-lstm for sentence sentiment classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210205

CF01 Termination of patent right due to non-payment of annual fee