CN108090049A - Multi-document summary extraction method and system based on sentence vector - Google Patents
Multi-document summary extraction method and system based on sentence vector Download PDFInfo
- Publication number
- CN108090049A CN108090049A CN201810045090.4A CN201810045090A CN108090049A CN 108090049 A CN108090049 A CN 108090049A CN 201810045090 A CN201810045090 A CN 201810045090A CN 108090049 A CN108090049 A CN 108090049A
- Authority
- CN
- China
- Prior art keywords
- sentence
- document
- vector
- sub
- topics
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000013598 vector Substances 0.000 title claims abstract description 86
- 238000000605 extraction Methods 0.000 title claims abstract description 17
- 238000012549 training Methods 0.000 claims abstract description 13
- 230000003595 spectral effect Effects 0.000 claims abstract description 4
- 239000011159 matrix material Substances 0.000 claims description 21
- 238000000034 method Methods 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 8
- 238000004422 calculation algorithm Methods 0.000 claims description 6
- 239000000463 material Substances 0.000 claims description 6
- 238000003062 neural network model Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 4
- 230000015572 biosynthetic process Effects 0.000 claims description 3
- 230000000153 supplemental effect Effects 0.000 claims description 3
- 230000001186 cumulative effect Effects 0.000 claims description 2
- 239000004744 fabric Substances 0.000 claims description 2
- 239000000284 extract Substances 0.000 abstract 1
- 238000013135 deep learning Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 241000544785 Bromus japonicus Species 0.000 description 1
- 241000209202 Bromus secalinus Species 0.000 description 1
- 244000097202 Rathbunia alamosensis Species 0.000 description 1
- 235000009776 Rathbunia alamosensis Nutrition 0.000 description 1
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000009412 basement excavation Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000739 chaotic effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 238000002407 reforming Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 235000002020 sage Nutrition 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000009891 weiqi Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/258—Heading extraction; Automatic titling; Numbering
Abstract
The invention discloses multi-document summary extraction methods and system based on sentence vector, comprise the following steps:S1, pretreatment document sets;S2, sentence vector is generated using doc2vec model trainings;S3, cluster are each sub-topics document;S4, sentence relation graph model is established in each sub-topics document;S5, sentence weight is calculated;S6, extraction sentence sort to form extracts.The present invention trains doc2vec models all sentences to be concentrated to be represented with vector destination document by big corpus;Gathered with spectral clustering for each sub-topics, one sentence of extraction in each sub-topics, so as to avoid sentence redundancy issue;Digest is formed according to name placement of the sentence in original text shelves, improves the contextual of digest sentence.
Description
Technical field
The present invention relates to computer version excavation applications more particularly to the multi-document summary extraction method based on sentence vector
And system.
Background technology
Document auto-abstracting technology carries out text summary and refining by computer for user, provides the generality information of text.
User, which only needs briefly to read to make a summary, can tentatively spy upon the key content of full text, greatly improve user and obtain or understand information
Efficiency.Single document autoabstract was the summary that computer automatically generates a document main contents by algorithm, from 1958
Since Luhn proposes that document automatically generates the method for summary, the research based on single document autoabstract is just unfolded in high gear,
So that the result of single document autoabstract to reaching generally accepted degree so far.And multi-document auto-abstracting is by not
With the comprehensive summary of document structure tree main contents.Up to the present, multi-document auto-abstracting technology with artificial intelligence section
With related algorithm combine closely, and is more combined development with evolution algorithmic and deep learning algorithm in recent years.
Yan et al. will be for the first time by deep learning for text snippet, and input layer is word frequency vector, and hidden layer is by being limited Boltzmann machine
Composition finally chooses important sentences by Dynamic Programming and forms summary.Rush carries out summary-type to original text shelves using deep learning and plucks
Will, using convolutional network to former document coding, generated and made a summary with context attention feedforward neural network.Google is opened within 2016
Automatic abstract module Textsum based on deep learning in the deep learning frame Tensorflow of source.More documents are plucked automatically
Can whether original text be derived from according to the sentence for forming summary and be divided into extraction-type summary and be abstracted formula summary.Extraction-type is made a summary
The sentence of original text shelves is mainly subjected to importance assessment, then therefrom chooses emphasis sentence and forms summary.Abstract formula summary mainly from
Word information is extracted in original text shelves, word series connection sentence is then organized to form summary.
The implementation method of abstract formula summary is too complicated now, and machine is to the understanding deficiency of natural language, it is necessary to manually participate in part
More, development is improved relatively slowly also in the starting stage.Extraction-type summary is currently used method, the text based on graph model
Maximum public subgraph method and side right analogue method are more commonly used method for measuring similarity in classification.Also have based on text diagram matrix
The corresponding feature vector of left singular value based on carry out measuring similarity, essence assumes that the PCA that sample average is 0 drops
Dimension.The problem of existing frequently-used extraction-type method of abstracting is primarily present sentence redundancy, and sentence linking is not smooth.
The content of the invention
For the sentence extracted in the prior art there are redundancy, the deficiencies of sentence order is chaotic, the present invention proposes a kind of to be based on sentence
The multi-document summary extraction method of subvector is to provide accurate readable higher documentation summary.
The technical solution adopted by the present invention is:
Based on the multi-document summary extraction method of sentence vector, comprise the following steps:
S1:The document sets of summary to be extracted are pre-processed;
S2:Using doc2vec model trainings generation sentence vector;
S3:Each sub-topics document is saved as by sentence vector clusters and by corresponding sentence;
S4:Sentence relation graph model is established in each sub-topics document;
S5:According to the relation graph model that step S4 is established, sentence weight is calculated in each sub-topics document;
S6:Sentence is extracted to sort to form summary.
Further, S1 comprises the following steps:
Step S101:Sentence is divided according to sentence end mark to every document of document sets, the sentence branch of division is recorded, one
A sentence accounts for a line;
Step S102:Record the corresponding position of each sentence;
Step S103:Every document content in document sets after division sentence is copied in same piece document and carries out collection of document
And one sentence of the document after merging accounts for a line;
Step S104:To often row sentence cutting word and removing stop words in the document after merging.
Further, the corresponding position of sentence described in the step S102 in step S1 is expressed as:
Wherein, hn,iRepresent the position of i-th of sentence in a document, text in n-th documentnRepresent n-th document, len
(textn) represent the sentence number that n-th document is included.
Further, step S2 comprises the following steps:
Step S201:All documents in Big-corpus are pre-processed by step S101 to the S104 in step S1, it will
The pretreated document of Big-corpus is input to the distribution memory models PV-DM of the sentence vector in doc2vec, point of distich vector
Cloth memory models PV-DM is trained;
Step S202:Trained sentence will be inputted by the pretreated destination documents of step S101 to S104 in step S1
The distribution memory models PV-DM of vector obtains sentence vector.
Wherein, the distribution memory models PV-DM of distich vector is trained in step S201, is comprised the following steps:
(2011) by the pretreated document of Big-corpus, often row sentence is initialized as k dimensional vectors with all words, by word w's
The corresponding term vector of context sentence vector corresponding with sentence where the word is input to deep neural network model;
(2012) vector of input carries out to summation is cumulative, and the vector that adds up is as output in the hidden layer of deep neural network model
The input of layer;
(2013) output layer of deep neural network model corresponds to a binary tree, and the binary tree is worked as with the word in big language material
Leafy node, Huffman trees are constructed using the number that each word occurs in big language material as weights, and each word corresponds to the leaf in tree
Child node, branch regards one time two classification as each time in tree, from root node into the corresponding leafy node paths of word w each burl
The corresponding Label of point is 1-pj,pjFor the corresponding coding of j-th of node in path, each tree in addition to root node and leafy node
Node corresponds to one and is used for supplemental training model with the auxiliary vector of length with sentence vector;
(2014) sentence vector, term vector and auxiliary vector are constantly corrected using the method that gradient rises, finally obtained trained
The distribution memory models PV-DM of sentence vector.
The context of institute predicate w is each C word before and after word w.
The object function of neural metwork training is
Wherein, sentence is sentence, and doc is document after pretreatment, and w is word, and Context (sentence, w) is word w
Context words and w where sentence.
Further, the cluster generation of sentence vector uses spectral clustering mode in step S3;
Further, step S3 comprises the following steps:
Step S301:The similar matrix W between all sentence vectors is built, kernel function uses gaussian kernel function,
Wherein, Wi,jFor sentence xiWith sentence xjBetween similarity, σ be Gauss radius;
Step S302:Calculate Laplacian Matrix L;
L=D-W
Wherein, D is diagonal matrix, and the line n element of D is the sum of W line n elements;
Step S303:Build the Laplacian Matrix of standardization
Step S304:It calculatesK minimum characteristic value and corresponding feature vector V;
Step S305:By feature vector by row arrangement form eigenmatrix, to the unitization formation matrix of every a line in eigenmatrix
F,
I.e. matrix F often row institute into vectorial modulus value be 1;
Step S306:By the often capable sample for regarding a k dimension as of matrix F, clustered, gathered for C classes with Kmeans algorithms;
Step S307:Sentence corresponding to vector saves as C sub- subject documents in C classes.
Further, step S4 is specifically included:
It is the similarity between side and sentence using similarity of the sentence between node, sentence in each sub-topics document, establishes
Sentence relation graph model;
Further, the similarity between sentence is calculated by included angle cosine value between sentence vector, cosine value sim (xi,xj) meter
Calculating formula is:
Wherein, xi,xjFor two sentence vectors.
Further, step S5 specific steps include:
Each sentence weights initialisation, according to the relation graph model that step S4 is established, iteration update sentence weight:
Wherein, S (i) is the weight of sentence i, and δ (i) is to be higher than given threshold with sentence i similarities in same sub-topics document
All sentences, | δ (j) | in same sub-topics document with sentence i similarities higher than given threshold all sentences sentence
Subnumber mesh, S (j) are the weight of sentence j, and it is 0.85 that d, which is that damped coefficient sets up,.
Further, sentence weights initialisation is 1 in step S5, | δ (i) | the middle similarity threshold set is 0.05.
Further, step S6 is specifically included:The sentence of weight maximum is extracted in each sub-topics document, according in step S1
The name placement of the sentence that step S102 is obtained in a document, is combined into summary.
Based on the multi-document summary automatic extracting system of sentence vector, including:Memory, processor and storage are on a memory
And the computer instruction run on a processor, when the computer instruction is run by processor, complete any of the above-described method institute
The step of stating.
A kind of computer readable storage medium, operation thereon has computer program, when the computer program is run by processor,
Complete the step described in any of the above-described method.
Description of the drawings
The accompanying drawings which form a part of this application are used for providing further understanding of the present application, and the application's is schematic
Embodiment and its explanation do not form the improper restriction to the application for explaining the application.
Fig. 1 is the flow chart of the present invention.
Fig. 2 is the flow chart of pre-treatment step of the present invention.
Specific embodiment
It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the application.Unless otherwise finger
Bright, all technical and scientific terms used herein has to be generally understood with the application person of an ordinary skill in the technical field
Identical meanings.
Fig. 1 is the flow chart of the present invention, is comprised the following steps:
S1:Pre-process document sets;
S2:Using doc2vec model trainings generation sentence vector;
S3:Each sub-topics document is saved as by sentence vector clusters and by corresponding sentence;
S4:Sentence relation graph model is established in each sub-topics document;
S5:Sentence weight is calculated in each sub-topics document;
S6:Sentence is extracted to sort to form summary.
Specifically, the specific implementation step of step S1 is as shown in Fig. 2, it comprises the following steps:
Step S101:Sentence is divided with sentence end mark to every document of document sets, a sentence accounts for a line;
Step S102:Record the corresponding position of each sentence;
Step S103:Every document content in document sets after division sentence is copied in same piece document and carries out collection of document
And one sentence of the document after merging accounts for a line;
Step S104:To often row sentence cutting word and removing stop words in the document after merging.
Further, the sentence position described in step S102 is expressed as:
Wherein hn,iRepresent the position of i-th of sentence in a document, text in n-th documentnRepresent n-th document, len is represented
The sentence number that n-th document is included.
Step S2 specifically includes following steps:
Step S201:All documents in Big-corpus are pre-processed by step S101 to the S104 in step S1, it will
The pretreated document of Big-corpus passes through PV-DM (the distribution memory models of the sentence vector) training in doc2vec;
Step S202:Trained mould will be imported by the pretreated destination documents of step S101 to S104 in step S1
Type obtains sentence vector.
Wherein, training PV-DM models specifically include following steps in step S201:
(1) by the pretreated document of Big-corpus, often row sentence is initialized as k dimensional vectors with all words, by above and below word w
Literary (each C word before and after the word) corresponding term vector sentence vector input deep neural network mould corresponding with sentence where the word
Type;
(2) these vectors of input are done summation in hidden layer to add up, add up input of the vector as output layer;
(3) output layer corresponds to a binary tree, it works as leafy node with the word occurred in big language material, with each word in big language material
The Huffman trees that the number of appearance is constructed as weights, each word correspond to the leaf node in tree, branch each time in tree
One time two classification can be regarded as, the corresponding Label of each tree node is 1- into the corresponding leafy node paths of word w from root node
pj,pjFor the corresponding coding of j-th of node in path, except root node corresponding with each tree node in addition to leafy node one and sentence
Subvector is used for supplemental training model with the vector of length, it is referred to as auxiliary vector;
(4) the method training pattern risen using gradient constantly corrects sentence vector, term vector and auxiliary vector, finally obtains training
The distribution memory models of good sentence vector.
The object function of neural metwork training is
Wherein, sentence is sentence, and doc is document after pretreatment, and w is word, and Context (sentence, w) is word w
Context words and w where sentence.
The cluster generation of each sub-topics document specifically includes following steps using spectral clustering mode in step S3:
Step S301:The similar matrix W between all sentences is built, kernel function uses gaussian kernel function,
Wherein Wi,jFor sentence xi,xjBetween similarity, σ be Gauss radius;
Step S302:Laplacian Matrix L is calculated,
L=D-W
Wherein, D is pair of horns matrix, and line n element is the sum of W line n elements;
Step S303:Laplacian Matrix after structure standardization
Step S304:It calculatesK minimum characteristic value and corresponding feature vector V;
Step S305:By feature vector by row arrangement form eigenmatrix, to the unitization formation matrix of every a line in eigenmatrix
F, i.e. matrix F often row institute into vectorial modulus value be 1;
Step S306:Often row regards the sample that a k is tieed up as to F, is clustered, gathered for C classes with Kmeans algorithms;
Step S307:Sentence corresponding to vector saves as C document in C classes.
Step S4 is specifically included:
In each sub-topics document, using similarity of similarity of the sentence between node, sentence between side, sentence, establish
Sentence relation graph model;
Further, the cosine similarity sim (xi,xj) calculation formula be:
Wherein, xi,xjFor two sentence vectors.
Step S5 specific steps include:
Each sentence weights initialisation updates sentence weight according to the relation graph model that step S4 is established using equation below iteration:
Wherein S (i) is the weight of sentence i, and δ (i) is higher than given threshold in same sub-topics document with sentence i similarities
All sentences, | δ (j) | to be higher than the sentence of all sentences of given threshold with sentence i similarities in same sub-topics document
Number, S (j) are the weight of sentence j, and it is 0.85 that d, which is that damped coefficient sets up,.
Further, sentence weights initialisation is 1 in step S5, | δ (i) | the middle similarity threshold set is 0.05.
Further, step S6 is specifically included:
The sentence of weight maximum is extracted in each sub-topics document, according to the method described in the step (2) in step S1 according to sentence
Name placement of the son in original text shelves, is combined into summary.
In order to further describe the multi-document summary extraction method of the present invention, here is that have to three and " Wu Qingyuan dies "
The result of the document structure tree summary of pass:
Celestial being is returned in Sina.com sports news Wu at the age of one hundred years old emergence of radically reforming, and the eternal legend of chess circle are lightly left away.Wu Qingyuan lifes on June 12 in 1914
It is the 3rd son in family in the Fujian Province of China.Second year is granted by three sections by Japanese chess academy, and nineteen fifty obtains nine sections.
Mr. Wu Qingyuan develops chess road, guides and supports younger generation, does one's utmost.First emperor in golden age that he is as Japanese go faces Japan,
It is referred to as " the Showa chess sage " of Japan.1961, Wu Qingyuan suffered from traffic accident, and gradually fade out a line, until formally living in retirement for 1984.
Before Wu Qingyuan, never any chess player of go circle can reach his height.2014, Zhong chesses circle were Mr. Wu
Birthday at the age of one hundred years old for holding congratulates that ceremony is grand grand, can possess the weiqi game chess scholar of these honor, all over the world only mono- people of Wu Qingyuan and
.The family members of Wu Qingyuan prepare to hold farewell ceremony December 3 for Wu Qingyuan.Wu Qingyuan was once said:" one it is at the age of one hundred years old after I also will under
Chess, two after death I also to play chess in universe.The road of go is pursued, Mr. Wu understands thoroughly life and death already.
The foregoing is merely the preferred embodiments of the application, are not limited to the application, for those skilled in the art
For member, the application can have various modifications and variations.All any modifications within spirit herein and principle, made,
Equivalent substitution, improvement etc., should be included within the protection domain of the application.
Claims (10)
1. based on the multi-document summary extraction method of sentence vector, it is characterized in that, comprise the following steps:
S1:The document sets of summary to be extracted are pre-processed;
S2:Using doc2vec model trainings generation sentence vector;
S3:Each sub-topics document is saved as by sentence vector clusters and by corresponding sentence;
S4:Sentence relation graph model is established in each sub-topics document;
S5:According to the relation graph model that step S4 is established, sentence weight is calculated in each sub-topics document;
S6:Sentence is extracted to sort to form summary.
2. the multi-document summary extraction method as described in claim 1 based on sentence vector, it is characterized in that, S1 include with
Lower step:
Step S101:Sentence is divided according to sentence end mark to every document of document sets, the sentence branch of division is recorded, one
A sentence accounts for a line;
Step S102:Record the corresponding position of each sentence;
Step S103:Every document content in document sets after division sentence is copied in same piece document and carries out collection of document
And one sentence of the document after merging accounts for a line;
Step S104:To often row sentence cutting word and removing stop words in the document after merging.
3. the multi-document summary extraction method as described in claim 1 based on sentence vector, it is characterized in that, step S2 bags
Include following steps:
Step S201:All documents in Big-corpus are pre-processed by step S101 to the S104 in step S1, it will
The pretreated document of Big-corpus is input to the distribution memory models PV-DM of the sentence vector in doc2vec, point of distich vector
Cloth memory models PV-DM is trained;
Step S202:Trained sentence will be inputted by the pretreated destination documents of step S101 to S104 in step S1
The distribution memory models PV-DM of vector obtains sentence vector.
4. the multi-document summary extraction method as claimed in claim 3 based on sentence vector, it is characterized in that, step S201
The distribution memory models PV-DM of middle distich vector is trained, and is comprised the following steps:
(2011) by the pretreated document of Big-corpus, often row sentence is initialized as k dimensional vectors with all words, by word w's
The corresponding term vector of context sentence vector corresponding with sentence where the word is input to deep neural network model;
(2012) vector of input carries out to summation is cumulative, and the vector that adds up is as output in the hidden layer of deep neural network model
The input of layer;
(2013) output layer of deep neural network model corresponds to a binary tree, and the binary tree is worked as with the word in big language material
Leafy node, Huffman trees are constructed using the number that each word occurs in big language material as weights, and each word corresponds to the leaf in tree
Child node, branch regards one time two classification as each time in tree, from root node into the corresponding leafy node paths of word w each burl
The corresponding Label of point is 1-pj,pjFor the corresponding coding of j-th of node in path, each tree in addition to root node and leafy node
Node corresponds to one and is used for supplemental training model with the auxiliary vector of length with sentence vector;
(2014) sentence vector, term vector and auxiliary vector are constantly corrected using the method that gradient rises, finally obtained trained
The distribution memory models PV-DM of sentence vector.
5. the multi-document summary extraction method as described in claim 1 based on sentence vector, it is characterized in that, in step S3
The cluster generation of sentence vector is using spectral clustering mode.
6. the multi-document summary extraction method as claimed in claim 5 based on sentence vector, it is characterized in that, step S3 bags
Include following steps:
Step S301:The similar matrix W between all sentence vectors is built, kernel function uses gaussian kernel function,
Step S302:Calculate Laplacian Matrix L;
Step S303:Build the Laplacian Matrix of standardization;
Step S304:Calculate k minimum characteristic value of Laplacian Matrix and corresponding feature vector V;
Step S305:By feature vector by row arrangement form eigenmatrix, to the unitization formation matrix of every a line in eigenmatrix
F, i.e. matrix F often row institute into vectorial modulus value be 1;
Step S306:By the often capable sample for regarding a k dimension as of matrix F, clustered, gathered for C classes with Kmeans algorithms;
Step S307:Sentence corresponding to vector saves as C sub- subject documents in C classes.
7. the multi-document summary extraction method as described in claim 1 based on sentence vector, it is characterized in that, step S4 tools
Body includes:
It is the similarity between side and sentence using similarity of the sentence between node, sentence in each sub-topics document, establishes
Sentence relation graph model.
8. the multi-document summary extraction method as claimed in claim 2 based on sentence vector, it is characterized in that, step S6 tools
Body includes:The sentence of weight maximum is extracted in each sub-topics document, is existed according to the obtained sentences of the step S102 in step S1
Name placement in document is combined into summary.
9. based on the multi-document summary automatic extracting system of sentence vector, it is characterized in that, including:It memory, processor and deposits
The computer instruction run on a memory and on a processor is stored up, when the computer instruction is run by processor, completes power
Profit requires the step of 1-8 any the methods.
10. a kind of computer readable storage medium, it is characterized in that, operation thereon has computer program, the computer program quilt
When processor is run, the step of completing claim 1-8 any the method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810045090.4A CN108090049B (en) | 2018-01-17 | 2018-01-17 | Multi-document abstract automatic extraction method and system based on sentence vectors |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810045090.4A CN108090049B (en) | 2018-01-17 | 2018-01-17 | Multi-document abstract automatic extraction method and system based on sentence vectors |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108090049A true CN108090049A (en) | 2018-05-29 |
CN108090049B CN108090049B (en) | 2021-02-05 |
Family
ID=62181661
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810045090.4A Expired - Fee Related CN108090049B (en) | 2018-01-17 | 2018-01-17 | Multi-document abstract automatic extraction method and system based on sentence vectors |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108090049B (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108897896A (en) * | 2018-07-13 | 2018-11-27 | 深圳追科技有限公司 | Keyword abstraction method based on intensified learning |
CN108959269A (en) * | 2018-07-27 | 2018-12-07 | 首都师范大学 | A kind of sentence auto ordering method and device |
CN109325109A (en) * | 2018-08-27 | 2019-02-12 | 中国人民解放军国防科技大学 | Attention encoder-based extraction type news abstract generating device |
CN109582967A (en) * | 2018-12-03 | 2019-04-05 | 深圳前海微众银行股份有限公司 | Public sentiment abstract extraction method, apparatus, equipment and computer readable storage medium |
CN109829161A (en) * | 2019-01-30 | 2019-05-31 | 延边大学 | A kind of method of multilingual autoabstract |
CN109885683A (en) * | 2019-01-29 | 2019-06-14 | 桂林远望智能通信科技有限公司 | A method of the generation text snippet based on K-means model and neural network model |
CN109902284A (en) * | 2018-12-30 | 2019-06-18 | 中国科学院软件研究所 | A kind of unsupervised argument extracting method excavated based on debate |
CN109902168A (en) * | 2019-01-25 | 2019-06-18 | 北京创新者信息技术有限公司 | A kind of valuation of patent method and system |
CN109977196A (en) * | 2019-03-29 | 2019-07-05 | 云南电网有限责任公司电力科学研究院 | A kind of detection method and device of magnanimity document similarity |
CN110162778A (en) * | 2019-04-02 | 2019-08-23 | 阿里巴巴集团控股有限公司 | The generation method and device of text snippet |
CN110362823A (en) * | 2019-06-21 | 2019-10-22 | 北京百度网讯科技有限公司 | The training method and device of text generation model are described |
CN110399606A (en) * | 2018-12-06 | 2019-11-01 | 国网信息通信产业集团有限公司 | A kind of unsupervised electric power document subject matter generation method and system |
CN110737768A (en) * | 2019-10-16 | 2020-01-31 | 信雅达系统工程股份有限公司 | Text abstract automatic generation method and device based on deep learning and storage medium |
CN111813925A (en) * | 2020-07-14 | 2020-10-23 | 混沌时代(北京)教育科技有限公司 | Semantic-based unsupervised automatic summarization method and system |
CN111914083A (en) * | 2019-05-10 | 2020-11-10 | 腾讯科技(深圳)有限公司 | Statement processing method, device and storage medium |
US10902191B1 (en) * | 2019-08-05 | 2021-01-26 | International Business Machines Corporation | Natural language processing techniques for generating a document summary |
WO2021042529A1 (en) * | 2019-09-02 | 2021-03-11 | 平安科技(深圳)有限公司 | Article abstract automatic generation method, device, and computer-readable storage medium |
CN112784043A (en) * | 2021-01-18 | 2021-05-11 | 辽宁工程技术大学 | Aspect-level emotion classification method based on gated convolutional neural network |
CN112949299A (en) * | 2021-02-26 | 2021-06-11 | 深圳市北科瑞讯信息技术有限公司 | Method and device for generating news manuscript, storage medium and electronic device |
CN113220853A (en) * | 2021-05-12 | 2021-08-06 | 燕山大学 | Automatic generation method and system for legal questions |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101187919A (en) * | 2006-11-16 | 2008-05-28 | 北大方正集团有限公司 | Method and system for abstracting batch single document for document set |
US7398196B1 (en) * | 2000-09-07 | 2008-07-08 | Intel Corporation | Method and apparatus for summarizing multiple documents using a subsumption model |
CN101231634A (en) * | 2007-12-29 | 2008-07-30 | 中国科学院计算技术研究所 | Autoabstract method for multi-document |
CN104778157A (en) * | 2015-03-02 | 2015-07-15 | 华南理工大学 | Multi-document abstract sentence generating method |
CN104834735A (en) * | 2015-05-18 | 2015-08-12 | 大连理工大学 | Automatic document summarization extraction method based on term vectors |
CN107357899A (en) * | 2017-07-14 | 2017-11-17 | 吉林大学 | Based on the short text sentiment analysis method with product network depth autocoder |
-
2018
- 2018-01-17 CN CN201810045090.4A patent/CN108090049B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7398196B1 (en) * | 2000-09-07 | 2008-07-08 | Intel Corporation | Method and apparatus for summarizing multiple documents using a subsumption model |
CN101187919A (en) * | 2006-11-16 | 2008-05-28 | 北大方正集团有限公司 | Method and system for abstracting batch single document for document set |
CN101231634A (en) * | 2007-12-29 | 2008-07-30 | 中国科学院计算技术研究所 | Autoabstract method for multi-document |
CN104778157A (en) * | 2015-03-02 | 2015-07-15 | 华南理工大学 | Multi-document abstract sentence generating method |
CN104834735A (en) * | 2015-05-18 | 2015-08-12 | 大连理工大学 | Automatic document summarization extraction method based on term vectors |
CN107357899A (en) * | 2017-07-14 | 2017-11-17 | 吉林大学 | Based on the short text sentiment analysis method with product network depth autocoder |
Non-Patent Citations (1)
Title |
---|
刘欣等: "基于PV_DM模型的多文档摘要方法", 《计算机应用与软件》 * |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108897896A (en) * | 2018-07-13 | 2018-11-27 | 深圳追科技有限公司 | Keyword abstraction method based on intensified learning |
CN108959269A (en) * | 2018-07-27 | 2018-12-07 | 首都师范大学 | A kind of sentence auto ordering method and device |
CN108959269B (en) * | 2018-07-27 | 2019-07-05 | 首都师范大学 | A kind of sentence auto ordering method and device |
CN109325109A (en) * | 2018-08-27 | 2019-02-12 | 中国人民解放军国防科技大学 | Attention encoder-based extraction type news abstract generating device |
CN109325109B (en) * | 2018-08-27 | 2021-11-19 | 中国人民解放军国防科技大学 | Attention encoder-based extraction type news abstract generating device |
CN109582967A (en) * | 2018-12-03 | 2019-04-05 | 深圳前海微众银行股份有限公司 | Public sentiment abstract extraction method, apparatus, equipment and computer readable storage medium |
CN109582967B (en) * | 2018-12-03 | 2023-08-18 | 深圳前海微众银行股份有限公司 | Public opinion abstract extraction method, device, equipment and computer readable storage medium |
CN110399606A (en) * | 2018-12-06 | 2019-11-01 | 国网信息通信产业集团有限公司 | A kind of unsupervised electric power document subject matter generation method and system |
CN110399606B (en) * | 2018-12-06 | 2023-04-07 | 国网信息通信产业集团有限公司 | Unsupervised electric power document theme generation method and system |
CN109902284A (en) * | 2018-12-30 | 2019-06-18 | 中国科学院软件研究所 | A kind of unsupervised argument extracting method excavated based on debate |
CN109902168A (en) * | 2019-01-25 | 2019-06-18 | 北京创新者信息技术有限公司 | A kind of valuation of patent method and system |
US11847152B2 (en) | 2019-01-25 | 2023-12-19 | Beijing Innovator Information Technology Co., Ltd. | Patent evaluation method and system that aggregate patents based on technical clustering |
CN109885683A (en) * | 2019-01-29 | 2019-06-14 | 桂林远望智能通信科技有限公司 | A method of the generation text snippet based on K-means model and neural network model |
CN109885683B (en) * | 2019-01-29 | 2022-12-02 | 桂林远望智能通信科技有限公司 | Method for generating text abstract based on K-means model and neural network model |
CN109829161B (en) * | 2019-01-30 | 2023-08-04 | 延边大学 | Method for automatically abstracting multiple languages |
CN109829161A (en) * | 2019-01-30 | 2019-05-31 | 延边大学 | A kind of method of multilingual autoabstract |
CN109977196A (en) * | 2019-03-29 | 2019-07-05 | 云南电网有限责任公司电力科学研究院 | A kind of detection method and device of magnanimity document similarity |
CN110162778A (en) * | 2019-04-02 | 2019-08-23 | 阿里巴巴集团控股有限公司 | The generation method and device of text snippet |
CN111914083A (en) * | 2019-05-10 | 2020-11-10 | 腾讯科技(深圳)有限公司 | Statement processing method, device and storage medium |
CN110362823A (en) * | 2019-06-21 | 2019-10-22 | 北京百度网讯科技有限公司 | The training method and device of text generation model are described |
CN110362823B (en) * | 2019-06-21 | 2023-07-28 | 北京百度网讯科技有限公司 | Training method and device for descriptive text generation model |
US10902191B1 (en) * | 2019-08-05 | 2021-01-26 | International Business Machines Corporation | Natural language processing techniques for generating a document summary |
WO2021042529A1 (en) * | 2019-09-02 | 2021-03-11 | 平安科技(深圳)有限公司 | Article abstract automatic generation method, device, and computer-readable storage medium |
CN110737768B (en) * | 2019-10-16 | 2022-04-08 | 信雅达科技股份有限公司 | Text abstract automatic generation method and device based on deep learning and storage medium |
CN110737768A (en) * | 2019-10-16 | 2020-01-31 | 信雅达系统工程股份有限公司 | Text abstract automatic generation method and device based on deep learning and storage medium |
CN111813925A (en) * | 2020-07-14 | 2020-10-23 | 混沌时代(北京)教育科技有限公司 | Semantic-based unsupervised automatic summarization method and system |
CN112784043A (en) * | 2021-01-18 | 2021-05-11 | 辽宁工程技术大学 | Aspect-level emotion classification method based on gated convolutional neural network |
CN112949299A (en) * | 2021-02-26 | 2021-06-11 | 深圳市北科瑞讯信息技术有限公司 | Method and device for generating news manuscript, storage medium and electronic device |
CN113220853B (en) * | 2021-05-12 | 2022-10-04 | 燕山大学 | Automatic generation method and system for legal questions |
CN113220853A (en) * | 2021-05-12 | 2021-08-06 | 燕山大学 | Automatic generation method and system for legal questions |
Also Published As
Publication number | Publication date |
---|---|
CN108090049B (en) | 2021-02-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108090049A (en) | Multi-document summary extraction method and system based on sentence vector | |
Abdullah et al. | SEDAT: sentiment and emotion detection in Arabic text using CNN-LSTM deep learning | |
Zhao et al. | Disease named entity recognition from biomedical literature using a novel convolutional neural network | |
CN111241294B (en) | Relationship extraction method of graph convolution network based on dependency analysis and keywords | |
Quan et al. | An efficient framework for sentence similarity modeling | |
CN109635280A (en) | A kind of event extraction method based on mark | |
CN106202010A (en) | The method and apparatus building Law Text syntax tree based on deep neural network | |
Yan et al. | Named entity recognition by using XLNet-BiLSTM-CRF | |
CN110083710A (en) | It is a kind of that generation method is defined based on Recognition with Recurrent Neural Network and the word of latent variable structure | |
CN108875809A (en) | The biomedical entity relationship classification method of joint attention mechanism and neural network | |
CN111222318B (en) | Trigger word recognition method based on double-channel bidirectional LSTM-CRF network | |
Tang et al. | Deep sequential fusion LSTM network for image description | |
CN110347796A (en) | Short text similarity calculating method under vector semantic tensor space | |
Ma et al. | Data augmentation for chinese text classification using back-translation | |
CN114925205B (en) | GCN-GRU text classification method based on contrast learning | |
Thattinaphanich et al. | Thai named entity recognition using Bi-LSTM-CRF with word and character representation | |
CN113535897A (en) | Fine-grained emotion analysis method based on syntactic relation and opinion word distribution | |
Jia et al. | Attention in character-based BiLSTM-CRF for Chinese named entity recognition | |
Yan et al. | MoGCN: Mixture of gated convolutional neural network for named entity recognition of chinese historical texts | |
CN112818698A (en) | Fine-grained user comment sentiment analysis method based on dual-channel model | |
Jin et al. | WordTransABSA: Enhancing Aspect-based Sentiment Analysis with masked language modeling for affective token prediction | |
CN115858736A (en) | Emotion text generation method based on emotion prompt fine adjustment | |
Sun et al. | Chinese microblog sentiment classification based on convolution neural network with content extension method | |
Wang | Research on the art value and application of art creation based on the emotion analysis of art | |
Shuang et al. | Combining word order and cnn-lstm for sentence sentiment classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210205 |
|
CF01 | Termination of patent right due to non-payment of annual fee |