CN108376131A - Keyword abstraction method based on seq2seq deep neural network models - Google Patents
Keyword abstraction method based on seq2seq deep neural network models Download PDFInfo
- Publication number
- CN108376131A CN108376131A CN201810211285.1A CN201810211285A CN108376131A CN 108376131 A CN108376131 A CN 108376131A CN 201810211285 A CN201810211285 A CN 201810211285A CN 108376131 A CN108376131 A CN 108376131A
- Authority
- CN
- China
- Prior art keywords
- word
- document
- neural network
- keyword
- term vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000003062 neural network model Methods 0.000 title claims abstract description 18
- 238000006243 chemical reaction Methods 0.000 claims abstract description 29
- 238000000605 extraction Methods 0.000 claims abstract description 17
- 238000007781 pre-processing Methods 0.000 claims abstract description 16
- 238000012216 screening Methods 0.000 claims abstract description 5
- 238000013528 artificial neural network Methods 0.000 claims description 20
- 230000000306 recurrent effect Effects 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 10
- 230000011218 segmentation Effects 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 6
- 238000003058 natural language processing Methods 0.000 claims description 4
- 238000005457 optimization Methods 0.000 claims description 4
- 238000007477 logistic regression Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims 1
- 238000011835 investigation Methods 0.000 abstract description 2
- 238000005516 engineering process Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Data Mining & Analysis (AREA)
- Probability & Statistics with Applications (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Machine Translation (AREA)
Abstract
The present invention relates to computer realms,More particularly to a kind of keyword abstraction method being based on seq2seq (sequence to sequence sequences to sequence) deep neural network model,Target information is first passed through preprocessing module and extracted by the method,Term vector conversion module and part-of-speech tagging module are converted and are marked respectively again,It is then passed through candidate word weight computation module and obtains candidate word sequence,Keyword is obtained using candidate word screening module,The present invention by document vector by being considered as being averaged for term vector,Term vector and document vector are combined and indicated as the vector of word,Importance of each word for document can preferably be analyzed,The keyword of document purport can more be represented by selecting,The investigation range of keyword extraction is expanded simultaneously,Overcome the shortcomings that existing extraction technique cannot predict keyword other than vocabulary and the keyword not in source document content.
Description
Technical field
The present invention relates to computer realms, and in particular to a kind of keyword based on seq2seq deep neural network models
Abstracting method.
Background technology
Development with computer and network technologies and the arrival in big data epoch, digitized file is just with surprising
Speed madness increase, a large amount of human contacts to information be all to exist with electrical file form.It is vast as the open sea in face of these
Information, people can automatically identify the keyword that can most represent article purport there is an urgent need to machine, us helped quickly to understand
Article main contents, to save reading, processing and utilize the time of these electronic documents.
These technologies are referred to as keyword abstraction (Keyword Extraction) at present, keyword abstraction refer to quickly from
Multiple words or phrase that can represent document subject matter are obtained in document, as a kind of refining summary to the document main contents.
The main contents of document can be understood quickly by keyword people, efficiently hold document subject matter.Keyword extensive use
In fields such as news report, scientific papers, so as to allow people's efficiently management and retrieval document.Therefore, keyword abstraction
Become an important research hot spot in text-processing field at present.
The starting point of current existing many webpage keywords extracting methods, these methods is mostly the appearance frequency of word
Region residing for full text of rate, word, word itself semantic feature.The method of use substantially has following a few classes:Based on statistics
Method, the method for machine learning, the method for natural language processing.
But all there is deficiencies for these methods, wherein to keyword extraction, evaluate the candidate keywords of text, and
After sequence, keyword of the extraction top n word as webpage, but in this N number of keyword, and not all word be all really with
The relevant keyword of text theme, and in the candidate keywords not being extracted, but still have some and text theme very phase
The word of pass so that the accuracy rate and recall rate of keyword extraction be not high.
In searching keyword, there is the keyword of half not come from source document, and existing keyword extraction techniques
Candidate keywords, therefore the unpredictable keyword not in source document can only be selected from source document, it can not will be in document
The near synonym of word leverage the accuracy rate of keyword extraction as keyword.
Existing keyword extraction techniques also can only choose candidate word from the vocabulary of certain scale simultaneously, when document word
When language scale is far more than vocabulary scale, cannot predict the word other than vocabulary then can influence the accuracy rate of keyword extraction.
Existing keyword abstraction method is when choosing candidate keywords, it will usually consider the feature that machine learning obtains,
However these features can only carry out the importance that statistics finds each word by the frequency of occurrences to word in document, it can not
It is enough to disclose the Complete Semantics being hidden in document content.
Invention content
The present invention is at least one defect (deficiency) overcome described in the above-mentioned prior art, provides one kind.
In order to solve the above technical problems, technical scheme is as follows:
One kind being based on the key of seq2seq (sequence to sequence sequences to sequence) deep neural network model
Word abstracting method, the described method comprises the following steps:
S1. document to be extracted and corpus preprocessing module is imported to extract;
S2. pass through the information that preprocessing module is made to extract and respectively enter term vector conversion module and part-of-speech tagging module,
Term vector conversion module carries out term vector conversion, and part-of-speech tagging is carried out in part-of-speech tagging module;
S3. the information for passing through term vector conversion and part-of-speech tagging enters candidate word weight computation module, obtains candidate word order
Row;
S4. the candidate word sequence obtained enters candidate word screening module, obtains suitable keyword;
Further, the preprocessing module to corpus and document to be extracted carry out Chinese word segmentation, English stem extraction with
And removal stop words, wherein Chinese word segmentation enter term vector conversion module, English stem enters part-of-speech tagging module.
Further, the term vector conversion module converts the word handled by preprocessing module to term vector
The form of (word embedding), term vector conversion module are used in term vector representational framework word2vec (word to
Vector words to vector) on the basis of random selection document in word seek mean value, as document vector indicate, then by word
Vector sum document vector participates in training and predicts as a whole.
Further, the term vector conversion module indicates to obtain certain using context and document information using following formula
The probability of one word:
Wherein, c is context term vector, and x is document vector, and U is mapping matrix of the neural network input layer to hidden layer,
V is mapping matrix of the neural network hidden layer to output layer, and w is prediction target word, and T is the length of document.
Further, the document vector x is handled by following formula, the seq2seq deep neural network models with
Probability q abandons the component of the term vector of certain word, while in order to avoid generating deviation, the dimension of reservation being normalized:
Term vector modulus of conversion model optimization object function in the block is represented by following formula, first item be still for
Determine context and document semantic observes the likelihood function of target word, and Section 2 is the regularization of data dependence:
Further, the regularization of data dependence is indicated using following formula, becomes regular terms,
Wherein σ is considered as logistic regression.For the regular terms, it tends to punish high word frequency word.Because for high frequency
It is sampled that probability is higher to word, and the value of regular terms is also bigger, and for σ (1- σ) this coefficient, when term vector is more accurate,
When the probability of prediction is bigger, this also can be smaller, this also just demonstrates the regular terms from mathematical angle can play Optimized model
Effect.
Further, the part-of-speech tagging module passes through NLTK (the natural language toolkit in the libraries python
Natural language processing tool) it wraps to the word progress part-of-speech tagging after preprocessing module.
Further, the candidate word weight computation module is seq2seq models, and the seq2seq models include encoder
And decoder, outputting and inputting for the seq2seq models is all a sequence, and the length for outputting and inputting sequence be can
Become, the encoder and decoder are Recognition with Recurrent Neural Network (RNN).
Further, attention mechanism and replicanism are added in the Recognition with Recurrent Neural Network so that pass through the neural network
The keyword other than vocabulary and source document can be predicted, therefore predicts that the probability of word is represented by following formula:
p(yt|y1..., t-1, x)
=pg(yt|y1..., t-1, x)+pc(yt|y1..., t-1, x)
First item is the predictor formula of traditional Recognition with Recurrent Neural Network, i.e., by softmax graders according to hidden layer
State and it is predicted that word export the probability of all current all words;Section 2 is that replicanism considers in document each
The importance of word, can be expressed as following formula:
ψ is the set of all words in source document, and σ is a nonlinear function, and Wc is a trained parameter matrix,
Z is the weighted sum of all scores, is used for regularization.
Compared with prior art, the advantageous effect of technical solution of the present invention is:
(1) present invention is in term vector conversion module, by the way that document vector is considered as being averaged for term vector, and word here
It is vectorial then be it is randomly selected from document, by term vector and document vector combine as word vector indicate, to
Semanteme of the word in different context of co-texts is considered, importance of each word for document can be preferably analyzed, is selected
The keyword of document purport can more be represented by selecting out.
(2) since the present invention is in candidate word chooses module, input of the term vector as module of document information is added,
More external informations are introduced for the keyword abstraction of document to be extracted, while adding attention mechanism and replicanism, from
And expanded the investigation range of keyword extraction, overcome existing extraction technique cannot predict keyword other than vocabulary and
The shortcomings that keyword not in source document content.
(3) accuracy rate for finding keyword is substantially increased, solving keyword in source document the problems in could not be chosen,
Search area is expanded simultaneously, can then disclose the semanteme hidden in keyword.
Description of the drawings
Fig. 1 is the keyword abstraction method schematic diagram based on seq2seq deep neural network models.
Fig. 2 is the keyword abstraction method work flow diagram based on seq2seq deep neural network models.
Specific implementation mode
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;
To those skilled in the art, it is to be appreciated that certain known features and its explanation, which may be omitted, in attached drawing
's.
The following further describes the technical solution of the present invention with reference to the accompanying drawings and examples.
The present invention is based on a kind of keyword abstraction method of the proposition of seq2seq deep neural network models, method operations
Step is as shown in Figure 1:
S1. document to be extracted and corpus preprocessing module is imported to extract.
S2. pass through the information that preprocessing module is made to extract and respectively enter term vector conversion module and part-of-speech tagging module,
Term vector conversion module carries out term vector conversion, and part-of-speech tagging is carried out in part-of-speech tagging module.
S3. the information for passing through term vector conversion and part-of-speech tagging enters candidate word weight computation module, obtains candidate word order
Row.
S4. the candidate word sequence obtained enters candidate word screening module, obtains suitable keyword
Wherein, preprocessing module carries out Chinese word segmentation, English stem extraction and removal to corpus and document to be extracted
Stop words, wherein Chinese word segmentation enter term vector conversion module, and English stem enters part-of-speech tagging module.
Term vector conversion module converts the word handled by preprocessing module to term vector (word
Embedding form), the technology that term vector conversion module uses and existing term vector representational framework word2vec (word
To vector words to vector) it is similar, be on the basis of word2vec randomly choose document in word seek mean value, as
Document vector indicates, then term vector and document vector are participated in training as a whole and predicted.It can see the vector
The part that representation method not only indicates document semantic as vector, while trained ginseng is reduced by randomly selected mechanism
Number, greatly reduces trained complexity.And the mechanism of random selection is also a kind of method of regularization, can be to the pre- of model
It surveys effect and plays optimization function.
Term vector conversion module indicates to obtain the probability of a certain word using context and document information using formula 1:
Wherein, c is context term vector, and x is document vector, and U is mapping matrix of the neural network input layer to hidden layer,
V is mapping matrix of the neural network hidden layer to output layer, and w is prediction target word, and T is the length of document.
Term vector modulus of conversion document vector x in the blockdIt is handled by formula 2, which abandons certain word with probability q
The component of term vector, while in order to avoid generating deviation, the dimension of reservation is normalized:
Term vector modulus of conversion model optimization object function in the block is represented by formula 3, and first item is still for given
Context and document semantic observe the likelihood function of target word, and Section 2 is the regularization of data dependence, becomes regularization,
Regular terms can be equivalent to formula 4, and wherein σ is considered as logistic regression.For the regular terms, it is high that it tends to punishment
Word frequency word.Because it is sampled that probability is higher for high frequency words, the value of regular terms is also bigger.And for σ (1- σ), this is
Number, when term vector is more accurate, when the probability of prediction is bigger, this also can be smaller, this also just demonstrates the canonical from mathematical angle
Item can play the role of Optimized model.
Part-of-speech tagging module is by the NLTK natural language processing kits in the libraries python to after preprocessing module
Word carry out part-of-speech tagging.
Candidate word weight computation module is a seq2seq model, and outputting and inputting for the model is all a sequence, and
And the length for outputting and inputting sequence is variable.Seq2seq models are made of an encoder and a decoder, I
Using Recognition with Recurrent Neural Network (RNN) be used as encoder and decoder.We input each of sentence into encoder
Then word exports the semantic vector of entire sentence.Because the characteristics of Recognition with Recurrent Neural Network is exactly to consider each step in front
Input information, so the semantic vector theoretically exported can include the information of entire sentence, we can be this semanteme
Vector treats as a semantic expressiveness of this sentence, that is, a sentence vector.Then in a decoder, we are according to encoder
Obtained sentence vector, gradually comes out information analysis therein is lain in.
By the way that attention mechanism and replicanism are added in the Recognition with Recurrent Neural Network in candidate word weight computation module, make
The keyword other than vocabulary and source document can be predicted by the neural network by obtaining, therefore predict that the probability of word can indicate
For formula 5:
p(yt|y1..., t-1, x)
=pg(yt|g1..., t-1, x)+pc(yt|y1..., t-1, x) (formula 5)
First item is the predictor formula of traditional Recognition with Recurrent Neural Network, i.e., by softmax graders according to hidden layer
State and it is predicted that word export the probability of all current all words;Section 2 is that replicanism considers in document each
The importance of word can be expressed as formula 6:
ψ is the set of all words in source document, and σ is a nonlinear function, and wc is a trained parameter matrix,
Z is the weighted sum of all scores, is used for regularization.
Therefore the decoder of candidate word weight computation module and traditional RNN structures are the difference is that with lower part:
When generating word, there are both of which, and one is the patterns of generation, and one is copy mode, final mask is by a selection net
The probabilistic model of network combination both of which is in then similar to the decoder of traditional RNN when generation pattern, generates a word,
And in copy mode when then from position softmax obtain word input terminal location information;With the t-1 moment when state updates
The word predicted update the state of t moment, and consider the hidden state of specific position in word matrix.
Candidate word screening module screens the candidate word sequence that candidate word weight computation module obtains, including filters out
The keyword of suitable part of speech, while all keywords being made of number and single character are excluded, and belong to other keys
Word prefix or the keyword repeated.
Embodiment 1
By in the keyword extraction system of the selected text input present invention, keyword abstraction experiment is carried out, as shown in Fig. 2,
“Towards content-based relevance ranking for video search.Most existing web
Video search engines index videos by file names, URLs, and surrounding
texts.These type of video metadata roughly describe the whole video in an
Abstract level without taking the rich content, such as semantic content
descriptions and speech within the video.In this paper we propose a novel
relevance ranking approach for Web-based video search using both video
Metadata and rich content.To leverage real content into ranking, the videos
Are segmented into shots, which are smaller and more semantic-meaningful
Retrievable units.With video metadata and content information of shots, we
Developed an integrated ranking approach, which achieves improved ranking
Performance. " by segmenting with after part-of-speech tagging the reservation part of speech of acquiescence is arranged, by this system and tradition RNN models in
Obtained keyword is compared in terms of the keyword two from positioned at source document and other than source document respectively, obtains result such as
Under, the keyword of wherein benchmark result is:Video metadata, integrated ranking, relevance
Ranking, content based ranking, video segmentation:
1. the keyword in source document:1.information retrieval;2.video search;
3.ranking;4.relevance ranking;5.relevance ranking;6.video metadata;
7.intergrated ranking;8.web video;9.web video search;10.rich content
2. the keyword other than source document:1.video retrieval;2.web search;3.content
ranking;4.content based retrieval;5.content retrieval;6.video indexing;
7.relevance feedback;8.content based ranking;9.semantic web;10.video
segmentation
Embodiment 2
Multiple existing keyword abstraction algorithms are compared, using F values as performance indicator, predict preceding 5,10 keys
Word, it is as a result as follows.It can be seen that it is proposed that keyword abstraction algorithm and model (CopyRNN tape copy mechanism cycle god
Through network) performance is best on each data set.
Embodiment 3
Extraction experiment is carried out for the keyword other than source document, source document is predicted since other algorithms are unable to get
Keyword in addition predicts preceding 10,50 keys because being only compared with using the algorithm of traditional Recognition with Recurrent Neural Network
Word, and using recall rate as evaluation metrics, it is as a result as follows.It can be seen that it is proposed that keyword abstraction algorithm and model
(CopyRNN) all highers of the recall rate on each data set illustrate that the algorithm can be predicted more accurately other than source document
Keyword.
It can be seen that the keyword abstraction system that the invention proposes can not only extract the keyword being present in source document,
And also have preferable prediction effect for the keyword other than source document, compare existing keyword abstraction technology, this hair
The result that bright system is realized is more rationally and efficient.
Claims (9)
1. a kind of keyword abstraction method based on seq2seq deep neural network models, which is characterized in that the method includes
Following steps:
S1. document to be extracted and corpus preprocessing module is imported to extract;
S2. pass through preprocessing module and make the information extracted and respectively enter term vector conversion module and part-of-speech tagging module, word to
It measures conversion module and carries out term vector conversion, part-of-speech tagging is carried out in part-of-speech tagging module;
S3. the information for passing through term vector conversion and part-of-speech tagging enters candidate word weight computation module, obtains candidate word sequence.
S4. the candidate word sequence obtained enters candidate word screening module, obtains suitable keyword.
2. the keyword abstraction method according to claim 1 based on seq2seq deep neural network models, feature exist
In the preprocessing module carries out Chinese word segmentation, English stem extraction and removal stop words to corpus and document to be extracted.
3. the keyword abstraction method according to claim 1 based on seq2seq deep neural network models, feature exist
In the term vector conversion module converts the word handled by preprocessing module to the form of term vector, term vector
Conversion module seeks mean value using the word in random selection document on the basis of term vector representational framework word2vec, as text
Shelves vector indicates, then term vector and document vector are participated in training as a whole and predicted.
4. the keyword abstraction method according to claim 3 based on seq2seq deep neural network models, feature exist
In the term vector conversion module indicates to obtain the general of a certain word using context and document information using following formula
Rate:
Wherein, c is context term vector, and x is document vector, and U is mapping matrix of the neural network input layer to hidden layer, and v is
Neural network hidden layer is to the mapping matrix of output layer, and w is prediction target word, and T is the length of document.
5. the keyword abstraction method according to claim 4 based on seq2seq deep neural network models, feature exist
In, the document vector x is handled by following formula, abandons the component of the term vector of certain word with probability q, while in order to
It avoids generating deviation, the dimension of reservation is normalized:
Term vector modulus of conversion model optimization object function in the block is represented by following formula:
First item is to observe the likelihood function of target word for giving context and document semantic, and Section 2 is data dependence
Regularization.
6. the keyword abstraction method according to claim 5 based on seq2seq deep neural network models, feature exist
In the regularization of the data dependence is indicated using following formula, referred to as regular terms
Wherein σ is considered as logistic regression, and for regular terms, it tends to punish high word frequency word, because for its quilt of high frequency words
Sampling probability is higher, and the value of regular terms is also bigger, and for σ (1- σ) this coefficient, when term vector is more accurate, prediction
When probability is bigger, regular terms also can be smaller.
7. the keyword abstraction method according to claim 1 based on seq2seq deep neural network models, feature exist
In the part-of-speech tagging module is by the NLTK natural language processing kits in the libraries python to after preprocessing module
Word carries out part-of-speech tagging.
8. the keyword abstraction method according to claim 1 based on seq2seq deep neural network models, feature exist
In the candidate word weight computation module is seq2seq models, and the seq2seq models include encoder and decoder, described
Outputting and inputting for seq2seq models is all a sequence, and the length for outputting and inputting sequence is variable, the coding
Device and decoder are Recognition with Recurrent Neural Network.
9. the keyword abstraction method according to claim 1 based on seq2seq deep neural network models, feature exist
In addition attention mechanism and replicanism in the Recognition with Recurrent Neural Network so that can predict word by the neural network
Keyword other than remittance table and source document, predicts that the probability of word is represented by following formula:
p(yt|y1 ..., t-1, x)
=pg(yt|y1 ..., t-1, x) and+pc(yt|y1 ..., t-1, x)
First item is the predictor formula of traditional Recognition with Recurrent Neural Network, i.e., by softmax graders according to the state of hidden layer
With it is predicted that word export the probability of all current all words;Section 2 is that replicanism considers each word in document
Importance, can be expressed as following formula:
ψ is the set of all words in source document, and σ is a nonlinear function, and Wc is a trained parameter matrix, and Z is
The weighted sum of all scores is used for regularization.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810211285.1A CN108376131A (en) | 2018-03-14 | 2018-03-14 | Keyword abstraction method based on seq2seq deep neural network models |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810211285.1A CN108376131A (en) | 2018-03-14 | 2018-03-14 | Keyword abstraction method based on seq2seq deep neural network models |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108376131A true CN108376131A (en) | 2018-08-07 |
Family
ID=63018752
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810211285.1A Pending CN108376131A (en) | 2018-03-14 | 2018-03-14 | Keyword abstraction method based on seq2seq deep neural network models |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108376131A (en) |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165950A (en) * | 2018-08-10 | 2019-01-08 | 哈尔滨工业大学(威海) | A kind of abnormal transaction identification method based on financial time series feature, equipment and readable storage medium storing program for executing |
CN109299470A (en) * | 2018-11-01 | 2019-02-01 | 成都数联铭品科技有限公司 | The abstracting method and system of trigger word in textual announcement |
CN109670035A (en) * | 2018-12-03 | 2019-04-23 | 科大讯飞股份有限公司 | A kind of text snippet generation method |
CN109933806A (en) * | 2019-04-01 | 2019-06-25 | 长沙理工大学 | A kind of repetition generation method, system, equipment and computer readable storage medium |
CN109948089A (en) * | 2019-02-21 | 2019-06-28 | 中国海洋大学 | A kind of method and device for extracting Web page text |
CN109992774A (en) * | 2019-03-25 | 2019-07-09 | 北京理工大学 | The key phrase recognition methods of word-based attribute attention mechanism |
CN110069611A (en) * | 2019-04-12 | 2019-07-30 | 武汉大学 | A kind of the chat robots reply generation method and device of theme enhancing |
CN110110330A (en) * | 2019-04-30 | 2019-08-09 | 腾讯科技(深圳)有限公司 | Text based keyword extracting method and computer equipment |
CN110119765A (en) * | 2019-04-18 | 2019-08-13 | 浙江工业大学 | A kind of keyword extracting method based on Seq2seq frame |
CN110263122A (en) * | 2019-05-08 | 2019-09-20 | 北京奇艺世纪科技有限公司 | A kind of keyword acquisition methods, device and computer readable storage medium |
CN110991612A (en) * | 2019-11-29 | 2020-04-10 | 交通银行股份有限公司 | Message analysis method of international routine real-time reasoning model based on word vector |
CN111192567A (en) * | 2019-12-27 | 2020-05-22 | 青岛海信智慧家居系统股份有限公司 | Method and device for generating interaction information of intelligent equipment |
CN111477320A (en) * | 2020-03-11 | 2020-07-31 | 北京大学第三医院(北京大学第三临床医学院) | Construction system of treatment effect prediction model, treatment effect prediction system and terminal |
WO2020155769A1 (en) * | 2019-01-30 | 2020-08-06 | 平安科技(深圳)有限公司 | Method and device for establishing keyword generation model |
CN111737401A (en) * | 2020-06-22 | 2020-10-02 | 首都师范大学 | Key phrase prediction method based on Seq2set2Seq framework |
CN111859940A (en) * | 2019-04-23 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | Keyword extraction method and device, electronic equipment and storage medium |
CN112163405A (en) * | 2020-09-08 | 2021-01-01 | 北京百度网讯科技有限公司 | Question generation method and device |
CN112446206A (en) * | 2019-08-16 | 2021-03-05 | 阿里巴巴集团控股有限公司 | Menu title generation method and device |
CN112464656A (en) * | 2020-11-30 | 2021-03-09 | 科大讯飞股份有限公司 | Keyword extraction method and device, electronic equipment and storage medium |
CN112800757A (en) * | 2021-04-06 | 2021-05-14 | 杭州远传新业科技有限公司 | Keyword generation method, device, equipment and medium |
CN114021440A (en) * | 2021-10-28 | 2022-02-08 | 中航机载系统共性技术有限公司 | FPGA (field programmable Gate array) time sequence simulation verification method and device based on MATLAB (matrix laboratory) |
CN115809665A (en) * | 2022-12-13 | 2023-03-17 | 杭州电子科技大学 | Unsupervised keyword extraction method based on bidirectional multi-granularity attention mechanism |
WO2023060795A1 (en) * | 2021-10-12 | 2023-04-20 | 平安科技(深圳)有限公司 | Automatic keyword extraction method and apparatus, and device and storage medium |
CN116011633A (en) * | 2022-12-23 | 2023-04-25 | 浙江苍南仪表集团股份有限公司 | Regional gas consumption prediction method, regional gas consumption prediction system, regional gas consumption prediction equipment and Internet of things cloud platform |
CN117150046A (en) * | 2023-09-12 | 2023-12-01 | 广东省华南技术转移中心有限公司 | Automatic task decomposition method and system based on context semantics |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105893410A (en) * | 2015-11-18 | 2016-08-24 | 乐视网信息技术(北京)股份有限公司 | Keyword extraction method and apparatus |
US20170091318A1 (en) * | 2015-09-29 | 2017-03-30 | Kabushiki Kaisha Toshiba | Apparatus and method for extracting keywords from a single document |
CN106919646A (en) * | 2017-01-18 | 2017-07-04 | 南京云思创智信息科技有限公司 | Chinese text summarization generation system and method |
CN106997344A (en) * | 2017-03-31 | 2017-08-01 | 成都数联铭品科技有限公司 | Keyword abstraction system |
-
2018
- 2018-03-14 CN CN201810211285.1A patent/CN108376131A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170091318A1 (en) * | 2015-09-29 | 2017-03-30 | Kabushiki Kaisha Toshiba | Apparatus and method for extracting keywords from a single document |
CN105893410A (en) * | 2015-11-18 | 2016-08-24 | 乐视网信息技术(北京)股份有限公司 | Keyword extraction method and apparatus |
CN106919646A (en) * | 2017-01-18 | 2017-07-04 | 南京云思创智信息科技有限公司 | Chinese text summarization generation system and method |
CN106997344A (en) * | 2017-03-31 | 2017-08-01 | 成都数联铭品科技有限公司 | Keyword abstraction system |
Cited By (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165950A (en) * | 2018-08-10 | 2019-01-08 | 哈尔滨工业大学(威海) | A kind of abnormal transaction identification method based on financial time series feature, equipment and readable storage medium storing program for executing |
CN109299470A (en) * | 2018-11-01 | 2019-02-01 | 成都数联铭品科技有限公司 | The abstracting method and system of trigger word in textual announcement |
CN109299470B (en) * | 2018-11-01 | 2024-02-09 | 成都数联铭品科技有限公司 | Method and system for extracting trigger words in text bulletin |
CN109670035A (en) * | 2018-12-03 | 2019-04-23 | 科大讯飞股份有限公司 | A kind of text snippet generation method |
WO2020155769A1 (en) * | 2019-01-30 | 2020-08-06 | 平安科技(深圳)有限公司 | Method and device for establishing keyword generation model |
CN109948089A (en) * | 2019-02-21 | 2019-06-28 | 中国海洋大学 | A kind of method and device for extracting Web page text |
CN109992774A (en) * | 2019-03-25 | 2019-07-09 | 北京理工大学 | The key phrase recognition methods of word-based attribute attention mechanism |
CN109933806A (en) * | 2019-04-01 | 2019-06-25 | 长沙理工大学 | A kind of repetition generation method, system, equipment and computer readable storage medium |
CN109933806B (en) * | 2019-04-01 | 2024-01-30 | 长沙理工大学 | Method, system, equipment and computer readable storage medium for generating duplicate description |
CN110069611A (en) * | 2019-04-12 | 2019-07-30 | 武汉大学 | A kind of the chat robots reply generation method and device of theme enhancing |
CN110069611B (en) * | 2019-04-12 | 2021-05-04 | 武汉大学 | Topic-enhanced chat robot reply generation method and device |
CN110119765A (en) * | 2019-04-18 | 2019-08-13 | 浙江工业大学 | A kind of keyword extracting method based on Seq2seq frame |
CN111859940B (en) * | 2019-04-23 | 2024-05-14 | 北京嘀嘀无限科技发展有限公司 | Keyword extraction method and device, electronic equipment and storage medium |
CN111859940A (en) * | 2019-04-23 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | Keyword extraction method and device, electronic equipment and storage medium |
CN110110330B (en) * | 2019-04-30 | 2023-08-11 | 腾讯科技(深圳)有限公司 | Keyword extraction method based on text and computer equipment |
CN110110330A (en) * | 2019-04-30 | 2019-08-09 | 腾讯科技(深圳)有限公司 | Text based keyword extracting method and computer equipment |
CN110263122A (en) * | 2019-05-08 | 2019-09-20 | 北京奇艺世纪科技有限公司 | A kind of keyword acquisition methods, device and computer readable storage medium |
CN112446206A (en) * | 2019-08-16 | 2021-03-05 | 阿里巴巴集团控股有限公司 | Menu title generation method and device |
CN110991612A (en) * | 2019-11-29 | 2020-04-10 | 交通银行股份有限公司 | Message analysis method of international routine real-time reasoning model based on word vector |
CN111192567A (en) * | 2019-12-27 | 2020-05-22 | 青岛海信智慧家居系统股份有限公司 | Method and device for generating interaction information of intelligent equipment |
CN111477320A (en) * | 2020-03-11 | 2020-07-31 | 北京大学第三医院(北京大学第三临床医学院) | Construction system of treatment effect prediction model, treatment effect prediction system and terminal |
CN111477320B (en) * | 2020-03-11 | 2023-05-30 | 北京大学第三医院(北京大学第三临床医学院) | Treatment effect prediction model construction system, treatment effect prediction system and terminal |
CN111737401A (en) * | 2020-06-22 | 2020-10-02 | 首都师范大学 | Key phrase prediction method based on Seq2set2Seq framework |
CN111737401B (en) * | 2020-06-22 | 2023-03-24 | 北方工业大学 | Key phrase prediction method based on Seq2set2Seq framework |
CN112163405A (en) * | 2020-09-08 | 2021-01-01 | 北京百度网讯科技有限公司 | Question generation method and device |
CN112464656A (en) * | 2020-11-30 | 2021-03-09 | 科大讯飞股份有限公司 | Keyword extraction method and device, electronic equipment and storage medium |
CN112464656B (en) * | 2020-11-30 | 2024-02-13 | 中国科学技术大学 | Keyword extraction method, keyword extraction device, electronic equipment and storage medium |
CN112800757B (en) * | 2021-04-06 | 2021-07-09 | 杭州远传新业科技有限公司 | Keyword generation method, device, equipment and medium |
CN112800757A (en) * | 2021-04-06 | 2021-05-14 | 杭州远传新业科技有限公司 | Keyword generation method, device, equipment and medium |
WO2023060795A1 (en) * | 2021-10-12 | 2023-04-20 | 平安科技(深圳)有限公司 | Automatic keyword extraction method and apparatus, and device and storage medium |
CN114021440B (en) * | 2021-10-28 | 2022-07-12 | 中航机载系统共性技术有限公司 | FPGA (field programmable Gate array) time sequence simulation verification method and device based on MATLAB (matrix laboratory) |
CN114021440A (en) * | 2021-10-28 | 2022-02-08 | 中航机载系统共性技术有限公司 | FPGA (field programmable Gate array) time sequence simulation verification method and device based on MATLAB (matrix laboratory) |
CN115809665A (en) * | 2022-12-13 | 2023-03-17 | 杭州电子科技大学 | Unsupervised keyword extraction method based on bidirectional multi-granularity attention mechanism |
CN115809665B (en) * | 2022-12-13 | 2023-07-11 | 杭州电子科技大学 | Unsupervised keyword extraction method based on bidirectional multi-granularity attention mechanism |
CN116011633A (en) * | 2022-12-23 | 2023-04-25 | 浙江苍南仪表集团股份有限公司 | Regional gas consumption prediction method, regional gas consumption prediction system, regional gas consumption prediction equipment and Internet of things cloud platform |
CN116011633B (en) * | 2022-12-23 | 2023-08-18 | 浙江苍南仪表集团股份有限公司 | Regional gas consumption prediction method, regional gas consumption prediction system, regional gas consumption prediction equipment and Internet of things cloud platform |
CN117150046A (en) * | 2023-09-12 | 2023-12-01 | 广东省华南技术转移中心有限公司 | Automatic task decomposition method and system based on context semantics |
CN117150046B (en) * | 2023-09-12 | 2024-03-15 | 广东省华南技术转移中心有限公司 | Automatic task decomposition method and system based on context semantics |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108376131A (en) | Keyword abstraction method based on seq2seq deep neural network models | |
CN103605665B (en) | Keyword based evaluation expert intelligent search and recommendation method | |
CN111460092B (en) | Multi-document-based automatic complex problem solving method | |
CN113591483A (en) | Document-level event argument extraction method based on sequence labeling | |
CN105528437B (en) | A kind of question answering system construction method extracted based on structured text knowledge | |
CN112818694A (en) | Named entity recognition method based on rules and improved pre-training model | |
JP5216063B2 (en) | Method and apparatus for determining categories of unregistered words | |
CN110888991B (en) | Sectional type semantic annotation method under weak annotation environment | |
CN112307182B (en) | Question-answering system-based pseudo-correlation feedback extended query method | |
CN108710672B (en) | Theme crawler method based on incremental Bayesian algorithm | |
CN109670014A (en) | A kind of Authors of Science Articles name disambiguation method of rule-based matching and machine learning | |
Pérez-Sancho et al. | Genre classification using chords and stochastic language models | |
CN110569355B (en) | Viewpoint target extraction and target emotion classification combined method and system based on word blocks | |
CN113032552B (en) | Text abstract-based policy key point extraction method and system | |
CN110008309A (en) | A kind of short phrase picking method and device | |
CN111061939A (en) | Scientific research academic news keyword matching recommendation method based on deep learning | |
Haque et al. | Literature review of automatic single document text summarization using NLP | |
CN114997288A (en) | Design resource association method | |
CN112926340A (en) | Semantic matching model for knowledge point positioning | |
CN107562774A (en) | Generation method, system and the answering method and system of rare foreign languages word incorporation model | |
Rani et al. | Telugu text summarization using LSTM deep learning | |
Uzun et al. | Automatically discovering relevant images from web pages | |
CN111859090A (en) | Method for obtaining plagiarism source document based on local matching convolutional neural network model facing source retrieval | |
Hashim et al. | An implementation method for Arabic keyword tendency using decision tree | |
KR102724394B1 (en) | Method and apparatus for analyzing articles through keyword extraction, and computer programs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180807 |