CN108376131A - Keyword abstraction method based on seq2seq deep neural network models - Google Patents

Keyword abstraction method based on seq2seq deep neural network models Download PDF

Info

Publication number
CN108376131A
CN108376131A CN201810211285.1A CN201810211285A CN108376131A CN 108376131 A CN108376131 A CN 108376131A CN 201810211285 A CN201810211285 A CN 201810211285A CN 108376131 A CN108376131 A CN 108376131A
Authority
CN
China
Prior art keywords
word
document
neural network
keyword
term vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810211285.1A
Other languages
Chinese (zh)
Inventor
李弘艺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201810211285.1A priority Critical patent/CN108376131A/en
Publication of CN108376131A publication Critical patent/CN108376131A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Machine Translation (AREA)

Abstract

The present invention relates to computer realms,More particularly to a kind of keyword abstraction method being based on seq2seq (sequence to sequence sequences to sequence) deep neural network model,Target information is first passed through preprocessing module and extracted by the method,Term vector conversion module and part-of-speech tagging module are converted and are marked respectively again,It is then passed through candidate word weight computation module and obtains candidate word sequence,Keyword is obtained using candidate word screening module,The present invention by document vector by being considered as being averaged for term vector,Term vector and document vector are combined and indicated as the vector of word,Importance of each word for document can preferably be analyzed,The keyword of document purport can more be represented by selecting,The investigation range of keyword extraction is expanded simultaneously,Overcome the shortcomings that existing extraction technique cannot predict keyword other than vocabulary and the keyword not in source document content.

Description

Keyword abstraction method based on seq2seq deep neural network models
Technical field
The present invention relates to computer realms, and in particular to a kind of keyword based on seq2seq deep neural network models Abstracting method.
Background technology
Development with computer and network technologies and the arrival in big data epoch, digitized file is just with surprising Speed madness increase, a large amount of human contacts to information be all to exist with electrical file form.It is vast as the open sea in face of these Information, people can automatically identify the keyword that can most represent article purport there is an urgent need to machine, us helped quickly to understand Article main contents, to save reading, processing and utilize the time of these electronic documents.
These technologies are referred to as keyword abstraction (Keyword Extraction) at present, keyword abstraction refer to quickly from Multiple words or phrase that can represent document subject matter are obtained in document, as a kind of refining summary to the document main contents. The main contents of document can be understood quickly by keyword people, efficiently hold document subject matter.Keyword extensive use In fields such as news report, scientific papers, so as to allow people's efficiently management and retrieval document.Therefore, keyword abstraction Become an important research hot spot in text-processing field at present.
The starting point of current existing many webpage keywords extracting methods, these methods is mostly the appearance frequency of word Region residing for full text of rate, word, word itself semantic feature.The method of use substantially has following a few classes:Based on statistics Method, the method for machine learning, the method for natural language processing.
But all there is deficiencies for these methods, wherein to keyword extraction, evaluate the candidate keywords of text, and After sequence, keyword of the extraction top n word as webpage, but in this N number of keyword, and not all word be all really with The relevant keyword of text theme, and in the candidate keywords not being extracted, but still have some and text theme very phase The word of pass so that the accuracy rate and recall rate of keyword extraction be not high.
In searching keyword, there is the keyword of half not come from source document, and existing keyword extraction techniques Candidate keywords, therefore the unpredictable keyword not in source document can only be selected from source document, it can not will be in document The near synonym of word leverage the accuracy rate of keyword extraction as keyword.
Existing keyword extraction techniques also can only choose candidate word from the vocabulary of certain scale simultaneously, when document word When language scale is far more than vocabulary scale, cannot predict the word other than vocabulary then can influence the accuracy rate of keyword extraction.
Existing keyword abstraction method is when choosing candidate keywords, it will usually consider the feature that machine learning obtains, However these features can only carry out the importance that statistics finds each word by the frequency of occurrences to word in document, it can not It is enough to disclose the Complete Semantics being hidden in document content.
Invention content
The present invention is at least one defect (deficiency) overcome described in the above-mentioned prior art, provides one kind.
In order to solve the above technical problems, technical scheme is as follows:
One kind being based on the key of seq2seq (sequence to sequence sequences to sequence) deep neural network model Word abstracting method, the described method comprises the following steps:
S1. document to be extracted and corpus preprocessing module is imported to extract;
S2. pass through the information that preprocessing module is made to extract and respectively enter term vector conversion module and part-of-speech tagging module, Term vector conversion module carries out term vector conversion, and part-of-speech tagging is carried out in part-of-speech tagging module;
S3. the information for passing through term vector conversion and part-of-speech tagging enters candidate word weight computation module, obtains candidate word order Row;
S4. the candidate word sequence obtained enters candidate word screening module, obtains suitable keyword;
Further, the preprocessing module to corpus and document to be extracted carry out Chinese word segmentation, English stem extraction with And removal stop words, wherein Chinese word segmentation enter term vector conversion module, English stem enters part-of-speech tagging module.
Further, the term vector conversion module converts the word handled by preprocessing module to term vector The form of (word embedding), term vector conversion module are used in term vector representational framework word2vec (word to Vector words to vector) on the basis of random selection document in word seek mean value, as document vector indicate, then by word Vector sum document vector participates in training and predicts as a whole.
Further, the term vector conversion module indicates to obtain certain using context and document information using following formula The probability of one word:
Wherein, c is context term vector, and x is document vector, and U is mapping matrix of the neural network input layer to hidden layer, V is mapping matrix of the neural network hidden layer to output layer, and w is prediction target word, and T is the length of document.
Further, the document vector x is handled by following formula, the seq2seq deep neural network models with Probability q abandons the component of the term vector of certain word, while in order to avoid generating deviation, the dimension of reservation being normalized:
Term vector modulus of conversion model optimization object function in the block is represented by following formula, first item be still for Determine context and document semantic observes the likelihood function of target word, and Section 2 is the regularization of data dependence:
Further, the regularization of data dependence is indicated using following formula, becomes regular terms,
Wherein σ is considered as logistic regression.For the regular terms, it tends to punish high word frequency word.Because for high frequency It is sampled that probability is higher to word, and the value of regular terms is also bigger, and for σ (1- σ) this coefficient, when term vector is more accurate, When the probability of prediction is bigger, this also can be smaller, this also just demonstrates the regular terms from mathematical angle can play Optimized model Effect.
Further, the part-of-speech tagging module passes through NLTK (the natural language toolkit in the libraries python Natural language processing tool) it wraps to the word progress part-of-speech tagging after preprocessing module.
Further, the candidate word weight computation module is seq2seq models, and the seq2seq models include encoder And decoder, outputting and inputting for the seq2seq models is all a sequence, and the length for outputting and inputting sequence be can Become, the encoder and decoder are Recognition with Recurrent Neural Network (RNN).
Further, attention mechanism and replicanism are added in the Recognition with Recurrent Neural Network so that pass through the neural network The keyword other than vocabulary and source document can be predicted, therefore predicts that the probability of word is represented by following formula:
p(yt|y1..., t-1, x)
=pg(yt|y1..., t-1, x)+pc(yt|y1..., t-1, x)
First item is the predictor formula of traditional Recognition with Recurrent Neural Network, i.e., by softmax graders according to hidden layer State and it is predicted that word export the probability of all current all words;Section 2 is that replicanism considers in document each The importance of word, can be expressed as following formula:
ψ is the set of all words in source document, and σ is a nonlinear function, and Wc is a trained parameter matrix, Z is the weighted sum of all scores, is used for regularization.
Compared with prior art, the advantageous effect of technical solution of the present invention is:
(1) present invention is in term vector conversion module, by the way that document vector is considered as being averaged for term vector, and word here It is vectorial then be it is randomly selected from document, by term vector and document vector combine as word vector indicate, to Semanteme of the word in different context of co-texts is considered, importance of each word for document can be preferably analyzed, is selected The keyword of document purport can more be represented by selecting out.
(2) since the present invention is in candidate word chooses module, input of the term vector as module of document information is added, More external informations are introduced for the keyword abstraction of document to be extracted, while adding attention mechanism and replicanism, from And expanded the investigation range of keyword extraction, overcome existing extraction technique cannot predict keyword other than vocabulary and The shortcomings that keyword not in source document content.
(3) accuracy rate for finding keyword is substantially increased, solving keyword in source document the problems in could not be chosen, Search area is expanded simultaneously, can then disclose the semanteme hidden in keyword.
Description of the drawings
Fig. 1 is the keyword abstraction method schematic diagram based on seq2seq deep neural network models.
Fig. 2 is the keyword abstraction method work flow diagram based on seq2seq deep neural network models.
Specific implementation mode
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;
To those skilled in the art, it is to be appreciated that certain known features and its explanation, which may be omitted, in attached drawing 's.
The following further describes the technical solution of the present invention with reference to the accompanying drawings and examples.
The present invention is based on a kind of keyword abstraction method of the proposition of seq2seq deep neural network models, method operations Step is as shown in Figure 1:
S1. document to be extracted and corpus preprocessing module is imported to extract.
S2. pass through the information that preprocessing module is made to extract and respectively enter term vector conversion module and part-of-speech tagging module, Term vector conversion module carries out term vector conversion, and part-of-speech tagging is carried out in part-of-speech tagging module.
S3. the information for passing through term vector conversion and part-of-speech tagging enters candidate word weight computation module, obtains candidate word order Row.
S4. the candidate word sequence obtained enters candidate word screening module, obtains suitable keyword
Wherein, preprocessing module carries out Chinese word segmentation, English stem extraction and removal to corpus and document to be extracted Stop words, wherein Chinese word segmentation enter term vector conversion module, and English stem enters part-of-speech tagging module.
Term vector conversion module converts the word handled by preprocessing module to term vector (word Embedding form), the technology that term vector conversion module uses and existing term vector representational framework word2vec (word To vector words to vector) it is similar, be on the basis of word2vec randomly choose document in word seek mean value, as Document vector indicates, then term vector and document vector are participated in training as a whole and predicted.It can see the vector The part that representation method not only indicates document semantic as vector, while trained ginseng is reduced by randomly selected mechanism Number, greatly reduces trained complexity.And the mechanism of random selection is also a kind of method of regularization, can be to the pre- of model It surveys effect and plays optimization function.
Term vector conversion module indicates to obtain the probability of a certain word using context and document information using formula 1:
Wherein, c is context term vector, and x is document vector, and U is mapping matrix of the neural network input layer to hidden layer, V is mapping matrix of the neural network hidden layer to output layer, and w is prediction target word, and T is the length of document.
Term vector modulus of conversion document vector x in the blockdIt is handled by formula 2, which abandons certain word with probability q The component of term vector, while in order to avoid generating deviation, the dimension of reservation is normalized:
Term vector modulus of conversion model optimization object function in the block is represented by formula 3, and first item is still for given Context and document semantic observe the likelihood function of target word, and Section 2 is the regularization of data dependence, becomes regularization,
Regular terms can be equivalent to formula 4, and wherein σ is considered as logistic regression.For the regular terms, it is high that it tends to punishment Word frequency word.Because it is sampled that probability is higher for high frequency words, the value of regular terms is also bigger.And for σ (1- σ), this is Number, when term vector is more accurate, when the probability of prediction is bigger, this also can be smaller, this also just demonstrates the canonical from mathematical angle Item can play the role of Optimized model.
Part-of-speech tagging module is by the NLTK natural language processing kits in the libraries python to after preprocessing module Word carry out part-of-speech tagging.
Candidate word weight computation module is a seq2seq model, and outputting and inputting for the model is all a sequence, and And the length for outputting and inputting sequence is variable.Seq2seq models are made of an encoder and a decoder, I Using Recognition with Recurrent Neural Network (RNN) be used as encoder and decoder.We input each of sentence into encoder Then word exports the semantic vector of entire sentence.Because the characteristics of Recognition with Recurrent Neural Network is exactly to consider each step in front Input information, so the semantic vector theoretically exported can include the information of entire sentence, we can be this semanteme Vector treats as a semantic expressiveness of this sentence, that is, a sentence vector.Then in a decoder, we are according to encoder Obtained sentence vector, gradually comes out information analysis therein is lain in.
By the way that attention mechanism and replicanism are added in the Recognition with Recurrent Neural Network in candidate word weight computation module, make The keyword other than vocabulary and source document can be predicted by the neural network by obtaining, therefore predict that the probability of word can indicate For formula 5:
p(yt|y1..., t-1, x)
=pg(yt|g1..., t-1, x)+pc(yt|y1..., t-1, x) (formula 5)
First item is the predictor formula of traditional Recognition with Recurrent Neural Network, i.e., by softmax graders according to hidden layer State and it is predicted that word export the probability of all current all words;Section 2 is that replicanism considers in document each The importance of word can be expressed as formula 6:
ψ is the set of all words in source document, and σ is a nonlinear function, and wc is a trained parameter matrix, Z is the weighted sum of all scores, is used for regularization.
Therefore the decoder of candidate word weight computation module and traditional RNN structures are the difference is that with lower part: When generating word, there are both of which, and one is the patterns of generation, and one is copy mode, final mask is by a selection net The probabilistic model of network combination both of which is in then similar to the decoder of traditional RNN when generation pattern, generates a word, And in copy mode when then from position softmax obtain word input terminal location information;With the t-1 moment when state updates The word predicted update the state of t moment, and consider the hidden state of specific position in word matrix.
Candidate word screening module screens the candidate word sequence that candidate word weight computation module obtains, including filters out The keyword of suitable part of speech, while all keywords being made of number and single character are excluded, and belong to other keys Word prefix or the keyword repeated.
Embodiment 1
By in the keyword extraction system of the selected text input present invention, keyword abstraction experiment is carried out, as shown in Fig. 2, “Towards content-based relevance ranking for video search.Most existing web Video search engines index videos by file names, URLs, and surrounding texts.These type of video metadata roughly describe the whole video in an Abstract level without taking the rich content, such as semantic content descriptions and speech within the video.In this paper we propose a novel relevance ranking approach for Web-based video search using both video Metadata and rich content.To leverage real content into ranking, the videos Are segmented into shots, which are smaller and more semantic-meaningful Retrievable units.With video metadata and content information of shots, we Developed an integrated ranking approach, which achieves improved ranking Performance. " by segmenting with after part-of-speech tagging the reservation part of speech of acquiescence is arranged, by this system and tradition RNN models in Obtained keyword is compared in terms of the keyword two from positioned at source document and other than source document respectively, obtains result such as Under, the keyword of wherein benchmark result is:Video metadata, integrated ranking, relevance Ranking, content based ranking, video segmentation:
1. the keyword in source document:1.information retrieval;2.video search; 3.ranking;4.relevance ranking;5.relevance ranking;6.video metadata; 7.intergrated ranking;8.web video;9.web video search;10.rich content
2. the keyword other than source document:1.video retrieval;2.web search;3.content ranking;4.content based retrieval;5.content retrieval;6.video indexing; 7.relevance feedback;8.content based ranking;9.semantic web;10.video segmentation
Embodiment 2
Multiple existing keyword abstraction algorithms are compared, using F values as performance indicator, predict preceding 5,10 keys Word, it is as a result as follows.It can be seen that it is proposed that keyword abstraction algorithm and model (CopyRNN tape copy mechanism cycle god Through network) performance is best on each data set.
Embodiment 3
Extraction experiment is carried out for the keyword other than source document, source document is predicted since other algorithms are unable to get Keyword in addition predicts preceding 10,50 keys because being only compared with using the algorithm of traditional Recognition with Recurrent Neural Network Word, and using recall rate as evaluation metrics, it is as a result as follows.It can be seen that it is proposed that keyword abstraction algorithm and model (CopyRNN) all highers of the recall rate on each data set illustrate that the algorithm can be predicted more accurately other than source document Keyword.
It can be seen that the keyword abstraction system that the invention proposes can not only extract the keyword being present in source document, And also have preferable prediction effect for the keyword other than source document, compare existing keyword abstraction technology, this hair The result that bright system is realized is more rationally and efficient.

Claims (9)

1. a kind of keyword abstraction method based on seq2seq deep neural network models, which is characterized in that the method includes Following steps:
S1. document to be extracted and corpus preprocessing module is imported to extract;
S2. pass through preprocessing module and make the information extracted and respectively enter term vector conversion module and part-of-speech tagging module, word to It measures conversion module and carries out term vector conversion, part-of-speech tagging is carried out in part-of-speech tagging module;
S3. the information for passing through term vector conversion and part-of-speech tagging enters candidate word weight computation module, obtains candidate word sequence.
S4. the candidate word sequence obtained enters candidate word screening module, obtains suitable keyword.
2. the keyword abstraction method according to claim 1 based on seq2seq deep neural network models, feature exist In the preprocessing module carries out Chinese word segmentation, English stem extraction and removal stop words to corpus and document to be extracted.
3. the keyword abstraction method according to claim 1 based on seq2seq deep neural network models, feature exist In the term vector conversion module converts the word handled by preprocessing module to the form of term vector, term vector Conversion module seeks mean value using the word in random selection document on the basis of term vector representational framework word2vec, as text Shelves vector indicates, then term vector and document vector are participated in training as a whole and predicted.
4. the keyword abstraction method according to claim 3 based on seq2seq deep neural network models, feature exist In the term vector conversion module indicates to obtain the general of a certain word using context and document information using following formula Rate:
Wherein, c is context term vector, and x is document vector, and U is mapping matrix of the neural network input layer to hidden layer, and v is Neural network hidden layer is to the mapping matrix of output layer, and w is prediction target word, and T is the length of document.
5. the keyword abstraction method according to claim 4 based on seq2seq deep neural network models, feature exist In, the document vector x is handled by following formula, abandons the component of the term vector of certain word with probability q, while in order to It avoids generating deviation, the dimension of reservation is normalized:
Term vector modulus of conversion model optimization object function in the block is represented by following formula:
First item is to observe the likelihood function of target word for giving context and document semantic, and Section 2 is data dependence Regularization.
6. the keyword abstraction method according to claim 5 based on seq2seq deep neural network models, feature exist In the regularization of the data dependence is indicated using following formula, referred to as regular terms
Wherein σ is considered as logistic regression, and for regular terms, it tends to punish high word frequency word, because for its quilt of high frequency words Sampling probability is higher, and the value of regular terms is also bigger, and for σ (1- σ) this coefficient, when term vector is more accurate, prediction When probability is bigger, regular terms also can be smaller.
7. the keyword abstraction method according to claim 1 based on seq2seq deep neural network models, feature exist In the part-of-speech tagging module is by the NLTK natural language processing kits in the libraries python to after preprocessing module Word carries out part-of-speech tagging.
8. the keyword abstraction method according to claim 1 based on seq2seq deep neural network models, feature exist In the candidate word weight computation module is seq2seq models, and the seq2seq models include encoder and decoder, described Outputting and inputting for seq2seq models is all a sequence, and the length for outputting and inputting sequence is variable, the coding Device and decoder are Recognition with Recurrent Neural Network.
9. the keyword abstraction method according to claim 1 based on seq2seq deep neural network models, feature exist In addition attention mechanism and replicanism in the Recognition with Recurrent Neural Network so that can predict word by the neural network Keyword other than remittance table and source document, predicts that the probability of word is represented by following formula:
p(yt|y1 ..., t-1, x)
=pg(yt|y1 ..., t-1, x) and+pc(yt|y1 ..., t-1, x)
First item is the predictor formula of traditional Recognition with Recurrent Neural Network, i.e., by softmax graders according to the state of hidden layer With it is predicted that word export the probability of all current all words;Section 2 is that replicanism considers each word in document Importance, can be expressed as following formula:
ψ is the set of all words in source document, and σ is a nonlinear function, and Wc is a trained parameter matrix, and Z is The weighted sum of all scores is used for regularization.
CN201810211285.1A 2018-03-14 2018-03-14 Keyword abstraction method based on seq2seq deep neural network models Pending CN108376131A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810211285.1A CN108376131A (en) 2018-03-14 2018-03-14 Keyword abstraction method based on seq2seq deep neural network models

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810211285.1A CN108376131A (en) 2018-03-14 2018-03-14 Keyword abstraction method based on seq2seq deep neural network models

Publications (1)

Publication Number Publication Date
CN108376131A true CN108376131A (en) 2018-08-07

Family

ID=63018752

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810211285.1A Pending CN108376131A (en) 2018-03-14 2018-03-14 Keyword abstraction method based on seq2seq deep neural network models

Country Status (1)

Country Link
CN (1) CN108376131A (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165950A (en) * 2018-08-10 2019-01-08 哈尔滨工业大学(威海) A kind of abnormal transaction identification method based on financial time series feature, equipment and readable storage medium storing program for executing
CN109299470A (en) * 2018-11-01 2019-02-01 成都数联铭品科技有限公司 The abstracting method and system of trigger word in textual announcement
CN109670035A (en) * 2018-12-03 2019-04-23 科大讯飞股份有限公司 A kind of text snippet generation method
CN109933806A (en) * 2019-04-01 2019-06-25 长沙理工大学 A kind of repetition generation method, system, equipment and computer readable storage medium
CN109948089A (en) * 2019-02-21 2019-06-28 中国海洋大学 A kind of method and device for extracting Web page text
CN109992774A (en) * 2019-03-25 2019-07-09 北京理工大学 The key phrase recognition methods of word-based attribute attention mechanism
CN110069611A (en) * 2019-04-12 2019-07-30 武汉大学 A kind of the chat robots reply generation method and device of theme enhancing
CN110110330A (en) * 2019-04-30 2019-08-09 腾讯科技(深圳)有限公司 Text based keyword extracting method and computer equipment
CN110119765A (en) * 2019-04-18 2019-08-13 浙江工业大学 A kind of keyword extracting method based on Seq2seq frame
CN110263122A (en) * 2019-05-08 2019-09-20 北京奇艺世纪科技有限公司 A kind of keyword acquisition methods, device and computer readable storage medium
CN110991612A (en) * 2019-11-29 2020-04-10 交通银行股份有限公司 Message analysis method of international routine real-time reasoning model based on word vector
CN111192567A (en) * 2019-12-27 2020-05-22 青岛海信智慧家居系统股份有限公司 Method and device for generating interaction information of intelligent equipment
CN111477320A (en) * 2020-03-11 2020-07-31 北京大学第三医院(北京大学第三临床医学院) Construction system of treatment effect prediction model, treatment effect prediction system and terminal
WO2020155769A1 (en) * 2019-01-30 2020-08-06 平安科技(深圳)有限公司 Method and device for establishing keyword generation model
CN111737401A (en) * 2020-06-22 2020-10-02 首都师范大学 Key phrase prediction method based on Seq2set2Seq framework
CN111859940A (en) * 2019-04-23 2020-10-30 北京嘀嘀无限科技发展有限公司 Keyword extraction method and device, electronic equipment and storage medium
CN112163405A (en) * 2020-09-08 2021-01-01 北京百度网讯科技有限公司 Question generation method and device
CN112446206A (en) * 2019-08-16 2021-03-05 阿里巴巴集团控股有限公司 Menu title generation method and device
CN112464656A (en) * 2020-11-30 2021-03-09 科大讯飞股份有限公司 Keyword extraction method and device, electronic equipment and storage medium
CN112800757A (en) * 2021-04-06 2021-05-14 杭州远传新业科技有限公司 Keyword generation method, device, equipment and medium
CN114021440A (en) * 2021-10-28 2022-02-08 中航机载系统共性技术有限公司 FPGA (field programmable Gate array) time sequence simulation verification method and device based on MATLAB (matrix laboratory)
CN115809665A (en) * 2022-12-13 2023-03-17 杭州电子科技大学 Unsupervised keyword extraction method based on bidirectional multi-granularity attention mechanism
WO2023060795A1 (en) * 2021-10-12 2023-04-20 平安科技(深圳)有限公司 Automatic keyword extraction method and apparatus, and device and storage medium
CN116011633A (en) * 2022-12-23 2023-04-25 浙江苍南仪表集团股份有限公司 Regional gas consumption prediction method, regional gas consumption prediction system, regional gas consumption prediction equipment and Internet of things cloud platform
CN117150046A (en) * 2023-09-12 2023-12-01 广东省华南技术转移中心有限公司 Automatic task decomposition method and system based on context semantics

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893410A (en) * 2015-11-18 2016-08-24 乐视网信息技术(北京)股份有限公司 Keyword extraction method and apparatus
US20170091318A1 (en) * 2015-09-29 2017-03-30 Kabushiki Kaisha Toshiba Apparatus and method for extracting keywords from a single document
CN106919646A (en) * 2017-01-18 2017-07-04 南京云思创智信息科技有限公司 Chinese text summarization generation system and method
CN106997344A (en) * 2017-03-31 2017-08-01 成都数联铭品科技有限公司 Keyword abstraction system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170091318A1 (en) * 2015-09-29 2017-03-30 Kabushiki Kaisha Toshiba Apparatus and method for extracting keywords from a single document
CN105893410A (en) * 2015-11-18 2016-08-24 乐视网信息技术(北京)股份有限公司 Keyword extraction method and apparatus
CN106919646A (en) * 2017-01-18 2017-07-04 南京云思创智信息科技有限公司 Chinese text summarization generation system and method
CN106997344A (en) * 2017-03-31 2017-08-01 成都数联铭品科技有限公司 Keyword abstraction system

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165950A (en) * 2018-08-10 2019-01-08 哈尔滨工业大学(威海) A kind of abnormal transaction identification method based on financial time series feature, equipment and readable storage medium storing program for executing
CN109299470A (en) * 2018-11-01 2019-02-01 成都数联铭品科技有限公司 The abstracting method and system of trigger word in textual announcement
CN109299470B (en) * 2018-11-01 2024-02-09 成都数联铭品科技有限公司 Method and system for extracting trigger words in text bulletin
CN109670035A (en) * 2018-12-03 2019-04-23 科大讯飞股份有限公司 A kind of text snippet generation method
WO2020155769A1 (en) * 2019-01-30 2020-08-06 平安科技(深圳)有限公司 Method and device for establishing keyword generation model
CN109948089A (en) * 2019-02-21 2019-06-28 中国海洋大学 A kind of method and device for extracting Web page text
CN109992774A (en) * 2019-03-25 2019-07-09 北京理工大学 The key phrase recognition methods of word-based attribute attention mechanism
CN109933806A (en) * 2019-04-01 2019-06-25 长沙理工大学 A kind of repetition generation method, system, equipment and computer readable storage medium
CN109933806B (en) * 2019-04-01 2024-01-30 长沙理工大学 Method, system, equipment and computer readable storage medium for generating duplicate description
CN110069611A (en) * 2019-04-12 2019-07-30 武汉大学 A kind of the chat robots reply generation method and device of theme enhancing
CN110069611B (en) * 2019-04-12 2021-05-04 武汉大学 Topic-enhanced chat robot reply generation method and device
CN110119765A (en) * 2019-04-18 2019-08-13 浙江工业大学 A kind of keyword extracting method based on Seq2seq frame
CN111859940B (en) * 2019-04-23 2024-05-14 北京嘀嘀无限科技发展有限公司 Keyword extraction method and device, electronic equipment and storage medium
CN111859940A (en) * 2019-04-23 2020-10-30 北京嘀嘀无限科技发展有限公司 Keyword extraction method and device, electronic equipment and storage medium
CN110110330B (en) * 2019-04-30 2023-08-11 腾讯科技(深圳)有限公司 Keyword extraction method based on text and computer equipment
CN110110330A (en) * 2019-04-30 2019-08-09 腾讯科技(深圳)有限公司 Text based keyword extracting method and computer equipment
CN110263122A (en) * 2019-05-08 2019-09-20 北京奇艺世纪科技有限公司 A kind of keyword acquisition methods, device and computer readable storage medium
CN112446206A (en) * 2019-08-16 2021-03-05 阿里巴巴集团控股有限公司 Menu title generation method and device
CN110991612A (en) * 2019-11-29 2020-04-10 交通银行股份有限公司 Message analysis method of international routine real-time reasoning model based on word vector
CN111192567A (en) * 2019-12-27 2020-05-22 青岛海信智慧家居系统股份有限公司 Method and device for generating interaction information of intelligent equipment
CN111477320A (en) * 2020-03-11 2020-07-31 北京大学第三医院(北京大学第三临床医学院) Construction system of treatment effect prediction model, treatment effect prediction system and terminal
CN111477320B (en) * 2020-03-11 2023-05-30 北京大学第三医院(北京大学第三临床医学院) Treatment effect prediction model construction system, treatment effect prediction system and terminal
CN111737401A (en) * 2020-06-22 2020-10-02 首都师范大学 Key phrase prediction method based on Seq2set2Seq framework
CN111737401B (en) * 2020-06-22 2023-03-24 北方工业大学 Key phrase prediction method based on Seq2set2Seq framework
CN112163405A (en) * 2020-09-08 2021-01-01 北京百度网讯科技有限公司 Question generation method and device
CN112464656A (en) * 2020-11-30 2021-03-09 科大讯飞股份有限公司 Keyword extraction method and device, electronic equipment and storage medium
CN112464656B (en) * 2020-11-30 2024-02-13 中国科学技术大学 Keyword extraction method, keyword extraction device, electronic equipment and storage medium
CN112800757B (en) * 2021-04-06 2021-07-09 杭州远传新业科技有限公司 Keyword generation method, device, equipment and medium
CN112800757A (en) * 2021-04-06 2021-05-14 杭州远传新业科技有限公司 Keyword generation method, device, equipment and medium
WO2023060795A1 (en) * 2021-10-12 2023-04-20 平安科技(深圳)有限公司 Automatic keyword extraction method and apparatus, and device and storage medium
CN114021440B (en) * 2021-10-28 2022-07-12 中航机载系统共性技术有限公司 FPGA (field programmable Gate array) time sequence simulation verification method and device based on MATLAB (matrix laboratory)
CN114021440A (en) * 2021-10-28 2022-02-08 中航机载系统共性技术有限公司 FPGA (field programmable Gate array) time sequence simulation verification method and device based on MATLAB (matrix laboratory)
CN115809665A (en) * 2022-12-13 2023-03-17 杭州电子科技大学 Unsupervised keyword extraction method based on bidirectional multi-granularity attention mechanism
CN115809665B (en) * 2022-12-13 2023-07-11 杭州电子科技大学 Unsupervised keyword extraction method based on bidirectional multi-granularity attention mechanism
CN116011633A (en) * 2022-12-23 2023-04-25 浙江苍南仪表集团股份有限公司 Regional gas consumption prediction method, regional gas consumption prediction system, regional gas consumption prediction equipment and Internet of things cloud platform
CN116011633B (en) * 2022-12-23 2023-08-18 浙江苍南仪表集团股份有限公司 Regional gas consumption prediction method, regional gas consumption prediction system, regional gas consumption prediction equipment and Internet of things cloud platform
CN117150046A (en) * 2023-09-12 2023-12-01 广东省华南技术转移中心有限公司 Automatic task decomposition method and system based on context semantics
CN117150046B (en) * 2023-09-12 2024-03-15 广东省华南技术转移中心有限公司 Automatic task decomposition method and system based on context semantics

Similar Documents

Publication Publication Date Title
CN108376131A (en) Keyword abstraction method based on seq2seq deep neural network models
CN103605665B (en) Keyword based evaluation expert intelligent search and recommendation method
CN111460092B (en) Multi-document-based automatic complex problem solving method
CN113591483A (en) Document-level event argument extraction method based on sequence labeling
CN105528437B (en) A kind of question answering system construction method extracted based on structured text knowledge
CN112818694A (en) Named entity recognition method based on rules and improved pre-training model
JP5216063B2 (en) Method and apparatus for determining categories of unregistered words
CN110888991B (en) Sectional type semantic annotation method under weak annotation environment
CN112307182B (en) Question-answering system-based pseudo-correlation feedback extended query method
CN108710672B (en) Theme crawler method based on incremental Bayesian algorithm
CN109670014A (en) A kind of Authors of Science Articles name disambiguation method of rule-based matching and machine learning
Pérez-Sancho et al. Genre classification using chords and stochastic language models
CN110569355B (en) Viewpoint target extraction and target emotion classification combined method and system based on word blocks
CN113032552B (en) Text abstract-based policy key point extraction method and system
CN110008309A (en) A kind of short phrase picking method and device
CN111061939A (en) Scientific research academic news keyword matching recommendation method based on deep learning
Haque et al. Literature review of automatic single document text summarization using NLP
CN114997288A (en) Design resource association method
CN112926340A (en) Semantic matching model for knowledge point positioning
CN107562774A (en) Generation method, system and the answering method and system of rare foreign languages word incorporation model
Rani et al. Telugu text summarization using LSTM deep learning
Uzun et al. Automatically discovering relevant images from web pages
CN111859090A (en) Method for obtaining plagiarism source document based on local matching convolutional neural network model facing source retrieval
Hashim et al. An implementation method for Arabic keyword tendency using decision tree
KR102724394B1 (en) Method and apparatus for analyzing articles through keyword extraction, and computer programs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180807