CN110532390A - A kind of news keyword extracting method based on NER and Complex Networks Feature - Google Patents
A kind of news keyword extracting method based on NER and Complex Networks Feature Download PDFInfo
- Publication number
- CN110532390A CN110532390A CN201910790303.0A CN201910790303A CN110532390A CN 110532390 A CN110532390 A CN 110532390A CN 201910790303 A CN201910790303 A CN 201910790303A CN 110532390 A CN110532390 A CN 110532390A
- Authority
- CN
- China
- Prior art keywords
- node
- word
- centrality
- ner
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The news keyword extracting method based on NER and Complex Networks Feature that the invention discloses a kind of, Entity recognition (Named Entities Recognition will be named, NER) with natural language in complex network (Complex Networks, CN) characteristic combines, propose novel keyword abstraction method --- the method (NER-CN) based on NER combination Complex Networks Feature, this method, which is first labeled sentence, is named Entity recognition analysis, then it constructs text complex network and keyword abstraction is carried out according to the global metric of the node in network.Keyword abstraction method proposed by the invention, in the accurate rate of text classification, has significant raising in the indexs such as recall rate and F1 value compared to conventional method.
Description
Technical field
The network characteristic of word, will name Entity recognition (Named Entities in the Chinese newsletter archive of present invention research
Recognition, NER) it combines, proposes with complex network (Complex Networks, CN) characteristic in natural language
Novel keyword abstraction algorithm --- the algorithm (NER-CN) based on NER combination Complex Networks Feature belongs to NLP technology neck
Domain.
Background technique
In recent years, since data explosion formula increases and the promotion of computing capability, how quick from mass data user is
Extracting useful information just has higher technical requirements.And the feature selecting of text is as the important link in text analyzing,
Its performance is just particularly important classifying quality.
Traditional text feature has TF-IDF, TextRank, LDA, information gain etc..But due to language itself
Complexity, when extracting text feature using these methods, be easy to ignore text self structure information and manufacture bulk redundancy letter
Breath.In order to retain the structural information in text, the community information in natural language is mapped in complex network with semantic structure
Become burning hot research direction.
Keyword extraction techniques are the bases of natural language processing field, in recent years, are had both at home and abroad to it more deep
Research.File: Amancio D R.Probing the Topological Properties of Complex
Networks Modeling Short Written Texts [J] .PLoS One, 2014,10 (2): e0118394 is proposed can be with
Short text is analyzed with the method for complex network and concept, Ziwen sheet is extracted with reasonable manner, word of the building based on grammer is total
Existing network analyzes the complex network characteristic of dynamic short text, and passes through the text classification experimental verification of SVM algorithm this method
Superiority.File: De Arruda H F, Costa L D F, Amancio D R.Using complex networks for
text classification:Discriminating informative and imaginative documents[J]
.EPL (Europhysics Letters), 2016,113 (2): 28007 inquired into how to be efficiently used in classification task from
Feature obtained in text retrieval conference TREC, they have carried out supervised classification, it is intended to distinguish information and image file, use description function
Local topology/dynamic characteristic network model of energy word, substantially increases the accuracy of text classification.File: Tang Jun complex web
Application [J] Yunnan Institute for nationalities journal (natural science edition) of the network in news web page keyword extraction, 2012,21 (4) will add
The characteristics of characteristic of power complex network is introduced into this link of keyword extraction, analyzes news web page document and node weight
Weight, describes the cluster coefficients and central part of directed networks weight, the advantages of using traditional algorithm, proposes a kind of improved
News keyword method is automatically extracted, experiments have shown that the algorithm is feasible.Existing file: Zhan Z J, Lin F, Yang X
P.Keyword Extraction of Document Based on Weighted Complex Network[J]
.Advanced Materials Research, 2011,403-408:2146-2151 have studied the complex network of Chinese composition
Feature proposes a kind of automatic keyword extraction algorithm of the Chinese document based on Complex Networks Feature, according in linguistic network
Theoretical result in worldlet structure and complex network, the characteristics extraction based on word node in document language network are crucial
Word, the experimental results showed that, which has higher mean accuracy compared to traditional TF-IDF algorithm.
Although the studies above improves the application of complex network in the text in certain level, still exist following
Problem: in the news report of specific topics, it often will appear the place name of some special entities, name and date etc., tradition
Keyword extraction algorithm can not effectively extract these entity informations.
Summary of the invention
Goal of the invention: in order to overcome the deficiencies in the prior art, the present invention provides a kind of based on NER and complex network
The news keyword extracting method of feature, the invention proposes for indicating the network model of relationship between document entity.First
Small-sized corpus is constructed, it is complicated according to the cooccurrence relation building text between syntactic relation and word to complete text training
Network.The degree centrality of comprehensive analysis node in a network creates network node overall situation metric formula close to centrality.It examines
Consider monosyllabic word in text to interfere keyword extraction, individual character word is first removed in dotcom world node and commonly deactivates
Word, the then global metric of calculate node.
Technical solution: to achieve the above object, the technical solution adopted by the present invention are as follows:
A kind of news keyword extracting method based on NER and Complex Networks Feature, comprising the following steps:
Step 1: collecting news content, news content is generated into listed files under original expectation file, setting filtering threshold
Value, the content that number of characters is less than filtering threshold are filtered, and for each corpus, canonical matches url and content;It is all
News content will be stored in respectively in each txt file according to the classification in url;
Step 2: the content in each txt file being segmented, removes stop words using stammerer participle;
Step 3: sentence being labeled using name entity recognition method neural network based, then carries out date knowledge
Not, name identification and place name identification, extract name entity important in text;
Step 4: building word complex network;The word that step 3 is obtained carries out digital coding, using coding result as section
Point, uses distance 2 as the distance of word association relationship, i.e. distance has even side between the word within 2, follows to each sentence
Ring judgement;
V={ v1,v2…vnIt is a set for having N number of node, (vi,vj) indicate node vi∈ V and vjSide between ∈ V, G
(V, E) be using V as node set, withFor the figure of line set, node viDegree centrality DCi
Are as follows:
Wherein, kiFor the degree of node, i.e., the number of coupled node, N is the number of nodes;
Calculate a node viInto network, the average value of the distance of all nodes, is denoted as di, that is, have:
Wherein, dijIt is node viTo node vjDistance, in network, distance average L between all nodes is with following
Formula is calculated:
By diInverse be defined as node viClose to centrality, with mark CCiTo indicate:
It will be combined to obtain new pitch point importance evaluation index overall situation metric close to centrality and degree centrality, it will
Node viGlobal metric mark CMiIt indicates:
CMi=α DCi+βCCi (5)
Wherein, DCiFor nodes viDegree centrality, CCiFor node viClose to centrality, α is that degree centrality can
Adjustment parameter, β are to extract keyword according to obtained global metric close to centrality customized parameter, and alpha+beta=1.
Preferred: filtering threshold is 30 in step 1.
It is preferred: name entity recognition method neural network based in step 3: to do sequence using BiLSTM_CRF model
Mark is embedded in using word insertion and word, and since input layer, the level of model is successively look-up layers, LSTM layers two-way, CRF
Layer, output layer, look-up layers by word x each in sentenceiIt is the dense word vector of low-dimensional by one-hot DUAL PROBLEMS OF VECTOR MAPPING, it is two-way
LSTM layers automatically extract sentence characteristics, the sequence labelling of CRF layers of progress Sentence-level;" ns " is marked as using part-of-speech tagging concentration
Part construct place name identification corpus, name Entity recognition mode using based on BiLSTM-CRF model.
It is preferred: degree centrality customized parameter α=0.4, close to centrality customized parameter β=0.6.
The present invention compared with prior art, has the advantages that
Keyword extraction of the invention achieves original achievement in the classification of Chinese newsletter archive, compared to TF-IDF
Algorithm and traditional text complex network developing algorithm, the extracted keyword of the present invention have been obviously improved point in classification task
The accurate rate of class result, recall rate and F1 value.
Detailed description of the invention
Fig. 1 linguistic network constructs optimized flow chart
The algorithm flow chart of Fig. 2 keyword extraction
Specific embodiment
In the following with reference to the drawings and specific embodiments, the present invention is furture elucidated, it should be understood that these examples are merely to illustrate this
It invents rather than limits the scope of the invention, after the present invention has been read, those skilled in the art are to of the invention various
The modification of equivalent form falls within the application range as defined in the appended claims.
A kind of news keyword extracting method based on NER and Complex Networks Feature, by studying in Chinese newsletter archive
The network characteristic of word, by name Entity recognition (Named Entities Recognition, NER) and answering in natural language
Miscellaneous network (Complex Networks, CN) characteristic combines, and proposes novel keyword abstraction algorithm --- it is tied based on NER
The algorithm (NER-CN) of Complex Networks Feature is closed, which, which is first labeled sentence, is named Entity recognition analysis, so
Text complex network is constructed afterwards and keyword abstraction is carried out according to the global metric of the node in network.Experimental result shows,
Keyword abstraction algorithm proposed by the invention has in the indexs such as recall rate and F1 value significant in the accurate rate of text classification
It improves.Mainly test coding is carried out using python3, comprising the following steps:
Step1: collecting news content, news content is generated listed files under original expectation file, setting threshold value is
30, the content that number of characters is less than threshold value will be all filtered, and for each corpus, canonical matches url and content.Institute
Some news contents will be stored in respectively in each txt file according to the classification in url.
Step2: being segmented, and the corpus handled herein is Chinese, therefore is handled Chinese word segmentation and needed using stammerer point
Word.Remove stop words.Although many words largely occur in article in Chinese, to article classification, there is no what practical meanings
Justice.For example " only ", word as " ", " should " may also influence finally point their calculating both wasting space time
Class result.
Step3: being labeled sentence using name entity recognition method neural network based, then carries out date knowledge
Not, the tasks such as name identification and place name identification, extract name entity important in text.
Name Entity recognition is a background task of natural language processing, is information extraction, and information retrieval, machine turns over
It translates, the essential component part of a variety of natural language processing techniques such as question answering system.In most cases, the content in dictionary
Some uncommon place names, the information such as name can not be completely covered, and the nomenclature rule of these entities is each has something to recommend him.And it identifies
These words are an especially basic important ring in natural language processing task again, therefore, are needed to the identification of these words independent
The task of foundation is handled, and is usually completed in the morphology processing task of early stage, this link, which is referred to as, names Entity recognition
(Named Entities Recognition, NER).Chinese is made for Chinese NER task often with word is flexible and changeable
It is more challenging.It names Entity recognition in news corpus and name, is achieved in the research of the entities such as place name quite original
Achievement.
Used in the present invention is NER method neural network based, does sequence labelling using BiLSTM_CRF model, benefit
With word insertion and word insertion, since input layer, the level of model is successively look-up layers (by word x each in sentenceiBy
One-hot DUAL PROBLEMS OF VECTOR MAPPING is the dense word vector of low-dimensional), LSTM layers two-way (automatically extracting sentence characteristics), CRF layers (carry out sentence
The sequence labelling of sub- grade), output layer.The part for being marked as " ns " is concentrated to construct place name identification language using part-of-speech tagging herein
Material, for example, " [Hong Kong/ns especially/administrative area a/n] ns ", we can directly extract " Hong Kong Special Administrative Region " (bracket
Within " ns " no longer separately as a place name).It is used herein that Entity recognition mode is named based on BiLSTM-CRF model,
By taking organization names as an example, the rule of definition is as shown in the table:
1 organization of table names entity indicia
Step4: building word complex network.By word obtained in the previous step carry out digital coding, using coding result as
Node, uses distance 2 as the distance of word association relationship herein, i.e. distance has even side between the word within 2.We are right
Each sentence loops to determine.
Step5: the degree centrality for calculating each node is normalized it close to centrality, and carries out weighting summation
Obtain the global metric of node.To each node cycle calculations.
Currently, academic it is believed that degree centrality is Local Metric index important in complex network, be close to centrality
Important global Measure Indexes will be spent centrality herein and be combined with close to the different index of centrality both emphasis, mention
Keyword abstraction algorithm based on Complex Networks Feature out.
If V={ v1,v2…vnIt is a set for having N number of node, (vi,vj) indicate node vi∈ V and vjBetween ∈ V
Side, if G (V, E) be using V as node set, withFor the figure of line set, node viDegree center
Property DCiAre as follows:
In formula, kiFor the degree of node, i.e., the number of coupled node, N is the number of nodes.In degree
Disposition is the local feature of node.
Other than neighbor node, in order to obtain a node in a network with the relationship of remaining remote node, to mention
The global property of node is taken out, it can be by calculating a node viInto network, the average value of the distance of all nodes, is denoted as di,
Have:
In formula, dijIt is node viTo node vjDistance.In network, the distance average between all nodes is also then used
Following formula is calculated:
diReflect node viRelative importance in a network: diIt is worth smaller, illustrates node viAt a distance from other nodes
It is smaller.Herein by diInverse be defined as node viClose to centrality, with mark CCiTo indicate:
Aggregation of the degree centrality reflection node of node in subrange, it is special to embody the part of node in a network
Property.Under normal conditions, as soon as the degree centrality of node is bigger, and neighbor node is more, then the node in subrange just
With quite high importance.However only using the degree centrality of node as evaluation index, local model in network can only be extracted
The bigger word of interior degree is enclosed, to have ignored larger to network entire effect but spend the not high word of centrality.Close to center
Property be a network characterization of overall importance, if node has very high close to centrality, illustrate this point distance
Any other point is all nearest, is spatially also embodied on center.The different index of above two emphasis is tied
It closes, set forth herein a kind of new pitch point importance evaluation index overall situation metric, we are by node viGlobal metric note
Number CMiIt indicates:
CMi=α DCi+βCCi (5)
Wherein, DCiFor nodes viDegree centrality, CCiFor node viClose to centrality.α, β are adjustable ginseng
Number, and alpha+beta=1.(α=0.4, β=0.6 in the present invention).
4000 web page news sample standard deviations herein derive from search dog corpus, we use python3.6 as experiment
Tool, all character codes are UTF-8.We need sample and label when doing text classification, sample data source is at this
In can be headline+news content, do to simplify that news content is used only herein, and the class label of the news then can be with
From the URL of the news.Such as: gongyi.sohu.com can be seen that the news belongs to classification " public good " from the URL.This
The ratio of training sample and test sample is 4:1 in experiment, i.e., using 3200 documents as training sample, 800 document conducts
Test document.
2 data distribution of table
Table 3 tests environment
Evaluation result:
In text categorization task, usually using accurate rate P (Precision), recall rate R (Recall) and F1 value are made
For evaluation index.Also use These parameters as evaluation index in this experiment.
Meaning of parameters is as shown in the table in above-mentioned formula.
4 meaning of parameters of table
The network characteristic of word, will name Entity recognition (Named Entities in the Chinese newsletter archive of present invention research
Recognition, NER) it combines, proposes with complex network (Complex Networks, CN) characteristic in natural language
Novel keyword abstraction algorithm --- the algorithm (NER-CN) based on NER combination Complex Networks Feature, algorithm distich first
Son, which is labeled, is named Entity recognition analysis, then constructs text complex network and the global degree according to the node in network
Magnitude carries out keyword abstraction.Keyword abstraction algorithm proposed by the invention is compared to conventional method in the accurate of text classification
Rate has significant raising in the indexs such as recall rate and F1 value.
The above is only a preferred embodiment of the present invention, it should be pointed out that: for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (4)
1. a kind of news keyword extracting method based on NER and Complex Networks Feature, which comprises the following steps:
Step 1: collecting news content, news content is generated into listed files under original expectation file, filtering threshold, word are set
The content that symbol number is less than filtering threshold is filtered, and for each corpus, canonical matches url and content;All news
Content will be stored in respectively in each txt file according to the classification in url;
Step 2: the content in each txt file being segmented, removes stop words using stammerer participle;
Step 3: sentence is labeled using name entity recognition method neural network based, then carries out date recognition,
Name identification and place name identification, extract name entity important in text;
Step 4: building word complex network;The word that step 3 is obtained carries out digital coding, using coding result as node,
Use distance 2 as the distance of word association relationship, i.e. distance has even side between the word within 2, recycles to each sentence
Judgement;
V={ v1,v2…vnIt is a set for having N number of node, (vi,vj) indicate node vi∈ V and vjSide between ∈ V, G (V, E)
Be using V as node set, withFor the figure of line set, node viDegree centrality DCiAre as follows:
Wherein, kiFor the degree of node, i.e., the number of coupled node, N is the number of nodes;
Calculate a node viInto network, the average value of the distance of all nodes, is denoted as di, that is, have:
Wherein, dijIt is node viTo node vjDistance, in network, distance average L between all nodes with following formula into
Row calculates:
By diInverse be defined as node viClose to centrality, with mark CCiTo indicate:
It will be combined to obtain new pitch point importance evaluation index overall situation metric close to centrality and degree centrality, by node
viGlobal metric mark CMiIt indicates:
CMi=α DCi+βCCi (5)
Wherein, DCiFor nodes viDegree centrality, CCiFor node viClose to centrality, α is that degree centrality is adjustable
Parameter, β are to extract keyword according to obtained global metric close to centrality customized parameter, and alpha+beta=1.
2. the news keyword extracting method based on NER and Complex Networks Feature according to claim 1, it is characterised in that:
Filtering threshold is 30 in step 1.
3. the news keyword extracting method based on NER and Complex Networks Feature according to claim 2, it is characterised in that:
Name entity recognition method neural network based in step 3: sequence labelling is done using BiLSTM_CRF model, is embedded in using word
It is embedded in word, since input layer, the level of model is successively look-up layers, LSTM layers two-way, CRF layers, output layer, look-
Up layers by word x each in sentenceiIt is the dense word vector of low-dimensional by one-hot DUAL PROBLEMS OF VECTOR MAPPING, two-way LSTM layers automatically extracts sentence
Subcharacter, the sequence labelling of CRF layers of progress Sentence-level;The part for being marked as " ns " is concentrated to construct place name using part-of-speech tagging
It identifies corpus, names Entity recognition mode using based on BiLSTM-CRF model.
4. the news keyword extracting method based on NER and Complex Networks Feature according to claim 3, it is characterised in that:
Centrality customized parameter α=0.4 is spent, close to centrality customized parameter β=0.6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910790303.0A CN110532390B (en) | 2019-08-26 | 2019-08-26 | News keyword extraction method based on NER and complex network characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910790303.0A CN110532390B (en) | 2019-08-26 | 2019-08-26 | News keyword extraction method based on NER and complex network characteristics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110532390A true CN110532390A (en) | 2019-12-03 |
CN110532390B CN110532390B (en) | 2022-07-29 |
Family
ID=68664200
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910790303.0A Active CN110532390B (en) | 2019-08-26 | 2019-08-26 | News keyword extraction method based on NER and complex network characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110532390B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111178076A (en) * | 2019-12-19 | 2020-05-19 | 成都欧珀通信科技有限公司 | Named entity identification and linking method, device, equipment and readable storage medium |
CN111339250A (en) * | 2020-02-20 | 2020-06-26 | 北京百度网讯科技有限公司 | Mining method of new category label, electronic equipment and computer readable medium |
CN112241492A (en) * | 2020-10-22 | 2021-01-19 | 西安石油大学 | Early identification method for multi-source heterogeneous online network topics |
CN112307364A (en) * | 2020-11-25 | 2021-02-02 | 哈尔滨工业大学 | Character representation-oriented news text place extraction method |
CN112948527A (en) * | 2021-02-23 | 2021-06-11 | 云南大学 | Improved TextRank keyword extraction method and device |
CN117408651A (en) * | 2023-12-15 | 2024-01-16 | 辽宁省网联数字科技产业有限公司 | On-line compiling method and system for bidding scheme based on artificial intelligence |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110103682A1 (en) * | 2009-10-29 | 2011-05-05 | Xerox Corporation | Multi-modality classification for one-class classification in social networks |
CN104933032A (en) * | 2015-06-29 | 2015-09-23 | 电子科技大学 | Method for extracting keywords of blog based on complex network |
CN106202042A (en) * | 2016-07-06 | 2016-12-07 | 中央民族大学 | A kind of keyword abstraction method based on figure |
CN106503101A (en) * | 2016-10-14 | 2017-03-15 | 五邑大学 | Electric business customer service automatically request-answering system sentence keyword extracting method |
-
2019
- 2019-08-26 CN CN201910790303.0A patent/CN110532390B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110103682A1 (en) * | 2009-10-29 | 2011-05-05 | Xerox Corporation | Multi-modality classification for one-class classification in social networks |
CN104933032A (en) * | 2015-06-29 | 2015-09-23 | 电子科技大学 | Method for extracting keywords of blog based on complex network |
CN106202042A (en) * | 2016-07-06 | 2016-12-07 | 中央民族大学 | A kind of keyword abstraction method based on figure |
CN106503101A (en) * | 2016-10-14 | 2017-03-15 | 五邑大学 | Electric business customer service automatically request-answering system sentence keyword extracting method |
Non-Patent Citations (3)
Title |
---|
唐俊: "复杂网络在新闻网页关键词提取中的应用", 《云南民族大学学报(自然科学版)》 * |
张丽: "文本挖掘中关键词与文本摘要自动提取研究", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 * |
李静月等: "一种改进的TFIDF网页关键词提取方法", 《计算机应用与软件》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111178076A (en) * | 2019-12-19 | 2020-05-19 | 成都欧珀通信科技有限公司 | Named entity identification and linking method, device, equipment and readable storage medium |
CN111178076B (en) * | 2019-12-19 | 2023-08-08 | 成都欧珀通信科技有限公司 | Named entity recognition and linking method, device, equipment and readable storage medium |
CN111339250A (en) * | 2020-02-20 | 2020-06-26 | 北京百度网讯科技有限公司 | Mining method of new category label, electronic equipment and computer readable medium |
CN111339250B (en) * | 2020-02-20 | 2023-08-18 | 北京百度网讯科技有限公司 | Mining method for new category labels, electronic equipment and computer readable medium |
CN112241492A (en) * | 2020-10-22 | 2021-01-19 | 西安石油大学 | Early identification method for multi-source heterogeneous online network topics |
CN112241492B (en) * | 2020-10-22 | 2023-04-07 | 西安石油大学 | Early identification method for multi-source heterogeneous online network topics |
CN112307364A (en) * | 2020-11-25 | 2021-02-02 | 哈尔滨工业大学 | Character representation-oriented news text place extraction method |
CN112307364B (en) * | 2020-11-25 | 2021-10-29 | 哈尔滨工业大学 | Character representation-oriented news text place extraction method |
CN112948527A (en) * | 2021-02-23 | 2021-06-11 | 云南大学 | Improved TextRank keyword extraction method and device |
CN117408651A (en) * | 2023-12-15 | 2024-01-16 | 辽宁省网联数字科技产业有限公司 | On-line compiling method and system for bidding scheme based on artificial intelligence |
Also Published As
Publication number | Publication date |
---|---|
CN110532390B (en) | 2022-07-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110532390A (en) | A kind of news keyword extracting method based on NER and Complex Networks Feature | |
CN106919673B (en) | Text mood analysis system based on deep learning | |
Zhang et al. | Enhancing cross-target stance detection with transferable semantic-emotion knowledge | |
Schmitz | Inducing ontology from flickr tags | |
CN107315738B (en) | A kind of innovation degree appraisal procedure of text information | |
CN105005594B (en) | Abnormal microblog users recognition methods | |
Li et al. | News text classification model based on topic model | |
CN105243129A (en) | Commodity property characteristic word clustering method | |
CN107180026B (en) | Event phrase learning method and device based on word embedding semantic mapping | |
CN106997344A (en) | Keyword abstraction system | |
CN101739430A (en) | Method for training and classifying text emotion classifiers based on keyword | |
CN101770580A (en) | Training method and classification method of cross-field text sentiment classifier | |
CN112464669B (en) | Stock entity word disambiguation method, computer device, and storage medium | |
CN110134792A (en) | Text recognition method, device, electronic equipment and storage medium | |
CN109614626A (en) | Keyword Automatic method based on gravitational model | |
CN105869058B (en) | A kind of method that multilayer latent variable model user portrait extracts | |
Wong et al. | Hot item mining and summarization from multiple auction web sites | |
Kocayusufoglu et al. | Riser: Learning better representations for richly structured emails | |
CN107357895A (en) | A kind of processing method of the text representation based on bag of words | |
Chen et al. | Sentiment classification of tourism based on rules and LDA topic model | |
CN114997288A (en) | Design resource association method | |
Biemann | Unsupervised part-of-speech tagging in the large | |
CN113934835A (en) | Retrieval type reply dialogue method and system combining keywords and semantic understanding representation | |
Laboreiro et al. | Determining language variant in microblog messages | |
CN110196910A (en) | A kind of method and device of corpus classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |