CN110532390A - A kind of news keyword extracting method based on NER and Complex Networks Feature - Google Patents

A kind of news keyword extracting method based on NER and Complex Networks Feature Download PDF

Info

Publication number
CN110532390A
CN110532390A CN201910790303.0A CN201910790303A CN110532390A CN 110532390 A CN110532390 A CN 110532390A CN 201910790303 A CN201910790303 A CN 201910790303A CN 110532390 A CN110532390 A CN 110532390A
Authority
CN
China
Prior art keywords
node
word
centrality
ner
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910790303.0A
Other languages
Chinese (zh)
Other versions
CN110532390B (en
Inventor
纪明轩
宋玉蓉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201910790303.0A priority Critical patent/CN110532390B/en
Publication of CN110532390A publication Critical patent/CN110532390A/en
Application granted granted Critical
Publication of CN110532390B publication Critical patent/CN110532390B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The news keyword extracting method based on NER and Complex Networks Feature that the invention discloses a kind of, Entity recognition (Named Entities Recognition will be named, NER) with natural language in complex network (Complex Networks, CN) characteristic combines, propose novel keyword abstraction method --- the method (NER-CN) based on NER combination Complex Networks Feature, this method, which is first labeled sentence, is named Entity recognition analysis, then it constructs text complex network and keyword abstraction is carried out according to the global metric of the node in network.Keyword abstraction method proposed by the invention, in the accurate rate of text classification, has significant raising in the indexs such as recall rate and F1 value compared to conventional method.

Description

A kind of news keyword extracting method based on NER and Complex Networks Feature
Technical field
The network characteristic of word, will name Entity recognition (Named Entities in the Chinese newsletter archive of present invention research Recognition, NER) it combines, proposes with complex network (Complex Networks, CN) characteristic in natural language Novel keyword abstraction algorithm --- the algorithm (NER-CN) based on NER combination Complex Networks Feature belongs to NLP technology neck Domain.
Background technique
In recent years, since data explosion formula increases and the promotion of computing capability, how quick from mass data user is Extracting useful information just has higher technical requirements.And the feature selecting of text is as the important link in text analyzing, Its performance is just particularly important classifying quality.
Traditional text feature has TF-IDF, TextRank, LDA, information gain etc..But due to language itself Complexity, when extracting text feature using these methods, be easy to ignore text self structure information and manufacture bulk redundancy letter Breath.In order to retain the structural information in text, the community information in natural language is mapped in complex network with semantic structure Become burning hot research direction.
Keyword extraction techniques are the bases of natural language processing field, in recent years, are had both at home and abroad to it more deep Research.File: Amancio D R.Probing the Topological Properties of Complex Networks Modeling Short Written Texts [J] .PLoS One, 2014,10 (2): e0118394 is proposed can be with Short text is analyzed with the method for complex network and concept, Ziwen sheet is extracted with reasonable manner, word of the building based on grammer is total Existing network analyzes the complex network characteristic of dynamic short text, and passes through the text classification experimental verification of SVM algorithm this method Superiority.File: De Arruda H F, Costa L D F, Amancio D R.Using complex networks for text classification:Discriminating informative and imaginative documents[J] .EPL (Europhysics Letters), 2016,113 (2): 28007 inquired into how to be efficiently used in classification task from Feature obtained in text retrieval conference TREC, they have carried out supervised classification, it is intended to distinguish information and image file, use description function Local topology/dynamic characteristic network model of energy word, substantially increases the accuracy of text classification.File: Tang Jun complex web Application [J] Yunnan Institute for nationalities journal (natural science edition) of the network in news web page keyword extraction, 2012,21 (4) will add The characteristics of characteristic of power complex network is introduced into this link of keyword extraction, analyzes news web page document and node weight Weight, describes the cluster coefficients and central part of directed networks weight, the advantages of using traditional algorithm, proposes a kind of improved News keyword method is automatically extracted, experiments have shown that the algorithm is feasible.Existing file: Zhan Z J, Lin F, Yang X P.Keyword Extraction of Document Based on Weighted Complex Network[J] .Advanced Materials Research, 2011,403-408:2146-2151 have studied the complex network of Chinese composition Feature proposes a kind of automatic keyword extraction algorithm of the Chinese document based on Complex Networks Feature, according in linguistic network Theoretical result in worldlet structure and complex network, the characteristics extraction based on word node in document language network are crucial Word, the experimental results showed that, which has higher mean accuracy compared to traditional TF-IDF algorithm.
Although the studies above improves the application of complex network in the text in certain level, still exist following Problem: in the news report of specific topics, it often will appear the place name of some special entities, name and date etc., tradition Keyword extraction algorithm can not effectively extract these entity informations.
Summary of the invention
Goal of the invention: in order to overcome the deficiencies in the prior art, the present invention provides a kind of based on NER and complex network The news keyword extracting method of feature, the invention proposes for indicating the network model of relationship between document entity.First Small-sized corpus is constructed, it is complicated according to the cooccurrence relation building text between syntactic relation and word to complete text training Network.The degree centrality of comprehensive analysis node in a network creates network node overall situation metric formula close to centrality.It examines Consider monosyllabic word in text to interfere keyword extraction, individual character word is first removed in dotcom world node and commonly deactivates Word, the then global metric of calculate node.
Technical solution: to achieve the above object, the technical solution adopted by the present invention are as follows:
A kind of news keyword extracting method based on NER and Complex Networks Feature, comprising the following steps:
Step 1: collecting news content, news content is generated into listed files under original expectation file, setting filtering threshold Value, the content that number of characters is less than filtering threshold are filtered, and for each corpus, canonical matches url and content;It is all News content will be stored in respectively in each txt file according to the classification in url;
Step 2: the content in each txt file being segmented, removes stop words using stammerer participle;
Step 3: sentence being labeled using name entity recognition method neural network based, then carries out date knowledge Not, name identification and place name identification, extract name entity important in text;
Step 4: building word complex network;The word that step 3 is obtained carries out digital coding, using coding result as section Point, uses distance 2 as the distance of word association relationship, i.e. distance has even side between the word within 2, follows to each sentence Ring judgement;
V={ v1,v2…vnIt is a set for having N number of node, (vi,vj) indicate node vi∈ V and vjSide between ∈ V, G (V, E) be using V as node set, withFor the figure of line set, node viDegree centrality DCi Are as follows:
Wherein, kiFor the degree of node, i.e., the number of coupled node, N is the number of nodes;
Calculate a node viInto network, the average value of the distance of all nodes, is denoted as di, that is, have:
Wherein, dijIt is node viTo node vjDistance, in network, distance average L between all nodes is with following Formula is calculated:
By diInverse be defined as node viClose to centrality, with mark CCiTo indicate:
It will be combined to obtain new pitch point importance evaluation index overall situation metric close to centrality and degree centrality, it will Node viGlobal metric mark CMiIt indicates:
CMi=α DCi+βCCi (5)
Wherein, DCiFor nodes viDegree centrality, CCiFor node viClose to centrality, α is that degree centrality can Adjustment parameter, β are to extract keyword according to obtained global metric close to centrality customized parameter, and alpha+beta=1.
Preferred: filtering threshold is 30 in step 1.
It is preferred: name entity recognition method neural network based in step 3: to do sequence using BiLSTM_CRF model Mark is embedded in using word insertion and word, and since input layer, the level of model is successively look-up layers, LSTM layers two-way, CRF Layer, output layer, look-up layers by word x each in sentenceiIt is the dense word vector of low-dimensional by one-hot DUAL PROBLEMS OF VECTOR MAPPING, it is two-way LSTM layers automatically extract sentence characteristics, the sequence labelling of CRF layers of progress Sentence-level;" ns " is marked as using part-of-speech tagging concentration Part construct place name identification corpus, name Entity recognition mode using based on BiLSTM-CRF model.
It is preferred: degree centrality customized parameter α=0.4, close to centrality customized parameter β=0.6.
The present invention compared with prior art, has the advantages that
Keyword extraction of the invention achieves original achievement in the classification of Chinese newsletter archive, compared to TF-IDF Algorithm and traditional text complex network developing algorithm, the extracted keyword of the present invention have been obviously improved point in classification task The accurate rate of class result, recall rate and F1 value.
Detailed description of the invention
Fig. 1 linguistic network constructs optimized flow chart
The algorithm flow chart of Fig. 2 keyword extraction
Specific embodiment
In the following with reference to the drawings and specific embodiments, the present invention is furture elucidated, it should be understood that these examples are merely to illustrate this It invents rather than limits the scope of the invention, after the present invention has been read, those skilled in the art are to of the invention various The modification of equivalent form falls within the application range as defined in the appended claims.
A kind of news keyword extracting method based on NER and Complex Networks Feature, by studying in Chinese newsletter archive The network characteristic of word, by name Entity recognition (Named Entities Recognition, NER) and answering in natural language Miscellaneous network (Complex Networks, CN) characteristic combines, and proposes novel keyword abstraction algorithm --- it is tied based on NER The algorithm (NER-CN) of Complex Networks Feature is closed, which, which is first labeled sentence, is named Entity recognition analysis, so Text complex network is constructed afterwards and keyword abstraction is carried out according to the global metric of the node in network.Experimental result shows, Keyword abstraction algorithm proposed by the invention has in the indexs such as recall rate and F1 value significant in the accurate rate of text classification It improves.Mainly test coding is carried out using python3, comprising the following steps:
Step1: collecting news content, news content is generated listed files under original expectation file, setting threshold value is 30, the content that number of characters is less than threshold value will be all filtered, and for each corpus, canonical matches url and content.Institute Some news contents will be stored in respectively in each txt file according to the classification in url.
Step2: being segmented, and the corpus handled herein is Chinese, therefore is handled Chinese word segmentation and needed using stammerer point Word.Remove stop words.Although many words largely occur in article in Chinese, to article classification, there is no what practical meanings Justice.For example " only ", word as " ", " should " may also influence finally point their calculating both wasting space time Class result.
Step3: being labeled sentence using name entity recognition method neural network based, then carries out date knowledge Not, the tasks such as name identification and place name identification, extract name entity important in text.
Name Entity recognition is a background task of natural language processing, is information extraction, and information retrieval, machine turns over It translates, the essential component part of a variety of natural language processing techniques such as question answering system.In most cases, the content in dictionary Some uncommon place names, the information such as name can not be completely covered, and the nomenclature rule of these entities is each has something to recommend him.And it identifies These words are an especially basic important ring in natural language processing task again, therefore, are needed to the identification of these words independent The task of foundation is handled, and is usually completed in the morphology processing task of early stage, this link, which is referred to as, names Entity recognition (Named Entities Recognition, NER).Chinese is made for Chinese NER task often with word is flexible and changeable It is more challenging.It names Entity recognition in news corpus and name, is achieved in the research of the entities such as place name quite original Achievement.
Used in the present invention is NER method neural network based, does sequence labelling using BiLSTM_CRF model, benefit With word insertion and word insertion, since input layer, the level of model is successively look-up layers (by word x each in sentenceiBy One-hot DUAL PROBLEMS OF VECTOR MAPPING is the dense word vector of low-dimensional), LSTM layers two-way (automatically extracting sentence characteristics), CRF layers (carry out sentence The sequence labelling of sub- grade), output layer.The part for being marked as " ns " is concentrated to construct place name identification language using part-of-speech tagging herein Material, for example, " [Hong Kong/ns especially/administrative area a/n] ns ", we can directly extract " Hong Kong Special Administrative Region " (bracket Within " ns " no longer separately as a place name).It is used herein that Entity recognition mode is named based on BiLSTM-CRF model, By taking organization names as an example, the rule of definition is as shown in the table:
1 organization of table names entity indicia
Step4: building word complex network.By word obtained in the previous step carry out digital coding, using coding result as Node, uses distance 2 as the distance of word association relationship herein, i.e. distance has even side between the word within 2.We are right Each sentence loops to determine.
Step5: the degree centrality for calculating each node is normalized it close to centrality, and carries out weighting summation Obtain the global metric of node.To each node cycle calculations.
Currently, academic it is believed that degree centrality is Local Metric index important in complex network, be close to centrality Important global Measure Indexes will be spent centrality herein and be combined with close to the different index of centrality both emphasis, mention Keyword abstraction algorithm based on Complex Networks Feature out.
If V={ v1,v2…vnIt is a set for having N number of node, (vi,vj) indicate node vi∈ V and vjBetween ∈ V Side, if G (V, E) be using V as node set, withFor the figure of line set, node viDegree center Property DCiAre as follows:
In formula, kiFor the degree of node, i.e., the number of coupled node, N is the number of nodes.In degree Disposition is the local feature of node.
Other than neighbor node, in order to obtain a node in a network with the relationship of remaining remote node, to mention The global property of node is taken out, it can be by calculating a node viInto network, the average value of the distance of all nodes, is denoted as di, Have:
In formula, dijIt is node viTo node vjDistance.In network, the distance average between all nodes is also then used Following formula is calculated:
diReflect node viRelative importance in a network: diIt is worth smaller, illustrates node viAt a distance from other nodes It is smaller.Herein by diInverse be defined as node viClose to centrality, with mark CCiTo indicate:
Aggregation of the degree centrality reflection node of node in subrange, it is special to embody the part of node in a network Property.Under normal conditions, as soon as the degree centrality of node is bigger, and neighbor node is more, then the node in subrange just With quite high importance.However only using the degree centrality of node as evaluation index, local model in network can only be extracted The bigger word of interior degree is enclosed, to have ignored larger to network entire effect but spend the not high word of centrality.Close to center Property be a network characterization of overall importance, if node has very high close to centrality, illustrate this point distance Any other point is all nearest, is spatially also embodied on center.The different index of above two emphasis is tied It closes, set forth herein a kind of new pitch point importance evaluation index overall situation metric, we are by node viGlobal metric note Number CMiIt indicates:
CMi=α DCi+βCCi (5)
Wherein, DCiFor nodes viDegree centrality, CCiFor node viClose to centrality.α, β are adjustable ginseng Number, and alpha+beta=1.(α=0.4, β=0.6 in the present invention).
4000 web page news sample standard deviations herein derive from search dog corpus, we use python3.6 as experiment Tool, all character codes are UTF-8.We need sample and label when doing text classification, sample data source is at this In can be headline+news content, do to simplify that news content is used only herein, and the class label of the news then can be with From the URL of the news.Such as: gongyi.sohu.com can be seen that the news belongs to classification " public good " from the URL.This The ratio of training sample and test sample is 4:1 in experiment, i.e., using 3200 documents as training sample, 800 document conducts Test document.
2 data distribution of table
Table 3 tests environment
Evaluation result:
In text categorization task, usually using accurate rate P (Precision), recall rate R (Recall) and F1 value are made For evaluation index.Also use These parameters as evaluation index in this experiment.
Meaning of parameters is as shown in the table in above-mentioned formula.
4 meaning of parameters of table
The network characteristic of word, will name Entity recognition (Named Entities in the Chinese newsletter archive of present invention research Recognition, NER) it combines, proposes with complex network (Complex Networks, CN) characteristic in natural language Novel keyword abstraction algorithm --- the algorithm (NER-CN) based on NER combination Complex Networks Feature, algorithm distich first Son, which is labeled, is named Entity recognition analysis, then constructs text complex network and the global degree according to the node in network Magnitude carries out keyword abstraction.Keyword abstraction algorithm proposed by the invention is compared to conventional method in the accurate of text classification Rate has significant raising in the indexs such as recall rate and F1 value.
The above is only a preferred embodiment of the present invention, it should be pointed out that: for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (4)

1. a kind of news keyword extracting method based on NER and Complex Networks Feature, which comprises the following steps:
Step 1: collecting news content, news content is generated into listed files under original expectation file, filtering threshold, word are set The content that symbol number is less than filtering threshold is filtered, and for each corpus, canonical matches url and content;All news Content will be stored in respectively in each txt file according to the classification in url;
Step 2: the content in each txt file being segmented, removes stop words using stammerer participle;
Step 3: sentence is labeled using name entity recognition method neural network based, then carries out date recognition, Name identification and place name identification, extract name entity important in text;
Step 4: building word complex network;The word that step 3 is obtained carries out digital coding, using coding result as node, Use distance 2 as the distance of word association relationship, i.e. distance has even side between the word within 2, recycles to each sentence Judgement;
V={ v1,v2…vnIt is a set for having N number of node, (vi,vj) indicate node vi∈ V and vjSide between ∈ V, G (V, E) Be using V as node set, withFor the figure of line set, node viDegree centrality DCiAre as follows:
Wherein, kiFor the degree of node, i.e., the number of coupled node, N is the number of nodes;
Calculate a node viInto network, the average value of the distance of all nodes, is denoted as di, that is, have:
Wherein, dijIt is node viTo node vjDistance, in network, distance average L between all nodes with following formula into Row calculates:
By diInverse be defined as node viClose to centrality, with mark CCiTo indicate:
It will be combined to obtain new pitch point importance evaluation index overall situation metric close to centrality and degree centrality, by node viGlobal metric mark CMiIt indicates:
CMi=α DCi+βCCi (5)
Wherein, DCiFor nodes viDegree centrality, CCiFor node viClose to centrality, α is that degree centrality is adjustable Parameter, β are to extract keyword according to obtained global metric close to centrality customized parameter, and alpha+beta=1.
2. the news keyword extracting method based on NER and Complex Networks Feature according to claim 1, it is characterised in that: Filtering threshold is 30 in step 1.
3. the news keyword extracting method based on NER and Complex Networks Feature according to claim 2, it is characterised in that: Name entity recognition method neural network based in step 3: sequence labelling is done using BiLSTM_CRF model, is embedded in using word It is embedded in word, since input layer, the level of model is successively look-up layers, LSTM layers two-way, CRF layers, output layer, look- Up layers by word x each in sentenceiIt is the dense word vector of low-dimensional by one-hot DUAL PROBLEMS OF VECTOR MAPPING, two-way LSTM layers automatically extracts sentence Subcharacter, the sequence labelling of CRF layers of progress Sentence-level;The part for being marked as " ns " is concentrated to construct place name using part-of-speech tagging It identifies corpus, names Entity recognition mode using based on BiLSTM-CRF model.
4. the news keyword extracting method based on NER and Complex Networks Feature according to claim 3, it is characterised in that: Centrality customized parameter α=0.4 is spent, close to centrality customized parameter β=0.6.
CN201910790303.0A 2019-08-26 2019-08-26 News keyword extraction method based on NER and complex network characteristics Active CN110532390B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910790303.0A CN110532390B (en) 2019-08-26 2019-08-26 News keyword extraction method based on NER and complex network characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910790303.0A CN110532390B (en) 2019-08-26 2019-08-26 News keyword extraction method based on NER and complex network characteristics

Publications (2)

Publication Number Publication Date
CN110532390A true CN110532390A (en) 2019-12-03
CN110532390B CN110532390B (en) 2022-07-29

Family

ID=68664200

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910790303.0A Active CN110532390B (en) 2019-08-26 2019-08-26 News keyword extraction method based on NER and complex network characteristics

Country Status (1)

Country Link
CN (1) CN110532390B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178076A (en) * 2019-12-19 2020-05-19 成都欧珀通信科技有限公司 Named entity identification and linking method, device, equipment and readable storage medium
CN111339250A (en) * 2020-02-20 2020-06-26 北京百度网讯科技有限公司 Mining method of new category label, electronic equipment and computer readable medium
CN112241492A (en) * 2020-10-22 2021-01-19 西安石油大学 Early identification method for multi-source heterogeneous online network topics
CN112307364A (en) * 2020-11-25 2021-02-02 哈尔滨工业大学 Character representation-oriented news text place extraction method
CN112948527A (en) * 2021-02-23 2021-06-11 云南大学 Improved TextRank keyword extraction method and device
CN117408651A (en) * 2023-12-15 2024-01-16 辽宁省网联数字科技产业有限公司 On-line compiling method and system for bidding scheme based on artificial intelligence

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110103682A1 (en) * 2009-10-29 2011-05-05 Xerox Corporation Multi-modality classification for one-class classification in social networks
CN104933032A (en) * 2015-06-29 2015-09-23 电子科技大学 Method for extracting keywords of blog based on complex network
CN106202042A (en) * 2016-07-06 2016-12-07 中央民族大学 A kind of keyword abstraction method based on figure
CN106503101A (en) * 2016-10-14 2017-03-15 五邑大学 Electric business customer service automatically request-answering system sentence keyword extracting method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110103682A1 (en) * 2009-10-29 2011-05-05 Xerox Corporation Multi-modality classification for one-class classification in social networks
CN104933032A (en) * 2015-06-29 2015-09-23 电子科技大学 Method for extracting keywords of blog based on complex network
CN106202042A (en) * 2016-07-06 2016-12-07 中央民族大学 A kind of keyword abstraction method based on figure
CN106503101A (en) * 2016-10-14 2017-03-15 五邑大学 Electric business customer service automatically request-answering system sentence keyword extracting method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
唐俊: "复杂网络在新闻网页关键词提取中的应用", 《云南民族大学学报(自然科学版)》 *
张丽: "文本挖掘中关键词与文本摘要自动提取研究", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *
李静月等: "一种改进的TFIDF网页关键词提取方法", 《计算机应用与软件》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178076A (en) * 2019-12-19 2020-05-19 成都欧珀通信科技有限公司 Named entity identification and linking method, device, equipment and readable storage medium
CN111178076B (en) * 2019-12-19 2023-08-08 成都欧珀通信科技有限公司 Named entity recognition and linking method, device, equipment and readable storage medium
CN111339250A (en) * 2020-02-20 2020-06-26 北京百度网讯科技有限公司 Mining method of new category label, electronic equipment and computer readable medium
CN111339250B (en) * 2020-02-20 2023-08-18 北京百度网讯科技有限公司 Mining method for new category labels, electronic equipment and computer readable medium
CN112241492A (en) * 2020-10-22 2021-01-19 西安石油大学 Early identification method for multi-source heterogeneous online network topics
CN112241492B (en) * 2020-10-22 2023-04-07 西安石油大学 Early identification method for multi-source heterogeneous online network topics
CN112307364A (en) * 2020-11-25 2021-02-02 哈尔滨工业大学 Character representation-oriented news text place extraction method
CN112307364B (en) * 2020-11-25 2021-10-29 哈尔滨工业大学 Character representation-oriented news text place extraction method
CN112948527A (en) * 2021-02-23 2021-06-11 云南大学 Improved TextRank keyword extraction method and device
CN117408651A (en) * 2023-12-15 2024-01-16 辽宁省网联数字科技产业有限公司 On-line compiling method and system for bidding scheme based on artificial intelligence

Also Published As

Publication number Publication date
CN110532390B (en) 2022-07-29

Similar Documents

Publication Publication Date Title
CN110532390A (en) A kind of news keyword extracting method based on NER and Complex Networks Feature
CN106919673B (en) Text mood analysis system based on deep learning
Zhang et al. Enhancing cross-target stance detection with transferable semantic-emotion knowledge
Schmitz Inducing ontology from flickr tags
CN107315738B (en) A kind of innovation degree appraisal procedure of text information
CN105005594B (en) Abnormal microblog users recognition methods
Li et al. News text classification model based on topic model
CN105243129A (en) Commodity property characteristic word clustering method
CN107180026B (en) Event phrase learning method and device based on word embedding semantic mapping
CN106997344A (en) Keyword abstraction system
CN101739430A (en) Method for training and classifying text emotion classifiers based on keyword
CN101770580A (en) Training method and classification method of cross-field text sentiment classifier
CN112464669B (en) Stock entity word disambiguation method, computer device, and storage medium
CN110134792A (en) Text recognition method, device, electronic equipment and storage medium
CN109614626A (en) Keyword Automatic method based on gravitational model
CN105869058B (en) A kind of method that multilayer latent variable model user portrait extracts
Wong et al. Hot item mining and summarization from multiple auction web sites
Kocayusufoglu et al. Riser: Learning better representations for richly structured emails
CN107357895A (en) A kind of processing method of the text representation based on bag of words
Chen et al. Sentiment classification of tourism based on rules and LDA topic model
CN114997288A (en) Design resource association method
Biemann Unsupervised part-of-speech tagging in the large
CN113934835A (en) Retrieval type reply dialogue method and system combining keywords and semantic understanding representation
Laboreiro et al. Determining language variant in microblog messages
CN110196910A (en) A kind of method and device of corpus classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant