CN103324700A - Noumenon concept attribute learning method based on Web information - Google Patents

Noumenon concept attribute learning method based on Web information Download PDF

Info

Publication number
CN103324700A
CN103324700A CN2013102292298A CN201310229229A CN103324700A CN 103324700 A CN103324700 A CN 103324700A CN 2013102292298 A CN2013102292298 A CN 2013102292298A CN 201310229229 A CN201310229229 A CN 201310229229A CN 103324700 A CN103324700 A CN 103324700A
Authority
CN
China
Prior art keywords
word
concept attribute
document
attribute
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013102292298A
Other languages
Chinese (zh)
Other versions
CN103324700B (en
Inventor
王俊丽
王志成
赵卫东
梁梅连
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN201310229229.8A priority Critical patent/CN103324700B/en
Publication of CN103324700A publication Critical patent/CN103324700A/en
Application granted granted Critical
Publication of CN103324700B publication Critical patent/CN103324700B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to the field of noumenon learning, in particular to a noumenon concept attribute learning method based on Web information. By means of the technical scheme, Web serves as a language database, a language pattern is built to serve as a query set of a Google search engine, webpage fragment and corresponding source website URL are extracted to build a candidate concept attribute word bank, an URL built text set of candidate works serves as LAD input, training parameters of an LDA model are obtained by adopting a Gibbs sampling method, an attribute candidate bank is trimmed and combined according to operation results of the LDA model, and a final concept attribute work set can be determined. The noumenon concept attribute learning method of Web information can accurately and effectively obtain a concept attribute set in a noumenon, and accordingly, automatic or semiautomatic body building can be possible.

Description

A kind of Ontological concept attribute learning method based on Web information
Technical field
The present invention relates to body learning technology and Internet technical field, specially refer to a kind of Ontological concept attribute learning method based on Web information.
Background technology
Semantic Web directly is the hot fields of computer research, its research emphasis mainly be around how the information table among the Web be shown machine the form that can understand and handle, namely have semanteme.Body is described the modeling tool of conceptual model as kind of energy at semantic and knowledge hierarchy, is core and the key of semantic description in the Semantic Web.At present, body is as providing the valuable source of domain knowledge support to be widely used in the various Intelligent Information Processing tasks such as knowledge engineering, information retrieval, question answering system.
Body learning is automatically or semi-automatically to obtain the ontology knowledge of expectation by technology such as machine learning, statistical method and natural language processings from existing data resource.Owing to realize that knowledge acquisition technology is still unrealistic fully automatically, so body learning is an automanual process of carrying out down user guided usually.
In the ontology conceptual knowledge is built, when describing a certain conceptual model, not only to provide the concept noun, and will provide the objective attributes of entities that concept reflects and describe, claim that these attributes are concept attribute.The body attribute is as the important component part of domain body construction of knowledge base and application, it is the emphasis of a basic research job of the automatic or semi-automatic structure of domain body knowledge base, correlative study both at home and abroad at present mainly concentrates on the extraction of Ontological concept example and attribute, or the right extraction of concept attribute and property value, and obtained certain progress.The research method of Ontological concept attributes extraction mainly is divided three classes:
Rule-based method: it is at first constructed based on the pattern rules set of word, part of speech and semanteme and them and stores.When attributes extraction, use the pattern of storing in statement fragment that linguistic knowledge handles desire and the pattern rules set to mate, if the match is successful, think that then this statement has the relation of corresponding pattern.Rule-based method needs the domain expert to participate and draw pattern rules, and the method costs dearly, and lacks the field portability;
Machine learning method based on statistics: the method based on the machine learning of adding up is a kind of method of carrying out widespread use in the concept attribute leaching process present stage.The language material that at first utilizes machine learning algorithm manually to mark is trained to a sorter model, then the sorter that makes up is used for the not prediction of the language material of mark is realized predefined classification is identified.The current use of this method is more extensive, has also obtained objective achievement.
Method based on semi-structured/structured data document: therefrom extracting concept attribute by analysis half hitch structure/structured data document structure also is to carry out a kind of main method that concept attribute extracts now.But the weak point of this method is that it is adapted to the relatively more fixing and complete document of document format, lacks generalization ability.
Summary of the invention
Purpose of the present invention provides a kind of Ontological concept attribute learning method based on Web information, in conjunction with carrying out the study of Ontological concept attribute based on the linguistics pattern with based on technology such as probability statistics, the concept attribute that the LDA model is applied to body is chosen the stage, generates the Ontological concept attribute more accurately and effectively to reach.
In order to reach the foregoing invention purpose, the present invention propose kind rule-based and machine learning, carry out the study of Ontological concept attribute with the irrelevant mixed method of file structure, adopt vocabulary-syntactic pattern to make up set of patterns, carrying out candidate's concept attribute word with Web as corpus extracts, and make up text set as the input of LDA model according to extracting the result, utilize the Gibbs sampling to obtain the training parameter of LDA model, body candidate concept attribute dictionary is pruned and merged according to extracting the result behind the operation LDA model, obtain final concept attribute set.
The present invention provides following technical proposals:
A kind of Ontological concept attribute learning method based on Web information is characterized in that, comprises the steps:
(1) structure of vocabulary-syntactic pattern collection.According to existing basic language set of patterns, utilize vocabulary-semantic pattern to make up and merge the verb form augmented pattern collection of expression relation of inclusion, the final set of patterns of expressing concept attribute of setting up is as the part of candidate's concept attribute extraction algorithm input.
(2) structure in candidate's concept attribute storehouse.Search plain engine as Web Data Source (corpus) with Google, at first make up the language mode collection, as the inquiry input of Google, extract corresponding webpage query fragment set and source network address URL set.The web page fragments that obtains according to inquiry then obtains candidate attribute word (the word frequency rate is more high, for the possibility of attribute word more big) according to word frequency statistics, just can obtain candidate's concept attribute word set through simple screening.
(3) structure of text set.According to the attribute word in candidate's dictionary, keep its corresponding source network address and carry out the webpage extraction.To the web document set of extracting, adopt the instrument of the increasing income OpenNLP composition notebook pre-service of Apache, mainly be to make the part of speech mark with OpenNLP.
(4) LDA prunes and merges the concept attribute collection.According to the text set of input, in conjunction with the result of Gibbs sampling parameter estimation, operation LDA model.Extraction result according to LDA models for several times iteration prunes and merging candidate concept attribute dictionary, obtains final concept attribute set.
In the above-mentioned Ontological concept attribute learning method, described step specifically comprises in (2):
1) according to each the pattern p among the set of patterns P i, in Google, carry out each inquiry p respectively i
2) to each inquiry p iEach n among the total Query Result number of pages N that returns, if (Query Result is included in<em〉</em〉in the label), corresponding web page fragments S then extracted iWith source network address (URL) U that extracts correspondence i, all inquire about up to set of patterns P and to finish;
3) each concentrated fragment S of web page fragments i, make word frequency statistics C WiWith the non-noun W of rejecting n
In the above-mentioned Ontological concept attribute learning method, described step specifically comprises in (3):
1) each U that URL is concentrated i, extract corresponding web page contents and save as document d i
2) to each the document d among the document sets D i, do pre-service with OpenNLP;
3) if w iPart of speech be NN/NNS/NNP/NNPS, extract word w i, up to handling document sets D.
In the above-mentioned Ontological concept attribute learning method, described step specifically comprises in (4):
1) at subject layer, to each the descriptor z among the theme word set T, extracts hybrid parameter
Figure BDA00003324550300031
2) in document level, to every piece of document d among the document sets D, extract hybrid parameter
Figure BDA00003324550300032
Be worth as document length with extraction from Poisson distribution is individual, i.e. the length N of every piece of document d: Poiss (ξ);
3) 2) word layer under the condition, to word set N among the document d dIn each word n, extract theme
Figure BDA00003324550300033
With extraction term word
Figure BDA00003324550300034
4) continuous repeating step 1), 2), 3) three steps constitute generative process at random, up to D piece of writing document is all traveled through.
Technical scheme of the present invention is utilized in the process of Web as corpus solution pattern learning and the sparse problem of data often occurred, use the LDA model to prune and merge candidate's concept attribute dictionary, can improve the accuracy rate of extracting the result significantly, thereby make that constructing body semi-automatedly becomes possibility, lay the foundation for robotization makes up body.
Description of drawings
Fig. 1 is the model support composition of Ontological concept attribute study of the present invention;
Fig. 2 is the general frame figure of Fig. 1 model support composition;
Fig. 3 is LDA structure of models figure among Fig. 2;
The attributes extraction that Fig. 4 obtains in the car field for Fig. 1 model support composition is figure as a result.
Embodiment
Shown in the model support composition of Fig. 1, comprise the steps: according to the Ontological concept attribute learning method of the specific embodiment of the invention
1) vocabulary-syntactic pattern collection makes up module
Model function: therefore the language mode collection need at first make up set of patterns as the necessary input of Google inquiry.
According to present existing natural language processing technique, structure by pattern match, is identified interested relation in the text based on word, part of speech and semantic pattern rules set (being language mode).Research a kind of language mode---vocabulary-syntactic pattern (lexical-syntactic patterns) in the present embodiment, according to existing basic language set of patterns, utilize vocabulary-semantic pattern to make up and merge the verb form augmented pattern collection of expression relation of inclusion, the final set of patterns of expressing concept attribute of setting up is as the part of candidate's concept attribute word extraction algorithm input.
The implication of vocabulary-syntactic pattern can be found out from following Example intuitively: establishing target strings is cdabfdbab, and pattern string is ab, and put the first place that then finds substring identical with pattern string in the target strings after the pattern match is 3 and 8.Selecting car in the present embodiment is the concept theme, and its concept attribute detecting pattern is as shown in table 1.
Table 1 concept attribute detecting pattern
Figure BDA00003324550300041
Wherein, the NP in the common-mode can be any concept noun (being car in the present embodiment), and the black runic word in is exactly the attribute candidate word of car for example.
2) candidate's concept attribute dictionary makes up module
The module effect: the candidate's concept attribute based on Web extracts, and sets up candidate's concept attribute dictionary.
Search plain engine as Web Data Source (corpus) with Google, with the inquiry input of language mode collection as Google, extract corresponding webpage query fragment set and source network address URL set.The web page fragments that obtains according to inquiry then obtains candidate attribute word (the word frequency rate is more high, for the possibility of attribute word more big) according to word frequency statistics, just can obtain candidate's concept attribute word set through simple screening.
According to the extraction result of candidate's concept attribute extraction algorithm, the word frequency result of the part web page fragments of extracting in the present embodiment, candidate attribute word and corresponding attribute word thereof is as shown in table 2.
Table 2 part webpage extracts example as a result
Figure BDA00003324550300051
In the present embodiment, the employing language mode is carried out the extraction of candidate attribute word in Web after, because the concept attribute word all is the noun part of speech, therefore the word of rejecting non-noun part of speech finally obtains a candidate attribute dictionary.
3) text set makes up module
Model function: the candidate attribute dictionary can not be asserted final attribute word set, also needs to use the progress of LDA model to extract the relevant word of concept attribute.Text set is the individual important input of LDA model.
In candidate's concept attribute leaching process of above-mentioned Web, not only can obtain the candidate attribute dictionary, can also obtain source network address set.According to the attribute word in candidate's dictionary, keep its corresponding source network address and carry out the webpage extraction.
To the web document set of extracting, adopt the instrument of the increasing income OpenNLP of Apache to do basic pre-service, as part-of-speech tagging etc.The text set of forming with noun is as the part of LDA model input.Like this, in conjunction with the result of Gibbs sampling parameter estimation, just can use the LDA model to do the attribute word and extract.
4) the LDA model is pruned and is merged the candidate attribute library module
Module effect: candidate's concept attribute dictionary is pruned and merged with the extraction result of LDA model, improve the accuracy rate of attribute learning outcome.Specific algorithm can be expressed as follows with false code:
I. at subject layer, to each the descriptor z among the theme word set T, be from one be that the Multinomial that extracts the Dirichlet prior distribution of β distributes from parameter, namely extract hybrid parameter
Figure BDA00003324550300061
Ii to every piece of document among the document sets D, extracts a value as document length, i.e. the length N of every piece of document in document level from Poisson distribution d: Poiss (ξ), from the Dirichlet prior distribution that a parameter is α, extract again and there emerged a the Multinomial distribution as the probability that occurs word under each theme inside the document d, namely extract hybrid parameter
Figure BDA00003324550300062
Iii. the word layer under the ii condition namely for n word among the document d, extracts a theme during the Multinomial that at first occurs word under each theme from the document distributes
Figure BDA00003324550300063
And then in the Multinomial of the word of this theme correspondence distributes, extract a word as document d in word set N dIn each word n, namely extract the term word
Figure BDA00003324550300064
Iv. the continuous generative process at random that constitutes of repeating step i, ii, three steps of iii is up to D piece of writing document is all traveled through.
In the above-mentioned algorithm, w is observation data,
Figure BDA00003324550300065
θ and z are latent variables to be estimated, α and β be respectively in the model constant super parameter and With the Dirichlet priori on the θ, concrete variable information is as shown in table 3.
Table 3 LDA Model parameter implication
Figure BDA00003324550300067
Finally, operation LDA model is example with car, and it is as shown in table 4 to obtain extracting the result.According to the Ontological concept attribute learning method based on Web information that proposes, extracted the concept attribute word set in this field in the present embodiment.

Claims (4)

1. plant the Ontological concept attribute learning method based on Web information, it is characterized in that, comprise the steps:
(1) structure of vocabulary-syntactic pattern collection:
According to existing basic language set of patterns, utilize vocabulary-semantic pattern to make up and merge the verb form augmented pattern collection of expression relation of inclusion, the final set of patterns of expressing concept attribute of setting up is as the part of candidate's concept attribute extraction algorithm input;
(2) structure in candidate's concept attribute storehouse:
Search plain engine as the Web Data Source with Google, at first make up the language mode collection, as the inquiry input of Google, extract corresponding webpage query fragment set and source network address URL set; The web page fragments that obtains according to inquiry obtains the candidate attribute word according to word frequency statistics then, just obtains candidate's concept attribute word set through screening;
(3) structure of text set:
According to the attribute word in candidate's dictionary, keep its corresponding source network address URL and carry out the webpage extraction; To the web document set of extracting, adopt the instrument of the increasing income OpenNLP composition notebook pre-service of Apache, make the part of speech mark with OpenNLP;
(4) LDA prunes and merges the concept attribute collection:
According to the text set of input, in conjunction with the result of Gibbs sampling parameter estimation, operation LDA model; Extraction result according to LDA models for several times iteration prunes and merging candidate concept attribute dictionary, obtains final concept attribute set.
2. the Ontological concept attribute learning method based on Web information as claimed in claim 1 is characterized in that described step specifically comprises in (2):
1) according to each the pattern p among the set of patterns P i, in Google, carry out each inquiry p respectively i
2) to each inquiry p iEach n among the total Query Result number of pages N that returns is if Query Result is included in<em〉</em〉in the label, then extract corresponding web page fragments S iWith the source network address U that extracts correspondence i, all inquire about up to set of patterns P and to finish;
3) each concentrated fragment S of web page fragments i, make word frequency statistics C WiWith the non-noun W of rejecting n
3. the Ontological concept attribute learning method based on Web information as claimed in claim 1 is characterized in that described step specifically comprises in (3):
1) each U that URL is concentrated i, extract corresponding web page contents and save as document d i
2) to each the document d among the document sets D i, do pre-service with OpenNLP;
3) if w iPart of speech be NN/NNS/NNP/NNPS, extract word w i, up to handling document sets D.
4. the Ontological concept attribute learning method based on Web information as claimed in claim 1 is characterized in that described step specifically comprises in (4):
1) at subject layer, to each the descriptor z among the theme word set T, extracts hybrid parameter
Figure FDA00003324550200021
2) in document level, to every piece of document d among the document sets D, extract hybrid parameter
Figure FDA00003324550200022
With from Poisson distribution, extract a value as document length, i.e. the length N of every piece of document d: Poiss (ξ);
3) 2) word layer under the condition, to word set N among the document d dIn each word n, extract theme
Figure FDA00003324550200023
With extraction term word
Figure FDA00003324550200024
4) continuous repeating step 1), 2), 3) three steps constitute generative process at random, up to D piece of writing document is all traveled through.
CN201310229229.8A 2013-06-08 2013-06-08 Noumenon concept attribute learning method based on Web information Active CN103324700B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310229229.8A CN103324700B (en) 2013-06-08 2013-06-08 Noumenon concept attribute learning method based on Web information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310229229.8A CN103324700B (en) 2013-06-08 2013-06-08 Noumenon concept attribute learning method based on Web information

Publications (2)

Publication Number Publication Date
CN103324700A true CN103324700A (en) 2013-09-25
CN103324700B CN103324700B (en) 2017-02-01

Family

ID=49193443

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310229229.8A Active CN103324700B (en) 2013-06-08 2013-06-08 Noumenon concept attribute learning method based on Web information

Country Status (1)

Country Link
CN (1) CN103324700B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810156A (en) * 2014-01-17 2014-05-21 浙江大学 Method for extracting text information through secondary semantic annotation
CN103942274A (en) * 2014-03-27 2014-07-23 东莞中山大学研究院 Labeling system and method for biological medical treatment image on basis of LDA
CN103984681A (en) * 2014-03-31 2014-08-13 同济大学 News event evolution analysis method based on time sequence distribution information and topic model
CN104021222A (en) * 2014-06-26 2014-09-03 深圳信息职业技术学院 Labeling algorithm for biomedical image based on invisible dirichlet model
CN104636466A (en) * 2015-02-11 2015-05-20 中国科学院计算技术研究所 Entity attribute extraction method and system oriented to open web page
WO2017024553A1 (en) * 2015-08-12 2017-02-16 浙江核新同花顺网络信息股份有限公司 Information emotion analysis method and system
CN106919997A (en) * 2015-12-28 2017-07-04 航天信息股份有限公司 A kind of customer consumption Forecasting Methodology of the ecommerce based on LDA
CN107133283A (en) * 2017-04-17 2017-09-05 北京科技大学 A kind of Legal ontology knowledge base method for auto constructing
CN108090231A (en) * 2018-01-12 2018-05-29 北京理工大学 A kind of topic model optimization method based on comentropy
CN111149117A (en) * 2017-09-28 2020-05-12 甲骨文国际公司 Gradient-based automatic adjustment of machine learning and deep learning models
CN111460079A (en) * 2020-03-06 2020-07-28 华南理工大学 Topic generation method based on concept information and word weight
CN112395889A (en) * 2019-08-01 2021-02-23 林超伦 Machine-synchronized translation
CN113312910A (en) * 2021-05-25 2021-08-27 华南理工大学 Ontology learning method, system, device and medium based on topic model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100306046A1 (en) * 2009-06-01 2010-12-02 Click Group, Inc. Fast building and monitoring system and method for search engine marketing
CN102542027A (en) * 2011-12-22 2012-07-04 北京航空航天大学深圳研究院 Construction method of data integration system for studying ontology based on relation schema

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100306046A1 (en) * 2009-06-01 2010-12-02 Click Group, Inc. Fast building and monitoring system and method for search engine marketing
CN102542027A (en) * 2011-12-22 2012-07-04 北京航空航天大学深圳研究院 Construction method of data integration system for studying ontology based on relation schema

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YU JING 等: "An Ontology Term Extracting Method Based on Latent Dirichlet Allocation", 《2012 FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION NETWORKING AND SECURITY》 *
傅魁: "基于Web的本体学习研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810156B (en) * 2014-01-17 2017-01-18 浙江大学 Method for extracting text information through secondary semantic annotation
CN103810156A (en) * 2014-01-17 2014-05-21 浙江大学 Method for extracting text information through secondary semantic annotation
CN103942274A (en) * 2014-03-27 2014-07-23 东莞中山大学研究院 Labeling system and method for biological medical treatment image on basis of LDA
CN103942274B (en) * 2014-03-27 2017-11-14 东莞中山大学研究院 A kind of labeling system and method for the biologic medical image based on LDA
CN103984681A (en) * 2014-03-31 2014-08-13 同济大学 News event evolution analysis method based on time sequence distribution information and topic model
CN104021222A (en) * 2014-06-26 2014-09-03 深圳信息职业技术学院 Labeling algorithm for biomedical image based on invisible dirichlet model
CN104636466A (en) * 2015-02-11 2015-05-20 中国科学院计算技术研究所 Entity attribute extraction method and system oriented to open web page
WO2017024553A1 (en) * 2015-08-12 2017-02-16 浙江核新同花顺网络信息股份有限公司 Information emotion analysis method and system
US11481422B2 (en) 2015-08-12 2022-10-25 Hithink Royalflush Information Network Co., Ltd Method and system for sentiment analysis of information
US11868386B2 (en) 2015-08-12 2024-01-09 Hithink Royalflush Information Network Co., Ltd. Method and system for sentiment analysis of information
US10437871B2 (en) 2015-08-12 2019-10-08 Hithink Royalflush Information Network Co., Ltd. Method and system for sentiment analysis of information
US10831808B2 (en) 2015-08-12 2020-11-10 Hithink Royalflush Information Network Co., Ltd. Method and system for sentiment analysis of information
CN106919997A (en) * 2015-12-28 2017-07-04 航天信息股份有限公司 A kind of customer consumption Forecasting Methodology of the ecommerce based on LDA
CN107133283A (en) * 2017-04-17 2017-09-05 北京科技大学 A kind of Legal ontology knowledge base method for auto constructing
CN111149117A (en) * 2017-09-28 2020-05-12 甲骨文国际公司 Gradient-based automatic adjustment of machine learning and deep learning models
CN111149117B (en) * 2017-09-28 2023-09-19 甲骨文国际公司 Gradient-based automatic tuning of machine learning and deep learning models
CN108090231A (en) * 2018-01-12 2018-05-29 北京理工大学 A kind of topic model optimization method based on comentropy
CN112395889A (en) * 2019-08-01 2021-02-23 林超伦 Machine-synchronized translation
CN111460079B (en) * 2020-03-06 2023-03-28 华南理工大学 Topic generation method based on concept information and word weight
CN111460079A (en) * 2020-03-06 2020-07-28 华南理工大学 Topic generation method based on concept information and word weight
CN113312910A (en) * 2021-05-25 2021-08-27 华南理工大学 Ontology learning method, system, device and medium based on topic model

Also Published As

Publication number Publication date
CN103324700B (en) 2017-02-01

Similar Documents

Publication Publication Date Title
CN103324700B (en) Noumenon concept attribute learning method based on Web information
CN105528437B (en) A kind of question answering system construction method extracted based on structured text knowledge
CN112101041B (en) Entity relationship extraction method, device, equipment and medium based on semantic similarity
CN103440287B (en) A kind of Web question and answer searching system based on product information structure
CN108681557B (en) Short text topic discovery method and system based on self-expansion representation and similar bidirectional constraint
CN102200975B (en) Vertical search engine system using semantic analysis
CN110298033A (en) Keyword corpus labeling trains extracting tool
CN101710343A (en) Body automatic build system and method based on text mining
CN105022725A (en) Text emotional tendency analysis method applied to field of financial Web
CN106126620A (en) Method of Chinese Text Automatic Abstraction based on machine learning
CN101901247A (en) Vertical engine searching method and system for domain body restraint
CN105528422A (en) Focused crawler processing method and apparatus
CN105512347A (en) Information processing method based on geographic topic model
Hu et al. A survey of state-of-the-art short text matching algorithms
CN114997288A (en) Design resource association method
Hassan et al. Automatic document topic identification using wikipedia hierarchical ontology
CN105677684A (en) Method for making semantic annotations on content generated by users based on external data sources
CN102982063A (en) Control method based on tuple elaboration of relation keywords extension
CN115730078A (en) Event knowledge graph construction method and device for class case retrieval and electronic equipment
Yang et al. A topic-specific web crawler with web page hierarchy based on HTML Dom-Tree
Sabty et al. Techniques for named entity recognition on arabic-english code-mixed data
Mahajani et al. Ranking-based sentence retrieval for text summarization
Jiang et al. Bidirectional LSTM-CRF models for keyword extraction in Chinese sport news
CN112115269A (en) Webpage automatic classification method based on crawler
Kardana et al. A novel approach for keyword extraction in learning objects using text mining and WordNet

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant