CN103853823B - Online encyclopedia oriented entity attribute extraction method and system - Google Patents

Online encyclopedia oriented entity attribute extraction method and system Download PDF

Info

Publication number
CN103853823B
CN103853823B CN201410065743.7A CN201410065743A CN103853823B CN 103853823 B CN103853823 B CN 103853823B CN 201410065743 A CN201410065743 A CN 201410065743A CN 103853823 B CN103853823 B CN 103853823B
Authority
CN
China
Prior art keywords
entity attribute
attribute
rule
page
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410065743.7A
Other languages
Chinese (zh)
Other versions
CN103853823A (en
Inventor
程学旗
贾岩涛
张泽慧
王元卓
冯凯
熊锦华
许洪波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201410065743.7A priority Critical patent/CN103853823B/en
Publication of CN103853823A publication Critical patent/CN103853823A/en
Application granted granted Critical
Publication of CN103853823B publication Critical patent/CN103853823B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

The invention provides an online encyclopedia oriented entity attribute extraction method and system. The method comprises the steps: selecting a page from an online encyclopedia webpage text set T to be extracted, and extracting entity attribute expression rules of the page, so as to obtain a current rule set; extracting entity attribute of the online encyclopedia webpage text set T to be extracted by using the current rule set, extracting entity attribute expression rules of the T according to the entity attribute obtained through extracting, taking a rule set obtained through extracting as a current rule set, and repeating the process for k times, so as to obtain a final rule set; carrying out entity attribute extraction on the T by using the final rule set. The entity attribute extraction method provided by the invention can adapt to the change of text structures, is applicable to various online encyclopedias and has the effects of high recall rate and high accuracy.

Description

A kind of entity attribute abstracting method towards online encyclopaedia and system
Technical field
The present invention relates to areas of information technology, more particularly, to a kind of entity attribute abstracting method towards online encyclopaedia and be System.
Background technology
Online encyclopaedia, also known as network encyclopaedia, is to be disclosed to the encyclopedia that online friend consults on the internet, network encyclopaedia has Open and two kinds of non-opening.User can timely and conveniently inquire about various information resources using online encyclopaedia.Simultaneously because netizen Participate in the construction of open encyclopaedia, the information of online encyclopaedia more open transparent, more rich perfect.Famous open network encyclopaedia Have: wikipedia, popular encyclopaedia, Baidupedia, interactive encyclopaedia etc..
Online encyclopaedia is used for describing all kinds of entities for user's inquiry.Entity refers to the objective things in real world, is existing Any in the real world distinguish, discernible things.Entity not only can refer to the objective objects that can touch, and can also refer to abstract Event.Entity attribute refers to some basic feature characteristics of entity, and entity attribute contributes to people and comprehensively, objectively understands in fact Body, the more descriptions to this entity of entity attribute are more detailed, and therefore entity attribute extracts important in inhibiting.
Online encyclopaedia for entity description comprehensively and in detail, the corresponding description page of the entity in online encyclopaedia it Between have man-to-man relation.Additionally, the page structure of online encyclopaedia has certain rule, each physical page has it independent The part that entity attribute is described, and entity attribute description section is often semi-structured text, is easy to extract. At present, the main entity attribute extracting online encyclopaedia using the method for rule-based (template).However, because each is online The text structure of encyclopaedia is different, and the rule for extracting the entity attribute of each online encyclopaedia is also different, therefore existing Entity attribute abstracting method often only for a certain online encyclopaedia, be not applied for other online encyclopaedias.
Content of the invention
For solving the above problems, the present invention provides a kind of entity attribute abstracting method towards online encyclopaedia, methods described Including:
Step 1), in online encyclopaedia web page text set t to be extracted select a page, extract the entity of this page Attribute display rule, obtains current rule set and closes;
Step 2), using current rule set close entity genus is carried out to described online encyclopaedia web page text set t to be extracted Property extract, and according to extracting the entity attribute that obtains and extracting the entity attribute expression rule of t, with extracting the regular collection obtaining Merge as current rule set and repeat this process k time, obtain final regular collection;Wherein k is nonnegative integer;
Step 3), using described final regular collection, entity attribute extraction is carried out to t.
In one embodiment, step 1) includes:
Step 11), in online encyclopaedia web page text set t to be extracted select a page;
Step 12), mark the entity attribute of this page, obtain entity attribute set u;
Step 13), according to entity attribute set u, extract the entity attribute expression rule of this page, obtain regular collection r.
In one embodiment, step 13) also includes:
The position being occurred in the described page according to entity attribute expression rule, to every entity attribute expression rule in r Then assign weight;Wherein, occur in attribute description part in the described page entity attribute expression rule weight be more than occur in In the described page non-attribute description part and do not appear in attribute description part entity attribute expression rule weight.
In one embodiment, step 2) include:
Step 21), using regular collection r, entity attribute is carried out to described online encyclopaedia web page text set t to be extracted Extract;
Step 22), the position that occurred in the page according to entity attribute and the entity attribute extracting this entity attribute The weight of display rule, obtains entity attribute set u' from extracting the entity attribute obtaining;
Step 23), according to entity attribute set u' extract t entity attribute expression rule, obtain regular collection r';
Step 24), r is updated to r' return to step 21) until this process is repeated k time, obtain final regular collection; Wherein k is nonnegative integer.
In one embodiment, step 22) include:
Step a), the position being occurred in the page according to entity attribute and the entity attribute table extracting this entity attribute Reach the weight of rule, weight is assigned to this entity attribute;
Step b), selection weighted value highest n entity attribute, obtain entity attribute set u';Wherein n is positive integer.
In a further embodiment, step a) includes:
The entity attribute that will appear in attribute description part in the page assigns weight α1*β;And
Will appear in non-attribute description part in the page and do not appear in the entity attribute of attribute description part and assign weight α2*β;
Wherein, β is the weight of the entity attribute expression rule extracting this entity attribute, and α2< α1.
In one embodiment, step 22) also include: entity attribute set u is merged into u'.
In a further embodiment, step 24) also include: in return to step 21) when, u is updated to u'.
In one embodiment, step 23) also include:
The position being occurred in the page according to entity attribute expression rule, to every entity attribute table in regular collection r' Reach rule and assign weight;Wherein, occur in attribute description part in the page entity attribute expression rule weight be more than occur in In the page non-attribute description part and do not appear in attribute description part entity attribute expression rule weight.
In a further embodiment, step 24) also include:
In extracting the entity attribute expression rule obtaining, using regular for the expression of weighted value highest m entity attribute as Final regular collection;Wherein m is positive integer.
According to one embodiment of present invention, a kind of entity attribute extraction system towards online encyclopaedia is also provided, described System includes:
Rule device, for selecting a page in online encyclopaedia web page text set t to be extracted, extracting should The entity attribute expression rule of the page, obtains current rule set and closes;
New regulation generating means, for being closed to described online encyclopaedia web page text set to be extracted using current rule set T carries out entity attribute extraction, and according to extracting the entity attribute expression rule of the entity attribute extraction t obtaining, with extracting The regular collection arriving repeats this process k time as current rule set merging, obtains final regular collection;Wherein k is that non-negative is whole Number;And
Entity attribute draw-out device, for carrying out entity attribute extraction using described final regular collection to t.
The present invention can adaptively be supplemented and perfect to the rule for extracting entity attribute, recycles this supplement To extract all of entity attribute of the page with the rule after improving.The entity attribute abstracting method that the present invention provides can adapt to literary composition The change of this structure is it is adaptable to various online encyclopaedia, and has the characteristics that recall rate and accuracy rate are higher.
Brief description
Fig. 1 is the flow chart of the entity attribute abstracting method towards online encyclopaedia according to an embodiment of the invention;
Fig. 2 is the flow process of the method according to entity attribute set decimation rule set according to an embodiment of the invention Figure;
Fig. 3 is the flow chart of the method generating new regulation set according to an embodiment of the invention;And
Fig. 4 is the flow process of the method that use regular collection according to an embodiment of the invention extracts entity attribute set Figure.
Specific embodiment
With reference to the accompanying drawings and detailed description the present invention is illustrated.It should be appreciated that it is described herein concrete Embodiment only in order to explain the present invention, is not intended to limit the present invention.
According to one embodiment of present invention, provide a kind of entity attribute abstracting method towards online encyclopaedia.With reference to Fig. 1 And briefly, the method includes: obtain online encyclopaedia web page text, web page text is carried out pretreatment, mark entity attribute, Obtain rule, the entity attribute generating new regulation and extracting online encyclopaedia web page text.These step will be respectively described below Rapid:
Step s101, a number of online encyclopaedia web page text of acquisition
It will be understood by those skilled in the art that online encyclopaedia can be obtained using spiders or third-party application api Web page text (or claiming page text).
Step s102, pretreatment is carried out to acquired web page text
In one embodiment, preprocessing process includes removing all of space in web page text, interior in<ref>label The unwanted data such as appearance, obtain the web page text set t to be extracted of certain scale after pretreatment.
Step s103, mark entity attribute
In one embodiment, arbitrarily a page μ can be chosen, page μ's in web page text set t to be extracted Attribute description is partly labeled to entity attribute, obtains entity attribute set u.Entity attribute may include attribute-name and property value (for example being represented in the form of key-value pair).In one embodiment, the rule of mark is: makes entity attribute set u as far as possible How to cover all entity attributes of attribute description part appearance.
Step s104, Rule
According to the entity attribute set u obtaining in previous step, to the entity attribute expression rule (referred to as rule) in page μ Extracted, obtained regular collection r.As shown in Fig. 2 this step includes following sub-step:
4-1), utilize entity attribute set u to extract the rule in page-out μ, and remove the rule of repetition, obtain rule Set r.
It will be understood by those skilled in the art that various prior arts can be used, extract the rule in the page according to entity attribute Then.
4-2), the position being occurred in the page according to rule, assigns weight beta to the every rule in regular collection r.
Wherein, the weight occurring in the rule of attribute description part in the page is greater than and only occurs in non-attribute description part Without occurring in the weight of the rule of attribute description part.This is higher rather than attribute is retouched by the confidence level of attribute description part State the relatively low decision of confidence level of part, as known to those skilled in the art, occur in attribute description part (i.e. property box Interior) rule accuracy general relatively higher, and occur in the accuracy of the rule of non-attribute description part (i.e. attribute outer frame) Typically relatively low.In one embodiment, rule imparting weight can be divided into following several situation:
If a rule occurs in the attribute description part in page μ, weighted value β is assigned to this rule1
If a rule occurs in the non-attribute description part in page μ, weight beta is assigned to this rule2
If a rule simultaneously appears in attribute description part and non-attribute description part in page μ, according to this The attribute description part that rule occurs in page μ is processed, and assigns weight beta1.Wherein, 0 < β2< β1≤ 1, β1And β2Can be Meet β2< β1And belong to (0,1] any number.
Step s105, generation new regulation
Generally, this step is entered to web page text set t to be extracted first by the regular collection r obtaining from page μ Row entity attribute extracts, and obtains new entity attribute set u', recycles u' that the rule in all webpage t to be extracted is taken out Take, obtain new regular collection r'.R' is regarded r, u' and repeats this process (i.e. for the entity attribute of set t as u Extract and rule extraction process).After k takes turns (k is nonnegative integer), the regular collection finally giving is carried out be filtrated to get newly Regular collection rl.As shown in figure 3, the step generating new regulation includes:
5-1), using regular collection r, entity attribute extraction is carried out to web page text set t to be extracted.
Wherein, when calculating the first round, this regular collection r is the regular collection obtaining from step s104.As shown in figure 4, The extraction process of entity attribute includes following sub-step:
A), use regular collection r, entity attribute is carried out to web page text set t to be extracted and extracts and carry out duplicate removal, obtain To entity attribute set ur.
B), the position being occurred according to entity attribute, to urIn entity attribute assign weight α.
In one embodiment, entity attribute imparting weight can be divided into following several situation:
If entity attribute occurs in the attribute description part (i.e. entity attribute description section) in the page, to this entity Attribute assigns weight α1*β;
If entity attribute occurs in the non-attribute description part in the page, weight α is assigned to this entity attribute2*β;
If entity attribute simultaneously appears in attribute description part and non-attribute description part in the page, go out according to it Attribute description part is processed now, assigns weight α1*β.Wherein, β is rule (the i.e. regular collection extracting this entity attribute Rule in r) weight, and 0 < α2< α1≤1.
It will be understood by those skilled in the art that the reason so set entity attribute weight is: because in the page, attribute is retouched The confidence level stating part is higher, and the accuracy therefore appearing in the entity attribute of attribute description part is higher, and occurs in non-genus The accuracy of the entity attribute of property description section is relatively low.Wherein, α1And α2Can be to meet α2< α1And belong to (0,1] any Number.
5-2), in entity attribute set urIn, n according to weight α selection confidence level highest (i.e. weighted value highest) is real Body attribute is as new entity attribute set u'.
In one embodiment, can merge weighted value highest n entity attribute with entity attribute set u becomes new Entity attribute set u'.
5-3), according to new entity attribute set u', the rule in all pages to be extracted is extracted, including following Sub-step:
A), using u', web page text set t to be extracted is carried out with rule extraction, and go to extracting the rule obtaining Weight, obtains new regular collection r'.
B), weight beta is assigned to the rule in regular collection r'.
Similar to step s104, give weight to rule and include:
If rule occurs in the attribute description part in the page, weight beta is assigned to this rule1
If rule occurs in the non-attribute description part in the page, weight beta is assigned to this rule2
If rule simultaneously appears in attribute description part and non-attribute description part, occur in attribute description according to it Part is processed, and assigns weight beta1.Wherein, 0 < β2< β1≤1.
5-4), by new entity attribute set u' as u, new regular collection r' as r, repeat step 5-1) to step 5-3) k wheel, obtains regular collection rk.
5-5), in regular collection rkIn, according to weight beta weighting weight values highest m rule as final regular collection rl.
In one embodiment, can be by weighted value by rkIn rule carry out descending sort, take front m% rule as Whole regular collection rl.
Step s106, use regular collection rlEntity attribute extraction is carried out to web page text set t to be extracted, and removes miscellaneous Matter, obtains final entity attribute set.
It should be understood that impurity here includes unwanted data, that is, remove the impurity in result attribute-name and property value, and And remove the entity attribute of repetition.
According to one embodiment of present invention, a kind of entity attribute extraction system towards online encyclopaedia is also provided, including Rule device, new regulation generating means and entity attribute draw-out device.
Rule device is used for selecting a page in online encyclopaedia web page text set t to be extracted, and extracting should The entity attribute expression rule of the page, obtains current rule set and closes.
New regulation generating means are used for being closed using current rule set and online encyclopaedia web page text set t to be extracted are carried out Entity attribute extracts, and according to extracting the entity attribute expression rule of the entity attribute extraction t obtaining, with extracting the rule obtaining Then gather to merge as current rule set and repeat this process k time, obtain final regular collection.
Entity attribute draw-out device is used for carrying out entity attribute extraction using final regular collection to t.
For checking present invention offer towards the entity attribute abstracting method of online encyclopaedia and the effectiveness of system, inventor The wikipedia Chinese data collection on June 25th, 2013 is tested, this experiment parameter is as follows:
Arbitrarily choose 1000 physical page as page set to be extracted.In the wikipedia page, entity attribute Description section is located at the start-up portion of web page text, is terminated with " { { " mark starts, " } } " mark.β1Value is 1, β2Value is 0.8、α1Value is 1, α2Value is that 0.8, n takes 100;Because experimental data set is less, so k takes 5, m to take 50.
Through experiment, inventor has obtained following result: the number of the entity attribute extracting from each page is all more than The entity attribute number of the part of attribute described in this page, and accuracy rate averagely can reach more than 90%.
It should be noted that and understanding, in the feelings without departing from the spirit and scope of the present invention required by appended claims Under condition, various modifications and improvements can be made to the present invention of foregoing detailed description.It is therefore desirable to the model of the technical scheme of protection Enclose and do not limited by given any specific exemplary teachings.

Claims (10)

1. a kind of entity attribute abstracting method towards online encyclopaedia, comprising:
Step 1), in online encyclopaedia web page text set t to be extracted select a page, extract the entity attribute of this page Display rule, the position being occurred in the described page according to described entity attribute expression rule, to every entity attribute expression rule Then assign weight, to obtain current rule set conjunction;Wherein, the entity attribute expression rule of attribute description part in the described page are occurred in Weight then is more than the entity attribute occurring in non-attribute description part in the described page and not appearing in attribute description part The weight of display rule;
Step 2), closed using current rule set and entity attribute is carried out to described online encyclopaedia web page text set t to be extracted take out Take, and the entity attribute being obtained according to extraction, extract the entity attribute expression rule of t, with extracting the rule set cooperation obtaining Merge for current rule set and repeat this process k time, obtain final regular collection;Wherein k is nonnegative integer;
Step 3), using described final regular collection, entity attribute extraction is carried out to t.
2. method according to claim 1, wherein, step 1) include:
Step 11), in online encyclopaedia web page text set t to be extracted select a page;
Step 12), mark the entity attribute of this page, obtain entity attribute set u;
Step 13), according to entity attribute set u, extract the entity attribute expression rule of this page, obtain regular collection r.
3. method according to claim 2, wherein, step 2) include:
Step 21), using regular collection r, entity attribute extraction is carried out to described online encyclopaedia web page text set t to be extracted;
Step 22), according to the entity attribute position occurring in the page and extract this entity attribute entity attribute expression The weight of rule, obtains entity attribute set u' from extracting the entity attribute obtaining;
Step 23), according to entity attribute set u' extract t entity attribute expression rule, obtain regular collection r';
Step 24), r is updated to r' return to step 21) until this process is repeated k time, obtain final regular collection;Wherein K is nonnegative integer.
4. method according to claim 3, wherein, step 22) include:
Step a), according to the entity attribute position occurring in the page and extract this entity attribute entity attribute expression rule Weight then, assigns weight to this entity attribute;
Step b), selection weighted value highest n entity attribute, obtain entity attribute set u';Wherein n is positive integer.
5. method according to claim 4, wherein, step a) includes:
The entity attribute that will appear in attribute description part in the page assigns weight α1*β;And
Will appear in non-attribute description part in the page and do not appear in the entity attribute of attribute description part and assign weight α2*β;
Wherein, β is the weight of the entity attribute expression rule extracting this entity attribute, and α2< α1.
6. method according to claim 3, wherein, step 22) also include:
Entity attribute set u is merged into u'.
7. method according to claim 6, wherein, step 24) also include:
In return to step 21) when, u is updated to u'.
8. method according to claim 3, wherein, step 23) also include:
The position being occurred in the page according to entity attribute expression rule, to every entity attribute expression rule in regular collection r' Then assign weight;Wherein, occur in attribute description part in the page entity attribute expression rule weight be more than occur in the page In non-attribute description part and do not appear in attribute description part entity attribute expression rule weight.
9. method according to claim 8, wherein, step 24) also include:
In the entity attribute expression rule that extraction obtains, using weighted value highest m entity attribute expression rule as finally Regular collection;Wherein m is positive integer.
10. a kind of entity attribute extraction system towards online encyclopaedia, comprising:
Rule device, for selecting a page in online encyclopaedia web page text set t to be extracted, extracts this page Entity attribute expression rule, according to described entity attribute expression rule in the described page occur position, to every entity Attribute display rule assigns weight, to obtain current rule set conjunction;Wherein, occur in the entity of attribute description part in the described page The weight of attribute display rule is more than and occurs in the described page non-attribute description part and do not appear in attribute description part Entity attribute expression rule weight;
New regulation generating means, for closing described online encyclopaedia web page text set t to be extracted is entered using current rule set Row entity attribute extracts, and according to extracting the entity attribute that obtains and extract the entity attribute expression rule of t, is obtained with extracting Regular collection repeats this process k time as current rule set merging, obtains final regular collection;Wherein k is nonnegative integer;With And
Entity attribute draw-out device, for carrying out entity attribute extraction using described final regular collection to t.
CN201410065743.7A 2014-02-26 2014-02-26 Online encyclopedia oriented entity attribute extraction method and system Active CN103853823B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410065743.7A CN103853823B (en) 2014-02-26 2014-02-26 Online encyclopedia oriented entity attribute extraction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410065743.7A CN103853823B (en) 2014-02-26 2014-02-26 Online encyclopedia oriented entity attribute extraction method and system

Publications (2)

Publication Number Publication Date
CN103853823A CN103853823A (en) 2014-06-11
CN103853823B true CN103853823B (en) 2017-01-18

Family

ID=50861478

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410065743.7A Active CN103853823B (en) 2014-02-26 2014-02-26 Online encyclopedia oriented entity attribute extraction method and system

Country Status (1)

Country Link
CN (1) CN103853823B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345625B (en) * 2017-01-25 2022-09-30 北京搜狗科技发展有限公司 Information mining method and device for information mining
CN109344346A (en) * 2018-08-14 2019-02-15 广州神马移动信息科技有限公司 Webpage information extracting method and device
CN112434530A (en) * 2019-08-06 2021-03-02 富士通株式会社 Information processing apparatus, information processing method, and computer program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1588371A (en) * 2004-09-08 2005-03-02 孟小峰 Forming method for package device
CN101464905A (en) * 2009-01-08 2009-06-24 中国科学院计算技术研究所 Web page information extraction system and method
CN102262658A (en) * 2011-07-13 2011-11-30 东北大学 Method for extracting web data from bottom to top based on entity
CN102495847A (en) * 2011-11-16 2012-06-13 浙江盘石信息技术有限公司 Network commodity information extraction method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080065621A1 (en) * 2006-09-13 2008-03-13 Kenneth Alexander Ellis Ambiguous entity disambiguation method
US8768960B2 (en) * 2009-01-20 2014-07-01 Microsoft Corporation Enhancing keyword advertising using online encyclopedia semantics

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1588371A (en) * 2004-09-08 2005-03-02 孟小峰 Forming method for package device
CN101464905A (en) * 2009-01-08 2009-06-24 中国科学院计算技术研究所 Web page information extraction system and method
CN102262658A (en) * 2011-07-13 2011-11-30 东北大学 Method for extracting web data from bottom to top based on entity
CN102495847A (en) * 2011-11-16 2012-06-13 浙江盘石信息技术有限公司 Network commodity information extraction method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning;Bing L等;《Proceedings of the sixth ACM international conference on Web search and data mining》;20131231;567-576 *
基于维基百科和模式聚类的实体关系抽取方法;张苇如等;《中文信息学报》;20120331;75-81 *
基于规则的百科人物属性抽取;李红亮等;《集成技术》;20130531;1-4 *

Also Published As

Publication number Publication date
CN103853823A (en) 2014-06-11

Similar Documents

Publication Publication Date Title
Frantzeskou et al. Effective identification of source code authors using byte-level information
CN107463607B (en) Method for acquiring and organizing upper and lower relations of domain entities by combining word vectors and bootstrap learning
CN103336766B (en) Short text garbage identification and modeling method and device
CN103218444B (en) Based on semantic method of Tibetan language webpage text classification
CN103678310B (en) The sorting technique and device of Web page subject
CN102262625B (en) Method and device for extracting keywords of page
CN108920466A (en) A kind of scientific text keyword extracting method based on word2vec and TextRank
Moraitis et al. Magnetic helicity and eruptivity in active region 12673
CN104573046A (en) Comment analyzing method and system based on term vector
CN103942340A (en) Microblog user interest recognizing method based on text mining
CN104484343A (en) Topic detection and tracking method for microblog
CN103559199B (en) Method for abstracting web page information and device
CN106021383A (en) Method and device for computing similarity of webpages
CN103176961A (en) Transfer learning method based on latent semantic analysis
CN107329954B (en) Topic detection method based on document content and mutual relation
CN103853823B (en) Online encyclopedia oriented entity attribute extraction method and system
CN104731768A (en) Incident location extraction method oriented to Chinese news texts
CN104504087A (en) Low-rank decomposition based delicate topic mining method
CN106469144A (en) Text similarity computing method and device
CN111143547A (en) Big data display method based on knowledge graph
CN107436931B (en) Webpage text extraction method and device
Gnehm et al. Evaluation of transfer learning and domain adaptation for analyzing german-speaking job advertisements
CN106294323A (en) The method that short text is carried out common-sense causal reasoning
Ptaszynski et al. Brute Force Works Best Against Bullying.
CN106844743B (en) Emotion classification method and device for Uygur language text

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant