CN103853823A - Online encyclopedia oriented entity attribute extraction method and system - Google Patents

Online encyclopedia oriented entity attribute extraction method and system Download PDF

Info

Publication number
CN103853823A
CN103853823A CN201410065743.7A CN201410065743A CN103853823A CN 103853823 A CN103853823 A CN 103853823A CN 201410065743 A CN201410065743 A CN 201410065743A CN 103853823 A CN103853823 A CN 103853823A
Authority
CN
China
Prior art keywords
entity attribute
attribute
page
rule
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410065743.7A
Other languages
Chinese (zh)
Other versions
CN103853823B (en
Inventor
程学旗
贾岩涛
张泽慧
王元卓
冯凯
熊锦华
许洪波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201410065743.7A priority Critical patent/CN103853823B/en
Publication of CN103853823A publication Critical patent/CN103853823A/en
Application granted granted Critical
Publication of CN103853823B publication Critical patent/CN103853823B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an online encyclopedia oriented entity attribute extraction method and system. The method comprises the steps: selecting a page from an online encyclopedia webpage text set T to be extracted, and extracting entity attribute expression rules of the page, so as to obtain a current rule set; extracting entity attribute of the online encyclopedia webpage text set T to be extracted by using the current rule set, extracting entity attribute expression rules of the T according to the entity attribute obtained through extracting, taking a rule set obtained through extracting as a current rule set, and repeating the process for k times, so as to obtain a final rule set; carrying out entity attribute extraction on the T by using the final rule set. The entity attribute extraction method provided by the invention can adapt to the change of text structures, is applicable to various online encyclopedias and has the effects of high recall rate and high accuracy.

Description

A kind of entity attribute abstracting method and system towards online encyclopaedia
Technical field
The present invention relates to areas of information technology, relate in particular to a kind of entity attribute abstracting method and system towards online encyclopaedia.
Background technology
Online encyclopaedia, claims again network encyclopaedia, is the open encyclopedia of consulting to online friend on the internet, and network encyclopaedia has two kinds of open and non-openings.User can utilize online encyclopaedia to inquire about easily in time various information resources.Participate in the construction of open encyclopaedia due to netizen, the information of online encyclopaedia is more open transparent, more enriches perfect simultaneously.Famous open network encyclopaedia has: wikipedia, popular encyclopaedia, Baidupedia, interactive encyclopaedia etc.
Online encyclopaedia is used for describing all kinds of entities for user's inquiry.Entity refers to the objective things in real world, be anyly in real world distinguish, discernible things.Entity not only can refer to the objective objects that can touch, can also refer to abstract event.Entity attribute refers to some essential characteristic characteristics of entity, and entity attribute contributes to people to understand comprehensively, objectively entity, and the description of more multipair this entity of entity attribute is just more detailed, and therefore entity attribute extracts important in inhibiting.
Online encyclopaedia is comprehensive and detailed for the description of entity, between the entity in online encyclopaedia and its corresponding description page, has man-to-man relation.In addition, the page structure of online encyclopaedia has certain rule, and each physical page has its part independently entity attribute being described, and entity attribute describes part semi-structured text often, is convenient to extract.At present, mainly use the method for rule-based (template) to extract the entity attribute of online encyclopaedia.But, due to the text structure difference of each online encyclopaedia, the rule that is used for the entity attribute that extracts each online encyclopaedia is also different, and therefore existing entity attribute abstracting method, often only for a certain online encyclopaedia, can not be applicable to other online encyclopaedias.
Summary of the invention
For addressing the above problem, the invention provides a kind of entity attribute abstracting method towards online encyclopaedia, described method comprises:
Step 1), in online encyclopaedia web page text set T to be extracted, select a page, extract the entity attribute of this page and express rule, obtain current regular collection;
Step 2), use current regular collection to carry out entity attribute extraction to described online encyclopaedia web page text set T to be extracted, and the entity attribute obtaining according to extraction extracts the entity attribute of T and expresses rule, the regular collection obtaining with extraction is as current regular collection and repeat this process k time, obtains final regular collection; Wherein k is nonnegative integer;
Step 3), use described final regular collection to carry out entity attribute extraction to T.
In one embodiment, step 1) comprises:
Step 11), in online encyclopaedia web page text set T to be extracted, select a page;
Step 12), mark the entity attribute of this page, obtain entity attribute set U;
Step 13), according to entity attribute set U, extract this page entity attribute express rule, obtain regular collection R.
In one embodiment, step 13) also comprises:
Express according to entity attribute the position that rule occurs in the described page, every entity attribute in R is expressed to rule and compose weight; Wherein, the entity attribute that appears at attribute description part in the described page is expressed regular weight and is greater than the entity attribute that appears at non-attribute description part in the described page and do not appear at attribute description part and expresses regular weight.
In one embodiment, step 2) comprising:
Step 21), service regeulations set R carries out entity attribute extraction to described online encyclopaedia web page text set T to be extracted;
Step 22), the position occurring in the page according to entity attribute and the entity attribute that extracts this entity attribute express regular weight, obtains entity attribute set U' from extracting the entity attribute obtaining;
Step 23), according to entity attribute set U' extract T entity attribute express rule, obtain regular collection R';
Step 24), R is updated to R' and returns to step 21) until this process has repeated k time, obtain final regular collection; Wherein k is nonnegative integer.
In one embodiment, step 22) comprising:
Step a), the position occurring in the page according to entity attribute and the entity attribute that extracts this entity attribute are expressed regular weight, and this entity attribute is composed to weight;
Step b), n the highest entity attribute of selection weighted value, obtain entity attribute set U'; Wherein n is positive integer.
In a further embodiment, step a) comprises:
The entity attribute that appears at attribute description part in the page is composed to weight α 1* β; And
The entity attribute that appears at non-attribute description part in the page and do not appear at attribute description part is composed to weight α 2* β;
Wherein, β is that the entity attribute that extracts this entity attribute is expressed regular weight, and α 2< α 1.
In one embodiment, step 22) also comprise: entity attribute set U is merged to U'.
In a further embodiment, step 24) also comprise: returning to step 21) time, U is updated to U'.
In one embodiment, step 23) also comprise:
Express according to entity attribute the position that rule occurs in the page, every entity attribute in regular collection R' is expressed to rule and compose weight; Wherein, the entity attribute that appears at attribute description part in the page is expressed regular weight and is greater than the entity attribute that appears at non-attribute description part in the page and do not appear at attribute description part and expresses regular weight.
In a further embodiment, step 24) also comprise:
The entity attribute obtaining in extraction is expressed in rule, and m the highest weighted value entity attribute expressed to rule as final regular collection; Wherein m is positive integer.
According to one embodiment of present invention, also provide a kind of entity attribute extraction system towards online encyclopaedia, described system comprises:
Rule device, for selecting a page at online encyclopaedia web page text set T to be extracted, extracts the entity attribute of this page and expresses rule, obtains current regular collection;
New regulation generating apparatus, for using current regular collection to carry out entity attribute extraction to described online encyclopaedia web page text set T to be extracted, and the entity attribute obtaining according to extraction extracts the entity attribute of T and expresses rule, the regular collection obtaining with extraction is as current regular collection and repeat this process k time, obtains final regular collection; Wherein k is nonnegative integer; And
Entity attribute draw-out device, for using described final regular collection to carry out entity attribute extraction to T.
The present invention can be adaptively to supplementing for the rule that extracts entity attribute and perfect, the rule after recycling this and supplementing and improve extracts all entity attributes of the page.Entity attribute abstracting method provided by the invention can adapt to the variation of text structure, is applicable to various online encyclopaedias, and has recall rate and the higher feature of accuracy rate.
Brief description of the drawings
Fig. 1 is the process flow diagram towards the entity attribute abstracting method of online encyclopaedia according to an embodiment of the invention;
Fig. 2 is according to an embodiment of the invention according to the process flow diagram of the method for entity attribute set decimation rule set;
Fig. 3 is the process flow diagram that generates according to an embodiment of the invention the method for new regulation set; And
Fig. 4 is the process flow diagram that the method for entity attribute set is extracted in service regeulations set according to an embodiment of the invention.
Embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is illustrated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.
According to one embodiment of present invention, provide a kind of entity attribute abstracting method towards online encyclopaedia.With reference to figure 1 and concise and to the point, the method comprises: obtain online encyclopaedia web page text, web page text is carried out pre-service, marks entity attribute, obtains rule, generates new regulation and extracts the entity attribute of online encyclopaedia web page text.These steps will be described respectively below:
Step S101, obtain the online encyclopaedia web page text of some
It will be understood by those skilled in the art that and can apply the web page text (or claiming page text) that API obtains online encyclopaedia with spiders or third party.
Step S102, obtained web page text is carried out to pre-service
In one embodiment, preprocessing process comprises the unwanted data such as the content in space, <ref> label all in removal web page text, obtains the web page text set T to be extracted of certain scale after pre-service.
Step S103, mark entity attribute
In one embodiment, can in web page text set T to be extracted, choose arbitrarily a page μ, in the attribute description part of page μ, entity attribute be marked, obtain entity attribute set U.Entity attribute can comprise attribute-name and property value (for example representing with the form of key-value pair).In one embodiment, the rule of mark is: make entity attribute set U contain as much as possible all entity attributes that attribute description part occurs.
Step S104, Rule
According to the entity attribute set U obtaining in previous step, the entity attribute in page μ is expressed to rule (being called for short rule) and extract, obtain regular collection R.As shown in Figure 2, this step comprises following sub-step:
4-1), utilize entity attribute set U to extract the rule in page μ, and remove the rule of repetition, obtain regular collection R.
It will be understood by those skilled in the art that and can use various prior aries, extract the rule in the page according to entity attribute.
4-2), the position that occurs in the page according to rule, the every rule in regular collection R is composed to weight beta.
Wherein, appear at the regular weight of attribute description part in the page and be greater than the regular weight that only appears at non-attribute description part and do not appear at attribute description part.This is the lower decision of degree of confidence of degree of confidence by attribute description part higher but not attribute description part, as known to those skilled in the art, the regular accuracy that appears at attribute description part (be property box in) is generally higher, and it is generally lower to appear at the regular accuracy of non-attribute description part (being that property box is outer).In one embodiment, give weight to rule and can be divided into following several situation:
If a rule appears at the attribute description part in page μ, this rule is composed to weighted value β 1;
If a rule appears at the non-attribute description part in page μ, this rule is composed to weight beta 2;
If a rule appears at attribute description part and non-attribute description part in page μ simultaneously, the attribute description part appearing in page μ according to this rule is processed, and composes weight beta 1.Wherein, 0 < β 2< β 1≤ 1, β 1and β 2can be to meet β 2< β 1and belong to (0,1] any number.
Step S105, generation new regulation
Generally, first this step is used the regular collection R obtaining from page μ to carry out entity attribute extraction to web page text set T to be extracted, obtain new entity attribute set U', recycling U' extracts the rule in all page T to be extracted, obtains new regular collection R'.R' is used as to U as R, U' and repeats this process (extracting and rule extraction process for the entity attribute of set T).After k wheel (k is nonnegative integer), the regular collection finally obtaining is filtered and obtains new regular collection R l.As shown in Figure 3, the step of generation new regulation comprises:
5-1), service regeulations set R carries out entity attribute extraction to web page text set T to be extracted.
Wherein, while calculating in the first round, this regular collection R is the regular collection obtaining from step S104.As shown in Figure 4, the extraction process of entity attribute comprises following sub-step:
A), service regeulations set R, web page text set T to be extracted is carried out entity attribute extraction and carries out duplicate removal, obtain entity attribute set U r.
B), according to the position of entity attribute appearance, to U rin entity attribute compose weight α.
In one embodiment, give weight to entity attribute and can be divided into following several situation:
If entity attribute appears at the attribute description part (being that entity attribute is described part) in the page, this entity attribute is composed to weight α 1* β;
If entity attribute appears at the non-attribute description part in the page, this entity attribute is composed to weight α 2* β;
If entity attribute appears at attribute description part and non-attribute description part in the page simultaneously, appear at attribute description part according to it and process, compose weight α 1* β.Wherein, β is the weight of the rule (being the rule in regular collection R) that extracts this entity attribute, and 0 < α 2< α 1≤ 1.
It will be understood by those skilled in the art that, the reason of setting like this entity attribute weight is: because the degree of confidence of attribute description part in the page is higher, therefore the accuracy of entity attribute that appears at attribute description part is higher, and it is lower to appear at the accuracy of entity attribute of non-attribute description part.Wherein, α 1and α 2can be to meet α 2< α 1and belong to (0,1] any number.
5-2), at entity attribute set U rin, n entity attribute choosing degree of confidence the highest (being that weighted value is the highest) according to weight α is as new entity attribute set U'.
In one embodiment, a n the highest weighted value entity attribute and entity attribute set U merging can be become to new entity attribute set U'.
5-3), according to new entity attribute set U', the rule in all pages to be extracted is extracted, comprise following sub-step:
A), utilize U' to carry out rule extraction to web page text set T to be extracted, and carry out duplicate removal to extracting the rule that obtains, obtain new regular collection R'.
B), the rule in regular collection R' is composed to weight beta.
Be similar to step S104, give weight to rule and comprise:
If rule appears at the attribute description part in the page, this rule is composed to weight beta 1;
If rule appears at the non-attribute description part in the page, this rule is composed to weight beta 2;
If rule appears at attribute description part and non-attribute description part simultaneously, appear at attribute description part according to it and process, compose weight beta 1.Wherein, 0 < β 2< β 1≤ 1.
5-4), by new entity attribute set U' be used as U, new regular collection R' is used as R, repeating step 5-1) to step 5-3) k wheel, obtain regular collection R k.
5-5), at regular collection R kin, m the rule of getting according to weight beta that weighted value is the highest is as final regular collection R l.
In one embodiment, can be by weighted value by R kin rule carry out descending sort, get front m% rule as final regular collection R l.
Step S106, use regular collection R lweb page text set T to be extracted is carried out to entity attribute extraction, and remove impurity, obtain final entity attribute set.
Should be understood that the impurity here comprises unwanted data, remove the impurity in result attribute-name and property value, and remove the entity attribute of repetition.
According to one embodiment of present invention, also provide a kind of entity attribute extraction system towards online encyclopaedia, comprise Rule device, new regulation generating apparatus and entity attribute draw-out device.
Rule device, for selecting a page at online encyclopaedia web page text set T to be extracted, extracts the entity attribute of this page and expresses rule, obtains current regular collection.
New regulation generating apparatus is for using current regular collection to carry out entity attribute extraction to online encyclopaedia web page text set T to be extracted, and the entity attribute obtaining according to extraction extracts the entity attribute of T and expresses rule, the regular collection obtaining with extraction is as current regular collection and repeat this process k time, obtains final regular collection.
Entity attribute draw-out device is used for using final regular collection to carry out entity attribute extraction to T.
For verifying the entity attribute abstracting method towards online encyclopaedia provided by the invention and the validity of system, inventor tests on the wikipedia Chinese data collection on June 25th, 2013, and this experiment parameter is as follows:
Choose arbitrarily 1000 physical page as page set to be extracted.In the wikipedia page, entity attribute is described the start-up portion that is partly positioned at web page text, finishes with " { { " mark starts, " } } " mark.β 1value is 1, β 2value is 0.8, α 1value is 1, α 2value is 0.8, n gets 100; Because experimental data collection is less, so k gets 5, m gets 50.
Through experiment, inventor has obtained following result: the number of the entity attribute extracting from each page is all more than the entity attribute number of part of describing attribute this page, and accuracy rate on average can reach more than 90%.
Should be noted that and understand, in the situation that not departing from the desired the spirit and scope of the present invention of accompanying claim, can make various amendments and improvement to the present invention of foregoing detailed description.Therefore, the scope of claimed technical scheme is not subject to the restriction of given any specific exemplary teachings.

Claims (11)

1. towards an entity attribute abstracting method for online encyclopaedia, comprising:
Step 1), in online encyclopaedia web page text set T to be extracted, select a page, extract the entity attribute of this page and express rule, obtain current regular collection;
Step 2), use current regular collection to carry out entity attribute extraction to described online encyclopaedia web page text set T to be extracted, and the entity attribute obtaining according to extraction extracts the entity attribute of T and expresses rule, the regular collection obtaining with extraction is as current regular collection and repeat this process k time, obtains final regular collection; Wherein k is nonnegative integer;
Step 3), use described final regular collection to carry out entity attribute extraction to T.
2. method according to claim 1, wherein, step 1) comprises:
Step 11), in online encyclopaedia web page text set T to be extracted, select a page;
Step 12), mark the entity attribute of this page, obtain entity attribute set U;
Step 13), according to entity attribute set U, extract this page entity attribute express rule, obtain regular collection R.
3. method according to claim 2, wherein, step 13) also comprises:
Express according to entity attribute the position that rule occurs in the described page, every entity attribute in R is expressed to rule and compose weight; Wherein, the entity attribute that appears at attribute description part in the described page is expressed regular weight and is greater than the entity attribute that appears at non-attribute description part in the described page and do not appear at attribute description part and expresses regular weight.
4. method according to claim 3, wherein, step 2) comprising:
Step 21), service regeulations set R carries out entity attribute extraction to described online encyclopaedia web page text set T to be extracted;
Step 22), the position occurring in the page according to entity attribute and the entity attribute that extracts this entity attribute express regular weight, obtains entity attribute set U' from extracting the entity attribute obtaining;
Step 23), according to entity attribute set U' extract T entity attribute express rule, obtain regular collection R';
Step 24), R is updated to R' and returns to step 21) until this process has repeated k time, obtain final regular collection; Wherein k is nonnegative integer.
5. method according to claim 4, wherein, step 22) comprising:
Step a), the position occurring in the page according to entity attribute and the entity attribute that extracts this entity attribute are expressed regular weight, and this entity attribute is composed to weight;
Step b), n the highest entity attribute of selection weighted value, obtain entity attribute set U'; Wherein n is positive integer.
6. method according to claim 5, wherein, step a) comprises:
The entity attribute that appears at attribute description part in the page is composed to weight α 1* β; And
The entity attribute that appears at non-attribute description part in the page and do not appear at attribute description part is composed to weight α 2* β;
Wherein, β is that the entity attribute that extracts this entity attribute is expressed regular weight, and α 2< α 1.
7. method according to claim 4, wherein, step 22) also comprise:
Entity attribute set U is merged to U'.
8. method according to claim 7, wherein, step 24) also comprise:
Returning to step 21) time, U is updated to U'.
9. method according to claim 4, wherein, step 23) also comprise:
Express according to entity attribute the position that rule occurs in the page, every entity attribute in regular collection R' is expressed to rule and compose weight; Wherein, the entity attribute that appears at attribute description part in the page is expressed regular weight and is greater than the entity attribute that appears at non-attribute description part in the page and do not appear at attribute description part and expresses regular weight.
10. method according to claim 9, wherein, step 24) also comprise:
The entity attribute obtaining in extraction is expressed in rule, and m the highest weighted value entity attribute expressed to rule as final regular collection; Wherein m is positive integer.
11. 1 kinds of entity attribute extraction systems towards online encyclopaedia, comprising:
Rule device, for selecting a page at online encyclopaedia web page text set T to be extracted, extracts the entity attribute of this page and expresses rule, obtains current regular collection;
New regulation generating apparatus, for using current regular collection to carry out entity attribute extraction to described online encyclopaedia web page text set T to be extracted, and the entity attribute obtaining according to extraction extracts the entity attribute of T and expresses rule, the regular collection obtaining with extraction is as current regular collection and repeat this process k time, obtains final regular collection; Wherein k is nonnegative integer; And
Entity attribute draw-out device, for using described final regular collection to carry out entity attribute extraction to T.
CN201410065743.7A 2014-02-26 2014-02-26 Online encyclopedia oriented entity attribute extraction method and system Active CN103853823B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410065743.7A CN103853823B (en) 2014-02-26 2014-02-26 Online encyclopedia oriented entity attribute extraction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410065743.7A CN103853823B (en) 2014-02-26 2014-02-26 Online encyclopedia oriented entity attribute extraction method and system

Publications (2)

Publication Number Publication Date
CN103853823A true CN103853823A (en) 2014-06-11
CN103853823B CN103853823B (en) 2017-01-18

Family

ID=50861478

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410065743.7A Active CN103853823B (en) 2014-02-26 2014-02-26 Online encyclopedia oriented entity attribute extraction method and system

Country Status (1)

Country Link
CN (1) CN103853823B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345625A (en) * 2017-01-25 2018-07-31 北京搜狗科技发展有限公司 A kind of information mining method and device, a kind of device for information excavating
CN109344346A (en) * 2018-08-14 2019-02-15 广州神马移动信息科技有限公司 Webpage information extracting method and device
CN112434530A (en) * 2019-08-06 2021-03-02 富士通株式会社 Information processing apparatus, information processing method, and computer program

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1588371A (en) * 2004-09-08 2005-03-02 孟小峰 Forming method for package device
US20080065621A1 (en) * 2006-09-13 2008-03-13 Kenneth Alexander Ellis Ambiguous entity disambiguation method
CN101464905A (en) * 2009-01-08 2009-06-24 中国科学院计算技术研究所 Web page information extraction system and method
US20100185689A1 (en) * 2009-01-20 2010-07-22 Microsoft Corporation Enhancing Keyword Advertising Using Wikipedia Semantics
CN102262658A (en) * 2011-07-13 2011-11-30 东北大学 Method for extracting web data from bottom to top based on entity
CN102495847A (en) * 2011-11-16 2012-06-13 浙江盘石信息技术有限公司 Network commodity information extraction method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1588371A (en) * 2004-09-08 2005-03-02 孟小峰 Forming method for package device
US20080065621A1 (en) * 2006-09-13 2008-03-13 Kenneth Alexander Ellis Ambiguous entity disambiguation method
CN101464905A (en) * 2009-01-08 2009-06-24 中国科学院计算技术研究所 Web page information extraction system and method
US20100185689A1 (en) * 2009-01-20 2010-07-22 Microsoft Corporation Enhancing Keyword Advertising Using Wikipedia Semantics
CN102262658A (en) * 2011-07-13 2011-11-30 东北大学 Method for extracting web data from bottom to top based on entity
CN102495847A (en) * 2011-11-16 2012-06-13 浙江盘石信息技术有限公司 Network commodity information extraction method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BING L等: "Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning", 《PROCEEDINGS OF THE SIXTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING》 *
张苇如等: "基于维基百科和模式聚类的实体关系抽取方法", 《中文信息学报》 *
李红亮等: "基于规则的百科人物属性抽取", 《集成技术》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345625A (en) * 2017-01-25 2018-07-31 北京搜狗科技发展有限公司 A kind of information mining method and device, a kind of device for information excavating
CN108345625B (en) * 2017-01-25 2022-09-30 北京搜狗科技发展有限公司 Information mining method and device for information mining
CN109344346A (en) * 2018-08-14 2019-02-15 广州神马移动信息科技有限公司 Webpage information extracting method and device
CN112434530A (en) * 2019-08-06 2021-03-02 富士通株式会社 Information processing apparatus, information processing method, and computer program

Also Published As

Publication number Publication date
CN103853823B (en) 2017-01-18

Similar Documents

Publication Publication Date Title
Pournarakis et al. A computational model for mining consumer perceptions in social media
CN103778207B (en) The topic method for digging of the news analysiss based on LDA
JP6661790B2 (en) Method, apparatus and device for identifying text type
CN102722709B (en) Method and device for identifying garbage pictures
CN105930425A (en) Personalized video recommendation method and apparatus
CN102253996B (en) Multi-visual angle stagewise image clustering method
CN106250550A (en) A kind of method and apparatus of real time correlation news content recommendation
CN103559199B (en) Method for abstracting web page information and device
CN104239539A (en) Microblog information filtering method based on multi-information fusion
CN103336766A (en) Short text garbage identification and modeling method and device
CN102855312A (en) Domain-and-theme-oriented Web service clustering method
CN104731768A (en) Incident location extraction method oriented to Chinese news texts
CN103761239A (en) Method for performing emotional tendency classification to microblog by using emoticons
CN103425686B (en) A kind of information issuing method and device
CN103294664A (en) Method and system for discovering new words in open fields
CN105975639B (en) Search result ordering method and device
CN103793501A (en) Theme community discovery method based on social network
CN104199846A (en) Comment subject term clustering method based on Wikipedia
CN112183078B (en) Text abstract determining method and device
CN103810251A (en) Method and device for extracting text
CN106296288A (en) A kind of commodity method of evaluating performance under assessing network text guiding
CN107392392A (en) Microblogging forwarding Forecasting Methodology based on deep learning
Bai et al. SR-LDA: Mining effective representations for generating service ecosystem knowledge maps
CN103853823A (en) Online encyclopedia oriented entity attribute extraction method and system
CN105138684A (en) Information processing method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant