CN104102738B - A kind of method and device for expanding entity storehouse - Google Patents

A kind of method and device for expanding entity storehouse Download PDF

Info

Publication number
CN104102738B
CN104102738B CN201410364026.4A CN201410364026A CN104102738B CN 104102738 B CN104102738 B CN 104102738B CN 201410364026 A CN201410364026 A CN 201410364026A CN 104102738 B CN104102738 B CN 104102738B
Authority
CN
China
Prior art keywords
entity
entity word
field
word
storehouse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410364026.4A
Other languages
Chinese (zh)
Other versions
CN104102738A (en
Inventor
梁爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201410364026.4A priority Critical patent/CN104102738B/en
Publication of CN104102738A publication Critical patent/CN104102738A/en
Application granted granted Critical
Publication of CN104102738B publication Critical patent/CN104102738B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a kind of method and device for expanding entity storehouse, this method includes:Structural data is obtained from resources bank;Entity word is identified from the field contents of the preset implication field of the structural data;The entity word is screened according to preset rules;If the entity word filtered out is not appeared in entity storehouse, the entity word is added in the entity storehouse, to expand the entity storehouse, the accuracy for expanding entity word in entity storehouse can be improved.

Description

A kind of method and device for expanding entity storehouse
Technical field
The present invention relates to internet information processing technology field, and in particular to a kind of method and device for expanding entity storehouse.
Background technology
With the continuous development of communication technology and network, people carry out various knowledge and letter by internet more and more The search of breath.Content supplier provided in internet content make it is all with can coequally browse per family, create, improve content and put down Platform.
Such as Baidupedia, wikipedia, interactive encyclopaedia etc., Internet user can be allowed to be found by encyclopaedia website Oneself desired comprehensive, accurate, objective definitional information, carries out the inquiry of similar theme for other users and browses, with Corresponding knowledge or reference are just provided.For example, entry is the based fragmentation unit of content contained by encyclopaedia website, an entry tool There are one or more single themes, for illustrating a things, a personage or the knowledge such as combination for possessing particular topic Content.Include the entry of a myriad of in encyclopaedia website, these encyclopaedia entries can greatly improve retrieval accuracy and The coverage rate of retrieval, and be conducive to extract structural data from webpage, vertical search can be carried out, is obtained more accurate Information.
As the wide-scale distribution of information and people exchange the continuous extension of content, new term emerges in an endless stream.Greatly send out Existing valuable entry, the entity storehouse for expanding encyclopaedia website is the important goal of encyclopaedia product.Common implementation be all from In existing data, entity word that may be present in text is analyzed using text dividing, it is encyclopaedia reality to judge which entity word Present in body storehouse, which is not present in encyclopaedia entity storehouse, and the entity word that will be not present increases in encyclopaedia entity storehouse.But It is that there are the problem of text dividing and inaccurate Attribute Recognition for this scheme.
The content of the invention
In view of this, the embodiment of the present invention provides a kind of method and device for expanding entity storehouse, to overcome existing encyclopaedia There are the problem of text dividing and inaccurate Attribute Recognition for the expansion of entity storehouse.
In a first aspect, an embodiment of the present invention provides a kind of method for expanding entity storehouse, including:
Structural data is obtained from resources bank;
Entity word is identified from the field contents of the preset implication field of the structural data;
The entity word is screened according to preset rules;
If the entity word filtered out is not appeared in entity storehouse, the entity word is added in the entity storehouse, To expand the entity storehouse.
Second aspect, the embodiment of the present invention additionally provide a kind of device for expanding entity storehouse, including:
Structural data recognition unit, for obtaining structural data from resources bank;
Entity word recognition unit, for identifying reality from the field contents of the preset implication field of the structural data Pronouns, general term for nouns, numerals and measure words;
Entity word screening unit, for being screened to the entity word according to preset rules;
Entity word adding device, if the entity word for filtering out is not appeared in entity storehouse, by the entity word It is added in the entity storehouse, to expand the entity storehouse.
The technical solution of the embodiment of the present invention from resources bank by obtaining structural data, from the word of preset implication field Entity word is identified in section content, after being screened, the entity word not appeared in entity storehouse is added in entity storehouse, to expand Fill the entity storehouse.Since the preset implication field of structural data has inherently carried out word content cutting, and it is corresponding In certain implication, so therefrom effectively obtain the probability higher of entity word, the accurate of entity word in expansion entity storehouse can be improved Property.
Brief description of the drawings
To describe the technical solutions in the embodiments of the present invention more clearly, institute in being described below to the embodiment of the present invention Attached drawing to be used is needed to be briefly described, it should be apparent that, drawings in the following description are only some implementations of the present invention Example, for those of ordinary skill in the art, without creative efforts, can also implement according to the present invention The content of example and these attached drawings obtain other attached drawings.
Fig. 1 is the method flow diagram for expanding entity storehouse described in the embodiment of the present invention one;
Fig. 2 is the sectional drawing of the first example table included in example entry in Baidupedia;
Fig. 3 is the sectional drawing of the second example table included in example entry in Baidupedia;
Fig. 4 is the method flow diagram for expanding entity storehouse described in the embodiment of the present invention two;
Fig. 5 is the structure diagram of the device for expanding entity storehouse described in the embodiment of the present invention three.
Embodiment
For make present invention solves the technical problem that, the technical solution that uses and the technique effect that reaches it is clearer, below The technical solution of the embodiment of the present invention will be described in further detail with reference to attached drawing, it is clear that described embodiment is only It is part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those skilled in the art exist All other embodiments obtained under the premise of creative work are not made, belong to the scope of protection of the invention.
Further illustrate technical scheme below with reference to the accompanying drawings and specific embodiments.
Embodiment one
Fig. 1 is the method flow diagram for expanding entity storehouse that the embodiment of the present invention one provides, and the present embodiment is applicable to utilize Structural data in resources bank expands the situation in entity storehouse, and the entity word alleged by the present embodiment refers to noun and pronoun, also may be used Further refer to the noun and pronoun for meeting preset condition.Entity storehouse is then the data for storing the relevant information of each entity word Storehouse, can cross the acquisition provided to the user to entity word related data.For example, entity word refers to the subject name of entry in encyclopaedia, Entry is the based fragmentation unit of content contained by encyclopaedia website, entry include entity word, the explanation to the entity word and with this The relevant information of entity word.In addition, the entity storehouse of other classifications, such as music property storehouse, commodity entity storehouse, can also use music Title, trade name etc. are used as entity word, and the correlative detail data of each entity word are stored in entity storehouse, as music background is situated between Continue, the commodity place of production etc..
The method of the present embodiment can be performed by the device for configuring expansion entity storehouse in the server, as shown in Figure 1, The method in the expansion entity storehouse described in the present embodiment includes:
S101, obtain structural data from resources bank.
Structural data refers to data being respectively stored at least one preset implication field, can usually use bivariate table Architecture logic is expressed to realize, all structural datas of data in relevant database, hereof, structural data Including form, chart, the isostructural data of report.The preset implication that data in preset implication field meet the field will Ask, there is certain general character, such as be all name, be all address etc..The data of structured storage are carried out by preset implication field Preliminary division, data have certain attributive character.
Resources bank alleged by the present embodiment can be any form of data source, such as database, file bag, web page resources Storehouse, electronic document etc., as long as structural data can be obtained therefrom, and can excavate to needs in the structural data and extend to The entity word in entity storehouse.
Since the purpose of the present embodiment is to expand entity storehouse, content included in used resources bank is preferably and this The higher content of the content degree of correlation in entity storehouse.Also, it is used to introduce present in the related data of entity word in entity storehouse Other entity words are more, and relevance is stronger, are adapted as the instrument of extension.If, can be with for example, for expanding encyclopaedia entity storehouse It is preferred that using encyclopaedia resources bank as resources bank.By taking singer as an example, dependency number that " Liu Dehua " this entity word is introduced In, may the entity word such as other star personages associated with this singer, song, film it is more, then from the phase of existing entity word Close and searched in structural data and filter out having higher success rate for the entity word of extension.
S102, identify entity word from the field contents of the preset implication field of the structural data.
Since structural data can be realized with bivariate table structure come logical expression, so the same field of structural data The field contents general category of (row i.e. in structural data) is identical.When needing to expand entity storehouse, the present embodiment can pass through The classification of the entity word expanded as needed, sets the setting condition of field or enumerates and meet the expansion target with reference to target is expanded Field, filtered out from acquired structural data meet it is described expansion target preset implication field, obtain the knot The field contents of field are screened in structure data, acquisition entity word is identified to acquired field contents.If some words The field contents of section are unable to Direct Recognition and go out entity word, and entity word identification operation is performed again i.e. after can field contents be carried out with cutting Can.
If for example, target for expand personage's classification in entity word, can set condition judgment field whether comprising " person ", Word or words such as " members ", " people " and " performer ", can also enumerate the field " figure " for meeting the expansion target, " director ", The field name such as " cooperation performer " and " singer ", can be from encyclopaedia entry " Liu Dehua " by taking enumerated field title as an example Structural data " film of taking part in a performance " form in filter out " figure ", " director " and " cooperation performer " these three fields As preset implication field, as shown in Figure 2.It can also be filtered out " singer " from " being created for other people " form in the entry This field is as preset implication field, as shown in Figure 3.
Wherein, reality can be gone out with Direct Recognition from the field contents of " figure ", " director " and " singer " field Pronouns, general term for nouns, numerals and measure words, and, it is necessary to carry out cutting by branch to identify entity word after extraction field contents from " cooperation performer " field.
S103, screen the entity word according to preset rules.
The preset rules can be set according to the expansion target in entity storehouse, for example, number of words in the entity word is more than The entity word of predetermined threshold value filters out, the entity word for belonging to blacklist is filtered out, and/or will belong to the entity word of preset kind Filter out (such as comprising sequence number, time, additional character).
It should be noted that the preset rules may include the screening rule for the field contents of all preset implication fields Then, the preset rules may also include the screening rule of the field contents for each preset implication field respectively.
If S104, the entity word filtered out are not appeared in entity storehouse, the entity word is added to the entity In storehouse, to expand the entity storehouse.
After repeating to set entity word, operation S103 to obtain entity word, also need to judge whether entity word has gone out In present entity storehouse, the entity word not appeared in the entity storehouse is added in the entity storehouse.
The technical solution of the present embodiment from resources bank by obtaining structural data, out of, preset implication field field Entity word is identified in appearance, after being screened, the entity word not appeared in entity storehouse is added in entity storehouse, can be eliminated real Pronouns, general term for nouns, numerals and measure words ambiguity, can reduce the scope to structural data identification.Since the preset implication field of structural data is inherently right Word content has carried out cutting, and corresponds to certain implication, so therefrom effectively obtaining the probability higher of entity word, Ke Yiti The accuracy and efficiency of high entity word identification, can improve the accuracy and efficiency for expanding entity storehouse.
Embodiment two
Fig. 4 is the method flow diagram for expanding entity storehouse described in the embodiment of the present invention two, and the present embodiment by encyclopaedia to be provided Structural data in the storehouse of source discloses a kind of method for expanding entity storehouse exemplified by expanding encyclopaedia entity storehouse, as shown in figure 4, this implementation The method in the expansion entity storehouse described in example includes:
S401, obtain structural data from encyclopaedia entity storehouse.
Preferably, the resources bank can be the encyclopaedia entity storehouse, i.e., come from the encyclopaedia entity storehouse inner excavated entity word Expand itself.
In general, retrieval and data management for convenience, the existing entity word in encyclopaedia entity storehouse are classified, Such as it is divided into the classes such as song, film, personage, nature, culture, geography, history, life, society, art, economy, science and technology, physical culture Not, or some classifications also have further deeper classification.Therefore, it is described from resources bank in order to improve hit rate The operation of structural data is obtained, can be more preferably the classification phase of the entity word expanded from the encyclopaedia entity storehouse with needs Structural data is obtained in associated classification.For example, it is desired to expand the entity word of the movies category in encyclopaedia entity storehouse, and with electricity The classification that shadow classification is associated is movies category and personage's classification, then only needs the movies category from encyclopaedia entity storehouse and figure kind Not middle acquisition structural data, to reduce the seeking scope of structural data, so as to improve the efficiency for expanding entity storehouse.
S402, the preset implication field for obtaining the structural data.
When needing to expand encyclopaedia entity storehouse, the present embodiment can by the classification of the entity word expanded as needed, with reference to Expand target to set the setting condition of field or enumerate the field for meeting the expansion target, sieved from acquired structural data The preset implication field for meeting the expansion target is selected, such as may filter that the fields such as time, address, obtains the knot The field contents of field are screened in structure data, acquisition entity word is identified to acquired field contents.
S403, obtain the structural data preset implication field field contents.
If the field contents of some fields, which are unable to Direct Recognition, goes out entity word, performed again after cutting can be carried out to field contents Entity word identification operation.
S404, filter out field contents there are internal chaining.
Internal chaining alleged by the present embodiment refers to interior chain, i.e., inside entity storehouse, if there is the correlation of some entity word Data, then when this entity word is appeared in the related data of other entity words, can for this entity word establish internal links, so as to Family is able to conveniently find the entity word related data of oneself.Such as in encyclopaedia entity storehouse, can be to wherein relating to inside each entry And the existing entry addition internal chaining arrived, the web placements of other entries involved by entry is found by internal chaining for user And classification.Such as " play the part of angle in structural data " film of the taking part in a performance " form (as shown in Figure 2) in encyclopaedia entry " Liu Dehua " In this row of color ", some field contents with the addition of internal chaining, some are not added with internal chaining (delineation content as shown in Figure 2).Bag The content for including addition internal chaining is had occurred in the entity word of encyclopaedia, without addition, therefore, in order to improve efficiency, is obtaining word After section content, it can be filtered before entity word identification is carried out.
For example, " play the part of from structural data " film of the taking part in a performance " form (as shown in Figure 2) in encyclopaedia entry " Liu Dehua " The other entity word of figure kind is identified in role ", " director " and " cooperation performer " these three preset implication fields, is being obtained After these field contents, the field contents there will be internal chaining filter out, and only obtain that to be not added with internal chaining (as shown in Figure 2 Draw a circle to approve content).And for example, identified from " being created for other people " form (as shown in Figure 3) " song title " this row in the entry Go out the entity word of song classification, after the field contents for filtering out internal chaining, only obtain the circle for being not added with internal chaining (as shown in Figure 3) Determine content.Screened in advance by filtering that there are the field contents of internal chaining, the scope of entity word identification can be reduced, so that Efficiency can be improved.
S405, identify entity word from the field contents after filtering.
S406, screen the entity word according to preset rules.
S407, carry out duplicate removal processing to the entity word.
It should be noted that this operation can carry out after screening, can also be carried out before screening.By to being identified Entity word carries out duplicate removal processing, can further reduce the number of the entity word in operation 408, while is avoided that repetition is added.
If S408, the entity word are not appeared in the entity word of encyclopaedia, the entity word is added to encyclopaedia entity In storehouse.
The present embodiment is by taking the structural data in by encyclopaedia resources bank expands encyclopaedia entity storehouse as an example, in embodiment one On basis, the operation for filtering out the field contents there are internal chaining is added, and adds and entity word is carried out at duplicate removal The operation of reason, can further improve the efficiency for expanding entity storehouse.
Embodiment three
Fig. 5 is the structure diagram of the device for expanding entity storehouse described in the embodiment of the present invention three, as shown in figure 5, this implementation The device in the expansion entity storehouse described in example includes:
Structural data recognition unit 501, for obtaining structural data from resources bank;
Entity word recognition unit 502, for being identified from the field contents of the preset implication field of the structural data Go out entity word;
Entity word screening unit 503, for being screened to the entity word according to preset rules;
Entity word adding device 504, if the entity word for filtering out is not appeared in entity storehouse, by the entity Word is added in the entity storehouse, to expand the entity storehouse.
Further, the resources bank is encyclopaedia resources bank.
Further, the entity word recognition unit 502 is specifically used for:
Obtain the field contents of the preset implication field of the structural data;
If internal chaining is not present in the field contents in the resources bank, reality is identified from the field contents Pronouns, general term for nouns, numerals and measure words.
Further, the entity word screening unit 503 is specifically used for:
The entity word for meeting at least one of following is filtered out:Number of words is more than the entity of predetermined threshold value in the entity word Word, the entity word for belonging to blacklist, the entity word comprising predetermined symbol and the entity word for belonging to preset kind.
Further, the entity word screening unit 503 is additionally operable to:The entity word is being added in the entity storehouse Operation before, further include:Duplicate removal processing is carried out to the entity word.
The device provided in this embodiment for expanding entity storehouse can perform the embodiment of the present invention one and embodiment two provided Expand the method in entity storehouse, possess the corresponding function module of execution method and beneficial effect.
Above example provide technical solution in all or part of content can be realized by software programming, its software Program storage is in the storage medium that can be read, and storage medium is for example:Hard disk, CD or floppy disk in computer.
Note that it above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that The invention is not restricted to specific embodiment described here, can carry out for a person skilled in the art various obvious changes, Readjust and substitute without departing from protection scope of the present invention.Therefore, although being carried out by above example to the present invention It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also It can include other more equivalent embodiments, and the scope of the present invention is determined by scope of the appended claims.

Claims (10)

  1. A kind of 1. method for expanding entity storehouse, it is characterised in that including:
    Obtain structural data from resources bank, the structural data refers to data being respectively stored at least one preset contain In adopted field, expression is realized with bivariate table architecture logic;
    Entity word is identified from the field contents of the preset implication field of the structural data, wherein, the preset implication Field is the classification of the entity word expanded as needed, sets the setting condition of field with reference to target is expanded or enumerates and meet expansion The field of target is filled, what is filtered out from acquired structural data meets the field of the expansion target;
    The entity word is screened according to preset rules;
    If the entity word filtered out is not appeared in entity storehouse, the entity word is added in the entity storehouse, to expand Fill the entity storehouse.
  2. 2. according to the method described in claim 1, it is characterized in that, the resources bank is encyclopaedia resources bank.
  3. 3. the according to the method described in claim 2, it is characterized in that, field of the preset implication field from the structural data Identify that the operation of entity word specifically includes in content:
    Obtain the field contents of the preset implication field of the structural data;
    If internal chaining is not present in the field contents in the resources bank, entity is identified from the field contents Word.
  4. 4. the according to the method described in claim 1, it is characterized in that, behaviour screened to the entity word according to preset rules Specifically include:
    The entity word for meeting at least one of following is filtered out:Number of words is more than the entity word of predetermined threshold value, belongs in the entity word Entity word in blacklist, the entity word comprising predetermined symbol and the entity word for belonging to preset kind.
  5. 5. according to the method described in claim 1, it is characterized in that, in the behaviour being added to the entity word in the entity storehouse Before work, further include:Duplicate removal processing is carried out to the entity word.
  6. A kind of 6. device for expanding entity storehouse, it is characterised in that including:
    Structural data recognition unit, for obtaining structural data from resources bank, the structural data refers to data It is respectively stored at least one preset implication field, expression is realized with bivariate table architecture logic;
    Entity word recognition unit, for identifying entity from the field contents of the preset implication field of the structural data Word, wherein, the preset implication field is the classification of the entity word expanded as needed, and field is set with reference to target is expanded Setting condition enumerates the field for meeting and expanding target, and what is filtered out from acquired structural data meets the expansion mesh Target field;
    Entity word screening unit, for being screened to the entity word according to preset rules;
    Entity word adding device, if the entity word for filtering out is not appeared in entity storehouse, the entity word is added Into the entity storehouse, to expand the entity storehouse.
  7. 7. device according to claim 6, it is characterised in that the resources bank is encyclopaedia resources bank.
  8. 8. device according to claim 7, it is characterised in that the entity word recognition unit is specifically used for:
    Obtain the field contents of the preset implication field of the structural data;
    If internal chaining is not present in the field contents in the resources bank, entity is identified from the field contents Word.
  9. 9. device according to claim 6, it is characterised in that the entity word screening unit is specifically used for, under meeting The entity word for stating at least one filters out:Number of words is more than the entity word of predetermined threshold value, belongs to the reality of blacklist in the entity word Pronouns, general term for nouns, numerals and measure words, the entity word comprising predetermined symbol and the entity word for belonging to preset kind.
  10. 10. device according to claim 6, it is characterised in that the entity word screening unit is additionally operable to:By the reality Pronouns, general term for nouns, numerals and measure words is added to before the operation in the entity storehouse, further includes:Duplicate removal processing is carried out to the entity word.
CN201410364026.4A 2014-07-28 2014-07-28 A kind of method and device for expanding entity storehouse Active CN104102738B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410364026.4A CN104102738B (en) 2014-07-28 2014-07-28 A kind of method and device for expanding entity storehouse

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410364026.4A CN104102738B (en) 2014-07-28 2014-07-28 A kind of method and device for expanding entity storehouse

Publications (2)

Publication Number Publication Date
CN104102738A CN104102738A (en) 2014-10-15
CN104102738B true CN104102738B (en) 2018-04-27

Family

ID=51670891

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410364026.4A Active CN104102738B (en) 2014-07-28 2014-07-28 A kind of method and device for expanding entity storehouse

Country Status (1)

Country Link
CN (1) CN104102738B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106168947A (en) * 2016-07-01 2016-11-30 北京奇虎科技有限公司 A kind of related entities method for digging and system
CN110309355B (en) * 2018-06-15 2023-05-16 腾讯科技(深圳)有限公司 Content tag generation method, device, equipment and storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005056361A (en) * 2003-08-07 2005-03-03 Sony Corp Information processor and method, program, and storage medium
US7698293B2 (en) * 2005-01-28 2010-04-13 Microsoft Corporation System and methods for capturing structure of data models using entity patterns
CN101901235B (en) * 2009-05-27 2013-03-27 国际商业机器公司 Method and system for document processing
JP5315368B2 (en) * 2011-02-28 2013-10-16 株式会社日立製作所 Document processing device
CN103106189B (en) * 2011-11-11 2016-04-27 北京百度网讯科技有限公司 A kind of method and apparatus excavating synonym attribute word
CN102495892A (en) * 2011-12-09 2012-06-13 北京大学 Webpage information extraction method
CN103425660B (en) * 2012-05-15 2017-10-17 北京百度网讯科技有限公司 The acquisition methods and device of a kind of entry
CN103440287B (en) * 2013-08-14 2016-12-28 广东工业大学 A kind of Web question and answer searching system based on product information structure

Also Published As

Publication number Publication date
CN104102738A (en) 2014-10-15

Similar Documents

Publication Publication Date Title
US10180967B2 (en) Performing application searches
CN101364239B (en) Method for auto constructing classified catalogue and relevant system
CN108268580A (en) The answering method and device of knowledge based collection of illustrative plates
CN104111941B (en) The method and apparatus that information is shown
US10713291B2 (en) Electronic document generation using data from disparate sources
US20200301987A1 (en) Taste extraction curation and tagging
JP2009093653A (en) Refining search space responding to user input
CN104978332B (en) User-generated content label data generation method, device and correlation technique and device
JP2009093651A (en) Modeling topics using statistical distribution
Yao et al. Bursty event detection from collaborative tags
JP2009093650A (en) Selection of tag for document by paragraph analysis of document
US20170109358A1 (en) Method and system of determining enterprise content specific taxonomies and surrogate tags
Baralis et al. Analysis of twitter data using a multiple-level clustering strategy
Faralli et al. Automatic acquisition of a taxonomy of microblogs users’ interests
JP2009093647A (en) Determination for depth of word and document
CN111708805A (en) Data query method and device, electronic equipment and storage medium
CN110222194A (en) Data drawing list generation method and relevant apparatus based on natural language processing
CN107977420A (en) The abstract extraction method, apparatus and readable storage medium storing program for executing of a kind of evolved document
US10650191B1 (en) Document term extraction based on multiple metrics
CN108255963A (en) A kind of control method and device of the News Retrieval based on internet
Bhardwaj et al. A novel approach for content extraction from web pages
CN104102738B (en) A kind of method and device for expanding entity storehouse
CN104102739B (en) A kind of method and device for expanding entity storehouse
CN104239314A (en) Search word expanding method and system
JP2008210335A (en) Consciousness system construction system, consciousness system construction method, and consciousness system construction program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant