CN104133868B - A kind of strategy integrated for the classification of vertical reptile data - Google Patents

A kind of strategy integrated for the classification of vertical reptile data Download PDF

Info

Publication number
CN104133868B
CN104133868B CN201410347438.7A CN201410347438A CN104133868B CN 104133868 B CN104133868 B CN 104133868B CN 201410347438 A CN201410347438 A CN 201410347438A CN 104133868 B CN104133868 B CN 104133868B
Authority
CN
China
Prior art keywords
classification
reptile
category
data
mapping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410347438.7A
Other languages
Chinese (zh)
Other versions
CN104133868A (en
Inventor
吴梅红
洪志令
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN201410347438.7A priority Critical patent/CN104133868B/en
Publication of CN104133868A publication Critical patent/CN104133868A/en
Application granted granted Critical
Publication of CN104133868B publication Critical patent/CN104133868B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of strategy integrated for the classification of vertical reptile data, including taxonomic hierarchies and its mapping mechanism and dynamic cataloging Data Integration mechanism two parts, taxonomic hierarchies and its mapping mechanism therein comprise the following steps:1) structure of reference category system;2) structure of the classification system of reptile targeted website;3) structure of classification system mapping mechanism.The strategy can effectively integrate carries out data grabber and the classification system for data obtain after unstructured parsing by vertical reptile, and keeps the integrality of source taxonomic hierarchies, while can also realize and enter Mobile state tracking to source taxonomic hierarchies.

Description

A kind of strategy integrated for the classification of vertical reptile data
Technical field
The present invention relates to vertical search engine technical field, is used for the classification of vertical reptile data more particularly, to one kind and integrates Strategy.
Background technology
With the explosive increase of info web, the use value of search engine also more and more higher, turning into the network user must Indispensable instrument, provides the user information navigation and inquiry services.It incorporates web page resources numerous on internet, according to The keyword of user's inquiry provides correlation webpage, and is the entrance of whole internet according to relevance ranking.At present, it is comprehensive Search engine is the main force for providing the user inquiry service, but its is comprehensive, determines that it can not meet professional crowd couple The precision information requirement service of specialized field.User is diversification to the demand of information, therefore the service mould of search engine Formula will also be segmented, and provide more accurate trade information for different field, this promotes the flourishing of vertical search Development.
Once searched for using comprehensive search engine, return to the Query Result of user may have it is up to ten thousand or even more than one hundred million Bar, although relative to immense incomparable Web information, the filtering of very limits has been carried out, the information returned is still use The magnanimity information that family can not possibly all browse, so as to huge waste caused by between user's inquiry request and network search service And contrast.And vertical search engine is occurred for this problem, as more there is specialty targetedly search engine, it is only The information searched in specific area.Because the ambit of covering is single-minded, information content is relative to be greatly reduced, and this reduces adopt Collect the difficulty of information, while improve the quality of information.
Vertical search engine is the professional search engine for some field, and it passes through to certain class specialty on internet Information be acquired, integrate, then extraction is oriented according to the corresponding structure of specialized information, at non-structured information The information of structuring is managed into, then user is returned in the form of information list.
A part for most critical is web page searcher in search engine, and by a reptile, (Crawler, also known as webpage are grabbed for it Take device, network robot, Web Spider) program composition.So-called crawlers refer to can automatically, never stoppingly on network The program of search and webpage.During operational network reptile, as long as providing minimal amount of start page, information acquisition device just can be by certain Rule roamed along the hyperlink on webpage on network, resource information is collected, until traveling through whole website.Its performance It largely have impact on the scale of search engine site.
For the scale of current internet, the web crawlers of single machine operation far can not be completed in the effective time The task of the interior whole web networks of search, therefore the web crawlers used now all distributions are run parallel on multimachine, are claimed For distributed reptile.Reptile controller plays a part of concentration control, its all reptile end of management, the equilibrium of guarantee web resource energy, It is unduplicated to be crawled by each reptile.Page analyzer carries out analysis filtering to the web page resources crawled, filters out substantial amounts of html marks Label and junk information, finally give valuable web page content information.
The strategy of existing universal search is to try to obtain data, but the processing level to data, than relatively low, protrusion is asked Topic is exactly:Invalid information excessive (noise data is more), effective information deficiency, effective information are unstructured, returning result is without individual character Change Optimization Mechanism.
For example, there is a respective network address taxonomic hierarchies different classification Web side navigation website at present, such as hao123.com, 2345.com wait.When we want to integrate the resource of these websites, data grabber is carried out and to data by vertical reptile A series of categorical data can be obtained after carrying out unstructured parsing, at this moment we face some such problems:It is how effective Integrate these classification systemsHow the integrality of source taxonomic hierarchies is keptHow Mobile state tracking is entered to source taxonomic hierarchiesSolve These problems are the emphasis and difficult point studied at this stage.
The content of the invention
It is a primary object of the present invention to overcome drawbacks described above of the prior art, propose that one kind is used for vertical reptile data The strategy that classification is integrated.
The present invention adopts the following technical scheme that:
A kind of strategy integrated for the classification of vertical reptile data, it is characterised in that:Including taxonomic hierarchies and its mapping machine System and dynamic cataloging Data Integration mechanism two parts, taxonomic hierarchies and its mapping mechanism therein comprise the following steps:
1) structure of reference category system;
2) structure of the classification system of reptile targeted website;
3) structure of classification system mapping mechanism.
Preferably, the step 1) uses tri-layer classification application build reference category system, the system include category IDs, Big classification, middle classification and small classification, the benchmark analogy system have its corresponding category content.
Preferably, the step 2) uses the classification system of the multiple reptile targeted websites of tri-layer classification application build, often The classification system of individual reptile targeted website includes category IDs, big classification, middle classification and small classification, and each reptile targeted website Classification system has its corresponding category content.
Preferably, in step 2), for each targeted website for treating reptile, a corresponding reptile mesh is all each created Mark the classification system of website;During reptile, the grouped data that unstructured parsing obtains is put into respective reptile target Categories of websites system, and be then put into each self-corresponding category content for the details of particular content.
Preferably, described category content includes the specifying information and its source web of website.
Preferably, in step 3), the structure of classification system mapping mechanism refers to that reference category system provides its infima species Other specific name, the classification system of reptile targeted website also provide its infima species other specific name, pass through the two infima species Other specific name is matched to establish mapping relations.
Preferably, described dynamic cataloging Data Integration mechanism refers to, by the classification that reptile obtains according to following several feelings Condition carries out integrated operation:
Processing in the presence of what A was newly added be sorted in reference category system:In reptile targeted website Corresponding mapping position in the map architecture option for classifying and insert obtained new category IDs structure is added in classification system;
What B was newly added is sorted in the processing in the case of being not present in reference category system:Classification is added first and is obtained New category IDs, semantic similar classification is found according to item name, if the classification of high similar semantic can be obtained, by new class Other ID inserts corresponding mapping position in the map architecture option of structure;Otherwise, new classification is added, and in the map architecture option of structure Add new mapping relations;
The processing in the case of categories combination in C reference category systems:Merge phase in the map architecture option of corresponding structure The classification answered;
Processing in D reference category systems in the case of classification division:D1. the classification in reference category system is split into two Individual classification, one type do not replace original classification, add a classification again again in addition;D2. by each reptile targeted website Manual sort is re-started with the classification of former reference category mapping, is mapped in two new classifications;
Processing in the case of classification is added and deleted in E reference category systems:New category is added in reference category system Afterwards, new category ID and its minimum classification after having added are put into corresponding map architecture option;And for deleting situation, then only need Related category is deleted in reference category system.
Preferably, in the situation B, the determination methods of described high similar semantic classification are mainly according to two class names The Similarity Measure of title obtains:For two item name W1And W2, W1Including concept set { S11,S12,…,S1n, W2Including Concept set { S21,S22,…,S2m, then W1And W2Similarity it is as follows:M, n represent two classifications to be compared respectively The number of concept notional word corresponding to title;Each concept notional word of concept set includes the former feature of following four justice:First is basic Adopted former description, the former description of other basic meanings, the former description of relation justice, relational symbol description;The similarity of two concept notional words is designated as The weighted sum of the part similarity of the former feature of aforementioned four justice, i.e., two concept realities are calculated according to the path distance between adopted original The similarity of word:Wherein, βiConcept reality is represented respectively Weight coefficient i=1,2,3,4 corresponding to four features of word, it is adjustable parameter and satisfaction:
From the above-mentioned description of this invention, compared with prior art, the present invention has the advantages that:
By means of the invention it is also possible to provide accurately and effectively information for Internet user, user is avoided to carry out multiple Search, consumes more energy.It is to solve web crawlers collection that the present invention, which is used for the new method that the classification of vertical reptile data is integrated, The disorderly and unsystematic shortage level of data, the problem of gatherer process is unstable, less efficient provides new strategy.
Brief description of the drawings
Fig. 1 is strategy system structure chart of the present invention.
Embodiment
Below by way of embodiment, the invention will be further described.
Reference picture 1, a kind of strategy integrated for the classification of vertical reptile data, including taxonomic hierarchies and its mapping mechanism and Dynamic cataloging Data Integration mechanism two parts, taxonomic hierarchies and its mapping mechanism therein comprise the following steps:
1) structure of reference category system;
2) structure of the classification system of reptile targeted website;
3) structure of classification system mapping mechanism.
In step 1).Reference category system is the classification system of web station system after integration, has benchmark effect, other websites Classification system and its snap.Adoptable tri-layer classification application is built, and its architecture is as shown in table 1 below.Mainly There are four dimension attributes:Category IDs, big classification, middle classification and small classification.Wherein, category IDs automatically generate when classification is added, big/ In/small classification uses M respectivelyAi, MBj, MCkRepresent, i herein, j, k represent respectively it is big/in/the automatic increase serial number of group. Greatly/in/the other structure of group can manually input realization, can also using the classification of a certain band reptile website as reference, Then modify on this basis.
Under reference category system, there is its corresponding category content information, it is specific as shown in table 2.Category content will be made To integrate the source of website specifying information.X in table 2, Y, Z etc. represent source web.Here category content information also includes being somebody's turn to do The source web of specifying information, this is beneficial to tracing to the source for information later.
For step 2) for multiple targeted websites for treating reptile, we keep its complete classification system structure as far as possible. Here the structure similar with Benchmark System is used, as shown in table 3.Mainly also there are four dimension attributes:Category IDs, big classification, middle classification With small classification.Wherein, category IDs automatically generate when classification is added, big/in/small classification uses M respectivelyAi, MBj, MCkRepresent, herein I, j, k represent respectively it is big/in/the automatic increase serial number of group.For each targeted website for treating reptile, all each create One corresponding reptile targeted website classification system.During reptile, the grouped data obtained to unstructured parsing is first Respective reptile targeted website classification system is put into, and is then put into corresponding classification for details such as particular contents In appearance as shown in table 4.Category content includes the specifying information and its source web of website.
In step 3), on the basis of structure reference category system and targeted website classification system, classification is finally built System mapping relations.As shown in table 5, classification system mapping relations are primarily directed to the minimum classification in system of all categories, benchmark Classification system provides its category IDs and its minimum classification, and for each reptile targeted website class complicated variant for treating reptile targeted website System, such as X, Y, Z websites, also all provides its category IDs and its minimum classification, finally matches (equal or similar according to item name With) structure mapping relations.
Pass through above-mentioned taxonomic hierarchies and its mapping mechanism, you can completely to preserve all data that reptile parses to obtain, Ensure that data do not lack;Also the information content and classification is caused all to trace source simultaneously, this is the maintenance of later stage system Great convenience is provided with management.
The classification system mapping relations of table 5
The dynamic cataloging Data Integration mechanism of the present invention refers to, the classification that reptile obtains is carried out according to following several situations Integrated operation:
Processing in the presence of what A was newly added be sorted in reference category system:In reptile targeted website Corresponding mapping position in the map architecture option for classifying and insert obtained new category IDs structure is added in classification system.As For the classification that reptile obtains in reference category system existing situation, classification and general need to only adds in the structure of table 3 Obtained new category ID inserts corresponding mapping position in the structure of table 5.
What B was newly added is sorted in the processing in the case of being not present in reference category system:The classification obtained for reptile Situation about being not present in reference category, classification is added first in table 3 and obtains new category IDs, then according to class in table 1 Alias claims to find semantic similar classification, if the classification of high similar semantic can be obtained, new category IDs are inserted into what table 5 was built Corresponding mapping position in map architecture option;Otherwise, new classification is added in table 1, and is added in the map architecture option built in table 5 Add new mapping relations.
The calculating of high similar semantic classification mainly obtains according to the Similarity Measure of two item names.Specific calculating process It is described as follows.Calculating process mainly make use of the calculation that Hownet (HowNet) provides.In Hownet, " concept " and " justice is former " It is two important ways of semantic meaning representation.Each word can be expressed as multiple concepts, and each concept is described using adopted original, Adopted original is the least meaning unit for being used to describe concept that is most basic, can not splitting again.
For two item name W1And W2If W1By concept set { S11,S12,…,S1nComposition, W2By concept set {S21,S22,…,S2mComposition, then W1And W2Similarity it is as follows:
So, the similarity problem just the similarity problem between two words being attributed between two concepts.For general The description for reading notional word is represented by a feature structure, and this feature structure contains following four feature:The former description of first basic meaning, The former description of other basic meanings, the former description of relation justice, relational symbol description.Then, the overall similarity of two concept notional words is designated as The weighted sum of the part similarity of aforementioned four feature, i.e.,
Wherein, βiWeight coefficient i=1,2,3,4 corresponding to four features of concept notional word are represented respectively, and it is adjustable Parameter and satisfaction:And all concepts were all described originally by justice, institute Similarity problem between adopted original is finally attributed to the problem.Because all adopted primitive roots constitute one according to hyponymy Tree-like hierarchy system, for tree, have between any two node and an only paths, therefore can be according to adopted original Between path distance calculate both similarities.
The processing in the case of categories combination in C reference category systems:Two classifications in reference category system are carried out Merge, only need to merge corresponding reference category in the structure in table 5 over there.Assuming that as described in Table 5 in structure, MC5With MC6Categories combination is carried out, over there only need to be by MC5And MC6Give same title, ID5And ID6Give the new ID values after merging.
Processing in D reference category systems in the case of classification division:D1. the classification in reference category system is split into two Individual classification, one type do not replace original classification, add a classification again again in addition;D2. by each reptile targeted website Manual sort is re-started with the classification of former reference category mapping, is mapped in two new classifications.If necessary to basis point Class in class enters line splitting processing, deals with this case comparatively laborious.Assuming that in such as structure of table 5, to MC8Carry out class point Processing is split, needs to complete following two step over there:
D1. benchmark is split into two classes, one type does not replace original classification MC8, then, add one again in addition Individual classification, such as MC9
D2. the classification in each reptile targeted website with the mapping of former reference category is re-started into manual sort, and be mapped to New MC8, MC9Classification.
Such case is more special, typically it is not recommended that doing division processing.Therefore need to refer to as far as possible when building benchmark classification Existing taxonomic hierarchies, the refinement that granularity of classification is tried one's best.
Processing in the case of classification is added and deleted in E reference category systems:New category is added in reference category system Afterwards, new category ID and its minimum classification after having added are put into map architecture option corresponding to table 5;And for deleting situation, then Related category only need to be deleted in reference category system.
The embodiment of the present invention is above are only, but the design concept of the present invention is not limited thereto, it is all to utilize this Conceive the change that unsubstantiality is carried out to the present invention, the behavior for invading the scope of the present invention all should be belonged to.

Claims (6)

  1. A kind of 1. strategy integrated for the classification of vertical reptile data, it is characterised in that:Including taxonomic hierarchies and its mapping mechanism With dynamic cataloging Data Integration mechanism two parts, taxonomic hierarchies and its mapping mechanism therein comprise the following steps:
    1) tri-layer classification application build reference category system is used, the system includes category IDs, big classification, middle classification and group Not, the reference category system has its corresponding category content;
    2) structure of the classification system of reptile targeted website;
    3) structure of classification system mapping mechanism;
    Described dynamic cataloging Data Integration mechanism refers to, the classification that reptile obtains is carried out into integration behaviour according to following several situations Make:
    Processing in the presence of what A was newly added be sorted in reference category system:Classification in reptile targeted website Corresponding mapping position in the map architecture option for classifying and insert obtained new category IDs structure is added in system;
    What B was newly added is sorted in the processing in the case of being not present in reference category system:Classification is added first and is obtained new Category IDs, semantic similar classification is found according to item name, if the classification of high similar semantic can be obtained, by new category IDs Insert corresponding mapping position in the map architecture option of structure;Otherwise, new classification is added, and is added in the map architecture option of structure New mapping relations;
    The processing in the case of categories combination in C reference category systems:Merge in the map architecture option of corresponding structure corresponding Classification;
    Processing in D reference category systems in the case of classification division:D1. the classification in reference category system is split into two classes Not, one type does not replace original classification, adds a classification again again in addition;D2. by each reptile targeted website with original The classification of reference category mapping re-starts manual sort, is mapped in two new classifications;
    Processing in the case of classification is added and deleted in E reference category systems:, will after adding new category in reference category system New category ID and its minimum classification after having added are put into corresponding map architecture option;And for deleting situation, then only need to be in base Related category is deleted in quasi- classification system.
  2. A kind of 2. strategy integrated for the classification of vertical reptile data as claimed in claim 1, it is characterised in that:The step 2) the classification system of the multiple reptile targeted websites of tri-layer classification application build, the classification system of each reptile targeted website are used Including category IDs, big classification, middle classification and small classification, and the classification system of each reptile targeted website has in its corresponding classification Hold.
  3. A kind of 3. strategy integrated for the classification of vertical reptile data as claimed in claim 2, it is characterised in that:In step 2) In, for each targeted website for treating reptile, all respective classification system for creating a corresponding reptile targeted website;In reptile During, the grouped data that unstructured parsing obtains is put into respective reptile targeted website classification system, and for specific The details of content are then put into each self-corresponding category content.
  4. A kind of 4. strategy integrated for the classification of vertical reptile data as claimed in claim 1 or 2, it is characterised in that:It is described Category content include the specifying information and its source web of website.
  5. A kind of 5. strategy integrated for the classification of vertical reptile data as claimed in claim 1, it is characterised in that:In step 3) In, the structure of classification system mapping mechanism refers to, reference category system provides its infima species other specific name, reptile target network The classification system stood also provides its infima species other specific name, is matched by the other specific name of the two infima species to build Vertical mapping relations.
  6. A kind of 6. strategy integrated for the classification of vertical reptile data as claimed in claim 1, it is characterised in that:In the feelings In condition B, the determination methods of described high similar semantic classification mainly obtain according to the Similarity Measure of two item names:For Two item name W1And W2, W1Including concept set { S11,S12,…,S1n, W2Including concept set { S21,S22,…,S2m, Then W1And W2Similarity it is as follows:m,n The number of concept notional word corresponding to two item names to be compared is represented respectively;Each concept notional word of concept set include with The former feature of lower four justice:The former description of first basic meaning, the former description of other basic meanings, the former description of relation justice, relational symbol description;Two The similarity of individual concept notional word is designated as the weighted sum of the part similarity of the former feature of aforementioned four justice, i.e., according to the road between adopted original Footpath distance calculates the similarity of two concept notional words: Wherein, βiRepresent weight coefficient i=1,2,3,4 corresponding to four features of concept notional word respectively, its be adjustable parameter and Meet:
CN201410347438.7A 2014-07-21 2014-07-21 A kind of strategy integrated for the classification of vertical reptile data Expired - Fee Related CN104133868B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410347438.7A CN104133868B (en) 2014-07-21 2014-07-21 A kind of strategy integrated for the classification of vertical reptile data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410347438.7A CN104133868B (en) 2014-07-21 2014-07-21 A kind of strategy integrated for the classification of vertical reptile data

Publications (2)

Publication Number Publication Date
CN104133868A CN104133868A (en) 2014-11-05
CN104133868B true CN104133868B (en) 2018-01-05

Family

ID=51806546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410347438.7A Expired - Fee Related CN104133868B (en) 2014-07-21 2014-07-21 A kind of strategy integrated for the classification of vertical reptile data

Country Status (1)

Country Link
CN (1) CN104133868B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106933962A (en) * 2017-02-06 2017-07-07 涂正富 A kind of film micro area network insertion and vertical search precise positioning obtain mesh calibration method
CN107436955B (en) * 2017-08-17 2022-02-25 齐鲁工业大学 English word correlation degree calculation method and device based on Wikipedia concept vector
CN107491524B (en) * 2017-08-17 2022-02-25 齐鲁工业大学 Method and device for calculating Chinese word relevance based on Wikipedia concept vector
CN107679121B (en) * 2017-09-20 2020-10-20 晶赞广告(上海)有限公司 Mapping method and device of classification system, storage medium and computing equipment
CN109101541B (en) * 2018-07-02 2022-10-04 土巴兔集团股份有限公司 Newly added index management method, device and computer readable storage medium
CN109033286B (en) * 2018-07-12 2021-10-29 北京猫眼文化传媒有限公司 Data statistical method and device
CN111008645A (en) * 2019-11-05 2020-04-14 北京邮电大学 Scientific and technological service resource classification system construction method and device based on coreference resolution

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1770159A (en) * 2005-10-28 2006-05-10 北大方正集团有限公司 Method for automatically finding network content quotation
CN102402539A (en) * 2010-09-15 2012-04-04 倪毅 Design technology for object-level personalized vertical search engine

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1770159A (en) * 2005-10-28 2006-05-10 北大方正集团有限公司 Method for automatically finding network content quotation
CN102402539A (en) * 2010-09-15 2012-04-04 倪毅 Design technology for object-level personalized vertical search engine

Also Published As

Publication number Publication date
CN104133868A (en) 2014-11-05

Similar Documents

Publication Publication Date Title
CN104133868B (en) A kind of strategy integrated for the classification of vertical reptile data
CN101593200B (en) Method for classifying Chinese webpages based on keyword frequency analysis
CN103049575B (en) A kind of academic conference search system of topic adaptation
CN103116657B (en) A kind of individuation search method of network teaching resource
JP5092165B2 (en) Data construction method and system
CN101615178B (en) Method and system for building object hierarchy
CN105706078A (en) Automatic definition of entity collections
CN107885793A (en) A kind of hot microblog topic analyzing and predicting method and system
CN103838756A (en) Method and device for determining pushed information
CN102591992A (en) Webpage classification identifying system and method based on vertical search and focused crawler technology
CN103955529A (en) Internet information searching and aggregating presentation method
CN103902597A (en) Method and device for determining search relevant categories corresponding to target keywords
CN102411617B (en) Method for storing and inquiring a large quantity of URLs
CN106951526B (en) Entity set extension method and device
CN102122291A (en) Blog friend recommendation method based on tree log pattern analysis
CN104750704A (en) Webpage uniform resource locator (URL) classification and identification method and device
CN101393555A (en) Rubbish blog detecting method
CN105528422A (en) Focused crawler processing method and apparatus
CN105893611A (en) Method for establishing interest theme semantic network facing to social networking services
CN103324700A (en) Noumenon concept attribute learning method based on Web information
CN103838798A (en) Page classification system and method
CN105512333A (en) Product comment theme searching method based on emotional tendency
CN100447793C (en) Method for extracting page query interface based on character of vision
CN104834736A (en) Method and device for establishing index database and retrieval method, device and system
CN105550169A (en) Method and device for identifying point of interest names based on character length

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Wu Meihong

Inventor after: Hong Zhiling

Inventor before: Hong Zhiling

Inventor before: Wu Meihong

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180105

Termination date: 20180721

CF01 Termination of patent right due to non-payment of annual fee