CN104750754A - Website industry classification method and server - Google Patents

Website industry classification method and server Download PDF

Info

Publication number
CN104750754A
CN104750754A CN201310753049.XA CN201310753049A CN104750754A CN 104750754 A CN104750754 A CN 104750754A CN 201310753049 A CN201310753049 A CN 201310753049A CN 104750754 A CN104750754 A CN 104750754A
Authority
CN
China
Prior art keywords
website
sorted
notional word
information
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310753049.XA
Other languages
Chinese (zh)
Inventor
高宁
杨莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEILONG KNET (BEIJING) TECHNOLOGY Co Ltd
Original Assignee
BEILONG KNET (BEIJING) TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEILONG KNET (BEIJING) TECHNOLOGY Co Ltd filed Critical BEILONG KNET (BEIJING) TECHNOLOGY Co Ltd
Priority to CN201310753049.XA priority Critical patent/CN104750754A/en
Publication of CN104750754A publication Critical patent/CN104750754A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a website industry classification method and a server. The method includes that the server acquires webpage content information of a to-be-classified website; the server performs word segmentation on all characters included in the webpage content information to generate a notional word set corresponding to the webpage content information; the server matches all notional words included in the notional word set corresponding to the webpage content information with preset keywords corresponding to each industry category, and determines appearance frequency of the keywords, corresponding to each industry category, in the notional word set corresponding to the webpage content information; the server determines the industry category of the to-be-classified website according to the proportion of the appearance frequency of the keywords, corresponding to each industry category, in the notional word set corresponding to the webpage content information. By the website industry classification method and the server, the technical problems of high labor consumption and low execution efficiency during manual judgment for industry categories of websites in the prior art are solved effectively.

Description

The sorting technique of industry belonging to website and server
Technical field
The present invention relates to infotech, particularly relate to sorting technique and the server of industry belonging to a kind of website.
Background technology
Along with the development of Internet technology, the quantity of website in the country rapidly increases.These websites provide various service for netizen, and it is also multifarious for relating to industry, as: for the various enterprise web sites branched out for enterprise, also promising netizen provides government's class website etc. of online government affairs or information inquiry.If concrete for above-mentioned domestic website affiliated industry can be distinguished, just according to concrete trade information, can find website like the sector classification lower class, this promotes Search Results for site information classification and search engine great role.
In prior art, employing manual type judges the industry type belonging to each website, and the method is at substantial manpower not only, and execution efficiency is low.
Summary of the invention
The invention provides sorting technique and the server of industry belonging to a kind of website, for solving in prior art, the industry type adopting manual type to judge belonging to each website needs at substantial manpower and the low technical matters of execution efficiency.
On the one hand, the embodiment of the present invention provides the sorting technique of industry belonging to a kind of website, comprising:
Server obtains the web page content information of website to be sorted;
Described server carries out word segmentation processing to all words comprised in described web page content information, to generate notional word set corresponding to described web page content information;
The keyword that all notional words comprised in notional word set corresponding for described web page content information are corresponding with the every profession and trade classification preset mates by described server; Determine the number of times that keyword corresponding to described every profession and trade classification occurs in the notional word set that described web page content information is corresponding;
The ratio of the number of times that the keyword that described server is corresponding according to every profession and trade classification occurs in the notional word set that described web page content information is corresponding, determines the category of employment belonging to described website to be sorted.
On the other hand, the embodiment of the present invention provides a kind of server, comprising:
Acquisition module, for obtaining the web page content information of website to be sorted;
Word-dividing mode, for carrying out word segmentation processing to all words comprised in described web page content information, to generate notional word set corresponding to described web page content information;
Matching module, mates for the keyword that all notional words comprised in notional word set corresponding for described web page content information is corresponding with the every profession and trade classification preset; Determine the number of times that keyword corresponding to described every profession and trade classification occurs in the notional word set that described web page content information is corresponding;
Determination module, for the ratio of the number of times that the keyword corresponding according to every profession and trade classification occurs in the notional word set that described web page content information is corresponding, determines the category of employment belonging to described website to be sorted.
The sorting technique of industry belonging to website provided by the invention and server, server obtains the web page content information of website to be sorted; Server carries out word segmentation processing to all words comprised in web page content information, with the notional word set that generating web page content information is corresponding; The keyword that all notional words comprised in notional word set corresponding for web page content information are corresponding with the every profession and trade classification preset mates by server; Determine the number of times that keyword corresponding to every profession and trade classification occurs in the notional word set that this web page content information is corresponding; The ratio of the number of times that the keyword that server is corresponding according to every profession and trade classification occurs in the notional word set that this web page content information is corresponding, determines the category of employment belonging to website to be sorted.The program improves execution efficiency when judging the industry type belonging to each website without the need at substantial manpower.
Accompanying drawing explanation
The process flow diagram of Fig. 1 sorting technique embodiment of industry belonging to website provided by the invention;
Fig. 2 judges the process flow diagram of a sorting technique embodiment of industry belonging to website for the suffix information according to website domain name provided by the invention;
Fig. 3 judges the process flow diagram of a sorting technique embodiment of industry belonging to website for the registered units' information according to website provided by the invention;
Fig. 4 is the process flow diagram judging a sorting technique embodiment of industry belonging to website according to web site name information provided by the invention;
The descriptor that Fig. 5 is the homepage face according to website provided by the invention judges the process flow diagram of a sorting technique embodiment of industry belonging to website;
Fig. 6 is the structural representation of a server provided by the invention embodiment.
Embodiment
The process flow diagram of Fig. 1 sorting technique embodiment of industry belonging to website provided by the invention.The executive agent of the following steps of the method can for having the server obtaining website relevant information.As shown in Figure 1, belonging to this website, the sorting technique of industry specifically comprises:
S101, server obtains the web page content information of website to be sorted;
Server, by existing network information gripping tool, as " web crawlers " captures program or the script of site information to be sorted, thus obtains the web page content information of website to be sorted; This web page content information comprises the content information related in all Webpages that this website comprises, and comprises word, picture etc.
S102, server carries out word segmentation processing to all words comprised in web page content information, with the notional word set that generating web page content information is corresponding;
After server gets the web page content information of website to be sorted, all Word messages comprised in this web page content information are carried out word segmentation processing by participle instrument, thus the notional word set that the web page content information generating each website to be sorted is corresponding.All notional words of the web page content information for describing this website to be sorted are contained in this notional word set.
S103, the keyword that all notional words comprised in notional word set corresponding for web page content information are corresponding with the every profession and trade classification preset mates by server; Determine the number of times that keyword corresponding to every profession and trade classification occurs in the notional word set that web page content information is corresponding;
Wherein, the keyword that above-mentioned every profession and trade classification is corresponding is extract after server carries out notional word statistics to the web page content information of having carried out the website of trade classification in a large number in advance to obtain.Each category of employment correspond to the keyword of some.These keywords can determine on very large probability that the category of employment comprised belonging to the website of these keywords is the category of employment that this keyword is corresponding.Such as, the website in network is divided into and comprises by server in advance described in the present embodiment: multiple categorys of employment such as workers and peasants' trade, electronic service, culture and sports, news paper advertising medium, ecommerce, plant equipment and IT service.For IT service industry, the corresponding keyword of the sector classification can comprise: rental server, trust server, intelligent bandwidth taxi, two-wire server, rack taxi, web hosting service, fictitious host computer etc.
The keyword that all notional words comprised in notional word set corresponding for the web page content information of above-mentioned website to be sorted are corresponding with above-mentioned default every profession and trade classification mates by server; Determine the number of times that keyword corresponding to each category of employment occurs in the notional word set that this web page content information is corresponding.Such as, after server carries out notional word set corresponding to word segmentation processing generation to all Word messages comprised in the web page content information of website abc.com, mated by the keyword that all notional words comprised in notional word set is corresponding with the every profession and trade classification preset, find keyword corresponding to IT service industry: rental server, trust server, intelligent bandwidth are hired out, occurred 1,2,3 time respectively, then determine that keyword corresponding to IT service industry has occurred 6 times in the notional word set that the web page content information of website abc.com is corresponding.
S104, the ratio of the number of times that the keyword that server is corresponding according to every profession and trade classification occurs in the notional word set that web page content information is corresponding, determines the category of employment belonging to website to be sorted;
The number of times that server is occurred in the notional word set that the web page content information of current website to be sorted is corresponding by the above-mentioned keyword that statistics industry-by-industry classification is corresponding, determine the number of times ratio occurred in this notional word set between the keyword that these categorys of employment are corresponding, and determine which category of employment website to be sorted finally belongs to according to the size cases of number of times ratio.It has been generally acknowledged that category of employment that keyword occurrence number ratio is larger is more close to the concrete class of website to be sorted.The concrete mode that in the present embodiment, server adopts is keyword corresponding for the every profession and trade classification industry that occurrence number is maximum in the notional word set that web page content information is corresponding, is defined as the category of employment belonging to website to be sorted.In actual classification, also may occur keyword corresponding to part category of employment in the notional word set that this web page content information is corresponding occurrence number comparatively other industry classification is a lot, but the situation that number of times corresponding is each other more impartial.Such as, the keyword that two categorys of employment that number of times accounts for 40% and 36% of the total degree that keyword corresponding to all categorys of employment occurs in the notional word set that this web page content information is corresponding comprise has occupied 76% of all keyword number of times.For this situation in this programme, these two kinds of categorys of employment all can be defined as the category of employment of current website to be sorted.
The sorting technique of industry belonging to website provided by the invention, server obtains the web page content information of website to be sorted; Server carries out word segmentation processing to all words comprised in web page content information, with the notional word set that generating web page content information is corresponding; The keyword that all notional words comprised in notional word set corresponding for web page content information are corresponding with the every profession and trade classification preset mates by server; Determine the number of times that keyword corresponding to every profession and trade classification occurs in the notional word set that this web page content information is corresponding; The ratio of the number of times that the keyword that server is corresponding according to every profession and trade classification occurs in the notional word set that this web page content information is corresponding, determines the category of employment belonging to website to be sorted.The program improves execution efficiency when judging the industry type belonging to each website without the need at substantial manpower.
This programme is on the basis of the sorting technique of industry belonging to website as shown in Figure 1, obtain the web page content information of website to be sorted at server before, also comprise: these four aspects of descriptor according to the homepage face of registered units' information of the suffix information of website domain name, website, web site name information and website treat the coupling that classifieds website carries out relevant information respectively, and then determine the category of employment belonging to website to be sorted.When all determining category of employment belonging to website to be sorted through above four aspects, trade classification can be carried out by the sorting technique of industry belonging to website as shown in Figure 1 to this website again.Carry out in the process of trade classification specifically treating classifieds website, treating classifieds website by above-mentioned four aspects carries out in the method for trade classification, if when existing by industry belonging to current website to be sorted still cannot be determined after a certain method or multiple Combination of Methods, directly trade classification can be carried out by method embodiment illustrated in fig. 1.Wherein, for the method number that comprises of method after combination and tandem the present embodiment, this is not restricted.
Treat for above-mentioned four aspects the method that classifieds website carries out trade classification to be below specifically addressed.
Fig. 2 judges the process flow diagram of a sorting technique embodiment of industry belonging to website for the suffix information according to website domain name provided by the invention.As shown in Figure 2, the method specifically comprises:
S201, server obtains the domain suffix information of website to be sorted;
Server, by existing network information gripping tool, as " web crawlers " captures program or the script of site information to be sorted, thus obtains the domain suffix information of website to be sorted.
S202, the domain suffix information that the domain suffix information of website to be sorted is corresponding with the every profession and trade classification preset is mated by server;
Server extracts the domain suffix information of having carried out the website of trade classification in a large number in advance, and the domain suffix information of extraction is carried out classification storage as the domain suffix information that corresponding category of employment is corresponding.Such as, domain suffix information be " .edu.cn " correspond to education sector class website, domain suffix information be " .mil.cn " corresponding to military industry class website.
The domain suffix information that the domain suffix information of website to be sorted is corresponding with the every profession and trade classification preset is mated by server, thus directly can determine the industry type belonging to current website to be sorted.
S203, if match identical domain suffix information, then category of employment corresponding for this domain suffix information is defined as the category of employment belonging to website to be sorted by server;
When the domain suffix information of current website to be sorted is mated by server in the domain suffix information that the every profession and trade classification preset is corresponding, and when matching identical domain suffix information, category of employment corresponding for the identical domain suffix information matched is defined as the category of employment belonging to this website to be sorted; If server mates in the domain suffix information that the every profession and trade classification preset is corresponding; All do not match identical domain suffix information, then server thinks that this is treated classifieds website and carries out trade classification failure, and determines the web page content information obtaining website to be sorted, to perform the step of the sorting technique of industry belonging to website shown in Fig. 1.
The sorting technique judging industry belonging to website according to the suffix information of website domain name that the present embodiment provides, improves the execution efficiency of classifying to industry belonging to website.
Fig. 3 judges the process flow diagram of a sorting technique embodiment of industry belonging to website for the registered units' information according to website provided by the invention.As shown in Figure 3, the method specifically comprises:
S301, server obtains registered units' information of website to be sorted;
Server, by existing network information gripping tool, as " web crawlers " captures program or the script of site information to be sorted, thus obtains registered units' information of website to be sorted.
S302, server carries out word segmentation processing to all words comprised in registered units' information, to generate notional word set corresponding to registered units' information;
After server gets registered units' information of website to be sorted, all Word messages comprised in this registered units' information are carried out word segmentation processing by participle instrument, thus the notional word set that the registered units' information generating each website to be sorted is corresponding.All notional words of the registered units for describing this website to be sorted are contained in this notional word set.
S303, the keyword that all notional words comprised in notional word set corresponding for registered units' information are corresponding with the every profession and trade classification preset mates by server;
Wherein, the explanation for keyword corresponding to every profession and trade classification preset can see the corresponding contents of step 103.
S304, the notional word that the notional word comprised in the notional word set corresponding with registered units information if exist in the keyword that every profession and trade classification is corresponding matches, then categorys of employment maximum for the notional word number of this coupling is defined as the category of employment belonging to website to be sorted by server;
The keyword that all notional words comprised in notional word set corresponding for registered units' information of above-mentioned website to be sorted are corresponding with above-mentioned default every profession and trade classification mates by server, determine the number of times that keyword corresponding to each category of employment occurs in the notional word set that this registered units' information is corresponding, and the number of times of the keyword matched is occurred that maximum categorys of employment is defined as the category of employment belonging to website to be sorted.Such as: registered units are in all notional words of comprising of notional word set corresponding to Beijing Machinery Plant, only have " machinery " one word match in the keyword that mechanical equipment website is corresponding.Therefore, determine that the number of times that keyword corresponding to mechanical equipment website occurs in the notional word set that this registered units' information is corresponding is 1, the number of times that keyword corresponding to other industry classification occurs in the notional word set that this registered units' information is corresponding is 0.Therefore, mechanical equipment is defined as the category of employment belonging to website to be sorted by server.Certainly, if all notional words comprised in the notional word set that registered units' information of above-mentioned website to be sorted is corresponding all do not match identical notional word in the keyword that above-mentioned default every profession and trade classification is corresponding, then server thinks that this is treated classifieds website and carries out trade classification failure, and determine the web page content information obtaining website to be sorted, to perform the step of the sorting technique of industry belonging to website shown in Fig. 1.
The sorting technique judging industry belonging to website according to registered units' information of website that the present embodiment provides, improves the execution efficiency of classifying to industry belonging to website.
Fig. 4 is the process flow diagram judging a sorting technique embodiment of industry belonging to website according to web site name information provided by the invention.As shown in Figure 4, the method specifically comprises:
S401, server obtains the web site name information of website to be sorted;
Server, by existing network information gripping tool, as " web crawlers " captures program or the script of site information to be sorted, thus obtains the web site name information of website to be sorted.
S402, server carries out word segmentation processing to all words comprised in web site name information, to generate notional word set corresponding to web site name information;
After server gets the web site name information of website to be sorted, all Word messages comprised in this web site name information are carried out word segmentation processing by participle instrument, thus the notional word set that the web site name information generating each website to be sorted is corresponding.All notional words of the note web site name for describing this website to be sorted are contained in this notional word set.
S403, the keyword that all notional words comprised in notional word set corresponding for web site name information are corresponding with the every profession and trade classification preset mates by server;
Wherein, the explanation for keyword corresponding to every profession and trade classification preset can see the corresponding contents of step 103.
S404, the notional word that the notional word comprised in the notional word set corresponding with web site name information if exist in the keyword that every profession and trade classification is corresponding matches, then categorys of employment maximum for the notional word number of this coupling is defined as the category of employment belonging to website to be sorted by server;
The keyword that all notional words comprised in notional word set corresponding for the web site name information of above-mentioned website to be sorted are corresponding with above-mentioned default every profession and trade classification mates by server, determine the number of times that keyword corresponding to each category of employment occurs in the notional word set that this web site name information is corresponding, and the number of times of the keyword matched is occurred that maximum categorys of employment is defined as the category of employment belonging to website to be sorted.Such as: web site name information is that XXX purchases by group in all notional words that notional word set corresponding to the website of net comprise, only have " purchasing by group net " one word match in the keyword that ecommerce class website is corresponding.Therefore, determine that the number of times that keyword corresponding to ecommerce class website occurs in the notional word set that this web site name information is corresponding is 1, the number of times that keyword corresponding to other industry classification occurs in the notional word set that this web site name information is corresponding is 0.Therefore, ecommerce class is defined as the category of employment belonging to website to be sorted by server.Certainly, if all notional words comprised in the notional word set that the web site name information of above-mentioned website to be sorted is corresponding all do not match identical notional word in the keyword that above-mentioned default every profession and trade classification is corresponding, then server thinks that this is treated classifieds website and carries out trade classification failure, and determine the web page content information obtaining website to be sorted, to perform the step of the sorting technique of industry belonging to website shown in Fig. 1.
The sorting technique judging industry belonging to website according to web site name information that the present embodiment provides, improves the execution efficiency of classifying to industry belonging to website.
The descriptor that Fig. 5 is the homepage face according to website provided by the invention judges the process flow diagram of a sorting technique embodiment of industry belonging to website.As shown in Figure 5, the method specifically comprises:
S501, server obtains the descriptor in the homepage face of website to be sorted, and the descriptor in this homepage face comprises multiple critical field information in the homepage face for describing website to be sorted;
Server, by existing network information gripping tool, as " web crawlers " captures program or the script of site information to be sorted, thus obtains the descriptor in the homepage face of website to be sorted.The descriptor in this homepage face is that web developers is when developing web, to the title in the homepage face of developed website, affiliated field and function carry out " summary info " of whole description in interior multiple critical field information, and are attached in the script information of website.
S502, all words comprised in multiple critical field information in the homepage face of website to be sorted are carried out word segmentation processing by server, with the notional word set that the descriptor in the homepage face generating website to be sorted is corresponding;
After server gets the descriptor in homepage face of website to be sorted, the all Word messages comprised in the descriptor in this homepage face are carried out word segmentation processing by participle instrument, thus the notional word set that the descriptor generating the homepage face of each website to be sorted is corresponding.All notional words of the descriptor of the first content of pages for describing this website to be sorted are contained in this notional word set.
S503, the keyword that all notional words comprised in notional word set corresponding for the descriptor in the homepage face of website to be sorted are corresponding with the every profession and trade classification preset mates by server;
Wherein, the explanation for keyword corresponding to every profession and trade classification preset can see the corresponding contents of step 103.
S504, the notional word that the notional word comprised in the notional word set corresponding with the descriptor in the homepage face of website to be sorted if exist in the keyword that every profession and trade classification is corresponding matches, then categorys of employment maximum for the notional word number of this coupling is defined as the category of employment belonging to website to be sorted by server;
The keyword that all notional words comprised in notional word set corresponding for the descriptor in the homepage face of above-mentioned website to be sorted are corresponding with above-mentioned default every profession and trade classification mates by server, determine the number of times that keyword corresponding to each category of employment occurs in the notional word set that the descriptor in this homepage face is corresponding, and the number of times of the keyword matched is occurred that maximum categorys of employment is defined as the category of employment belonging to website to be sorted.Certainly, if all notional words comprised in the notional word set that the descriptor in the homepage face of above-mentioned website to be sorted is corresponding all do not match identical notional word in the keyword that above-mentioned default every profession and trade classification is corresponding, then server thinks that this is treated classifieds website and carries out trade classification failure, and determine the web page content information obtaining website to be sorted, to perform the step of the sorting technique of industry belonging to website shown in Fig. 1.
The sorting technique judging industry belonging to website according to registered units' information of website that the present embodiment provides, improves the execution efficiency of classifying to industry belonging to website.
This programme additionally provides the sorting technique of industry belonging to another kind of website, and the method is on the basis of the sorting technique of industry belonging to website as shown in Figure 1, also comprises after step 104:
Server extracts the Feature Words determined in the web page content information of the website to be sorted of category of employment; This Feature Words can be used for judging and describing the category of employment belonging to website comprising this Feature Words.
This Feature Words is updated in keyword corresponding to category of employment belonging to website to be sorted by server, basis for estimation when judging category of employment belonging to website to be sorted for subsequent server.
The sorting technique of industry belonging to website shown in the present embodiment, by from determine category of employment website to be sorted web page content information in extract Feature Words; And this Feature Words is updated to keyword corresponding to category of employment belonging to website to be sorted, add the word capacity of keyword corresponding to every profession and trade classification, thus improve the accuracy and classification effectiveness that judge category of employment belonging to website to be sorted.
One of ordinary skill in the art will appreciate that: all or part of step realizing above-mentioned each embodiment of the method can have been come by the hardware that programmed instruction is relevant.Aforesaid program can be stored in a computer read/write memory medium.This program, when performing, performs the step comprising above-mentioned each embodiment of the method; And aforesaid storage medium comprises: ROM, RAM, magnetic disc or CD etc. various can be program code stored medium.
Fig. 6 is the structural representation of a server provided by the invention embodiment, can perform method step as shown in Figure 1, and this server comprises:
Acquisition module 61, for obtaining the web page content information of website to be sorted;
Word-dividing mode 62, for carrying out word segmentation processing to all words comprised in web page content information, with the notional word set that generating web page content information is corresponding;
Matching module 63, mates for the keyword that all notional words comprised in notional word set corresponding for web page content information is corresponding with the every profession and trade classification preset; Determine the number of times that keyword corresponding to every profession and trade classification occurs in the notional word set that web page content information is corresponding;
Determination module 64, for the ratio of the number of times that the keyword corresponding according to every profession and trade classification occurs in the notional word set that described web page content information is corresponding, determines the category of employment belonging to website to be sorted.
Particularly, server shown in the present embodiment realizes the process of the sorting technique of industry belonging to website and is:
Acquisition module 61 obtains the web page content information of website to be sorted; This acquisition process specifically can see the corresponding contents of step 101;
The all words comprised in the web page content information that word-dividing mode 62 pairs of acquisition modules 61 obtain carry out word segmentation processing, with the notional word set that generating web page content information is corresponding; This word segmentation processing process specifically can see the corresponding contents of step 102;
The keyword that all notional words comprised in notional word set corresponding for web page content information are corresponding with the every profession and trade classification preset mates by matching module 63; Determine the number of times that keyword corresponding to every profession and trade classification occurs in the notional word set that web page content information is corresponding; This matching process specifically can see the corresponding contents of step 103;
The ratio of the number of times that the keyword that determination module 64 is corresponding according to every profession and trade classification occurs in the notional word set that described web page content information is corresponding, determines the category of employment belonging to website to be sorted; This deterministic process specifically can see the corresponding contents of step 104.
Further, the ratio of the number of times that the keyword that above-mentioned determination module 64 is corresponding according to every profession and trade classification occurs in the notional word set that described web page content information is corresponding, when determining the category of employment belonging to website to be sorted, the concrete grammar adopted is: by keyword corresponding for the every profession and trade classification industry that occurrence number is maximum in the notional word set that web page content information is corresponding, is defined as this category of employment belonging to website to be sorted.
Further, server shown in the present embodiment is on the basis of sorting technique performing industry belonging to website as shown in Figure 1, obtain the web page content information of website to be sorted at acquisition module 61 before, also comprise: these four aspects of descriptor according to the homepage face of registered units' information of the suffix information of website domain name, website, web site name information and website treat the coupling that classifieds website carries out relevant information respectively, and then determine the category of employment belonging to website to be sorted.When all category of employment belonging to website to be sorted cannot be determined through above four aspects, trade classification can be carried out by the sorting technique performing industry belonging to website as shown in Figure 1 to this website again.Carry out in the process of trade classification specifically treating classifieds website, treating classifieds website by above-mentioned four aspects carries out in the method for trade classification, if when existing by industry belonging to current website to be sorted still cannot be determined after a certain method or multiple Combination of Methods, directly trade classification can be carried out by method embodiment illustrated in fig. 1.Wherein, for the method number that comprises of method after combination and tandem the present embodiment, this is not restricted.
Below for above-mentioned four aspects, set forth server execution and treat the detailed process that classifieds website carries out trade classification.
1. according to the suffix information of website domain name, server judges that the assorting process of industry belonging to website is:
Acquisition module 61 in server obtains the domain suffix information of website to be sorted; The domain suffix information that the domain suffix information of website to be sorted is corresponding with the every profession and trade classification preset is mated by matching module 63; If match identical domain suffix information, then category of employment corresponding for this domain suffix information is defined as the category of employment belonging to website to be sorted by determination module 64; If do not match identical domain suffix information, then determination module 64 indicates acquisition module 61 to obtain the web page content information of website to be sorted, starts the step of the sorting technique performing industry belonging to website shown in Fig. 1 to make server.Server judges the sorting technique of industry belonging to website principle according to the suffix information of website domain name specifically see the method step of embodiment as shown in Figure 2, can not repeat at this.
2. according to registered units' information of website, server judges that the assorting process of industry belonging to website is:
Acquisition module 61 in server obtains registered units' information of website to be sorted; The all words comprised in word-dividing mode 62 pairs of registered units' information carry out word segmentation processing, to generate notional word set corresponding to registered units' information; The keyword that all notional words comprised in notional word set corresponding for registered units' information are corresponding with the every profession and trade classification preset mates by matching module 63; The notional word that the notional word comprised in the notional word set corresponding with registered units information if exist in the keyword that every profession and trade classification is corresponding matches, then categorys of employment maximum for the notional word number of this coupling is defined as the category of employment belonging to website to be sorted by determination module 64; The notional word that the notional word comprised in the notional word set corresponding with registered units information if do not exist in the keyword that every profession and trade classification is corresponding matches, then determination module 64 indicates acquisition module 61 to obtain the web page content information of website to be sorted, starts the step of the sorting technique performing industry belonging to website shown in Fig. 1 to make server.The principle that server judges the sorting technique of industry belonging to website according to registered units' information of website specifically see the method step of embodiment as shown in Figure 3, can not repeat at this.
3. according to the name information of website, server judges that the assorting process of industry belonging to website is:
Acquisition module 61 in server obtains the web site name information of website to be sorted; The all words comprised in word-dividing mode 62 pairs of web site name information carry out word segmentation processing, to generate notional word set corresponding to web site name information; The keyword that all notional words comprised in notional word set corresponding for web site name information are corresponding with the every profession and trade classification preset mates by matching module 63; The notional word that the notional word comprised in the notional word set corresponding with web site name information if exist in the keyword that every profession and trade classification is corresponding matches, then categorys of employment maximum for the notional word number of this coupling is defined as the category of employment belonging to website to be sorted by determination module 64; The notional word that the notional word comprised in the notional word set corresponding with web site name information if do not exist in the keyword that every profession and trade classification is corresponding matches, then determination module 64 indicates acquisition module 61 to obtain the web page content information of described website to be sorted, starts the step of the sorting technique performing industry belonging to website shown in Fig. 1 to make server.Server judges the sorting technique of industry belonging to website principle according to the name information of website specifically see the method step of embodiment as shown in Figure 4, can not repeat at this.
4. according to the descriptor in the homepage face of website, server judges that the assorting process of industry belonging to website is:
Acquisition module 61 in server obtains the descriptor in the homepage face of website to be sorted, and the descriptor in this homepage face comprises multiple critical field information in the homepage face for describing website to be sorted; The all words comprised in multiple critical field information in the homepage face of website to be sorted are carried out word segmentation processing by word-dividing mode 62, with the notional word set that the descriptor in the homepage face generating website to be sorted is corresponding; The keyword that all notional words comprised in notional word set corresponding for the descriptor in the homepage face of website to be sorted are corresponding with the every profession and trade classification preset mates by matching module 63; The notional word that the notional word comprised in the notional word set corresponding with the descriptor in the homepage face of website to be sorted if exist in the keyword that every profession and trade classification is corresponding matches, then categorys of employment maximum for the notional word number of this coupling is defined as the category of employment belonging to website to be sorted by determination module 64; The notional word that the notional word comprised in the notional word set corresponding with the descriptor in the homepage face of website to be sorted if do not exist in the keyword that every profession and trade classification is corresponding matches, then determination module 64 indicates acquisition module 61 to obtain the web page content information of website to be sorted, starts the step of the sorting technique performing industry belonging to website shown in Fig. 1 to make server.The principle that server judges the sorting technique of industry belonging to website according to the descriptor in the homepage face of website specifically see the method step of embodiment as shown in Figure 5, can not repeat at this.
Further, shown in the present embodiment, server also comprises: extraction module and update module, wherein:
Extraction module, for extracting the Feature Words in the web page content information of the website to be sorted determining category of employment;
Update module, for being updated to Feature Words in keyword corresponding to category of employment belonging to website to be sorted.
Particularly, after server described in the present embodiment determines the category of employment belonging to website to be sorted, extract the Feature Words determined in the web page content information of the website to be sorted of category of employment by extraction module; This Feature Words can be used for judging and describing the category of employment belonging to website comprising this Feature Words; Then Feature Words is updated in keyword corresponding to category of employment belonging to website to be sorted by update module, basis for estimation when judging category of employment belonging to website to be sorted for subsequent server.
Server provided by the invention, obtains the web page content information of website to be sorted; Word segmentation processing is carried out to all words comprised in web page content information, with the notional word set that generating web page content information is corresponding; The keyword that all notional words comprised in notional word set corresponding for web page content information are corresponding with the every profession and trade classification preset mates by server; Determine the number of times that keyword corresponding to every profession and trade classification occurs in the notional word set that this web page content information is corresponding; The ratio of the number of times that the keyword corresponding according to every profession and trade classification occurs in the notional word set that this web page content information is corresponding, determines the category of employment belonging to website to be sorted.The program improves execution efficiency when judging the industry type belonging to each website without the need at substantial manpower.
Last it is noted that above each embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to foregoing embodiments to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein some or all of technical characteristic; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the scope of various embodiments of the present invention technical scheme.

Claims (14)

1. a sorting technique for industry belonging to website, is characterized in that, comprising:
Server obtains the web page content information of website to be sorted;
Described server carries out word segmentation processing to all words comprised in described web page content information, to generate notional word set corresponding to described web page content information;
The keyword that all notional words comprised in notional word set corresponding for described web page content information are corresponding with the every profession and trade classification preset mates by described server; Determine the number of times that keyword corresponding to described every profession and trade classification occurs in the notional word set that described web page content information is corresponding;
The ratio of the number of times that the keyword that described server is corresponding according to every profession and trade classification occurs in the notional word set that described web page content information is corresponding, determines the category of employment belonging to described website to be sorted.
2. method according to claim 1, is characterized in that, the ratio of the number of times that the keyword that described server is corresponding according to every profession and trade classification occurs in described notional word set, determines the category of employment belonging to described website to be sorted, comprising:
Described server, by keyword corresponding for the described every profession and trade classification industry that occurrence number is maximum in the notional word set that described web page content information is corresponding, is defined as the category of employment belonging to described website to be sorted.
3. method according to claim 1 and 2, is characterized in that, described server comprises before obtaining the web page content information of website to be sorted:
Described server obtains the domain suffix information of website to be sorted;
The domain suffix information that the domain suffix information of described website to be sorted is corresponding with the every profession and trade classification preset is mated by described server;
If match identical domain suffix information, then category of employment corresponding for this domain suffix information is defined as the category of employment belonging to described website to be sorted by server;
If do not match identical domain suffix information, then server determines the web page content information obtaining described website to be sorted.
4. method according to claim 1 and 2, is characterized in that, described server also comprises before obtaining the web page content information of website to be sorted:
Described server obtains registered units' information of website to be sorted;
Described server carries out word segmentation processing to all words comprised in described registered units information, to generate notional word set corresponding to described registered units information;
The keyword that all notional words comprised in notional word set corresponding for described registered units information are corresponding with described default every profession and trade classification mates by described server;
The notional word that the notional word comprised in the notional word set corresponding with described registered units information if exist in the keyword that described every profession and trade classification is corresponding matches, then categorys of employment maximum for the notional word number of this coupling is defined as the category of employment belonging to described website to be sorted by server;
The notional word that the notional word comprised in the notional word set corresponding with described registered units information if do not exist in the keyword that described every profession and trade classification is corresponding matches, then server determines the web page content information obtaining described website to be sorted.
5. method according to claim 1 and 2, is characterized in that, described server comprises before obtaining the web page content information of website to be sorted:
Described server obtains the web site name information of website to be sorted;
Described server carries out word segmentation processing to all words comprised in described web site name information, to generate notional word set corresponding to described web site name information;
The keyword that all notional words comprised in notional word set corresponding for described web site name information are corresponding with described default every profession and trade classification mates by described server;
The notional word that the notional word comprised in the notional word set corresponding with described web site name information if exist in the keyword that described every profession and trade classification is corresponding matches, then categorys of employment maximum for the notional word number of this coupling is defined as the category of employment belonging to described website to be sorted by server;
The notional word that the notional word comprised in the notional word set corresponding with described web site name information if do not exist in the keyword that described every profession and trade classification is corresponding matches, then server determines the web page content information obtaining described website to be sorted.
6. method according to claim 1 and 2, is characterized in that, described server comprises before obtaining the web page content information of website to be sorted:
Described server obtains the descriptor in the homepage face of website to be sorted, and the descriptor in described homepage face comprises multiple critical field information in the homepage face for describing described website to be sorted;
The all words comprised in multiple critical field information in the homepage face of described website to be sorted are carried out word segmentation processing by described server, with the notional word set that the descriptor in the homepage face generating described website to be sorted is corresponding;
The keyword that all notional words comprised in notional word set corresponding for the descriptor in the homepage face of described website to be sorted are corresponding with the every profession and trade classification preset mates by described server;
The notional word that the notional word comprised in the notional word set corresponding with the descriptor in the homepage face of described website to be sorted if exist in the keyword that described every profession and trade classification is corresponding matches, then categorys of employment maximum for the notional word number of this coupling is defined as the category of employment belonging to described website to be sorted by server;
The notional word that the notional word comprised in the notional word set corresponding with the descriptor in the homepage face of described website to be sorted if do not exist in the keyword that described every profession and trade classification is corresponding matches, then server determines the web page content information obtaining described website to be sorted.
7. method according to claim 1 and 2, is characterized in that, described server also comprises after determining the category of employment belonging to described website to be sorted:
Described server extracts the Feature Words determined in the web page content information of the website described to be sorted of described category of employment;
Described Feature Words is updated in described keyword corresponding to category of employment belonging to described website to be sorted by described server.
8. a server, is characterized in that, comprising:
Acquisition module, for obtaining the web page content information of website to be sorted;
Word-dividing mode, for carrying out word segmentation processing to all words comprised in described web page content information, to generate notional word set corresponding to described web page content information;
Matching module, mates for the keyword that all notional words comprised in notional word set corresponding for described web page content information is corresponding with the every profession and trade classification preset; Determine the number of times that keyword corresponding to described every profession and trade classification occurs in the notional word set that described web page content information is corresponding;
Determination module, for the ratio of the number of times that the keyword corresponding according to every profession and trade classification occurs in the notional word set that described web page content information is corresponding, determines the category of employment belonging to described website to be sorted.
9. server according to claim 8, is characterized in that,
Described determination module, specifically for by keyword corresponding for the described every profession and trade classification industry that occurrence number is maximum in the notional word set that described web page content information is corresponding, is defined as the category of employment belonging to described website to be sorted.
10. server according to claim 8 or claim 9, is characterized in that,
Described acquisition module, for obtaining the domain suffix information of website to be sorted;
Described matching module, mates for the domain suffix information that the domain suffix information of described website to be sorted is corresponding with the every profession and trade classification preset;
Described determination module, if match identical domain suffix information for described matching module, then category of employment corresponding for this domain suffix information is defined as the category of employment belonging to described website to be sorted by described determination module;
Described determination module, if also do not match identical domain suffix information for described matching module, then indicates described acquisition module to obtain the web page content information of described website to be sorted.
11. servers according to claim 8 or claim 9, is characterized in that,
Described acquisition module, for obtaining registered units' information of website to be sorted;
Described word-dividing mode, for carrying out word segmentation processing to all words comprised in described registered units information, to generate notional word set corresponding to described registered units information;
Described matching module, mates for the keyword that all notional words comprised in notional word set corresponding for described registered units information is corresponding with described default every profession and trade classification;
Described determination module, if for there is the notional word that the notional word that comprises in the notional word set corresponding with described registered units information matches in the keyword that described every profession and trade classification is corresponding, then categorys of employment maximum for the notional word number of this coupling is defined as the category of employment belonging to described website to be sorted by described determination module;
Described determination module, if also for there is not the notional word that the notional word that comprises in the notional word set corresponding with described registered units information matches in keyword corresponding to described every profession and trade classification, then indicate acquisition module to obtain the web page content information of described website to be sorted.
12. servers according to claim 8 or claim 9, is characterized in that,
Described acquisition module, for obtaining the web site name information of website to be sorted;
Described word-dividing mode, for carrying out word segmentation processing to all words comprised in described web site name information, to generate notional word set corresponding to described web site name information;
Described matching module, mates for the keyword that all notional words comprised in notional word set corresponding for described web site name information is corresponding with described default every profession and trade classification;
Described determination module, if for the notional word that the notional word comprised in the notional word set that existence in the keyword that described every profession and trade classification is corresponding is corresponding with described web site name information matches, then categorys of employment maximum for the notional word number of this coupling is defined as the category of employment belonging to described website to be sorted by described determination module;
Described determination module, if also for there is not the notional word that the notional word that comprises in the notional word set corresponding with described web site name information matches in keyword corresponding to described every profession and trade classification, then indicate acquisition module to obtain the web page content information of described website to be sorted.
13. servers according to claim 8 or claim 9, is characterized in that,
Described acquisition module, for obtaining the descriptor in the homepage face of website to be sorted, the descriptor in described homepage face comprises multiple critical field information in the homepage face for describing described website to be sorted;
Described word-dividing mode, carries out word segmentation processing for all words comprised in multiple critical field information in the homepage face by described website to be sorted, with the notional word set that the descriptor in the homepage face generating described website to be sorted is corresponding;
Described matching module, the keyword that all notional words for comprising in the notional word set that the descriptor in the homepage face by described website to be sorted is corresponding are corresponding with the every profession and trade classification preset mates;
Described determination module, if for the notional word that the notional word comprised in the notional word set that existence in the keyword that described every profession and trade classification is corresponding is corresponding with the descriptor in the homepage face of described website to be sorted matches, then categorys of employment maximum for the notional word number of this coupling is defined as the category of employment belonging to described website to be sorted by described determination module;
Described determination module, if also for there is not the notional word that the notional word that comprises in the notional word set corresponding with the descriptor in the homepage face of described website to be sorted matches in keyword corresponding to described every profession and trade classification, then indicate acquisition module to obtain the web page content information of described website to be sorted.
14. servers according to claim 8 or claim 9, is characterized in that, also comprise:
Extraction module, for extracting the Feature Words in the web page content information of the website described to be sorted determining described category of employment;
Update module, for being updated to described Feature Words in described keyword corresponding to category of employment belonging to described website to be sorted.
CN201310753049.XA 2013-12-31 2013-12-31 Website industry classification method and server Pending CN104750754A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310753049.XA CN104750754A (en) 2013-12-31 2013-12-31 Website industry classification method and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310753049.XA CN104750754A (en) 2013-12-31 2013-12-31 Website industry classification method and server

Publications (1)

Publication Number Publication Date
CN104750754A true CN104750754A (en) 2015-07-01

Family

ID=53590449

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310753049.XA Pending CN104750754A (en) 2013-12-31 2013-12-31 Website industry classification method and server

Country Status (1)

Country Link
CN (1) CN104750754A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105653651A (en) * 2015-12-29 2016-06-08 云南电网有限责任公司电力科学研究院 Discovery and arrangement method and apparatus for industry website
CN105723367A (en) * 2016-01-07 2016-06-29 马岩 Network information sorting method and system
CN106250402A (en) * 2016-07-19 2016-12-21 杭州华三通信技术有限公司 A kind of Website classification method and device
CN106557520A (en) * 2015-09-29 2017-04-05 百度在线网络技术(北京)有限公司 The recognition methods of the Type of website and device
CN106874340A (en) * 2016-12-22 2017-06-20 新华三技术有限公司 A kind of web page address sorting technique and device
CN107169049A (en) * 2017-04-25 2017-09-15 腾讯科技(深圳)有限公司 The label information generation method and device of application
CN107169523A (en) * 2017-05-27 2017-09-15 鹏元征信有限公司 Automatically determine method, storage device and the terminal of the affiliated category of employment of mechanism
CN107436890A (en) * 2016-05-26 2017-12-05 阿里巴巴集团控股有限公司 A kind of detection method and device of the Type of website
CN107491536A (en) * 2017-08-22 2017-12-19 广东小天才科技有限公司 A kind of examination question method of calibration, examination question calibration equipment and electronic equipment
CN108053196A (en) * 2018-01-31 2018-05-18 四川民工加网络科技有限公司 A kind of recruitment methods of construction site
CN108090090A (en) * 2016-11-23 2018-05-29 北京国双科技有限公司 Programme orientation method and apparatus
CN108536800A (en) * 2018-04-03 2018-09-14 有米科技股份有限公司 File classification method, system, computer equipment and storage medium
CN109271481A (en) * 2018-08-31 2019-01-25 国网河北省电力有限公司沧州供电分公司 A kind of classification method, system and the terminal device of electric power demand information
CN109977328A (en) * 2019-03-06 2019-07-05 杭州迪普科技股份有限公司 A kind of URL classification method and device
CN111223496A (en) * 2020-01-03 2020-06-02 广东电网有限责任公司 Voice information classification method and device
CN111241240A (en) * 2020-01-08 2020-06-05 中国联合网络通信集团有限公司 Industry keyword extraction method and device
CN111382385A (en) * 2020-02-21 2020-07-07 奇安信科技集团股份有限公司 Webpage affiliated industry classification method and device
CN111784448A (en) * 2020-06-24 2020-10-16 支付宝(杭州)信息技术有限公司 Merchant data processing method and system
GB2601517A (en) * 2020-12-02 2022-06-08 Silver Bullet Media Services Ltd A method, apparatus and program for classifying subject matter of content in a webpage
TWI827984B (en) * 2021-10-05 2024-01-01 台灣大哥大股份有限公司 System and method for website classification

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW586065B (en) * 2002-05-20 2004-05-01 Pchome Online Inc Automatic classification method of website and system thereof
CN101196923A (en) * 2006-11-28 2008-06-11 株式会社Opms Category-based advertising system and method
CN102567494A (en) * 2011-12-22 2012-07-11 北京亿赞普网络技术有限公司 Website classification method and device
CN102629282A (en) * 2012-05-03 2012-08-08 湖南神州祥网科技有限公司 Website classification method, device and system
CN103226578A (en) * 2013-04-02 2013-07-31 浙江大学 Method for identifying websites and finely classifying web pages in medical field

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW586065B (en) * 2002-05-20 2004-05-01 Pchome Online Inc Automatic classification method of website and system thereof
CN101196923A (en) * 2006-11-28 2008-06-11 株式会社Opms Category-based advertising system and method
CN102567494A (en) * 2011-12-22 2012-07-11 北京亿赞普网络技术有限公司 Website classification method and device
CN102629282A (en) * 2012-05-03 2012-08-08 湖南神州祥网科技有限公司 Website classification method, device and system
CN103226578A (en) * 2013-04-02 2013-07-31 浙江大学 Method for identifying websites and finely classifying web pages in medical field

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106557520A (en) * 2015-09-29 2017-04-05 百度在线网络技术(北京)有限公司 The recognition methods of the Type of website and device
CN105653651B (en) * 2015-12-29 2019-04-02 云南电网有限责任公司电力科学研究院 A kind of the discovery method for sorting and device of industrial sustainability
CN105653651A (en) * 2015-12-29 2016-06-08 云南电网有限责任公司电力科学研究院 Discovery and arrangement method and apparatus for industry website
CN105723367A (en) * 2016-01-07 2016-06-29 马岩 Network information sorting method and system
WO2017117781A1 (en) * 2016-01-07 2017-07-13 马岩 Network information classification method and system
CN107436890A (en) * 2016-05-26 2017-12-05 阿里巴巴集团控股有限公司 A kind of detection method and device of the Type of website
CN106250402A (en) * 2016-07-19 2016-12-21 杭州华三通信技术有限公司 A kind of Website classification method and device
CN108090090A (en) * 2016-11-23 2018-05-29 北京国双科技有限公司 Programme orientation method and apparatus
CN106874340A (en) * 2016-12-22 2017-06-20 新华三技术有限公司 A kind of web page address sorting technique and device
CN106874340B (en) * 2016-12-22 2020-12-18 新华三技术有限公司 Webpage address classification method and device
WO2018196561A1 (en) * 2017-04-25 2018-11-01 腾讯科技(深圳)有限公司 Label information generating method and device for application and storage medium
CN107169049A (en) * 2017-04-25 2017-09-15 腾讯科技(深圳)有限公司 The label information generation method and device of application
CN107169523A (en) * 2017-05-27 2017-09-15 鹏元征信有限公司 Automatically determine method, storage device and the terminal of the affiliated category of employment of mechanism
CN107491536A (en) * 2017-08-22 2017-12-19 广东小天才科技有限公司 A kind of examination question method of calibration, examination question calibration equipment and electronic equipment
CN107491536B (en) * 2017-08-22 2020-07-07 广东小天才科技有限公司 Test question checking method, test question checking device and electronic equipment
CN108053196A (en) * 2018-01-31 2018-05-18 四川民工加网络科技有限公司 A kind of recruitment methods of construction site
CN108536800A (en) * 2018-04-03 2018-09-14 有米科技股份有限公司 File classification method, system, computer equipment and storage medium
CN108536800B (en) * 2018-04-03 2022-04-19 有米科技股份有限公司 Text classification method, system, computer device and storage medium
CN109271481A (en) * 2018-08-31 2019-01-25 国网河北省电力有限公司沧州供电分公司 A kind of classification method, system and the terminal device of electric power demand information
CN109977328A (en) * 2019-03-06 2019-07-05 杭州迪普科技股份有限公司 A kind of URL classification method and device
CN111223496A (en) * 2020-01-03 2020-06-02 广东电网有限责任公司 Voice information classification method and device
CN111241240A (en) * 2020-01-08 2020-06-05 中国联合网络通信集团有限公司 Industry keyword extraction method and device
CN111241240B (en) * 2020-01-08 2023-08-15 中国联合网络通信集团有限公司 Industry keyword extraction method and device
CN111382385A (en) * 2020-02-21 2020-07-07 奇安信科技集团股份有限公司 Webpage affiliated industry classification method and device
CN111382385B (en) * 2020-02-21 2024-04-12 奇安信科技集团股份有限公司 Method and device for classifying industries of web pages
CN111784448A (en) * 2020-06-24 2020-10-16 支付宝(杭州)信息技术有限公司 Merchant data processing method and system
GB2601517A (en) * 2020-12-02 2022-06-08 Silver Bullet Media Services Ltd A method, apparatus and program for classifying subject matter of content in a webpage
TWI827984B (en) * 2021-10-05 2024-01-01 台灣大哥大股份有限公司 System and method for website classification

Similar Documents

Publication Publication Date Title
CN104750754A (en) Website industry classification method and server
CN102693271B (en) A kind of network information recommending method and system
CN101782919B (en) Web form data output method, device and form processing system
CN104504150A (en) News public opinion monitoring system
CN102567494B (en) Website classification method and device
CN102521248A (en) Network user classification method and device
CN103617266A (en) Personalized extension search method, device and system
CN104239298A (en) Text message recommendation method, server, browser and system
CN104217031A (en) Method and device for classifying users according to search log data of server
CN103455758A (en) Method and device for identifying malicious website
CN110457579B (en) Webpage denoising method and system based on cooperative work of template and classifier
CN103744856A (en) Method, device and system for linkage extended search
CN103902535A (en) Method, device and system for obtaining associational word
CN102314492A (en) Method and equipment for acquiring candidate document sections matched with target document section
CN103248677A (en) Internet behavior analysis system and working method thereof
CN103886092A (en) Method and device for providing terminal failure problem solutions
CN105138907A (en) Method and system for actively detecting attacked website
CN106294535A (en) The recognition methods of website and device
CN103440199A (en) Method and device for guiding test
CN104573033A (en) Dynamic URL filtering method and device
US11334592B2 (en) Self-orchestrated system for extraction, analysis, and presentation of entity data
US11250080B2 (en) Method, apparatus, storage medium and electronic device for establishing question and answer system
CN106874368B (en) RTB bidding advertisement position value analysis method and system
CN103399968B (en) A kind of micro-blog information acquisition method and system
CN104021124A (en) Method, device and system used for processing webpage data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150701