CN102880722B - A kind of method for digging of authoritative website and device - Google Patents

A kind of method for digging of authoritative website and device Download PDF

Info

Publication number
CN102880722B
CN102880722B CN201210394980.9A CN201210394980A CN102880722B CN 102880722 B CN102880722 B CN 102880722B CN 201210394980 A CN201210394980 A CN 201210394980A CN 102880722 B CN102880722 B CN 102880722B
Authority
CN
China
Prior art keywords
website
query
authoritative
type
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210394980.9A
Other languages
Chinese (zh)
Other versions
CN102880722A (en
Inventor
周步恋
雷大伟
石志伟
车天文
杨振东
王更生
王喜民
何宏靖
徐忆苏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen easou world Polytron Technologies Inc
Original Assignee
Shenzhen Yisou Science & Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yisou Science & Technology Development Co Ltd filed Critical Shenzhen Yisou Science & Technology Development Co Ltd
Priority to CN201210394980.9A priority Critical patent/CN102880722B/en
Publication of CN102880722A publication Critical patent/CN102880722A/en
Application granted granted Critical
Publication of CN102880722B publication Critical patent/CN102880722B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of method for digging and device of authoritative website, described method comprises classifies to the search word query of user's input; Search in authoritative search engine according to sorted query; The excavation of authoritative website is carried out according to Search Results.In the present invention, it is by classifying to user search word query, adopt in sorted query to authoritative search engine and capture Search Results, then website marking is carried out to the Search Results captured, and the authoritative website of the type query is finally excavated eventually through authoritative website mining algorithm provided by the invention, adopt the present invention, the authoritative website that its automatic mining is relevant to dissimilar query, speed and the accuracy of the excavation of authoritative website can be improved.

Description

A kind of method for digging of authoritative website and device
Technical field
The present invention relates to Internet technical field, specifically, particularly relate to method for digging and the device of a kind of authoritative website in information retrieval process.
Background technology
Along with the development of Internet technology and the continuous expansion of information; people are also more and more higher for the user demand of the network information; make current search engine become people and obtain the indispensable important tool of the network information; when user inputs query(search word) after, the page comprising query can be fed back to user as Search Results by search engine usually.
The commercial search engine of present maturation is a lot, these search engines are by the accumulation of long-term user experience, for we providing abundant information, the result data provided by reasonably utilizing these commercial search engine carries out further analyzing and processing, can also excavate more how useful information.
Authority's website refers to have the representative or internal information resource of authority than more rich website in a certain classification or field.Usually, in search engine process, how to distinguish authoritative website extremely important, and carrying out in web page analysis, parsing or transfer process, authoritative station data is of paramount importance test and test and appraisal data foundation.
In existing search engine, its web page authority all adopts pagerank(webpage rank) calculate, wherein, pagerank is Google(Google) part of rank algorithm, be Google for being used for the grade of presentation web page or a kind of method of importance, be also that Google is used for the sole criterion of a measurement website quality.
In web page authority computation process, because pagerank calculates all webpages, its calculated amount is large, the spent resource of calculating is large, and computation period is long, and the web page authority that pagerank calculates can not distinguish type of webpage, therefore can not represent the authority of certain class website completely.Further, often do not need all web page authority in fact in numerous applications, and only need the most authoritative website of a small amount of particular type, the authority therefore by calculating all webpages often causes the waste of computational resource.
In addition, in the authoritative website method for digging of another kind of traditional Distinguish, it is mainly summarized from some websites that navigate (such as hao123 etc.) by artificial mode, then manually carries out arrangement screening.Although the method for manual sorting is easier, also there is obvious deficiency, first manual sorting needs to spend more manpower; It two is because internet information resource is various, by manually carry out the classification that arranges and data volume limited, and speed is slow; It three is that the result human factor impact that arranges is too serious, and causing arranging result cannot be effectively objective.
Summary of the invention
In view of this, the object of the present invention is to provide a kind of method for digging and device of authoritative website.
The present invention is achieved by the following technical solutions:
A method for digging for authoritative website, comprising:
The search word query that user inputs is divided into dissimilar;
Query according to variant type searches in authoritative search engine;
The excavation of authoritative website is carried out according to Search Results;
The excavation carrying out authoritative website according to Search Results comprises:
Resolve every bar query Search Results that rank is forward in authoritative search engine, obtain its corresponding URL;
Described URL is converted to site name according at least one site name transformation rule;
Described website is filtered according at least one inactive website filtering rule;
According at least one website Evaluation Strategy, the website after filtration is given a mark;
Authoritative website is obtained according to marking result;
Obtain authoritative website according to marking result to comprise:
Marking result is carried out associating to set up inverted list with the table that recording station is called the roll, and merges the marking result of every bar query in multiple authoritative search engine;
In inverted list, the marking situation according to each website sorts to website;
Authoritative website is obtained according to ordering scenario;
Obtain authoritative website according to ordering scenario to comprise:
According to the size of site name correspondence marking result, from big to small site name being put into one is initially in empty Website Hosting A, the query corresponding with site name being put into one is initially in empty set B, when the ratio of the sum of the query of the query number in set B and the type reaches a predetermined threshold value, then stop adding data in Website Hosting A and set B, and the site list in Website Hosting A is exactly the authoritative website corresponding with the query of the type.
Preferably, the search word query that user inputs is divided into following classification system: navigational route type, information and affairs type;
Or search word query user inputted is divided into following classification system according to user search intent: navigation type, info class and resources-type.
More preferably, the query of region class complicated variant system is classified according to following steps:
Query coupling is carried out according at least one matching criterior;
Svm (Suport Vector Machines, support vector machine) model is utilized to classify to the query after coupling.
An excavating gear for authoritative website, comprising:
Sort module, is divided into dissimilar for search word query user inputted;
Data capture module, searches in authoritative search engine for the query according to variant type;
Excavate module, for carrying out the excavation of authoritative website according to Search Results;
The step that described excavation module carries out authoritative website excavation according to Search Results comprises:
Resolve every bar query Search Results that rank is forward in authoritative search engine, obtain its corresponding URL;
Described URL is converted to site name according at least one site name transformation rule;
Described website is filtered according at least one inactive website filtering rule;
According at least one website Evaluation Strategy, the website after filtration is given a mark;
Authoritative website is obtained according to marking result;
The step that described excavation module obtains authoritative website according to marking result comprises:
Marking result is carried out associating to set up inverted list with the table that recording station is called the roll, and merges the marking result of every bar query in multiple authoritative search engine;
In inverted list, the marking situation according to each website sorts to website;
Authoritative website is obtained according to ordering scenario;
The step that described excavation module obtains authoritative website according to ordering scenario comprises:
According to the size of site name correspondence marking result, from big to small site name being put into one is initially in empty Website Hosting A, the query corresponding with site name being put into one is initially in empty set B, when the ratio of the sum of the query of the query number in set B and the type reaches a predetermined threshold value, then stop adding data in Website Hosting A and set B, and the site list in Website Hosting A is exactly the authoritative website corresponding with the query of the type.
Preferably, the search word query that user inputs by sort module is divided into following classification system: navigational route type, information and affairs type;
Or,
Search word query that user inputs by sort module is divided into following classification system according to user search intent: navigation type, info class and resources-type.
More preferably, the query of sort module to region class complicated variant system classifies according to following steps:
Query coupling is carried out according at least one matching criterior;
Support vector machine svm model is utilized to classify to the query after coupling.
Can be found out by the technical scheme of the invention described above, adopt the present invention when excavating authoritative website, do not need manual intervention, it is a full automatic process, and excavation speed is fast, and owing to being adopt the result of the comparatively authoritative authoritative search engine such as Yahoo, Google, Baidu to carry out confluence analysis, so the authority of Result is also higher, and then ensure the availability of its authoritative website result finally excavated.
Accompanying drawing explanation
Accompanying drawing described herein is used to provide a further understanding of the present invention, forms a part of the present invention, and schematic description and description of the present invention, for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the schematic flow sheet of the authoritative website method for digging that the embodiment of the present invention provides;
Fig. 2 is the structured flowchart of the authoritative website excavating gear that the embodiment of the present invention provides;
Fig. 3 is the workflow diagram of data capture module in authoritative website excavating gear in the embodiment of the present invention.
Embodiment
In order to make technical matters to be solved by this invention, technical scheme and beneficial effect clearly, understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.
In the present invention, it is by classifying to user search word query, adopt in sorted query to authoritative search engine and capture Search Results, then website marking is carried out to the Search Results captured, and the authoritative website of the type query is finally excavated eventually through authoritative website mining algorithm provided by the invention, adopt the present invention, the authoritative website that its automatic mining is relevant to dissimilar query, speed and the accuracy of the excavation of authoritative website can be improved.
The method for digging of a kind of authoritative website that the embodiment of the present invention provides, comprises the steps:
S10, to user input search word query classify;
S11, to search in authoritative search engine according to sorted query;
S12, foundation Search Results carry out the excavation of authoritative website.
In the present embodiment, the search word query that user inputs is divided into following classification system: navigational route type, information and affairs type;
Or search word query user inputted is divided into following classification system according to user search intent: navigation type, info class and resources-type.
More specifically, the query of region class complicated variant system is classified according to following steps:
Query coupling is carried out according at least one matching criterior;
Support vector machine svm model is utilized to classify to the query after coupling.
Because different classes of the retrieved result of query often has very strong correlativity with the classification of website, therefore in embodiments of the present invention, the authoritative website of a classification website be excavated, first will determine the classification system of query.
How query is classified, current and clear and definite standard useless can be sayed, but a lot of work at present all receives the impact of the query classification work that the people such as Broader propose, and query, by manual analysis query, is divided into the classification system that 3 large: navigational route type, information, affairs type by them.
The query classification that the people such as Broader propose is very powerful, in embodiments of the present invention, the method can be adopted equally to carry out query classification.But research shows, this sorting technique is still too coarse, and the researchist of Yahoo has done refinement on this basis, will can be divided into following classification: navigation type, info class, resources-type according to user query intention to query, similarly, this sorting technique is suitable for the present invention equally.
In a kind of preferred implementation of the embodiment of the present invention, the search word query that user inputs is divided into following classification system: navigation type, question and answer class, and the little classification of various transactions classes (as weather, game, novel, music, the lyrics etc.);
And then user query is specifically classified.
In embodiments of the present invention, after determining query classification system after, specifically will classify to query.The main method of Query classification is matching criterior and model, usually, described matching criterior is generally grasped for those skilled in the art, such as, mainly contain some conventional list vocabularys in matching criterior and template is mated, after matching treatment, in embodiments of the present invention, then with svm model query is specifically classified.
Preferably, in described step S12, the step of carrying out authoritative website excavation according to Search Results comprises:
S121, resolve every bar query Search Results that rank is forward in authoritative search engine, obtain its corresponding URL;
S122, convert described URL to site name according at least one site name transformation rule;
S123, filter described website according at least one inactive website filtering rule;
S124, according at least one website Evaluation Strategy to filter after website give a mark;
S125, foundation marking result obtain authoritative website.
In above-mentioned steps, described site name transformation rule, inactive website filtering rule and website Evaluation Strategy all can be known from prior art, do not run business into particular one herein state this.
More preferably, in step s 125, the step obtaining authoritative website according to marking result comprises:
S1251, marking result is carried out associating to set up inverted list with the table that recording station is called the roll, and merge the marking result of every bar query in multiple authoritative search engine;
S1252, in inverted list, the marking situation according to each website sorts to website;
S1253, foundation ordering scenario obtain authoritative website.
More preferably, in step S1253, the step obtaining authoritative website according to ordering scenario comprises:
According to the size of site name correspondence marking result, from big to small site name being put into one is initially in empty Website Hosting A, the query corresponding with site name being put into one is initially in empty set B, when the ratio of the sum of the query number in set B and the type query reaches a predetermined threshold value, then stop adding data in Website Hosting A and set B, and the site list in Website Hosting A is exactly the authoritative website corresponding with the type query.
Such as, the method that a kind of authoritative website that the embodiment of the present invention provides excavates, the method comprises following concrete steps:
Step S101, to user input search word query classify;
Step S102, in the search engine of authority, Search Results is captured to the query of a certain particular type;
Step S103, parsing Search Results, preserve the URL result of rank in every bar query result forward (such as N bar);
Step S104, convert the URL in result to site name;
Step S105, the inactive website of filtration;
Step S106, the site name corresponding to every bar query are given a mark according to certain strategy;
Step S107, inverted list is set up to the table of the corresponding site name of the query after marking, and merge the Search Results of different authoritative search engine;
Step S108, the scoring event of inverted list by website to be sorted;
Step S109, from inverted list, take out the authoritative website of the type according to certain strategy.
More specifically, as shown in Fig. 1, the method specifically comprises:
Step 201, user query to be classified;
Step 202, in authoritative search engine, Search Results is captured to the query belonging to a certain type.
During concrete enforcement, described authoritative search engine can be Yahoo, Google, Baidu etc.
In this step, to each query in each class query, point being clipped to authoritative search engine goes to capture corresponding Search Results, such as can remove crawl Search Results to more authoritative Chinese search engine (Yahoo, Google) etc.
Step 203, resolve the Search Results that grabs, preserve N bar URL result before every bar query result.
In this step, the result of page searching grabbed is resolved, parse the URL of result of page searching, N bar URL before preserving, such as first 30.
Step 204, convert the URL in Search Results to site name.
Such as, in the present embodiment, type query Search Results in 3 different authoritative search engines, can obtain 3 groups 30 orderly url lists.
Conversion process is carried out to URL, the site name of whole URL URL is substituted, such as in the present embodiment, mode URL being converted to site name is: acquiescence site name is exactly the part that in URL, first '/' is front, namely with the site name of this acquiescence site name as this URL.
Step 205, filtration website.
Such as, in the present embodiment, inactive station point list can be utilized to filter website of stopping using.
During concrete enforcement, when filtering inactive website, needing arrangement inactive station point list, will be removing of inactive website in website used; When such as excavating novel class, the non-novel such as Baidupedia page accounting in URL is very high, therefore, is removed with inactive website table.
Inactive station point list can excavate prefinishing, also when not having inactive station point list, can excavate authoritative website in advance, and then from authoritative website, sum up the inactive station point list of the type, finally recycles inactive station point list and again excavates authoritative website.
Step 206, the site name corresponding to every bar query are given a mark according to certain strategy.
The site name corresponding to every bar url is given a mark now, and such as, in the present embodiment, the one marking strategy that it provides is: Article 1 mark is best, and mark is lower more rearward.If retain front 30 results, marking strategy can be: Article 1 28 points, and then first 10 are successively decreased 2 points, and middle 10 are successively decreased 1 point, and last 10 all obtain 1 point, that is:
Step 207, inverted list is set up to the table of the corresponding site name of the query after marking, and merge the result for retrieval of several search engine.
In this step, need the inverted list setting up the corresponding site name of query, the i.e. corresponding query of site name, the marking of all query corresponding for each site name is added up, also by the query of several authoritative search engine, situation of giving a mark all accumulates, such as in embodiments of the present invention, go Yahoo, Google and Baidu's search data respectively, just the marking of these 3 authoritative search engines is added up.
Step 208, inverted list is pressed website score height sequence.
In this step, now to the list of the corresponding query of site name, the height of website score sorts.
Step 209, from inverted list, take out the authoritative website of the type according to certain strategy
Such as, in the present embodiment, the strategy obtaining authoritative website from inverted list is as follows:
According to the size of site name to reserved portion, from big to small, site name is put into one and is initially in empty Website Hosting A, query corresponding for site name is put into one and is initially in empty set B.It is (actual when implementing when the query number in set B reaches a threshold value divided by the sum (namely total query number of the type query removal search engine. retrieves) of the type query, this threshold value can be carried out active according to actual conditions and be arranged, such as 95% etc., but, to in the setting up procedure of this threshold value, ensure accuracy rate when this threshold value is classified close to such query, the consistent of type of site can be guaranteed like this), stop adding data in set A and set B, now, site list in set A is exactly the authoritative website of such query or such website.
The embodiment of the present invention additionally provides a kind of excavating gear of authoritative website, as shown in Figure 2, comprising:
Sort module 201, classifies for the search word query inputted user;
Wherein, described sort module 201, is exactly the query first obtaining user's input, classifies to the query of user;
Data capture module 202, for searching in authoritative search engine according to sorted query;
Wherein, described data capture module 202 is exactly according to the query without type, captures retrieval retrieval result page in authoritative search engine;
Excavate module 203, for carrying out the excavation of authoritative website according to Search Results.
Wherein, described excavation module 203, carries out authoritative website excavation to the Search Results of a certain type query exactly, excavates the authoritative site list of the type query.
Particularly, the search word query that user inputs is divided into following classification system by sort module 201: navigational route type, information and affairs type; Or,
Search word query that user inputs by sort module 201 is divided into following classification system according to user search intent: navigation type, info class and resources-type.
Further, the query of sort module 201 pairs of region class complicated variant systems classifies according to following steps:
1, query coupling is carried out according at least one matching criterior;
2, support vector machine svm model is utilized to classify to the query after coupling.
Particularly, the step that excavation module 203 carries out authoritative website excavation according to Search Results comprises:
1, resolve every bar query Search Results that rank is forward in authoritative search engine, obtain its corresponding URL;
2, described URL is converted to site name according at least one site name transformation rule;
3, described website is filtered according at least one inactive website filtering rule;
4, according at least one website Evaluation Strategy, the website after filtration is given a mark;
5, authoritative website is obtained according to marking result.
More preferably, the step that excavation module 203 obtains authoritative website according to marking result comprises:
1, marking result is carried out associating to set up inverted list with the table that recording station is called the roll, and merge the marking result of every bar query in multiple authoritative search engine;
2, in inverted list, the marking situation according to each website sorts to website;
3, authoritative website is obtained according to ordering scenario.
More preferably, the step that excavation module 203 obtains authoritative website according to ordering scenario comprises:
According to the size of site name correspondence marking result, from big to small site name being put into one is initially in empty Website Hosting A, the query corresponding with site name being put into one is initially in empty set B, when the ratio of the sum of the query number in set B and the type query reaches a predetermined threshold value, then stop adding data in Website Hosting A and set B, and the site list in Website Hosting A is exactly the authoritative website corresponding with the type query.
In embodiments of the present invention, for sort module 201:
Sort module 201 is mainly classified to the query that user exports, due to the authoritative website of appearance the type that the result for retrieval of the query of same type is often relatively concentrated, carry out the excavation of authoritative website after query classification again, be conducive to the accuracy improving the excavation of authoritative website.
The classify classification method used of query is exactly rule and model, and the flow process of query classification comprises: first determine query classification system; The rule of some classification of summary and induction; Classify to the query svm model of the bad differentiation of rule.Wherein, query classification can adopt any one method of the prior art, and the embodiment of the present invention does not limit this.
For data capture module 202:
The search engine of authority is through long-term user accumulation, and result for retrieval is comparatively authoritative.Therefore, the result for retrieval of authoritative search engine, containing abundant information.
The main working process of data capture module 202 as shown in Figure 3, specifically comprises:
Step 301, the data grabber model provided in the embodiment of the present invention, mainly remove to authoritative search engine the result for retrieval capturing query, such as can go to capture result for retrieval to the more authoritative search engine Yahoo, Google etc. of Chinese.
Step 302, then, the result of page searching grabbed to be resolved, parse the url of result of page searching, N bar url before preserving, such as first 30 etc.
Step 303, last, url name is transformed into site name.
For excavation module 203:
First, resolve every bar query Search Results that rank is forward in authoritative search engine, obtain its corresponding URL, and convert described URL to site name according at least one site name transformation rule;
Then carry out the filtration of inactive website then;
30 corresponding to every bar query again websites are given a mark: the principle of marking is: more forward site name score is higher, and forward score decreasing gradient is large.To be as the criterion with forward marking when having an identical website in a query;
After the marking of the corresponding site name of query is completed, the table of corresponding for query site name is set up inverted list, i.e. the corresponding query of site name, and merge the score of a website in different query.
Finally, according to the size of the corresponding query score of site name, from big to small, site name is put in an empty Website Hosting A, query corresponding for site name is put in a null set B.When the query number in set B reaches a threshold value divided by the sum (namely total query number of the type query removal search engine. retrieves) of the type query, (this threshold value can oneself be arranged, such as 95% etc., but ensure accuracy rate when this threshold value is classified close to such query, the consistent of type of site can be guaranteed like this), stop adding data in set A and set B.Now, the site list in set A is exactly the authoritative website of the type excavated.
Above-mentioned explanation illustrate and describes a preferred embodiment of the present invention, but as previously mentioned, be to be understood that the present invention is not limited to the form disclosed by this paper, should not regard the eliminating to other embodiments as, and can be used for other combinations various, amendment and environment, and can in invention contemplated scope described herein, changed by the technology of above-mentioned instruction or association area or knowledge.And the change that those skilled in the art carry out and change do not depart from the spirit and scope of the present invention, then all should in the protection domain of claims of the present invention.

Claims (6)

1. a method for digging for authoritative website, is characterized in that, comprising:
The search word query that user inputs is divided into dissimilar;
Query according to variant type searches in authoritative search engine;
The excavation of authoritative website is carried out according to Search Results;
The excavation that described foundation Search Results carries out authoritative website comprises:
Resolve every bar query Search Results that rank is forward in authoritative search engine, obtain its corresponding URL;
Described URL is converted to site name according at least one site name transformation rule;
Described website is filtered according at least one inactive website filtering rule;
According at least one website Evaluation Strategy, the website after filtration is given a mark;
Authoritative website is obtained according to marking result;
Described foundation marking result obtains authoritative website and comprises,
Marking result is carried out associating to set up inverted list with the table that recording station is called the roll, and merges the marking result of every bar query in multiple authoritative search engine;
In inverted list, the marking situation according to each website sorts to website;
Authoritative website is obtained according to ordering scenario;
Described foundation ordering scenario obtains authoritative website and comprises,
According to the size of site name correspondence marking result, from big to small site name being put into one is initially in empty Website Hosting A, the query corresponding with site name being put into one is initially in empty set B, when the ratio of the sum of the query of the query number in set B and the type reaches a predetermined threshold value, then stop adding data in Website Hosting A and set B, and the site list in Website Hosting A is exactly the authoritative website corresponding with the query of the type.
2. the method for digging of authoritative website as claimed in claim 1, is characterized in that, the search word query that user inputs is divided into following classification system: navigational route type, information and affairs type;
Or search word query user inputted is divided into following classification system according to user search intent: navigation type, info class and resources-type.
3. the method for digging of authoritative website as claimed in claim 2, is characterized in that, classifies according to following steps to the query of region class complicated variant system:
Query coupling is carried out according at least one matching criterior;
Support vector machine svm model is utilized to classify to the query after coupling.
4. an excavating gear for authoritative website, is characterized in that, comprising:
Sort module, is divided into dissimilar for search word query user inputted;
Data capture module, searches in authoritative search engine for the query according to variant type;
Excavate module, for carrying out the excavation of authoritative website according to Search Results;
The step that described excavation module carries out authoritative website excavation according to Search Results comprises:
Resolve every bar query Search Results that rank is forward in authoritative search engine, obtain its corresponding URL;
Described URL is converted to site name according at least one site name transformation rule;
Described website is filtered according at least one inactive website filtering rule;
According at least one website Evaluation Strategy, the website after filtration is given a mark;
Authoritative website is obtained according to marking result;
The step that described excavation module obtains authoritative website according to marking result comprises:
Marking result is carried out associating to set up inverted list with the table that recording station is called the roll, and merges the marking result of every bar query in multiple authoritative search engine;
In inverted list, the marking situation according to each website sorts to website;
Authoritative website is obtained according to ordering scenario;
The step that described excavation module obtains authoritative website according to ordering scenario comprises:
According to the size of site name correspondence marking result, from big to small site name being put into one is initially in empty Website Hosting A, the query corresponding with site name being put into one is initially in empty set B, when the ratio of the sum of the query of the query number in set B and the type reaches a predetermined threshold value, then stop adding data in Website Hosting A and set B, and the site list in Website Hosting A is exactly the corresponding authoritative website with the type query.
5. the excavating gear of authoritative website as claimed in claim 4, is characterized in that,
The search word query that user inputs by sort module is divided into following classification system: navigational route type, information and affairs type;
Or,
Search word query that user inputs by sort module is divided into following classification system according to user search intent: navigation type, info class and resources-type.
6. the excavating gear of authoritative website as claimed in claim 5, is characterized in that, the query of sort module to region class complicated variant system classifies according to following steps:
Query coupling is carried out according at least one matching criterior;
Support vector machine svm model is utilized to classify to the query after coupling.
CN201210394980.9A 2012-10-17 2012-10-17 A kind of method for digging of authoritative website and device Active CN102880722B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210394980.9A CN102880722B (en) 2012-10-17 2012-10-17 A kind of method for digging of authoritative website and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210394980.9A CN102880722B (en) 2012-10-17 2012-10-17 A kind of method for digging of authoritative website and device

Publications (2)

Publication Number Publication Date
CN102880722A CN102880722A (en) 2013-01-16
CN102880722B true CN102880722B (en) 2015-08-05

Family

ID=47482048

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210394980.9A Active CN102880722B (en) 2012-10-17 2012-10-17 A kind of method for digging of authoritative website and device

Country Status (1)

Country Link
CN (1) CN102880722B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109388690A (en) * 2017-08-10 2019-02-26 阿里巴巴集团控股有限公司 Text searching method, inverted list generation method and system for text retrieval
CN107885857B (en) * 2017-11-17 2019-02-12 山东师范大学 A kind of search results pages user's behavior pattern mining method, apparatus and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102339322A (en) * 2011-11-10 2012-02-01 武汉大学 Word meaning extracting method based on search interactive information and user search intention
CN102411626A (en) * 2011-12-13 2012-04-11 北京大学 Correlation fraction distribution-based method for classifying query intentions

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102339322A (en) * 2011-11-10 2012-02-01 武汉大学 Word meaning extracting method based on search interactive information and user search intention
CN102411626A (en) * 2011-12-13 2012-04-11 北京大学 Correlation fraction distribution-based method for classifying query intentions

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《Web 检索查询意图分类技术综述》;张森等;《中文信息学报》;20080730;第22卷(第4期);第75-82页 *
《基于用户搜索意图的Web网页动态泛化》;王大玲等;《软件学报》;20100531;第21卷(第5期);第1083-1097页 *
孙原.《基于酉变换的权威页面挖掘算法研究》.《CNKI中国优秀硕士学位论文全文数据库(电子期刊)》.2011,正文第25-30、39-49页. *

Also Published As

Publication number Publication date
CN102880722A (en) 2013-01-16

Similar Documents

Publication Publication Date Title
CN103365924B (en) A kind of method of internet information search, device and terminal
US7882099B2 (en) System and method for focused re-crawling of web sites
CN103092950B (en) A kind of network public-opinion geographic position real-time monitoring system and method
CN105468744B (en) Big data platform for realizing tax public opinion analysis and full text retrieval
CN101246499A (en) Network information search method and system
CN101261629A (en) Specific information searching method based on automatic classification technology
CN103744954B (en) Word relevancy network model establishing method and establishing device thereof
CN106980651B (en) Crawling seed list updating method and device based on knowledge graph
CN104699835A (en) Method and device used for determining webpages including POI (point of interest) data
CN102081604A (en) Search method for meta search engine and device thereof
CN102567494A (en) Website classification method and device
CN103218375A (en) POI (Point of Interest) information supplementing method and device
CN104331473A (en) Academic knowledge acquisition method and academic knowledge acquisition system based on knowledge network nodes
CN102411617A (en) Method for storing and inquiring a large quantity of URLs
CN103440328B (en) A kind of user classification method based on mouse behavior
CN104133868A (en) Strategy used for vertical crawler data classification and integration
CN101957860B (en) Method and device for releasing and searching information
CN102880722B (en) A kind of method for digging of authoritative website and device
CN102799638B (en) In-page navigation generation method facing barrier-free access to webpage contents
CN105095091A (en) Software defect code file locating method based on reverse index technology
CN105528432B (en) A kind of digital resource hot spot generation method and device
CN105528421A (en) Search dimension excavation method of query terms in mass data
CN104156458B (en) The extracting method and device of a kind of information
CN102368266B (en) Sorting method of unlabelled pictures for network search
CN103729374A (en) Information search method and search engine

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 518057 C Building 5, Nanshan District software industry base, Shenzhen, Guangdong 403-409, China

Patentee after: Shenzhen easou world Polytron Technologies Inc

Address before: 518026 Guangdong city of Shenzhen province Futian District Binhe Road and CaiTian Road Interchange Union Square Tower A, A5501-A

Patentee before: Shenzhen Yisou Science & Technology Development Co., Ltd.