CN102663077A - Web search results security sorting method based on Hits algorithm - Google Patents

Web search results security sorting method based on Hits algorithm Download PDF

Info

Publication number
CN102663077A
CN102663077A CN2012100951402A CN201210095140A CN102663077A CN 102663077 A CN102663077 A CN 102663077A CN 2012100951402 A CN2012100951402 A CN 2012100951402A CN 201210095140 A CN201210095140 A CN 201210095140A CN 102663077 A CN102663077 A CN 102663077A
Authority
CN
China
Prior art keywords
webpage
page
collection
carry out
expressed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012100951402A
Other languages
Chinese (zh)
Other versions
CN102663077B (en
Inventor
陈志德
郭扬富
许力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Normal University
Original Assignee
Fujian Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Normal University filed Critical Fujian Normal University
Priority to CN201210095140.2A priority Critical patent/CN102663077B/en
Publication of CN102663077A publication Critical patent/CN102663077A/en
Application granted granted Critical
Publication of CN102663077B publication Critical patent/CN102663077B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to the technical field of network security, specifically to a Web search results security sorting method based on a Hits algorithm. The method comprises the following steps: establishing a malicious feature library F (f1, f2, f3, ..., fn), wherein the malicious feature library comprises n types feature codes that network virus, trojan and bugs are appeared in webpages; expressing each feature code of the malicious feature library as a vector form composed of m components, namely fx= (fx1 fx2, fx3, ..., fxm), wherein x belongs to a set of (1, 2,..., n), and fx belongs to F; the weight of each component is expressed by f'x; then, combining a vector space model with a malicious feature library so as to sorting webpage search results security. According to the web search results security sorting method provided by the invention, sorting of malicious webpages in the search results is reduced, therefore, probability of accessing insecure webpages is reduced.

Description

Web Search Results security sort method based on the Hits algorithm
Technical field
The present invention relates to the network security technology field, particularly a kind of Web Search Results security sort method based on the Hits algorithm.
Background technology
Along with developing rapidly of Internet, Web resource exponentially level increases the management that makes to the Web resource and seems difficult more.Nowadays, the malicious web pages of a large amount of wooden horses under cover, virus and illegal advertisement supervisor spreads unchecked in the Web network.These webpages are taked fraudulent means, utilize the limitation of search engine, make some malice page be hidden in the more forward position of search result rank, the very big like this information security that possibly jeopardize subscriber computer and other-end.This shows, solve and the problem of improving Web safety has been extremely urgent.
Summary of the invention
The object of the present invention is to provide a kind of Web Search Results security sort method based on the Hits algorithm, this method helps reducing the ordering of malicious web pages in Search Results, thereby reduces the probability that has access to dangerous webpage.
The technical scheme that the present invention adopts is: a kind of Web Search Results security sort method based on the Hits algorithm, set up a malice feature database F( f 1, f 2, f 3..., f n ), said malice feature database comprises nThe condition code that kind internet worm, wooden horse, leak occur in webpage is with each condition code of said malice feature database f i Be expressed as by mThe vector form that individual component is formed, promptly f i =( f i1 , f i2 , f i3 ..., f Im ), wherein i∈ 1,2 ..., n, f i FThen, based on the Hits algorithm, carry out the webpage security ordering as follows:
Step 1: search for is submitted to the text based search engine, before from the set of return results webpage, getting tThe set of individual webpage is designated as the root collection RTo said collection RThe middle adding by the root collection RThe webpage of quoting with quote the root collection RWebpage, after inherence link and uncorrelated link handled, with the root collection RBe extended to set GWith set GIn the Hub webpage be vertex set V 1, be vertex set with the Authority webpage V 2, V 1In webpage arrive V 2In the hyperlink of webpage be the limit collection E, form two fens digraph S=( V 1, V 2, E), right V 1In arbitrary summit v, use h( v) the expression webpage vThe Hub value, right V 2In arbitrary summit u, use a( u) the expression webpage uThe Authority value, when initial h( v)= a( u)=1;
Step 2: right uCarry out the I operation, revise its a( u), right vCarry out the O operation, revise its h( v), I operation, O operation are respectively:
I operation:
Figure 2012100951402100002DEST_PATH_IMAGE002
O operation:
Figure 2012100951402100002DEST_PATH_IMAGE004
In the above-mentioned formula,
Figure 2012100951402100002DEST_PATH_IMAGE006
Expression is gone through time V 1The middle page and summation,
Figure 2012100951402100002DEST_PATH_IMAGE008
Expression is gone through time V 2The middle page and summation, Risk( F, u), Risk( F, v) calculate by following formula:
Figure 2012100951402100002DEST_PATH_IMAGE010
Figure 2012100951402100002DEST_PATH_IMAGE012
In the above-mentioned formula, μ i Represent in the said malice feature database iPlant the harm factor of condition code, μ i ∈ (0,1); The said page uBe text collection, the page uBeing expressed as vector does u( u 1, u 2, u 3..., u p ), with the page uEach component u k Be expressed as by mThe vector form that individual component is formed, promptly u K= ( u k1 , u k2 , u k3 ..., u Km ), wherein k∈ 1,2 ..., p, u k uThe said page vBe text collection, the page vBeing expressed as vector does v( v 1, v 2, v 3..., v p ), with the page vEach component v k Be expressed as by mThe vector form that individual component is formed, promptly v K= ( v k1 , v k2 , v k3 ..., v Km ), wherein k∈ 1,2 ..., p, v k v
Step 3: 2 pairs of vertex sets set by step V 2In all pages carry out I operation, to vertex set V 1In all pages carry out O operation; After the completion, right by following formula a( u), h( v) carry out standardization processing:
Figure 2012100951402100002DEST_PATH_IMAGE014
In the above-mentioned formula, qThe quantity of expression chain ingress;
Step 4: repeating step 2,3 carries out iterative computation, up to a( u), h( v) convergence;
Step 5: last according to each page a( u) value just sorts by security to each page.
The invention has the beneficial effects as follows on the basis of Hits algorithm,, the degree of risk of webpage is estimated in conjunction with vector space model and malice feature database.Through the Authority value of restriction malicious web pages, reduce the ordering of malicious web pages in Search Results, thereby reduced the probability that has access to dangerous webpage, strengthened Web safety.
Description of drawings
Fig. 1 is the fundamental diagram of the embodiment of the invention.
Embodiment
The present invention is based on the Web Search Results security sort method of Hits algorithm, set up a malice feature database F( f 1, f 2, f 3..., f n ), said malice feature database comprises nThe condition code that kind internet worm, wooden horse, leak occur in webpage is with each condition code of said malice feature database f i Be expressed as by mThe vector form that individual component is formed, promptly f i =( f i1 , f i2 , f i3 ..., f Im ), wherein i∈ 1,2 ..., n, f i FThen, based on the Hits algorithm, carry out the webpage security ordering as follows:
Step 1: search for is submitted to the text based search engine, before from the set of return results webpage, getting tThe set of individual webpage is designated as the root collection RTo said collection RThe middle adding by the root collection RThe webpage of quoting with quote the root collection RWebpage, after inherence link and uncorrelated link handled, with the root collection RBe extended to set GWith set GIn the Hub webpage be vertex set V 1, be vertex set with the Authority webpage V 2, V 1In webpage arrive V 2In the hyperlink of webpage be the limit collection E, form two fens digraph S=( V 1, V 2, E), right V 1In arbitrary summit v, use h( v) the expression webpage vThe Hub value, right V 2In arbitrary summit u, use a( u) the expression webpage uThe Authority value, when initial h( v)= a( u)=1;
Step 2: right uCarry out the I operation, revise its a( u), right vCarry out the O operation, revise its h( v), I operation, O operation are respectively:
I operation:
Figure 444027DEST_PATH_IMAGE002
O operation:
Figure 361168DEST_PATH_IMAGE004
In the above-mentioned formula,
Figure 140905DEST_PATH_IMAGE006
Expression is gone through time V 1The middle page and summation, Expression is gone through time V 2The middle page and summation, Risk( F, u), Risk( F, v) calculate by following formula:
Figure 412803DEST_PATH_IMAGE010
In the above-mentioned formula, μ i Represent in the said malice feature database iPlant the harm factor of condition code, μ i ∈ (0,1); The said page uBe text collection, the page uBeing expressed as vector does u( u 1, u 2, u 3..., u p ), with the page uEach component u k Be expressed as by mThe vector form that individual component is formed, promptly u K= ( u k1 , u k2 , u k3 ..., u Km ), wherein k∈ 1,2 ..., p, u k uThe said page vBe text collection, the page vBeing expressed as vector does v( v 1, v 2, v 3..., v p ), with the page vEach component v k Be expressed as by mThe vector form that individual component is formed, promptly v K= ( v k1 , v k2 , v k3 ..., v Km ), wherein k∈ 1,2 ..., p, v k v
Step 3: 2 pairs of vertex sets set by step V 2In all pages carry out I operation, to vertex set V 1In all pages carry out O operation; After the completion, right by following formula a( u), h( v) carry out standardization processing:
Figure 502299DEST_PATH_IMAGE014
Figure 763516DEST_PATH_IMAGE016
In the above-mentioned formula, qThe quantity of expression chain ingress;
Step 4: repeating step 2,3 carries out iterative computation, up to a( u), h( v) convergence;
Step 5: last according to each page a( u) value just sorts by security to each page.
Be further described in the face of the related content that the present invention relates to down.
1, Hits algorithm
The Hits algorithm is a kind of web page interlinkage analytical algorithm by the Kleinberg proposition of IBM.Its principle is according to a given search for , searches the authoritative page relevant with theme through link analysis.Basic idea is to draw the weights of each webpage through the web page interlinkage analysis, thereby draws the authority of webpage.The Hits algorithm is divided into two types with webpage: a kind of for expressing the authoritative page of a certain theme, be called the authority page; The another kind of page for linking together these authority pages is called the hubs page.Two important weights notions of Hits algorithm design:
Authority: represent the weighting quantity that authoritative webpage is quoted by other webpage, weighting in-degree value that promptly should authority's webpage.If the number of times that certain webpage is cited is many more, then the weighting in-degree value of this webpage is big more, and Authority is big more, and webpage is also just important more.
Hub: represent that Web page points to the weighting quantity of other webpage, i.e. the weighting out-degree value of this Web page, it provides the link of pointing to authoritative webpage set.If the weighting out-degree value of certain webpage is big more, the Hub value of this this webpage is big more.Hub plays the effect of the implicit declaration theme authority page.
Ideally, the result set
Figure 2012100951402100002DEST_PATH_IMAGE020
that obtains through search for
Figure 432395DEST_PATH_IMAGE018
has following characteristics:
(1)
Figure 488075DEST_PATH_IMAGE020
is less relatively;
(2) related web page is abundant in
Figure 242405DEST_PATH_IMAGE020
;
(3)
Figure 979417DEST_PATH_IMAGE020
comprises the authority page of most most worthies.
To
Figure 502802DEST_PATH_IMAGE018
putd question in concrete retrieval, make up following about the gathering subgraph process of this enquirement:
The result set that uses text based search engine (like Hotbot, AltaVista) retrieval to obtain
Figure 496428DEST_PATH_IMAGE018
; Get the result set
Figure 2012100951402100002DEST_PATH_IMAGE024
of the most preceding
Figure 2012100951402100002DEST_PATH_IMAGE022
position of rank, be called root collection (Root Set).
Figure 738053DEST_PATH_IMAGE024
satisfies characteristics (1), (2); But far can not satisfy characteristics (3), therefore need to expand
Figure 13177DEST_PATH_IMAGE024
.
Expand
Figure 718965DEST_PATH_IMAGE024
, mainly be divided into two aspects.The one, the page in all
Figure 54131DEST_PATH_IMAGE024
is expanded into; Be in the graph model; The directed edge that with
Figure 783053DEST_PATH_IMAGE024
is starting point expands into, and the quantity of expansion is restriction not; The 2nd, the link page of each page in pointing to
Figure 861867DEST_PATH_IMAGE024
is got wherein any
Figure 2012100951402100002DEST_PATH_IMAGE026
;
Figure 422162DEST_PATH_IMAGE026
value is set at 50 usually; If
Figure 193808DEST_PATH_IMAGE026
is not more than 50, then get its all pages.These pages are extended to formation
Figure 89269DEST_PATH_IMAGE020
in original
Figure 144447DEST_PATH_IMAGE024
, be called baseset (Base Set).Such set
Figure 441753DEST_PATH_IMAGE020
can be satisfied above-mentioned three characteristics preferably, the quantity of
Figure 384301DEST_PATH_IMAGE020
generally 1000 in 5000.
In order to improve the calculating effect;
Figure 822236DEST_PATH_IMAGE020
done further processing; Be divided into two kinds of situation to link: first kind of two page that are meant linking relationship is between the different domain names, and such link is called horizontal link; Second kind is meant two link pages under the same domain name, and such link is called inherent link.
Inherent link has only inner navigation function; Almost can not transmit the authority value between webpage; Therefore, the linking relationship of this type is deleted from
Figure 242853DEST_PATH_IMAGE020
.Some incoherent links go out again like advertisement etc., form .
Figure 276275DEST_PATH_IMAGE028
can think to satisfy the gathering subgraph of above-mentioned 3 characteristics.Through calculating hubs and authorities, more last convergent authorities value is carried out an ordering, obtain the result who needs.
Authorities and hubs are the relations that strengthens each other, and a good hub page or leaf points to a lot of good authorities, and simultaneously, a good authority page or leaf also has much good hubs to point to it.
For
Figure 389724DEST_PATH_IMAGE028
, be expressed as two fens digraphs
Figure 2012100951402100002DEST_PATH_IMAGE030
.Any vertex v in
Figure 2012100951402100002DEST_PATH_IMAGE032
; The Hub value of expression webpage with
Figure 2012100951402100002DEST_PATH_IMAGE034
; To the summit u in
Figure 2012100951402100002DEST_PATH_IMAGE038
, the Authority value of expression webpage with
Figure 2012100951402100002DEST_PATH_IMAGE040
. at first;
Figure 2012100951402100002DEST_PATH_IMAGE044
carried out the I operation;
Figure 439589DEST_PATH_IMAGE036
carried out the O operation; Revise ,
Figure 787710DEST_PATH_IMAGE034
respectively, then standardization.Like this operation I, O below the double counting constantly is up to
Figure 72061DEST_PATH_IMAGE040
,
Figure 484588DEST_PATH_IMAGE034
convergence.
I operation:
Figure 2012100951402100002DEST_PATH_IMAGE046
O operation:
Figure 2012100951402100002DEST_PATH_IMAGE048
In the above-mentioned formula,
Figure 76369DEST_PATH_IMAGE006
Expression is gone through time V 1The middle page and summation,
Figure 257951DEST_PATH_IMAGE008
Expression is gone through time V 2The middle page and summation.
After each iteration requires
Figure 447624DEST_PATH_IMAGE040
,
Figure 347447DEST_PATH_IMAGE034
is normalized:
Figure 975874DEST_PATH_IMAGE014
Figure 277543DEST_PATH_IMAGE016
2, based on the Web safety of Hits algorithm
Security model is mainly mated by malice feature database and page source code to be accomplished.Adopt the similarity between similar vector space model (VSM, Vector Space Model) the retrieval character sign indicating number and the page, i.e. risk.In this model, document is represented with vector, and the condition code in the document is represented with the component of vector, and its component value is a weight.
Figure 2012100951402100002DEST_PATH_IMAGE050
Wherein, can be respectively the proper vector of malice feature database and document;
Figure 2012100951402100002DEST_PATH_IMAGE054
is the dimension of proper vector,
Figure 2012100951402100002DEST_PATH_IMAGE056
be
Figure 2012100951402100002DEST_PATH_IMAGE058
dimension of proper vector.
In like manner, risk storehouse FWith document DSimilarity, can be used for estimating the risk of a page
Figure 2012100951402100002DEST_PATH_IMAGE060
More than be preferred embodiment of the present invention, all changes of doing according to technical scheme of the present invention when the function that is produced does not exceed the scope of technical scheme of the present invention, all belong to protection scope of the present invention.

Claims (1)

1. the Web Search Results security sort method based on the Hits algorithm is characterized in that: set up a malice feature database F( f 1, f 2, f 3..., f n ), said malice feature database comprises nThe condition code that kind internet worm, wooden horse, leak occur in webpage is with each condition code of said malice feature database f i Be expressed as by mThe vector form that individual component is formed, promptly f i =( f i1 , f i2 , f i3 ..., f Im ), wherein i∈ 1,2 ..., n, f i FThen, based on the Hits algorithm, carry out the webpage security ordering as follows:
Step 1: search for is submitted to the text based search engine, before from the set of return results webpage, getting tThe set of individual webpage is designated as the root collection RTo said collection RThe middle adding by the root collection RThe webpage of quoting with quote the root collection RWebpage, after inherence link and uncorrelated link handled, with the root collection RBe extended to set GWith set GIn the Hub webpage be vertex set V 1, be vertex set with the Authority webpage V 2, V 1In webpage arrive V 2In the hyperlink of webpage be the limit collection E, form two fens digraph S=( V 1, V 2, E), right V 1In arbitrary summit v, use h( v) the expression webpage vThe Hub value, right V 2In arbitrary summit u, use a( u) the expression webpage uThe Authority value, when initial h( v)= a( u)=1;
Step 2: right uCarry out the I operation, revise its a( u), right vCarry out the O operation, revise its h( v), I operation, O operation are respectively:
I operation:
Figure 494427DEST_PATH_IMAGE002
O operation:
Figure 411567DEST_PATH_IMAGE004
In the above-mentioned formula,
Figure 191305DEST_PATH_IMAGE006
Expression is gone through time V 1The middle page and summation,
Figure 586514DEST_PATH_IMAGE008
Expression is gone through time V 2The middle page and summation, Risk( F, u), Risk( F, v) calculate by following formula:
Figure 400886DEST_PATH_IMAGE010
Figure 285665DEST_PATH_IMAGE012
In the above-mentioned formula, μ i Represent in the said malice feature database iPlant the harm factor of condition code, μ i ∈ (0,1); The said page uBe text collection, the page uBeing expressed as vector does u( u 1, u 2, u 3..., u p ), with the page uEach component u k Be expressed as by mThe vector form that individual component is formed, promptly u K= ( u k1 , u k2 , u k3 ..., u Km ), wherein k∈ 1,2 ..., p, u k uThe said page vBe text collection, the page vBeing expressed as vector does v( v 1, v 2, v 3..., v p ), with the page vEach component v k Be expressed as by mThe vector form that individual component is formed, promptly v K= ( v k1 , v k2 , v k3 ..., v Km ), wherein k∈ 1,2 ..., p, v k v
Step 3: 2 pairs of vertex sets set by step V 2In all pages carry out I operation, to vertex set V 1In all pages carry out O operation; After the completion, right by following formula a( u), h( v) carry out standardization processing:
Figure 552699DEST_PATH_IMAGE014
Figure 751599DEST_PATH_IMAGE016
In the above-mentioned formula, qThe quantity of expression chain ingress;
Step 4: repeating step 2,3 carries out iterative computation, up to a( u), h( v) convergence;
Step 5: last according to each page a( u) value just sorts by security to each page.
CN201210095140.2A 2012-03-31 2012-03-31 Web search results security sorting method based on Hits algorithm Expired - Fee Related CN102663077B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210095140.2A CN102663077B (en) 2012-03-31 2012-03-31 Web search results security sorting method based on Hits algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210095140.2A CN102663077B (en) 2012-03-31 2012-03-31 Web search results security sorting method based on Hits algorithm

Publications (2)

Publication Number Publication Date
CN102663077A true CN102663077A (en) 2012-09-12
CN102663077B CN102663077B (en) 2014-03-12

Family

ID=46772568

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210095140.2A Expired - Fee Related CN102663077B (en) 2012-03-31 2012-03-31 Web search results security sorting method based on Hits algorithm

Country Status (1)

Country Link
CN (1) CN102663077B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014059852A1 (en) * 2012-10-17 2014-04-24 北京奇虎科技有限公司 Search server and search method
CN103761476A (en) * 2013-12-30 2014-04-30 北京奇虎科技有限公司 Characteristic extraction method and device
CN107622048A (en) * 2017-09-06 2018-01-23 上海斐讯数据通信技术有限公司 A kind of text mode recognition method and system
CN108182186A (en) * 2016-12-08 2018-06-19 广东精点数据科技股份有限公司 A kind of Web page sequencing method based on random forests algorithm

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101409634A (en) * 2007-10-10 2009-04-15 中国科学院自动化研究所 Quantitative analysis tools and method for internet news influence based on information retrieval

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101409634A (en) * 2007-10-10 2009-04-15 中国科学院自动化研究所 Quantitative analysis tools and method for internet news influence based on information retrieval

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DAWEI HONG ET AL: "Analysis of Web search algorithm HITS", 《INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE》 *
JUNG-HUN LEE ET AL: "Associated word extraction system for search query expansion based on hits", 《CCIS》 *
YANGFU GUO ET AL: "《Optimizing for Web Security Based on Search Engine》", 《2010 INTERNATIONAL CONFERENCE ON COMPUTER DESIGN AND APPLIATIONS (ICCDA 2010)》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014059852A1 (en) * 2012-10-17 2014-04-24 北京奇虎科技有限公司 Search server and search method
CN103761476A (en) * 2013-12-30 2014-04-30 北京奇虎科技有限公司 Characteristic extraction method and device
CN103761476B (en) * 2013-12-30 2016-11-09 北京奇虎科技有限公司 The method and device of feature extraction
CN108182186A (en) * 2016-12-08 2018-06-19 广东精点数据科技股份有限公司 A kind of Web page sequencing method based on random forests algorithm
CN107622048A (en) * 2017-09-06 2018-01-23 上海斐讯数据通信技术有限公司 A kind of text mode recognition method and system
CN107622048B (en) * 2017-09-06 2021-06-22 南京硅基智能科技有限公司 Text mode recognition method and system

Also Published As

Publication number Publication date
CN102663077B (en) 2014-03-12

Similar Documents

Publication Publication Date Title
US20180144132A1 (en) Kind of android malicious code detection method on the base of community structure analysis
Hu et al. Large-scale malware indexing using function-call graphs
CN102436563B (en) Method and device for detecting page tampering
CN106021256A (en) De-duplicating distributed file system using cloud-based object store
CN102591965B (en) Method and device for detecting black chain
CN104516910A (en) Method and system for recommending content in client-side server environment
CN102790762A (en) Phishing website detection method based on uniform resource locator (URL) classification
CN101853277A (en) Vulnerability data mining method based on classification and association analysis
US20130339369A1 (en) Search Method and Apparatus
CN107437026B (en) Malicious webpage advertisement detection method based on advertisement network topology
Satpal et al. Web information extraction using markov logic networks
CN103984883A (en) Class dependency graph based Android application similarity detection method
CN103150663A (en) Method and device for placing network placement data
CN104268142A (en) Meta search result ranking algorithm based on rejection strategy
CN104834736A (en) Method and device for establishing index database and retrieval method, device and system
CN102663077B (en) Web search results security sorting method based on Hits algorithm
CN111181922A (en) Fishing link detection method and system
CN111754338B (en) Method and system for identifying partner of trepanning loan website
Fdez-Glez et al. A dynamic model for integrating simple web spam classification techniques
WO2017086992A1 (en) Malicious web content discovery through graphical model inference
CN104881446A (en) Searching method and searching device
CN101268465B (en) Method for sorting a set of electronic documents
Choudhary et al. Role of ranking algorithms for information retrieval
CN110781497B (en) Method for detecting web page link and storage medium
CN104462241A (en) Population property classification method and device based on anchor texts and peripheral texts in URLs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140312

Termination date: 20170331

CF01 Termination of patent right due to non-payment of annual fee