CN103778217A - Current webpage list-based method and system for recommendation - Google Patents

Current webpage list-based method and system for recommendation Download PDF

Info

Publication number
CN103778217A
CN103778217A CN201410024821.9A CN201410024821A CN103778217A CN 103778217 A CN103778217 A CN 103778217A CN 201410024821 A CN201410024821 A CN 201410024821A CN 103778217 A CN103778217 A CN 103778217A
Authority
CN
China
Prior art keywords
url
collected
webpage
module
web page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410024821.9A
Other languages
Chinese (zh)
Inventor
崔晶晶
林佳婕
吴鹏
马占国
李春华
刘立娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING GEO POLYMERIZATION TECHNOLOGY Co Ltd
Original Assignee
BEIJING GEO POLYMERIZATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING GEO POLYMERIZATION TECHNOLOGY Co Ltd filed Critical BEIJING GEO POLYMERIZATION TECHNOLOGY Co Ltd
Priority to CN201410024821.9A priority Critical patent/CN103778217A/en
Publication of CN103778217A publication Critical patent/CN103778217A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to the technical field of internet application and provides a current webpage list-based method and system for recommendation. The method comprises the steps of adopting a Bloom Filter algorithm to identify an acquired URL (uniform resource locator) and judging whether the URL already exists in the presently collected webpage list. According to the current webpage list-based method and system for recommendation, webpage address information to be collected can be efficiently accurately obtained, web pages to be collected and collected web pages can be subjected to identification processing in real time, resources are fully used, and service is provided for precise content recommending and advertisement putting.

Description

The method and system that list is recommended based on current web page
Technical field
The present invention relates to technical field of internet application, particularly a kind of method and system that list is recommended based on current web page.
Background technology
In Web content recommendation/advertisement putting field, need to be to Web information, and relate in Web information, magnanimity URL is analyzed, for avoiding the URL to repeating to gather, need to distinguish the URL gathering or do not gather, due to the huge amount of URL, to distinguish need of work and expend certain room and time, prior art can have been done effectively to sentence heavily to URL and process.The technology of current utilization has following several:
1. the storage of chained list or tree and sentence double recipe formula:
Use chained list or tree storage URL, sentence heavy URL by compare operation.Realizing of this scheme is fairly simple, utilizes to search and with comparing function, URL is done the judgement whether repeating.
The storage of 2.HashTable and sentence double recipe formula:
Choose suitable Hash function, by URL expressly by Hash Function Mapping to a point in bit array, thereby can judge fast whether certain URL was crawled.
But there is following defect in prior art:
In the storage of chained list or tree with sentence in double recipe formula, the mode of chained list and tree can keep the efficiency in regular hour and space in the time that URL quantity is little.But along with the continuous expansion of URL quantity, the time of URL retrieval and the efficiency in space all can reduce, time efficiency and space efficiency are respectively O (n), O (logn), now there is the webpage of magnanimity internet, cannot meet needs of production so adopt in this way for the storage of URL and the time of retrieval and space efficiency.
In the storage of HashTable with sentence in double recipe formula, the solution of HashTable can keep O (1) in time efficiency, but Hash can produce collision problem, and the collision rate causing in order to reduce collision, need again the number of elements that can hold HashTable to limit, suppose that Hash function is good, if our bitrate length is m point, in the time that needs are for example reduced to 1% by collision rate, this HashTable just can only hold m/100 element, and obviously this has just reduced space efficiency.
Summary of the invention
(1) technical matters to be solved by this invention:
This programme uses the mode of Bloom Filter, is all better than the mode of basic chained list in space or time efficiency.Owing to adopting multiple Hash functions, so reduced collision, in space availability ratio and collision probability, be better than the mode of HashTable simultaneously.
(2) technical scheme
For achieving the above object, the present invention proposes a kind of method and system that list is recommended based on current web page.Adopt Bloom Filter to identify the URL obtaining, judge that whether Already in it in existing web page listings, can obtain website information to be collected by efficiently and accurately, and can be in real time for webpage differentiating and processing to be collected and that gathered, making full use of resource, is accurate commending contents and ad placement services.
Bloom Filter has good room and time efficiency, be used to detect the member of an element in whether gathering, its random storage organization based on a kind of high spatial utilization factor, utilizes bit array to represent a set, and can judge whether an element belongs to this set.This detection only can be to the data misjudgement in set, and not can be not set in data misjudge, " in set (possible errors) " and " not in set (absolutely not in set) " two kinds of situations have been returned in each like this detection request, visible Bloom Filter is in the time judging whether an element belongs to certain set, likely the element that does not belong to this set is mistaken for and belongs to this set (False Positive), but the element that belongs to this set can't be mistaken for and not belong to that these are several.In the demand of URL duplicate removal, as long as meet certain ratio, False Positive is acceptable.This is just for the application of Bloom Filter provides good suitable environment.
Particularly, on the one hand, the invention provides a kind of method that list is recommended based on current web page, it is characterized in that, described method comprises step:
S1: obtain current accessed URL;
S2: judge the whether collected mistake of this URL, the whether collected mistake of URL that adopts Bloom Filter algorithm identified to obtain in this step, be, turn S3, no, turn S4;
S3: inquiry URL related data, turns S6;
S4: add queue to be collected, this URL is reported to and do not crawl the collection of network address queue wait reptile instrument;
S5: obtain URL relevant information;
S6: commending contents/input advertisement.
Preferably, in step S2, after service end is received the URL reporting in step S1, adopt whether once collected mistake of Bloom Filter this webpage of algorithm identified, after obtaining a URL, calculating respectively a corresponding k bit is 0 or 1, wherein, k is the number of the hash function of algorithm use, if be all 1 on k correspondence position, think that this URL has existed in existing set, , this webpage is collected mistake, as long as having a value on correspondence position is not 1, all think that this URL is not in existing set, , this webpage does not have collected mistake.
Preferably, in step S3, this URL is sent to context database, from context database, obtain the information such as categories of websites.
Preferably, in step S5, reptile instrument never gathers the related content of obtaining URL in network address queue and crawling web page/site corresponding to this network address, submits to data-analyzing machine and carries out content analysis, and stamp relevant label to the web page/site of analyzing after content.
Preferably, in step S6, service end is chosen suitable content according to the classification of this website and other information and is recommended and throw in as media or advertisement.
On the other hand, the invention provides the system that a kind of user property based on user tag excavates, it is characterized in that, described system comprises with lower module:
M1: for obtaining current accessed URL;
M2: for judging the whether collected mistake of this URL, the whether collected mistake of URL that adopts Bloom Filter algorithm identified to obtain in this module, be, turn M3, no, turn M4;
M3: for inquiring about URL related data, turn M6;
M4: for adding queue to be collected, this URL is reported to and do not crawl the collection of network address queue wait reptile instrument;
M5: for obtaining URL relevant information;
M6: for commending contents/input advertisement.
Preferably, in module M2, after service end is received the URL reporting in step S1, adopt whether once collected mistake of Bloom Filter this webpage of algorithm identified, after obtaining a URL, calculating respectively a corresponding k bit is 0 or 1, wherein, k is the number of the hash function of algorithm use, if be all 1 on k correspondence position, think that this URL has existed in existing set, , this webpage is collected mistake, as long as having a value on correspondence position is not 1, all think that this URL is not in existing set, , this webpage does not have collected mistake.
Preferably, in module M3, this URL is sent to context database, from context database, obtain the information such as categories of websites.
Preferably, in module M5, reptile instrument never gathers the related content of obtaining URL in network address queue and crawling web page/site corresponding to this network address, submits to data-analyzing machine and carries out content analysis, and stamp relevant label to the web page/site of analyzing after content.
Preferably, in module M6, service end is chosen suitable content according to the classification of this website and other information and is recommended and throw in as media or advertisement.
(3) technique effect
The present invention is used for obtaining web page listings, can effectively realize removing duplicate webpages.
The present invention uses Bloom Filter algorithm to obtain current web page list, can obtain website information to be collected by efficiently and accurately.
The present invention can process webpage to be collected and that gathered in real time, makes full use of resource, is accurate commending contents and ad placement services.
Accompanying drawing explanation
Fig. 1 is the method flow schematic diagram that in the present invention, list is recommended based on current web page;
Fig. 2 is the system architecture schematic diagram that in the present invention, list is recommended based on current web page.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out to clear, complete description, obviously, described embodiment is a part of embodiment of the present invention, rather than whole embodiment.Based on the embodiment in the present invention, the every other embodiment that those of ordinary skills obtain under the prerequisite of not making creative work, belongs to the scope of protection of the invention.
For solving the above-mentioned defect of prior art, the invention provides a kind of method and system that list is recommended based on current web page, by adopting Bloom Filter algorithm to the current execution discriminator that obtains webpage URL, to carry out different processing for the URL that belongs to different sets respectively, thereby obtain higher time efficiency and better space efficiency.
Bloom Filter is a kind of binary vector data structure, it has good room and time efficiency, be used to detect the member of an element in whether gathering, this detection only can be to the data misjudgement in set, and can be to not being that data in set are misjudged, " in set (possible errors) " and " not within gathering (absolutely not gather in) " two kinds of situations have been returned in each like this detection request.As needs judge an element be in a set, our common way is that all elements is preserved, then by relatively knowing that it is in set, chained list, tree are all based on this thinking, when the change of set interior element number large, the room and time that we need is all linear becomes large, and retrieval rate is also more and more slower.Bloom Filter adopt be the method for hash function, by a point on the array of an element map to m length, in the time that this point is 1, so this element set in, otherwise not set in.In order to solve the collision collision problem in common hash algorithm, in Bloom Filter algorithm, use corresponding k the point of k hash function, if institute is a little all 1, element is in set so, if having 0, element is not in set.The advantage of Bloom filter be exactly its insertion and query time be all constant, its searching elements is not but preserved element itself in addition, has good security.Its shortcoming is also apparent, and when the element inserting is more, the probability of misjudgement " in set " (False Positive) is just larger.But due in the demand of URL duplicate removal, as long as meet certain ratio, False Positive is acceptable.This is just for the application of Bloom Filter provides good suitable environment.
Based on this, in one embodiment of the invention, as shown in Figure 1, the method that list is recommended based on current web page mainly comprises step:
S1: obtain current accessed URL.
In the time of user's accessed web page, browser reports webpage URL to service end, can obtain by the modes such as plug-in unit or access log are installed the URL of current web page, and this step can be used the existing URL of obtaining technology.
S2: judge the whether collected mistake of this URL.If so, go to step S3; Otherwise go to step S4.
After service end is received the URL reporting in step S1, adopt whether once collected mistake of Bloom Filter this webpage of algorithm identified.
Particularly, after obtaining a URL, judge, suppose and use k hash function, calculating respectively a corresponding k bit is 0 or 1, if be all 1 on k correspondence position, thinks that this URL has existed in existing set,, collected mistake of this webpage.As long as having a value on correspondence position is not 1, all think that this URL is not in existing set, that is, this webpage does not have collected mistake.Along with the insertion of element, it is many that the value of revising in Bloom Filter becomes, the possibility of collision conflict is just larger, in the time newly arriving an element, meet its condition in set, all corresponding positions are all 1, so just may have two kinds of situations, once being this element in set, do not judge by accident; Also have a kind of situation to judge by accident exactly, occurred Hash collision, this element is not originally in set.Now, occur that the probability of judging by accident also becomes large thereupon.But compared to the algorithm of the single Hash function of existing use, can greatly reduce collision conflict with erroneous judgement problem and can effectively improve space efficiency.And in the demand of URL duplicate removal, as long as meet certain ratio, this False Rate is acceptable.
S3: inquiry URL related data.This URL is sent to context database, from context database, obtain the information such as categories of websites, go to step S6.
S4: add queue to be collected.This URL is reported to and do not crawl the collection of network address queue wait reptile instrument.
S5: obtain URL relevant information.Reptile instrument never gathers the related content of obtaining URL in network address queue and crawling web page/site corresponding to this network address, submit to data-analyzing machine and carry out content analysis, and stamp relevant label to the web page/site of analyzing after content, as, the division Type of website is the different Types of website such as shopping website, consulting website, news website.
S6: commending contents/input advertisement.Service end is chosen suitable content according to the classification of this website and other information and is recommended and throw in as media or advertisement.
One of ordinary skill in the art will appreciate that, the all or part of step realizing in above-described embodiment method is can carry out the hardware that instruction is relevant by program to complete, described program can be stored in a computer read/write memory medium, this program is in the time carrying out, comprise each step of above-described embodiment method, and described storage medium can be: ROM/RAM, magnetic disc, CD, storage card etc.Therefore, relevant technical staff in the field will be understood that corresponding with method of the present invention, and the present invention also comprises a kind of system that list is recommended based on current web page simultaneously, as shown in Figure 2, with said method step correspondingly, this system comprises:
Acquisition module, for obtaining current accessed URL.
In the time of user's accessed web page, this acquisition module drives browser to report webpage URL to service end, can obtain by the modes such as plug-in unit or access log are installed the URL of current web page, now can use the existing URL of obtaining technology.
Judge module, for judging the whether collected mistake of this URL.If so, process this URL by enquiry module; Otherwise, turn by acquisition module and process this URL.
After service end is received the URL obtaining in submodule M1, adopt whether once collected mistake of Bloom Filter this webpage of algorithm identified.
Particularly, after obtaining a URL, judge, suppose and use k hash function, calculating respectively a corresponding k bit is 0 or 1, if be all 1 on k correspondence position, thinks that this URL has existed in existing set,, collected mistake of this webpage.As long as having a value on correspondence position is not 1, all think that this URL is not in existing set, that is, this webpage does not have collected mistake.Along with the insertion of element, it is many that the value of revising in Bloom Filter becomes, the possibility of collision conflict is just larger, in the time newly arriving an element, meet its condition in set, all corresponding positions are all 1, so just may have two kinds of situations, once being this element in set, do not judge by accident; Also have a kind of situation to judge by accident exactly, occurred Hash collision, this element is not originally in set.Now, occur that the probability of judging by accident also becomes large thereupon.But compared to the algorithm of the single Hash function of existing use, can greatly reduce collision conflict with erroneous judgement problem and can effectively improve space efficiency.And in the demand of URL duplicate removal, as long as meet certain ratio, this False Rate is acceptable.
Enquiry module, for inquiring about URL related data.This URL is sent to context database, from context database, obtain the information such as categories of websites, turn by recommending module processing.
Queue module, for adding queue to be collected.This URL is reported to and do not crawl the collection of network address queue wait reptile instrument.
Acquisition module, for obtaining URL relevant information.Reptile instrument never gathers the related content of obtaining URL in network address queue and crawling web page/site corresponding to this network address, submit to data-analyzing machine and carry out content analysis, and stamp relevant label to the web page/site of analyzing after content, as, the division Type of website is the different Types of website such as shopping website, consulting website, news website.
Recommending module, for carrying out commending contents/input advertisement.Service end is chosen suitable content according to the classification of this website and other information and is recommended and throw in as media or advertisement.
The method and system that obtains web page listings that utilizes the present invention to propose, can effectively gather focus webpage, improves collecting efficiency.
Although below invention has been described in conjunction with the preferred embodiments, but it should be appreciated by those skilled in the art, method and system of the present invention is not limited to the embodiment described in embodiment, in the case of not deviating from the spirit and scope of the invention being limited by appended claims, can the present invention be made various modifications, increase and be replaced.

Claims (10)

1. the method that list is recommended based on current web page, is characterized in that, described method comprises step:
S1: obtain current accessed URL;
S2: adopt Bloom Filter algorithm to judge the whether collected mistake of webpage that described URL is corresponding, if so, go to step S3, otherwise, go to step S4;
S3: go to step S6 after inquiring about the related data of described URL;
S4: described URL is added to queue to be collected, reported to and do not crawl the collection of network address queue wait reptile instrument;
S5: utilize described reptile instrument to obtain the related data of described URL;
S6: carry out commending contents or input according to the related data of described URL.
2. the method for claim 1, is characterized in that, in step S2, described employing Bloom Filter algorithm judge webpage that described URL is corresponding whether once collected mistake be specially:
After obtaining a URL, calculating respectively a corresponding k bit is 0 or 1, wherein, k is the number of the hash function of algorithm use, if be all 1 on k correspondence position, think that this URL has existed in existing set, that is, corresponding webpage is collected mistake; As long as having a value on correspondence position is not 1, all think that this URL is not in existing set, that is, corresponding webpage does not have collected mistake.
3. the method for claim 1, is characterized in that, in step S3, this URL is sent to context database, obtains categories of websites information from context database.
4. the method for claim 1, it is characterized in that, in step S5, reptile instrument never gathers the related content of obtaining URL in network address queue and crawling its corresponding web page/site, submit to data-analyzing machine and carry out content analysis, and stamp relevant label to the web page/site of analyzing after content.
5. the method for claim 1, is characterized in that, in step S6, service end is chosen suitable content according to this categories of websites and other information and recommended and throw in.
6. the system that list is recommended based on current web page, is characterized in that, described system comprises:
Acquisition module, for obtaining current accessed URL;
Judge module, for the whether collected mistake of webpage that adopts Bloom Filter algorithm to judge that described URL is corresponding, if so, processes this URL by enquiry module; Otherwise, turn by this URL of queue resume module;
Enquiry module, turns after the related data for described URL by recommending module processing;
Queue module, adds queue to be collected by described URL, is reported to and does not crawl the collection of network address queue wait reptile instrument;
Acquisition module, for utilizing described reptile instrument to obtain the related data of described URL;
Recommending module, for carrying out commending contents or input according to the related data of described URL.
7. system as claimed in claim 6, is characterized in that, in described judge module, described employing Bloom Filter algorithm judge webpage that described URL is corresponding whether once collected mistake be specially:
After obtaining a URL, calculating respectively a corresponding k bit is 0 or 1, wherein, k is the number of the hash function of algorithm use, if be all 1 on k correspondence position, think that this URL has existed in existing set, that is, corresponding webpage is collected mistake; As long as having a value on correspondence position is not 1, all think that this URL is not in existing set, that is, corresponding webpage does not have collected mistake.
8. system as claimed in claim 6, is characterized in that, in described enquiry module, this URL is sent to context database, obtains categories of websites information from context database.
9. system as claimed in claim 6, it is characterized in that, in described acquisition module, reptile instrument never gathers the related content of obtaining URL in network address queue and crawling its corresponding web page/site, submit to data-analyzing machine and carry out content analysis, and stamp relevant label to the web page/site of analyzing after content.
10. system as claimed in claim 6, is characterized in that, in described recommending module, service end is chosen suitable content according to categories of websites and other information and recommended and throw in.
CN201410024821.9A 2014-01-20 2014-01-20 Current webpage list-based method and system for recommendation Pending CN103778217A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410024821.9A CN103778217A (en) 2014-01-20 2014-01-20 Current webpage list-based method and system for recommendation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410024821.9A CN103778217A (en) 2014-01-20 2014-01-20 Current webpage list-based method and system for recommendation

Publications (1)

Publication Number Publication Date
CN103778217A true CN103778217A (en) 2014-05-07

Family

ID=50570452

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410024821.9A Pending CN103778217A (en) 2014-01-20 2014-01-20 Current webpage list-based method and system for recommendation

Country Status (1)

Country Link
CN (1) CN103778217A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150356196A1 (en) * 2014-06-04 2015-12-10 International Business Machines Corporation Classifying uniform resource locators
CN105630980A (en) * 2015-12-25 2016-06-01 北京奇虎科技有限公司 Game recommending strategy obtaining method and device
CN106126648A (en) * 2016-06-23 2016-11-16 华南理工大学 A kind of based on the distributed merchandise news reptile method redo log
CN109472637A (en) * 2018-10-18 2019-03-15 微梦创科网络科技(中国)有限公司 A kind of user throws advertisement optimization method and device surely
CN110020058A (en) * 2017-12-30 2019-07-16 中国移动通信集团贵州有限公司 Information processing method, device, equipment and medium
CN110781386A (en) * 2019-10-10 2020-02-11 支付宝(杭州)信息技术有限公司 Information recommendation method and device, and bloom filter creation method and device
CN110968578A (en) * 2018-09-28 2020-04-07 中建水务环保有限公司 Sewage treatment process recommendation method and device
CN111209458A (en) * 2018-11-22 2020-05-29 顺丰科技有限公司 Data processing system and method for web crawler

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9928292B2 (en) * 2014-06-04 2018-03-27 International Business Machines Corporation Classifying uniform resource locators
US20160179929A1 (en) * 2014-06-04 2016-06-23 International Business Machines Corporation Classifying uniform resource locators
US9928301B2 (en) * 2014-06-04 2018-03-27 International Business Machines Corporation Classifying uniform resource locators
US20150356196A1 (en) * 2014-06-04 2015-12-10 International Business Machines Corporation Classifying uniform resource locators
US9569522B2 (en) * 2014-06-04 2017-02-14 International Business Machines Corporation Classifying uniform resource locators
US9582565B2 (en) * 2014-06-04 2017-02-28 International Business Machines Corporation Classifying uniform resource locators
US20170103138A1 (en) * 2014-06-04 2017-04-13 International Business Machines Corporation Classifying uniform resource locators
US20170109429A1 (en) * 2014-06-04 2017-04-20 International Business Machines Corporation Classifying uniform resource locators
CN105630980A (en) * 2015-12-25 2016-06-01 北京奇虎科技有限公司 Game recommending strategy obtaining method and device
CN105630980B (en) * 2015-12-25 2019-05-28 北京奇虎科技有限公司 Game recommdation strategy acquisition methods and device
CN106126648A (en) * 2016-06-23 2016-11-16 华南理工大学 A kind of based on the distributed merchandise news reptile method redo log
CN106126648B (en) * 2016-06-23 2019-04-09 华南理工大学 It is a kind of based on the distributed merchandise news crawler method redo log
CN110020058A (en) * 2017-12-30 2019-07-16 中国移动通信集团贵州有限公司 Information processing method, device, equipment and medium
CN110968578B (en) * 2018-09-28 2023-04-25 中建生态环境集团有限公司 Sewage treatment process recommendation method and device
CN110968578A (en) * 2018-09-28 2020-04-07 中建水务环保有限公司 Sewage treatment process recommendation method and device
CN109472637A (en) * 2018-10-18 2019-03-15 微梦创科网络科技(中国)有限公司 A kind of user throws advertisement optimization method and device surely
CN111209458A (en) * 2018-11-22 2020-05-29 顺丰科技有限公司 Data processing system and method for web crawler
CN110781386A (en) * 2019-10-10 2020-02-11 支付宝(杭州)信息技术有限公司 Information recommendation method and device, and bloom filter creation method and device

Similar Documents

Publication Publication Date Title
CN103778217A (en) Current webpage list-based method and system for recommendation
US9317613B2 (en) Large scale entity-specific resource classification
US20240111818A1 (en) Method for training isolation forest, and method for recognizing web crawler
CN101819573B (en) Self-adaptive network public opinion identification method
CN105404699A (en) Method, device and server for searching articles of finance and economics
US20140189525A1 (en) User behavior models based on source domain
CN106021583B (en) Statistical method and system for page flow data
CN110602045B (en) Malicious webpage identification method based on feature fusion and machine learning
CN102567494B (en) Website classification method and device
CN111125086B (en) Method, device, storage medium and processor for acquiring data resources
CN101814083A (en) Automatic webpage classification method and system
CN107800591A (en) A kind of analysis method of unified daily record data
CN105224636A (en) A kind of data access method and device
CN103246664A (en) Web page retrieval method and device
WO2011041345A1 (en) Identification disambiguation in databases
US20140358867A1 (en) De-duplication deployment planning
US20150302088A1 (en) Method and System for Providing Personalized Content
CN110546633A (en) Named entity based category tag addition for documents
CN108959550B (en) User focus mining method, device, equipment and computer readable medium
CN104699837A (en) Method, device and server for selecting illustrated pictures of web pages
CN116015842A (en) Network attack detection method based on user access behaviors
CN103455491A (en) Method and device for classifying search terms
CN103605744A (en) Method and device for analyzing website searching engine traffic data
CN114090643A (en) Recruitment information recommendation method, device, equipment and storage medium
CN116127047B (en) Method and device for establishing enterprise information base

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent for invention or patent application
CB03 Change of inventor or designer information

Inventor after: Cui Jingjing

Inventor after: Lin Jiajie

Inventor after: Wu Peng

Inventor after: Ma Zhanguo

Inventor after: Li Chunhua

Inventor before: Cui Jingjing

Inventor before: Lin Jiajie

Inventor before: Wu Peng

Inventor before: Ma Zhanguo

Inventor before: Li Chunhua

Inventor before: Liu Lina

COR Change of bibliographic data

Free format text: CORRECT: INVENTOR; FROM: CUI JINGJING LIN JIAJIE WU PENG MA ZHANGUO LI CHUNHUA LIU LINA TO: CUI JINGJING LIN JIAJIE WU PENG MA ZHANGUO LI CHUNHUA

RJ01 Rejection of invention patent application after publication

Application publication date: 20140507

RJ01 Rejection of invention patent application after publication