CN106126716A - A kind of data crawling method and device - Google Patents

A kind of data crawling method and device Download PDF

Info

Publication number
CN106126716A
CN106126716A CN201610511377.2A CN201610511377A CN106126716A CN 106126716 A CN106126716 A CN 106126716A CN 201610511377 A CN201610511377 A CN 201610511377A CN 106126716 A CN106126716 A CN 106126716A
Authority
CN
China
Prior art keywords
contents producer
data
producer
contents
identification information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610511377.2A
Other languages
Chinese (zh)
Inventor
姚光明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201610511377.2A priority Critical patent/CN106126716A/en
Publication of CN106126716A publication Critical patent/CN106126716A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a kind of data crawling method and device, be obtained ahead of time and store the identification information of at least one contents producer;According to the identification information of contents producer at least one described, determine at least one and described contents producer contents producer personal homepage one to one;For each contents producer, the most within it hold in Producer personal homepage, crawl all data that described contents producer produces.The embodiment that the application present invention provides can crawl comprehensive data.

Description

A kind of data crawling method and device
Technical field
The present invention relates to network information search field, particularly to a kind of data crawling method and device.
Background technology
Along with developing rapidly of network, WWW becomes the carrier of magnanimity information, and people are extra large at these by research tool Amount data are retrieved.The result that research tool returns contains the unconcerned data of a large number of users, goes to search in these data The data that rope user is concerned about become a difficult problem.In this case orientation capture related web page resource crawler system meet the tendency and Raw, can be according to set crawl target, selectively webpage in WWW or chain the data message required for acquisition.
Existing crawler system is when crawling microblogging or video data, and the mode generally used has based on search key Word, the crawling of list page.When scanning for key word and crawling, key step has the searching interface calling crawler system, input Search key word, then downloads the result of search;Extract content details page URL by the result searched, and be downloaded.Should The major defect of mode is that Search Results has number to limit, and the data crawled can be caused the most comprehensive;And based on list page crawl, Owing to by list page number quantitative limitation, also existing and crawling the incomplete problem of data.
Summary of the invention
The purpose of the embodiment of the present invention is to provide a kind of data crawling method and device, in order to crawl comprehensively number According to.
For reaching above-mentioned purpose, the embodiment of the invention discloses a kind of data crawling method, be obtained ahead of time and store at least The identification information of one contents producer;Described method includes:
According to the identification information of contents producer at least one described, determine that at least one is with described contents producer one by one Corresponding contents producer personal homepage;
For each contents producer, the most within it hold in Producer personal homepage, crawl described contents producer The all data produced.
Preferably, it is thus achieved that the identification information of contents producer, including:
The identification information of contents producer is extracted from the results page scanned for key word;
Or
Crawl scheme based on the homepage degree of depth, from targeted website, extract the identification information of contents producer.
Preferably, described method also includes:
For each contents producer, all data produced according to the described contents producer crawled, determine institute State the frequency of contents producer creation data;
Frequency determined by with, in described contents producer personal homepage, the described contents production not crawled The data that person produces.
Preferably, described method also includes:
According to the evaluation information of the data that the user's described contents producer to having crawled produces, determine each contents production The priority of person;
According to described priority from high to low order, in described contents producer personal homepage, do not crawled Described contents producer produce data.
For reaching above-mentioned purpose, the embodiment of the invention discloses a kind of data and crawl device, described device includes:
Obtain module, for being obtained ahead of time and store the identification information of at least one contents producer;
First determines module, for according to the identification information of contents producer at least one described, determine at least one with Described contents producer contents producer personal homepage one to one;
First crawls module, for for each contents producer, the most within it holds in Producer personal homepage, climbs Take all data that described contents producer produces.
Preferably, described acquisition module, specifically for:
From the results page scanned for key word, extract and store the identification information of at least one contents producer;
Or
Crawl scheme based on the homepage degree of depth, from targeted website, extract and store the mark letter of at least one contents producer Breath.
Preferably, described device also includes: second determines that module and second crawls module,
Described second determines module, for for each contents producer, according to the described contents producer crawled The all data produced, determine the frequency of described contents producer creation data;
Described second crawls module, for determined by frequency, in described contents producer personal homepage, crawl not The data that the described contents producer crawled produces.
Preferably, described device also includes: the 3rd determines that module and the 3rd crawls module;
Described 3rd determines module, for according to user's evaluation to the data that the described contents producer that crawled produces Information, determines the priority of each contents producer;
Described 3rd crawls module, is used for according to described priority from high to low order, in described contents producer In people's homepage, the data that the described contents producer not crawled produces.
As seen from the above technical solutions, the data crawling method of embodiment of the present invention offer and device, it is obtained ahead of time also Store the identification information of at least one contents producer;According to the identification information of at least one contents producer, determine at least one Individual with contents producer contents producer personal homepage one to one;For each contents producer, the most within it hold In Producer personal homepage, crawl all data that contents producer produces.The technical scheme that the application embodiment of the present invention provides, After the identification information obtaining contents producer, this contents producer can be crawled according to the flag information of this contents producer The all data produced, thus crawl comprehensive data.
Certainly, either method or the device of implementing the present invention must be not necessarily required to reach all the above excellent simultaneously Point.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing In having technology to describe, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to Other accompanying drawing is obtained according to these accompanying drawings.
A kind of schematic flow sheet of the data crawling method that Fig. 1 provides for the embodiment of the present invention;
The another kind of schematic flow sheet of the data crawling method that Fig. 2 provides for the embodiment of the present invention;
Another schematic flow sheet of the data crawling method that Fig. 3 provides for the embodiment of the present invention;
Fig. 4 crawls a kind of structural representation of device for the data that the embodiment of the present invention provides;
Fig. 5 crawls the another kind of structural representation of device for the data that the embodiment of the present invention provides;
Fig. 6 crawls the yet another construction schematic diagram of device for the data that the embodiment of the present invention provides.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Describe, it is clear that described embodiment is only a part of embodiment of the present invention rather than whole embodiments wholely.Based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under not making creative work premise Embodiment, broadly falls into the scope of protection of the invention.
For solving prior art problem, embodiments provide a kind of data crawling method and device, individually below It is described in detail.
It should be noted that a kind of data crawling method of embodiment of the present invention offer and device, it is adaptable to crawler system. In actual application, prestore the identification information of at least one contents producer, such as individual ID, the name on account of video uploader Deng, as follow-up crawl data time use.This process is to form the mark corresponding with search content, for carrying out comprehensive content Crawl work to prepare.
A kind of schematic flow sheet of the data crawling method that Fig. 1 provides for the embodiment of the present invention, comprises the steps:
S101, according to the identification information of contents producer at least one described, determines at least one and described contents production Person's contents producer personal homepage one to one.
Concrete, in actual application, it is obtained ahead of time and stores the identification information of at least one contents producer.Wherein, mark Knowledge information can be name, No. ID, account etc., and the concrete manifestation form of identification information is not limited by the embodiment of the present invention Fixed.
Concrete, it is thus achieved that the identification information of contents producer, can carry from the results page scanned for key word Take the identification information of contents producer.
Exemplary, scan for using " abcdefghijk " as key word, obtain website " http: // Www.yyy.com/movies/key=abcdefghijk ", it is assumed that corresponding 3 video uploader in the web page, ID is respectively For " AAAA1 ", " AAAA2 " and " AAAA3 ".Extract all the ID, ID " AAAA1 " of these video uploader, " AAAA2 " and " AAAA3 " is the identification information of contents producer.If there is corresponding account name in corresponding ID, it is also possible to extracts corresponding account masterpiece Identification information for contents producer.The embodiment of the present invention is intended to extract an identification information corresponding with this contents producer, The type of this identification information is not any limitation as, as long as the one-to-one relationship of identification information and this contents producer can be realized ?.
Concrete, it is thus achieved that the identification information of contents producer, it is also possible to crawl scheme based on the homepage degree of depth, from targeted website The identification information of middle extraction contents producer.
Exemplary, targeted website is homepage http://www.xyz.com, crawls based on this website, it is thus achieved that it is complete The video uploader in portion, it is assumed that there are 5 video uploader, ID be respectively " aaa1 ", " aaa2 ", " aaa3 ", " aaa4 ", " aaa5 ", extracts whole ID, then corresponding ID " aaa1 ", " aaa2 ", " aaa3 ", " aaa4 ", " aaa5 ", be this website The identification information of full content Producer.The extraction ID of the embodiment of the present invention is merely exemplary as contents producer mark , it is not intended that limitation of the invention.
In actual application, after the mark obtaining and preserving contents producer, determine the master that contents producer mark is corresponding Page, i.e. corresponding personal homepage.Be as a example by " aaa1 " by contents producer ID, it is assumed that " aaa1 " corresponding homepage be " http: // Www.xyz.com/ID=aaa1 ", then " http://www.xyz.com/ID=aaa1 " is defined as and contents producer The personal homepage that " aaa1 " is corresponding.
S102, for each contents producer, the most within it holds in Producer personal homepage, crawls described content raw All data that product person produces.
It will be appreciated by persons skilled in the art that in the corresponding personal homepage obtained, comprise all the elements Producer Information and extract, comprehensive information can be obtained.Exemplary, the corresponding personal homepage of contents producer ID " aaa1 " is " http://www.xyz.com/ID=aaa1 ", comprises all data letters uploaded of contents producer " aaa1 " in this homepage Breath, by the total data producing the search of this personal homepage i.e. available " aaa1 ", then carries out crawling of total data, Crawling the data of webpage into prior art, this programme does not repeats.
Visible, apply the embodiment of Fig. 1 of the present invention, after the identification information obtaining contents producer, raw according to this content The flag information of product person can crawl all data that this contents producer produces, thus crawls comprehensive data.
The another kind of schematic flow sheet of the data crawling method that Fig. 2 provides for the embodiment of the present invention, in embodiment illustrated in fig. 1 On the basis of, increase S103 and S104.
S103, for each contents producer, according to all data of the described contents producer production crawled, really The frequency of fixed described contents producer creation data.
It will be appreciated by persons skilled in the art that and be analyzed after extracting all data crawled, exemplary, After crawling the total data that contents producer " aaa1 " produces, the frequency that analytical data updates, it is assumed that be to update once for 2 days Data, are 2 days/time by the frequency setting that crawls of corresponding for contents producer " aaa1 " personal homepage.
S104, with determined by frequency, in described contents producer personal homepage, the described content not crawled The data that Producer produces.
Exemplary, ID is the contents producer of " aaa1 ", set it is crawled frequency as 2 days/time after, it is assumed that When the date that the last data crawl is 14:00 on the 5th June in 2016, then the time that next time crawls is 7 days 14 June in 2016: When 00, this time the most only crawl the data that on June 5th, 2016,14:00 updated between 14:00 up on June 7th, 2016, i.e. carry out Increment crawls.Concrete increment crawls as prior art, and this programme does not repeats.
Visible, apply the embodiment of Fig. 2 of the present invention, after the data obtaining contents producer produce frequency, according to institute really Fixed frequency realizes increment and crawls in individual's Producer homepage, while reduction crawls frequency, reduces and crawl task amount, protects Card can crawl comprehensive data.
The another kind of schematic flow sheet of the data crawling method that Fig. 3 provides for the embodiment of the present invention, in embodiment illustrated in fig. 1 On the basis of, increase S105 and S106.
S105, according to the evaluation information of the data that the user's described contents producer to having crawled produces, determines each interior Hold the priority of Producer.
In actual application, the data crawled often contain the information of attention rate, as click volume, point are praised quantity, evaluated number Amounts etc., these information can reflect the concerned degree of data.For video website, can comment below each video Opinion, online friend's marking, point are praised quantity and praise contrary unwelcome score information etc. with point.While crawling data also Obtain these reaction attention rates information, with one or several reaction attention rates information as standard, carry out the excellent of video First level divides.Exemplary, the priority carrying out video with the some amount of praising divides, and the quantity priority the most at most that point is praised is the highest;Or Carry out the division of priority with online friend's marking, the highest then priority of mark is the highest;Or divide with unwelcome scoring, Divide more high priority the lowest.
Exemplary, divide with touching quantity, " aaa1 ", " aaa2 ", " aaa3 ", " aaa4 ", " aaa5 " online friend point Hitting quantity to be respectively 900 times, 700 times, 300 times, 800 times, 100 times, then priority from high to Low order is: 1 grade " aaa1 ", 2 grades " aaa4 ", 3 grades " aaa2 ", 4 grades " aaa3 ", 5 grades " aaa5 ".The embodiment of the present invention is merely exemplary, for priority The concrete criteria for classifying do not limit.
S106, according to described priority from high to low order, in described contents producer personal homepage, crawls The data that the described contents producer taken produces.
In actual application, it is ranked up after the division obtaining priority, the highest preferentially the crawling in crawling of priority. Exemplary, ID be " aaa1 ", " aaa2 ", " aaa3 ", " aaa4 ", " aaa5 " contents producer through the sequence of priority, The priority orders of gained is: 1 grade " aaa1 ", 2 grades " aaa4 ", 3 grades " aaa2 ", 4 grades " aaa3 ", 5 grades " aaa5 ", is crawling out During the beginning, first carry out the personal homepage http://www.xyz.com/ID=corresponding with the contents producer " aaa1 " that grade is 1 grade The data of aaa1 crawl, and are followed successively by 2 grades, 3 grades etc. and crawl respectively.The prioritization that the embodiment of the present invention provides is only Exemplary, do not constitute limitation of the invention.
Information is the most popular, attention rate is the highest in priority the highest reflection the most to a certain extent, and people generally obtain this letter The desire of breath is the strongest, for presenting to masses as early as possible, the most preferentially crawls the information that priority is high, it is ensured that the promptness of information.
Visible, apply the embodiment of Fig. 3 of the present invention, obtain contents producer priority after, according to determined by from High to Low priority realization order in individual's Producer homepage crawls, to ensure the information of contents producer that priority is high Can preferentially crawl.
Fig. 4 crawls a kind of structural representation of device for the data that the embodiment of the present invention provides, and can include obtaining module 201, first determine module 202, first crawl module 203.
Obtain module 201, for being obtained ahead of time and store the identification information of at least one contents producer.
Concrete, in actual application, described acquisition module 201, specifically for:
From the results page scanned for key word, extract and store the identification information of at least one contents producer;
Or
Crawl scheme based on the homepage degree of depth, from targeted website, extract and store the mark letter of at least one contents producer Breath.
First determines module 202, for according to the identification information of contents producer at least one described, determines at least one With described contents producer contents producer personal homepage one to one.
First crawls module 203, for for each contents producer, the most within it holds Producer personal homepage In, crawl all data that described contents producer produces.
Visible, apply the embodiment shown in Fig. 4 of the present invention, after the identification information obtaining contents producer, interior according to this The flag information holding Producer can crawl all data that this contents producer produces, thus crawls comprehensive data.
Fig. 5 crawls the another kind of structural representation of device for the data that the embodiment of the present invention provides, real shown in Fig. 5 of the present invention Execute example on the basis of embodiment illustrated in fig. 4, increase by second and determine that module 204 and second crawls module 205.
Second determines module 204, for for each contents producer, raw according to the described contents producer crawled The all data produced, determine the frequency of described contents producer creation data.
Second crawls module 205, for determined by frequency, in described contents producer personal homepage, crawl not The data that the described contents producer crawled produces.
Visible, apply the embodiment shown in Fig. 5 of the present invention, after the data obtaining contents producer produce frequency, foundation Determined by frequency realize increment and crawl in individual's Producer homepage, crawl frequency reducing, reduce and crawl the same of task amount Time, it is ensured that comprehensive data can be crawled.
Fig. 6 crawls the yet another construction schematic diagram of device for the data that the embodiment of the present invention provides, real shown in Fig. 6 of the present invention Execute example on the basis of embodiment illustrated in fig. 4, increase the 3rd and determine that module 206 and the 3rd crawls module 207.
3rd determines module 206, for according to user's evaluation to the data that the described contents producer that crawled produces Information, determines the priority of each contents producer.
3rd crawls module 207, is used for according to described priority from high to low order, described contents producer individual In homepage, the data that the described contents producer not crawled produces.
Visible, apply the embodiment shown in Fig. 6 of the present invention, after the priority obtaining contents producer, according to being determined The realization order in individual's Producer homepage of priority from high to low crawl, to ensure contents producer that priority is high Information can preferentially crawl.
It should be noted that in this article, the relational terms of such as first and second or the like is used merely to a reality Body or operation separate with another entity or operating space, and deposit between not necessarily requiring or imply these entities or operating Relation or order in any this reality.And, term " includes ", " comprising " or its any other variant are intended to Comprising of nonexcludability, so that include that the process of a series of key element, method, article or equipment not only include that those are wanted Element, but also include other key elements being not expressly set out, or also include for this process, method, article or equipment Intrinsic key element.In the case of there is no more restriction, statement " including ... " key element limited, it is not excluded that Including process, method, article or the equipment of described key element there is also other identical element.
Each embodiment in this specification all uses relevant mode to describe, identical similar portion between each embodiment Dividing and see mutually, what each embodiment stressed is the difference with other embodiments.Real especially for device For executing example, owing to it is substantially similar to embodiment of the method, so describe is fairly simple, relevant part sees embodiment of the method Part illustrate.
One of ordinary skill in the art will appreciate that all or part of step realizing in said method embodiment is can Completing instructing relevant hardware by program, described program can be stored in computer read/write memory medium, The storage medium obtained designated herein, such as: ROM/RAM, magnetic disc, CD etc..
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit protection scope of the present invention.All Any modification, equivalent substitution and improvement etc. made within the spirit and principles in the present invention, are all contained in protection scope of the present invention In.

Claims (8)

1. a data crawling method, it is characterised in that be obtained ahead of time and store the identification information of at least one contents producer; Described method includes:
According to the identification information of contents producer at least one described, determine at least one and described contents producer one_to_one corresponding Contents producer personal homepage;
For each contents producer, the most within it hold in Producer personal homepage, crawl described contents producer and produce All data.
Method the most according to claim 1, it is characterised in that obtain the identification information of contents producer, including:
The identification information of contents producer is extracted from the results page scanned for key word;
Or
Crawl scheme based on the homepage degree of depth, from targeted website, extract the identification information of contents producer.
Method the most according to claim 1, it is characterised in that described method also includes:
For each contents producer, all data produced according to the described contents producer crawled, determine described interior Hold the frequency of Producer creation data;
Frequency determined by with, in described contents producer personal homepage, the described contents producer not crawled is raw The data produced.
Method the most according to claim 1, it is characterised in that described method also includes:
According to the evaluation information of the data that the user's described contents producer to having crawled produces, determine each contents producer Priority;
According to described priority from high to low order, in described contents producer personal homepage, the institute not crawled State the data that contents producer produces.
5. data crawl device, it is characterised in that described device includes:
Obtain module, for being obtained ahead of time and store the identification information of at least one contents producer;
First determines module, for according to the identification information of contents producer at least one described, determines that at least one is with described Contents producer contents producer personal homepage one to one;
First crawls module, for for each contents producer, the most within it holds in Producer personal homepage, crawls institute State all data that contents producer produces.
Device the most according to claim 5, it is characterised in that described acquisition module, specifically for:
From the results page scanned for key word, extract and store the identification information of at least one contents producer;
Or
Crawl scheme based on the homepage degree of depth, from targeted website, extract and store the identification information of at least one contents producer.
Device the most according to claim 5, it is characterised in that described device also includes: second determines that module and second is climbed Delivery block,
Described second determines module, for for each contents producer, producing according to the described contents producer crawled All data, determine the frequency of described contents producer creation data;
Described second crawls module, for determined by frequency, in described contents producer personal homepage, do not crawl The data that the described contents producer crossed produces.
Device the most according to claim 5, it is characterised in that described device also includes: the 3rd determines that module and the 3rd is climbed Delivery block;
Described 3rd determines module, for according to user's evaluation letter to the data that the described contents producer that crawled produces Breath, determines the priority of each contents producer;
Described 3rd crawls module, is used for according to described priority from high to low order, described contents producer individual master In Ye, the data that the described contents producer not crawled produces.
CN201610511377.2A 2016-06-30 2016-06-30 A kind of data crawling method and device Pending CN106126716A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610511377.2A CN106126716A (en) 2016-06-30 2016-06-30 A kind of data crawling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610511377.2A CN106126716A (en) 2016-06-30 2016-06-30 A kind of data crawling method and device

Publications (1)

Publication Number Publication Date
CN106126716A true CN106126716A (en) 2016-11-16

Family

ID=57468084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610511377.2A Pending CN106126716A (en) 2016-06-30 2016-06-30 A kind of data crawling method and device

Country Status (1)

Country Link
CN (1) CN106126716A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729344A (en) * 2017-07-26 2018-02-23 上海壹账通金融科技有限公司 Website data crawling method, device, computer equipment and readable storage medium storing program for executing
CN109388736A (en) * 2018-09-21 2019-02-26 真相网络科技(北京)有限公司 Response scheduling method in crawler system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090249451A1 (en) * 2008-03-31 2009-10-01 Yahoo!, Inc. Access to Trusted User-Generated Content Using Social Networks
CN102521337A (en) * 2011-12-08 2012-06-27 华中科技大学 Academic community system based on massive knowledge network
CN103092999A (en) * 2013-02-22 2013-05-08 人民搜索网络股份公司 Webpage crawling cycle adjusting method and device
CN104063448A (en) * 2014-06-18 2014-09-24 华东师范大学 Distributed type microblog data capturing system related to field of videos

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090249451A1 (en) * 2008-03-31 2009-10-01 Yahoo!, Inc. Access to Trusted User-Generated Content Using Social Networks
CN102521337A (en) * 2011-12-08 2012-06-27 华中科技大学 Academic community system based on massive knowledge network
CN103092999A (en) * 2013-02-22 2013-05-08 人民搜索网络股份公司 Webpage crawling cycle adjusting method and device
CN104063448A (en) * 2014-06-18 2014-09-24 华东师范大学 Distributed type microblog data capturing system related to field of videos

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
罗一纾: "微博爬虫的相关技术研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729344A (en) * 2017-07-26 2018-02-23 上海壹账通金融科技有限公司 Website data crawling method, device, computer equipment and readable storage medium storing program for executing
WO2019019673A1 (en) * 2017-07-26 2019-01-31 深圳壹账通智能科技有限公司 Website data crawling method and apparatus, computer device and readable storage medium
CN107729344B (en) * 2017-07-26 2020-08-28 深圳壹账通智能科技有限公司 Website data crawling method and device, computer equipment and readable storage medium
CN109388736A (en) * 2018-09-21 2019-02-26 真相网络科技(北京)有限公司 Response scheduling method in crawler system

Similar Documents

Publication Publication Date Title
US8347231B2 (en) Methods, systems, and computer program products for displaying tag words for selection by users engaged in social tagging of content
US10402479B2 (en) Method, server, browser, and system for recommending text information
US20150278359A1 (en) Method and apparatus for generating a recommendation page
CN102184230A (en) Method and device for displaying search results
CN104102639B (en) Popularization triggering method based on text classification and device
CN103942712A (en) Product similarity based e-commerce recommendation system and method thereof
CN106415537A (en) Inserting native application search results into web search results
CN105074700A (en) Generating search results containing state links to applications
CN104462293A (en) Search processing method and method and device for generating search result ranking model
CN102306171A (en) Method and equipment for providing network access suggestions and network search suggestions
CN104035966A (en) Method and device for providing extended search terms
CN101097578A (en) Network resource searching method and system
CN104751354B (en) A kind of advertisement crowd screening technique
CN103699576A (en) Method and device used for providing searching results
WO2014194689A1 (en) Method, server, browser, and system for recommending text information
CN102708174A (en) Method and device for displaying rich media information in browser
CN105975537A (en) Sorting method and device of application program
EP2786282A2 (en) Temporal visualization of query results
CN105302876A (en) Regular expression based URL filtering method
CN108959580A (en) A kind of optimization method and system of label data
CN103186666A (en) Method, device and equipment for searching based on favorites
CN108763313A (en) On-line training method, server and the storage medium of model
CN103106234A (en) Searching method and device of webpage content
CN104090923A (en) Method and device for displaying rich media information in browser
CN101894109A (en) Database building method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20161116

RJ01 Rejection of invention patent application after publication