CN106126716A - A kind of data crawling method and device - Google Patents
A kind of data crawling method and device Download PDFInfo
- Publication number
- CN106126716A CN106126716A CN201610511377.2A CN201610511377A CN106126716A CN 106126716 A CN106126716 A CN 106126716A CN 201610511377 A CN201610511377 A CN 201610511377A CN 106126716 A CN106126716 A CN 106126716A
- Authority
- CN
- China
- Prior art keywords
- contents producer
- data
- producer
- contents
- identification information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a kind of data crawling method and device, be obtained ahead of time and store the identification information of at least one contents producer;According to the identification information of contents producer at least one described, determine at least one and described contents producer contents producer personal homepage one to one;For each contents producer, the most within it hold in Producer personal homepage, crawl all data that described contents producer produces.The embodiment that the application present invention provides can crawl comprehensive data.
Description
Technical field
The present invention relates to network information search field, particularly to a kind of data crawling method and device.
Background technology
Along with developing rapidly of network, WWW becomes the carrier of magnanimity information, and people are extra large at these by research tool
Amount data are retrieved.The result that research tool returns contains the unconcerned data of a large number of users, goes to search in these data
The data that rope user is concerned about become a difficult problem.In this case orientation capture related web page resource crawler system meet the tendency and
Raw, can be according to set crawl target, selectively webpage in WWW or chain the data message required for acquisition.
Existing crawler system is when crawling microblogging or video data, and the mode generally used has based on search key
Word, the crawling of list page.When scanning for key word and crawling, key step has the searching interface calling crawler system, input
Search key word, then downloads the result of search;Extract content details page URL by the result searched, and be downloaded.Should
The major defect of mode is that Search Results has number to limit, and the data crawled can be caused the most comprehensive;And based on list page crawl,
Owing to by list page number quantitative limitation, also existing and crawling the incomplete problem of data.
Summary of the invention
The purpose of the embodiment of the present invention is to provide a kind of data crawling method and device, in order to crawl comprehensively number
According to.
For reaching above-mentioned purpose, the embodiment of the invention discloses a kind of data crawling method, be obtained ahead of time and store at least
The identification information of one contents producer;Described method includes:
According to the identification information of contents producer at least one described, determine that at least one is with described contents producer one by one
Corresponding contents producer personal homepage;
For each contents producer, the most within it hold in Producer personal homepage, crawl described contents producer
The all data produced.
Preferably, it is thus achieved that the identification information of contents producer, including:
The identification information of contents producer is extracted from the results page scanned for key word;
Or
Crawl scheme based on the homepage degree of depth, from targeted website, extract the identification information of contents producer.
Preferably, described method also includes:
For each contents producer, all data produced according to the described contents producer crawled, determine institute
State the frequency of contents producer creation data;
Frequency determined by with, in described contents producer personal homepage, the described contents production not crawled
The data that person produces.
Preferably, described method also includes:
According to the evaluation information of the data that the user's described contents producer to having crawled produces, determine each contents production
The priority of person;
According to described priority from high to low order, in described contents producer personal homepage, do not crawled
Described contents producer produce data.
For reaching above-mentioned purpose, the embodiment of the invention discloses a kind of data and crawl device, described device includes:
Obtain module, for being obtained ahead of time and store the identification information of at least one contents producer;
First determines module, for according to the identification information of contents producer at least one described, determine at least one with
Described contents producer contents producer personal homepage one to one;
First crawls module, for for each contents producer, the most within it holds in Producer personal homepage, climbs
Take all data that described contents producer produces.
Preferably, described acquisition module, specifically for:
From the results page scanned for key word, extract and store the identification information of at least one contents producer;
Or
Crawl scheme based on the homepage degree of depth, from targeted website, extract and store the mark letter of at least one contents producer
Breath.
Preferably, described device also includes: second determines that module and second crawls module,
Described second determines module, for for each contents producer, according to the described contents producer crawled
The all data produced, determine the frequency of described contents producer creation data;
Described second crawls module, for determined by frequency, in described contents producer personal homepage, crawl not
The data that the described contents producer crawled produces.
Preferably, described device also includes: the 3rd determines that module and the 3rd crawls module;
Described 3rd determines module, for according to user's evaluation to the data that the described contents producer that crawled produces
Information, determines the priority of each contents producer;
Described 3rd crawls module, is used for according to described priority from high to low order, in described contents producer
In people's homepage, the data that the described contents producer not crawled produces.
As seen from the above technical solutions, the data crawling method of embodiment of the present invention offer and device, it is obtained ahead of time also
Store the identification information of at least one contents producer;According to the identification information of at least one contents producer, determine at least one
Individual with contents producer contents producer personal homepage one to one;For each contents producer, the most within it hold
In Producer personal homepage, crawl all data that contents producer produces.The technical scheme that the application embodiment of the present invention provides,
After the identification information obtaining contents producer, this contents producer can be crawled according to the flag information of this contents producer
The all data produced, thus crawl comprehensive data.
Certainly, either method or the device of implementing the present invention must be not necessarily required to reach all the above excellent simultaneously
Point.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
In having technology to describe, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only this
Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to
Other accompanying drawing is obtained according to these accompanying drawings.
A kind of schematic flow sheet of the data crawling method that Fig. 1 provides for the embodiment of the present invention;
The another kind of schematic flow sheet of the data crawling method that Fig. 2 provides for the embodiment of the present invention;
Another schematic flow sheet of the data crawling method that Fig. 3 provides for the embodiment of the present invention;
Fig. 4 crawls a kind of structural representation of device for the data that the embodiment of the present invention provides;
Fig. 5 crawls the another kind of structural representation of device for the data that the embodiment of the present invention provides;
Fig. 6 crawls the yet another construction schematic diagram of device for the data that the embodiment of the present invention provides.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Describe, it is clear that described embodiment is only a part of embodiment of the present invention rather than whole embodiments wholely.Based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under not making creative work premise
Embodiment, broadly falls into the scope of protection of the invention.
For solving prior art problem, embodiments provide a kind of data crawling method and device, individually below
It is described in detail.
It should be noted that a kind of data crawling method of embodiment of the present invention offer and device, it is adaptable to crawler system.
In actual application, prestore the identification information of at least one contents producer, such as individual ID, the name on account of video uploader
Deng, as follow-up crawl data time use.This process is to form the mark corresponding with search content, for carrying out comprehensive content
Crawl work to prepare.
A kind of schematic flow sheet of the data crawling method that Fig. 1 provides for the embodiment of the present invention, comprises the steps:
S101, according to the identification information of contents producer at least one described, determines at least one and described contents production
Person's contents producer personal homepage one to one.
Concrete, in actual application, it is obtained ahead of time and stores the identification information of at least one contents producer.Wherein, mark
Knowledge information can be name, No. ID, account etc., and the concrete manifestation form of identification information is not limited by the embodiment of the present invention
Fixed.
Concrete, it is thus achieved that the identification information of contents producer, can carry from the results page scanned for key word
Take the identification information of contents producer.
Exemplary, scan for using " abcdefghijk " as key word, obtain website " http: //
Www.yyy.com/movies/key=abcdefghijk ", it is assumed that corresponding 3 video uploader in the web page, ID is respectively
For " AAAA1 ", " AAAA2 " and " AAAA3 ".Extract all the ID, ID " AAAA1 " of these video uploader, " AAAA2 " and
" AAAA3 " is the identification information of contents producer.If there is corresponding account name in corresponding ID, it is also possible to extracts corresponding account masterpiece
Identification information for contents producer.The embodiment of the present invention is intended to extract an identification information corresponding with this contents producer,
The type of this identification information is not any limitation as, as long as the one-to-one relationship of identification information and this contents producer can be realized
?.
Concrete, it is thus achieved that the identification information of contents producer, it is also possible to crawl scheme based on the homepage degree of depth, from targeted website
The identification information of middle extraction contents producer.
Exemplary, targeted website is homepage http://www.xyz.com, crawls based on this website, it is thus achieved that it is complete
The video uploader in portion, it is assumed that there are 5 video uploader, ID be respectively " aaa1 ", " aaa2 ", " aaa3 ", " aaa4 ",
" aaa5 ", extracts whole ID, then corresponding ID " aaa1 ", " aaa2 ", " aaa3 ", " aaa4 ", " aaa5 ", be this website
The identification information of full content Producer.The extraction ID of the embodiment of the present invention is merely exemplary as contents producer mark
, it is not intended that limitation of the invention.
In actual application, after the mark obtaining and preserving contents producer, determine the master that contents producer mark is corresponding
Page, i.e. corresponding personal homepage.Be as a example by " aaa1 " by contents producer ID, it is assumed that " aaa1 " corresponding homepage be " http: //
Www.xyz.com/ID=aaa1 ", then " http://www.xyz.com/ID=aaa1 " is defined as and contents producer
The personal homepage that " aaa1 " is corresponding.
S102, for each contents producer, the most within it holds in Producer personal homepage, crawls described content raw
All data that product person produces.
It will be appreciated by persons skilled in the art that in the corresponding personal homepage obtained, comprise all the elements Producer
Information and extract, comprehensive information can be obtained.Exemplary, the corresponding personal homepage of contents producer ID " aaa1 " is
" http://www.xyz.com/ID=aaa1 ", comprises all data letters uploaded of contents producer " aaa1 " in this homepage
Breath, by the total data producing the search of this personal homepage i.e. available " aaa1 ", then carries out crawling of total data,
Crawling the data of webpage into prior art, this programme does not repeats.
Visible, apply the embodiment of Fig. 1 of the present invention, after the identification information obtaining contents producer, raw according to this content
The flag information of product person can crawl all data that this contents producer produces, thus crawls comprehensive data.
The another kind of schematic flow sheet of the data crawling method that Fig. 2 provides for the embodiment of the present invention, in embodiment illustrated in fig. 1
On the basis of, increase S103 and S104.
S103, for each contents producer, according to all data of the described contents producer production crawled, really
The frequency of fixed described contents producer creation data.
It will be appreciated by persons skilled in the art that and be analyzed after extracting all data crawled, exemplary,
After crawling the total data that contents producer " aaa1 " produces, the frequency that analytical data updates, it is assumed that be to update once for 2 days
Data, are 2 days/time by the frequency setting that crawls of corresponding for contents producer " aaa1 " personal homepage.
S104, with determined by frequency, in described contents producer personal homepage, the described content not crawled
The data that Producer produces.
Exemplary, ID is the contents producer of " aaa1 ", set it is crawled frequency as 2 days/time after, it is assumed that
When the date that the last data crawl is 14:00 on the 5th June in 2016, then the time that next time crawls is 7 days 14 June in 2016:
When 00, this time the most only crawl the data that on June 5th, 2016,14:00 updated between 14:00 up on June 7th, 2016, i.e. carry out
Increment crawls.Concrete increment crawls as prior art, and this programme does not repeats.
Visible, apply the embodiment of Fig. 2 of the present invention, after the data obtaining contents producer produce frequency, according to institute really
Fixed frequency realizes increment and crawls in individual's Producer homepage, while reduction crawls frequency, reduces and crawl task amount, protects
Card can crawl comprehensive data.
The another kind of schematic flow sheet of the data crawling method that Fig. 3 provides for the embodiment of the present invention, in embodiment illustrated in fig. 1
On the basis of, increase S105 and S106.
S105, according to the evaluation information of the data that the user's described contents producer to having crawled produces, determines each interior
Hold the priority of Producer.
In actual application, the data crawled often contain the information of attention rate, as click volume, point are praised quantity, evaluated number
Amounts etc., these information can reflect the concerned degree of data.For video website, can comment below each video
Opinion, online friend's marking, point are praised quantity and praise contrary unwelcome score information etc. with point.While crawling data also
Obtain these reaction attention rates information, with one or several reaction attention rates information as standard, carry out the excellent of video
First level divides.Exemplary, the priority carrying out video with the some amount of praising divides, and the quantity priority the most at most that point is praised is the highest;Or
Carry out the division of priority with online friend's marking, the highest then priority of mark is the highest;Or divide with unwelcome scoring,
Divide more high priority the lowest.
Exemplary, divide with touching quantity, " aaa1 ", " aaa2 ", " aaa3 ", " aaa4 ", " aaa5 " online friend point
Hitting quantity to be respectively 900 times, 700 times, 300 times, 800 times, 100 times, then priority from high to Low order is: 1 grade " aaa1 ",
2 grades " aaa4 ", 3 grades " aaa2 ", 4 grades " aaa3 ", 5 grades " aaa5 ".The embodiment of the present invention is merely exemplary, for priority
The concrete criteria for classifying do not limit.
S106, according to described priority from high to low order, in described contents producer personal homepage, crawls
The data that the described contents producer taken produces.
In actual application, it is ranked up after the division obtaining priority, the highest preferentially the crawling in crawling of priority.
Exemplary, ID be " aaa1 ", " aaa2 ", " aaa3 ", " aaa4 ", " aaa5 " contents producer through the sequence of priority,
The priority orders of gained is: 1 grade " aaa1 ", 2 grades " aaa4 ", 3 grades " aaa2 ", 4 grades " aaa3 ", 5 grades " aaa5 ", is crawling out
During the beginning, first carry out the personal homepage http://www.xyz.com/ID=corresponding with the contents producer " aaa1 " that grade is 1 grade
The data of aaa1 crawl, and are followed successively by 2 grades, 3 grades etc. and crawl respectively.The prioritization that the embodiment of the present invention provides is only
Exemplary, do not constitute limitation of the invention.
Information is the most popular, attention rate is the highest in priority the highest reflection the most to a certain extent, and people generally obtain this letter
The desire of breath is the strongest, for presenting to masses as early as possible, the most preferentially crawls the information that priority is high, it is ensured that the promptness of information.
Visible, apply the embodiment of Fig. 3 of the present invention, obtain contents producer priority after, according to determined by from
High to Low priority realization order in individual's Producer homepage crawls, to ensure the information of contents producer that priority is high
Can preferentially crawl.
Fig. 4 crawls a kind of structural representation of device for the data that the embodiment of the present invention provides, and can include obtaining module
201, first determine module 202, first crawl module 203.
Obtain module 201, for being obtained ahead of time and store the identification information of at least one contents producer.
Concrete, in actual application, described acquisition module 201, specifically for:
From the results page scanned for key word, extract and store the identification information of at least one contents producer;
Or
Crawl scheme based on the homepage degree of depth, from targeted website, extract and store the mark letter of at least one contents producer
Breath.
First determines module 202, for according to the identification information of contents producer at least one described, determines at least one
With described contents producer contents producer personal homepage one to one.
First crawls module 203, for for each contents producer, the most within it holds Producer personal homepage
In, crawl all data that described contents producer produces.
Visible, apply the embodiment shown in Fig. 4 of the present invention, after the identification information obtaining contents producer, interior according to this
The flag information holding Producer can crawl all data that this contents producer produces, thus crawls comprehensive data.
Fig. 5 crawls the another kind of structural representation of device for the data that the embodiment of the present invention provides, real shown in Fig. 5 of the present invention
Execute example on the basis of embodiment illustrated in fig. 4, increase by second and determine that module 204 and second crawls module 205.
Second determines module 204, for for each contents producer, raw according to the described contents producer crawled
The all data produced, determine the frequency of described contents producer creation data.
Second crawls module 205, for determined by frequency, in described contents producer personal homepage, crawl not
The data that the described contents producer crawled produces.
Visible, apply the embodiment shown in Fig. 5 of the present invention, after the data obtaining contents producer produce frequency, foundation
Determined by frequency realize increment and crawl in individual's Producer homepage, crawl frequency reducing, reduce and crawl the same of task amount
Time, it is ensured that comprehensive data can be crawled.
Fig. 6 crawls the yet another construction schematic diagram of device for the data that the embodiment of the present invention provides, real shown in Fig. 6 of the present invention
Execute example on the basis of embodiment illustrated in fig. 4, increase the 3rd and determine that module 206 and the 3rd crawls module 207.
3rd determines module 206, for according to user's evaluation to the data that the described contents producer that crawled produces
Information, determines the priority of each contents producer.
3rd crawls module 207, is used for according to described priority from high to low order, described contents producer individual
In homepage, the data that the described contents producer not crawled produces.
Visible, apply the embodiment shown in Fig. 6 of the present invention, after the priority obtaining contents producer, according to being determined
The realization order in individual's Producer homepage of priority from high to low crawl, to ensure contents producer that priority is high
Information can preferentially crawl.
It should be noted that in this article, the relational terms of such as first and second or the like is used merely to a reality
Body or operation separate with another entity or operating space, and deposit between not necessarily requiring or imply these entities or operating
Relation or order in any this reality.And, term " includes ", " comprising " or its any other variant are intended to
Comprising of nonexcludability, so that include that the process of a series of key element, method, article or equipment not only include that those are wanted
Element, but also include other key elements being not expressly set out, or also include for this process, method, article or equipment
Intrinsic key element.In the case of there is no more restriction, statement " including ... " key element limited, it is not excluded that
Including process, method, article or the equipment of described key element there is also other identical element.
Each embodiment in this specification all uses relevant mode to describe, identical similar portion between each embodiment
Dividing and see mutually, what each embodiment stressed is the difference with other embodiments.Real especially for device
For executing example, owing to it is substantially similar to embodiment of the method, so describe is fairly simple, relevant part sees embodiment of the method
Part illustrate.
One of ordinary skill in the art will appreciate that all or part of step realizing in said method embodiment is can
Completing instructing relevant hardware by program, described program can be stored in computer read/write memory medium,
The storage medium obtained designated herein, such as: ROM/RAM, magnetic disc, CD etc..
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit protection scope of the present invention.All
Any modification, equivalent substitution and improvement etc. made within the spirit and principles in the present invention, are all contained in protection scope of the present invention
In.
Claims (8)
1. a data crawling method, it is characterised in that be obtained ahead of time and store the identification information of at least one contents producer;
Described method includes:
According to the identification information of contents producer at least one described, determine at least one and described contents producer one_to_one corresponding
Contents producer personal homepage;
For each contents producer, the most within it hold in Producer personal homepage, crawl described contents producer and produce
All data.
Method the most according to claim 1, it is characterised in that obtain the identification information of contents producer, including:
The identification information of contents producer is extracted from the results page scanned for key word;
Or
Crawl scheme based on the homepage degree of depth, from targeted website, extract the identification information of contents producer.
Method the most according to claim 1, it is characterised in that described method also includes:
For each contents producer, all data produced according to the described contents producer crawled, determine described interior
Hold the frequency of Producer creation data;
Frequency determined by with, in described contents producer personal homepage, the described contents producer not crawled is raw
The data produced.
Method the most according to claim 1, it is characterised in that described method also includes:
According to the evaluation information of the data that the user's described contents producer to having crawled produces, determine each contents producer
Priority;
According to described priority from high to low order, in described contents producer personal homepage, the institute not crawled
State the data that contents producer produces.
5. data crawl device, it is characterised in that described device includes:
Obtain module, for being obtained ahead of time and store the identification information of at least one contents producer;
First determines module, for according to the identification information of contents producer at least one described, determines that at least one is with described
Contents producer contents producer personal homepage one to one;
First crawls module, for for each contents producer, the most within it holds in Producer personal homepage, crawls institute
State all data that contents producer produces.
Device the most according to claim 5, it is characterised in that described acquisition module, specifically for:
From the results page scanned for key word, extract and store the identification information of at least one contents producer;
Or
Crawl scheme based on the homepage degree of depth, from targeted website, extract and store the identification information of at least one contents producer.
Device the most according to claim 5, it is characterised in that described device also includes: second determines that module and second is climbed
Delivery block,
Described second determines module, for for each contents producer, producing according to the described contents producer crawled
All data, determine the frequency of described contents producer creation data;
Described second crawls module, for determined by frequency, in described contents producer personal homepage, do not crawl
The data that the described contents producer crossed produces.
Device the most according to claim 5, it is characterised in that described device also includes: the 3rd determines that module and the 3rd is climbed
Delivery block;
Described 3rd determines module, for according to user's evaluation letter to the data that the described contents producer that crawled produces
Breath, determines the priority of each contents producer;
Described 3rd crawls module, is used for according to described priority from high to low order, described contents producer individual master
In Ye, the data that the described contents producer not crawled produces.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610511377.2A CN106126716A (en) | 2016-06-30 | 2016-06-30 | A kind of data crawling method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610511377.2A CN106126716A (en) | 2016-06-30 | 2016-06-30 | A kind of data crawling method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106126716A true CN106126716A (en) | 2016-11-16 |
Family
ID=57468084
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610511377.2A Pending CN106126716A (en) | 2016-06-30 | 2016-06-30 | A kind of data crawling method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106126716A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107729344A (en) * | 2017-07-26 | 2018-02-23 | 上海壹账通金融科技有限公司 | Website data crawling method, device, computer equipment and readable storage medium storing program for executing |
CN109388736A (en) * | 2018-09-21 | 2019-02-26 | 真相网络科技(北京)有限公司 | Response scheduling method in crawler system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090249451A1 (en) * | 2008-03-31 | 2009-10-01 | Yahoo!, Inc. | Access to Trusted User-Generated Content Using Social Networks |
CN102521337A (en) * | 2011-12-08 | 2012-06-27 | 华中科技大学 | Academic community system based on massive knowledge network |
CN103092999A (en) * | 2013-02-22 | 2013-05-08 | 人民搜索网络股份公司 | Webpage crawling cycle adjusting method and device |
CN104063448A (en) * | 2014-06-18 | 2014-09-24 | 华东师范大学 | Distributed type microblog data capturing system related to field of videos |
-
2016
- 2016-06-30 CN CN201610511377.2A patent/CN106126716A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090249451A1 (en) * | 2008-03-31 | 2009-10-01 | Yahoo!, Inc. | Access to Trusted User-Generated Content Using Social Networks |
CN102521337A (en) * | 2011-12-08 | 2012-06-27 | 华中科技大学 | Academic community system based on massive knowledge network |
CN103092999A (en) * | 2013-02-22 | 2013-05-08 | 人民搜索网络股份公司 | Webpage crawling cycle adjusting method and device |
CN104063448A (en) * | 2014-06-18 | 2014-09-24 | 华东师范大学 | Distributed type microblog data capturing system related to field of videos |
Non-Patent Citations (1)
Title |
---|
罗一纾: "微博爬虫的相关技术研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107729344A (en) * | 2017-07-26 | 2018-02-23 | 上海壹账通金融科技有限公司 | Website data crawling method, device, computer equipment and readable storage medium storing program for executing |
WO2019019673A1 (en) * | 2017-07-26 | 2019-01-31 | 深圳壹账通智能科技有限公司 | Website data crawling method and apparatus, computer device and readable storage medium |
CN107729344B (en) * | 2017-07-26 | 2020-08-28 | 深圳壹账通智能科技有限公司 | Website data crawling method and device, computer equipment and readable storage medium |
CN109388736A (en) * | 2018-09-21 | 2019-02-26 | 真相网络科技(北京)有限公司 | Response scheduling method in crawler system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8347231B2 (en) | Methods, systems, and computer program products for displaying tag words for selection by users engaged in social tagging of content | |
US10402479B2 (en) | Method, server, browser, and system for recommending text information | |
US20150278359A1 (en) | Method and apparatus for generating a recommendation page | |
CN102184230A (en) | Method and device for displaying search results | |
CN104102639B (en) | Popularization triggering method based on text classification and device | |
CN103942712A (en) | Product similarity based e-commerce recommendation system and method thereof | |
CN106415537A (en) | Inserting native application search results into web search results | |
CN105074700A (en) | Generating search results containing state links to applications | |
CN104462293A (en) | Search processing method and method and device for generating search result ranking model | |
CN102306171A (en) | Method and equipment for providing network access suggestions and network search suggestions | |
CN104035966A (en) | Method and device for providing extended search terms | |
CN101097578A (en) | Network resource searching method and system | |
CN104751354B (en) | A kind of advertisement crowd screening technique | |
CN103699576A (en) | Method and device used for providing searching results | |
WO2014194689A1 (en) | Method, server, browser, and system for recommending text information | |
CN102708174A (en) | Method and device for displaying rich media information in browser | |
CN105975537A (en) | Sorting method and device of application program | |
EP2786282A2 (en) | Temporal visualization of query results | |
CN105302876A (en) | Regular expression based URL filtering method | |
CN108959580A (en) | A kind of optimization method and system of label data | |
CN103186666A (en) | Method, device and equipment for searching based on favorites | |
CN108763313A (en) | On-line training method, server and the storage medium of model | |
CN103106234A (en) | Searching method and device of webpage content | |
CN104090923A (en) | Method and device for displaying rich media information in browser | |
CN101894109A (en) | Database building method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161116 |
|
RJ01 | Rejection of invention patent application after publication |