CN106656860A - Multi-site HTTP access frequency control method - Google Patents

Multi-site HTTP access frequency control method Download PDF

Info

Publication number
CN106656860A
CN106656860A CN201610920014.4A CN201610920014A CN106656860A CN 106656860 A CN106656860 A CN 106656860A CN 201610920014 A CN201610920014 A CN 201610920014A CN 106656860 A CN106656860 A CN 106656860A
Authority
CN
China
Prior art keywords
task
queue
frequency
frequency control
site http
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610920014.4A
Other languages
Chinese (zh)
Inventor
于鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Mass Information Technology Ltd By Share Ltd
Original Assignee
Tianjin Mass Information Technology Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Mass Information Technology Ltd By Share Ltd filed Critical Tianjin Mass Information Technology Ltd By Share Ltd
Priority to CN201610920014.4A priority Critical patent/CN106656860A/en
Publication of CN106656860A publication Critical patent/CN106656860A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling

Abstract

The invention relates to a multi-site HTTP access frequency control method. The method comprises that the frequency is configured after a download task is received, a download task queue is entered, tasks of the download task queue are polled according to the queue, and downloading is implemented; after that a system starts processing, one task is read from the head of the first queue, if frequency state of the task is recorded, whether the task can be executed at present is determined, if it is determined that the task can be executed at present, the task is downloaded, finish time and waiting time are recorded, the task is deleted from the queue, whether to continue is determined, if YES, another task is read from the head of the next queue, and the frequency state of the task is continued to be inquired; and if the frequency state of the task is not recorded, the task is downloaded, the finish time and waiting time are recorded, the task is deleted from the queue, whether to continue is determined, and if NO, a thread is completed. Thus, problems caused by frequency control are solved.

Description

A kind of multi-site HTTP visiting frequency control methods
Technical field
Patent of the present invention belongs to control field, more particularly to a kind of multi-site HTTP visiting frequency control methods.
Background technology
With developing rapidly for network, WWW becomes the carrier of bulk information, how to efficiently extract and utilizes these Information becomes a huge challenge.It is the main method for extracting the network information that HTTP is accessed, continuous with anti-reptile mechanism Strengthen, controlling HTTP visiting frequencies just becomes the Main Means for preventing website from limiting access, but frequency control is just faced with money The service efficiency problem in source.
At present, in the case problems faced has:1st, frequency control is nonessential:It is not that each website is required for frequency control System.2nd, frequency control interacts:If a task receives frequency control in original downloading task list queue, certainly will Task below can be affected to download in time.3rd, frequency control partition problem:One website might have different subdomains and enjoy list Only frequency, multiple websites may share a frequency.
Patent of invention content
Patent of the present invention provides a kind of multi-site HTTP visiting frequency control methods, to solve what is brought due to frequency control Problem.
A kind of multi-site HTTP visiting frequency control methods, configure including frequency after downloading task is received, is then made, Subsequently into downloading task queue, according to downloading task queue snoop queue task, download is finally made;Start in system thread Afterwards, a task is read from the head of first queue, if there is the frequency state of this task in record, is if it is judged Whether current time task can perform, if it is, downloading task, record end time and stand-by period, and from queue Middle this task of deletion, and be confirmed whether to continue, if it is, reading a task from the head of next queue, and continue to look into Ask the frequency state of task;If having the frequency state of this task in record, if it is not, then during downloading task record end Between and the stand-by period, and this task is deleted from queue, and be confirmed whether to continue, if it is not, then thread terminates.
Further, used in frequency rule canonical dividing frequency unit;Regular expression has great flexibility, Both can divide according to the subdomain of URL, it is also possible to collectively constitute a rule by multiple domain names.
Further, in frequency cell queue, by specifying frequency rule the task of different frequency units is divided Different queues are arrived.
Further, the task of different frequency units in different queues, is gone to look into by scheduler program in multiqueue dispatching Whether see the task of each queue can perform.
Description of the drawings
Fig. 1 is a kind of multi-site HTTP visiting frequencies control method system information Organization Chart
Fig. 2 is a kind of multi-site HTTP visiting frequencies control method system process chart
Specific embodiment
Embodiment:A kind of multi-site HTTP visiting frequency control methods, including after downloading task is received, then make frequency Degree configuration, subsequently into downloading task queue, according to downloading task queue snoop queue task, finally makes download;In system After process starts, a task is read from the head of first queue, if there is the frequency state of this task in record, if It is to judge whether current time task can perform, if it is, downloading task, record end time and stand-by period, And this task is deleted from queue, and be confirmed whether to continue, if it is, a task is read from the head of next queue, And continue to inquire the frequency state of task;If there is the frequency state of this task in record, if it is not, then downloading task note Record end time and stand-by period, and this task is deleted from queue, and be confirmed whether to continue, if it is not, then thread terminates.
Frequency rule, contains canonical, minimum latency, maximum latency.Wherein canonical is used for matching task URL, distinguishes different frequency control units, and minimum latency refers to the minimum time of required wait before tasks carrying, Maximum latency refers to the maximum time of required wait before tasks carrying.Multigroup frequency is contained in frequency control system Rule.When downloading task is received, the canonical in rule is in duty mapping to different task queues.
Scheduling flow, itself can record the last downloaded end time of each frequency unit and need the time for waiting, it Meeting each task queue of poll simultaneously judges whether this task meets the time interval that needs are waited under current time, if can hold It is capable then task is taken out into download from queue, the poll next task if it can not perform.Scheduling flow can be completed in task Download end time of logger task afterwards, and the next required by task time to be waited is calculated according to frequency rule.
Wherein, used in frequency rule canonical dividing frequency unit;Regular expression has great flexibility, both may be used To divide according to the subdomain of URL, it is also possible to collectively constitute a rule by multiple domain names.
Wherein, in frequency cell queue, by specifying frequency rule the task of different frequency units is divided into Different queues.
Wherein, the task of different frequency units in different queues, is gone to check each by scheduler program in multiqueue dispatching Whether the task of individual queue can perform, and so as to ensure that each queue of the task can be performed in time, carry so as to maximized Rise resource utilization.
Although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with Understanding can carry out various changes, modification, replacement to these embodiments without departing from the principles and spirit of the present invention And modification, the scope of the present invention be defined by the appended.

Claims (4)

1. a kind of multi-site HTTP visiting frequency control methods, it is characterised in that:After downloading task is received, then frequency is made Configuration, subsequently into downloading task queue, according to downloading task queue snoop queue task, finally makes download;At system After reason starts, a task is read from the head of first queue, if there is the frequency state of this task in record, if Then judge whether current time task can perform, if it is, downloading task, record end time and stand-by period, and This task is deleted from queue, and is confirmed whether to continue, if it is, a task is read from the head of next queue, and Continue the frequency state for inquiring task;If there is the frequency state of this task in record, if it is not, then downloading task record End time and stand-by period, and this task is deleted from queue, and be confirmed whether to continue, if it is not, then thread terminates.
2. a kind of multi-site HTTP visiting frequency control methods according to claim 1, it is characterised in that:In frequency rule Frequency unit is divided using canonical;Regular expression has great flexibility, both can divide according to the subdomain of URL, A rule can be collectively constituted by multiple domain names.
3. a kind of multi-site HTTP visiting frequency control methods according to claim 1, it is characterised in that:Frequency unit team In row, the task of different frequency units is caused to be divided into different queues by specifying frequency rule.
4. a kind of multi-site HTTP visiting frequency control methods according to claim 1, it is characterised in that:Multiqueue dispatching The task of middle different frequency units goes whether the checking each queue of the task can be held in different queues by scheduler program OK.
CN201610920014.4A 2016-10-21 2016-10-21 Multi-site HTTP access frequency control method Pending CN106656860A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610920014.4A CN106656860A (en) 2016-10-21 2016-10-21 Multi-site HTTP access frequency control method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610920014.4A CN106656860A (en) 2016-10-21 2016-10-21 Multi-site HTTP access frequency control method

Publications (1)

Publication Number Publication Date
CN106656860A true CN106656860A (en) 2017-05-10

Family

ID=58856086

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610920014.4A Pending CN106656860A (en) 2016-10-21 2016-10-21 Multi-site HTTP access frequency control method

Country Status (1)

Country Link
CN (1) CN106656860A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108154431A (en) * 2018-01-17 2018-06-12 北京网信云服信息科技有限公司 A kind of target raises condition processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298622A (en) * 2011-08-11 2011-12-28 中国科学院自动化研究所 Search method for focused web crawler based on anchor text and system thereof
CN102902785A (en) * 2012-09-29 2013-01-30 合一网络技术(北京)有限公司 Webpage information acquisition system and method
CN103873597A (en) * 2014-04-15 2014-06-18 厦门市美亚柏科信息股份有限公司 Distributed webpage downloading method and system
CN105260388A (en) * 2015-09-11 2016-01-20 广州极数宝数据服务有限公司 Optimization method of distributed vertical crawler service system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298622A (en) * 2011-08-11 2011-12-28 中国科学院自动化研究所 Search method for focused web crawler based on anchor text and system thereof
CN102902785A (en) * 2012-09-29 2013-01-30 合一网络技术(北京)有限公司 Webpage information acquisition system and method
CN103873597A (en) * 2014-04-15 2014-06-18 厦门市美亚柏科信息股份有限公司 Distributed webpage downloading method and system
CN105260388A (en) * 2015-09-11 2016-01-20 广州极数宝数据服务有限公司 Optimization method of distributed vertical crawler service system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108154431A (en) * 2018-01-17 2018-06-12 北京网信云服信息科技有限公司 A kind of target raises condition processing method and device
CN108154431B (en) * 2018-01-17 2021-07-06 北京网信云服信息科技有限公司 Target recruitment state processing method and device

Similar Documents

Publication Publication Date Title
Ren et al. Hopper: Decentralized speculation-aware cluster scheduling at scale
CN105915633B (en) Automatic operation and maintenance system and method
CN104717636B (en) Method for upgrading software, terminal device and aerial download server
CN103679392B (en) A kind of task scheduling processing method and system
US20180349178A1 (en) A method and system for scalable job processing
JP2011242991A (en) Cloud computing system, document processing method, and computer program
JP4876138B2 (en) Control computer and control system
CN103235835A (en) Inquiry implementation method for database cluster and device
US20160019090A1 (en) Data processing control method, computer-readable recording medium, and data processing control device
CN106664714A (en) Periodic uplink grant alignment in a cellular network
Kim et al. An analytical framework to characterize the efficiency and delay in a mobile data offloading system
CN104202386B (en) A kind of high concurrent amount distributed file system and its secondary load equalization methods
JP2017168074A (en) Method and apparatus for controlling data transmission
CN105138598A (en) Method and system for remotely timing task
CN105791371A (en) Cloud storage service system and method
CN102664950A (en) Data communication method between welding power sources and computers
Chen et al. DTS: dynamic TDMA scheduling for networked control systems
CN106656860A (en) Multi-site HTTP access frequency control method
CN107015855A (en) A kind of asynchronous service centralized dispatching method and device for supporting time parameter method
CN104750545A (en) Process scheduling method and device
ATE447813T1 (en) SYSTEM AND METHOD FOR TIME-BASED PLANNING
CN107819823A (en) A kind of information processing method, server and computer-readable recording medium
CN106911739B (en) Information distribution method and device
US20170279895A1 (en) Information processing system and information processing method
CN106776032A (en) The treating method and apparatus of the I/O Request of distributed block storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170510

WD01 Invention patent application deemed withdrawn after publication