CN113840000A - Distributed network downloading method and device for massive large files - Google Patents

Distributed network downloading method and device for massive large files Download PDF

Info

Publication number
CN113840000A
CN113840000A CN202111109211.5A CN202111109211A CN113840000A CN 113840000 A CN113840000 A CN 113840000A CN 202111109211 A CN202111109211 A CN 202111109211A CN 113840000 A CN113840000 A CN 113840000A
Authority
CN
China
Prior art keywords
downloading
task state
downloaded
task
monitoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111109211.5A
Other languages
Chinese (zh)
Inventor
邱江飞
娄伟贞
朱梅
吴敬超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong EHualu Information Technology Co ltd
Original Assignee
Shandong EHualu Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong EHualu Information Technology Co ltd filed Critical Shandong EHualu Information Technology Co ltd
Publication of CN113840000A publication Critical patent/CN113840000A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a distributed network downloading method and device for massive large files. Wherein, the method comprises the following steps: submitting a download address at the control node; monitoring the task state according to the download address; according to the task state, carrying out multithread downloading to obtain downloading data; and judging whether the downloaded data needs to be downloaded repeatedly. The invention solves the problem that the downloading program can be continuously operated on the server in the prior art, but is limited by resources such as network, storage and the like of a single server, and the downloading period is longer when a great number of data files are downloaded. The commands such as curl and wget do not support multithreading technology, efficiency is not high when a single large file is downloaded, downloading speed is low, and meanwhile the commands do not support remote rpc calling and are not beneficial to remote monitoring and downloading task scheduling.

Description

Distributed network downloading method and device for massive large files
Technical Field
The invention relates to the field of data downloading, in particular to a distributed network downloading method and device for massive large files.
Background
Along with the continuous development of intelligent science and technology, people use intelligent equipment more and more among life, work, the study, use intelligent science and technology means, improved the quality of people's life, increased the efficiency of people's study and work.
In many scientific research fields such as atmospheric science, hydrology, oceanography, environmental simulation, geophysical science and the like, data sets are usually published on the internet in files of specific formats for shared access, and common data formats include netCDF, HDF, GRIB and the like. Data in these scientific research fields are usually published according to the data type and time organization form, some form daily data into a file, some form monthly or yearly, and these files contain data of several years, several decades, and even hundreds of years, and contain different observation indexes, so the number of data files is very large, and a single file can reach several tens G at most, and the total amount is very large.
In order to acquire these files, an automated script is generally run on the server side, and the required files are continuously crawled through a curl, wget and other network commands until all the files are completely downloaded. Although the downloading program can be continuously run on the server, the downloading program is limited by resources such as a single server network and storage, and when a large number of data files are downloaded, the downloading period is long. commands such as curl and wget do not support multithreading technology, efficiency is not high when a single large file is downloaded, downloading speed is low, and meanwhile the commands do not support remote rpc calling and are not beneficial to remote monitoring and downloading task scheduling. Often, for security, websites will limit or block IP that frequently crawl websites or download traffic that exceeds a threshold. Downloading a large amount of data at a single point can easily result in the IP being sealed, thereby rendering the data unavailable for downloading.
The invention aims to provide a method for improving the downloading efficiency of massive large files by a distributed network and a multithreading downloading technology. The time required for downloading is shortened by allocating many download tasks to a plurality of servers to work cooperatively. The downloading speed of a single file is improved by a multithreading downloading technology. By means of the distributed cluster, the task of the limited IP can be migrated to other nodes to avoid task failure caused by blocking of the single-point IP.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a distributed network downloading method and device for massive large files, which at least solve the problem that in the prior art, although a downloading program can be continuously operated on a server, the downloading program is limited by resources such as a single server network and storage, and when a great number of data files are downloaded, the downloading period is long. The commands such as curl and wget do not support multithreading technology, efficiency is not high when a single large file is downloaded, downloading speed is low, and meanwhile the commands do not support remote rpc calling and are not beneficial to remote monitoring and downloading task scheduling.
According to an aspect of the embodiments of the present invention, a method for downloading a large number of large files over a distributed network is provided, including: submitting a download address at the control node; monitoring the task state according to the download address; according to the task state, carrying out multithread downloading to obtain downloading data; and judging whether the downloaded data needs to be downloaded repeatedly.
Optionally, after monitoring the task state according to the download address, the method further includes: and acquiring the busy condition according to the task state.
Optionally, the task state includes: busy condition, idle condition.
Optionally, after determining whether the downloaded data needs to be repeatedly downloaded, the method further includes: and repeating the step until the task state is monitored according to the download address.
According to another aspect of the embodiments of the present invention, there is also provided a distributed network downloading apparatus for massive large files, including: the download module is used for submitting a download address at the control node; the monitoring module is used for monitoring the task state according to the download address; the multithreading module is used for carrying out multithreading downloading according to the task state to obtain downloading data; and the judging module is used for judging whether the downloaded data needs to be repeatedly downloaded.
According to another aspect of the embodiments of the present invention, a nonvolatile storage medium is further provided, where the nonvolatile storage medium includes a stored program, and the program controls, when running, a device where the nonvolatile storage medium is located to execute a distributed network downloading method for a large amount of large files.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a processor and a memory; the memory is stored with computer readable instructions, and the processor is used for executing the computer readable instructions, wherein the computer readable instructions execute a distributed network downloading method for massive large files.
In the embodiment of the invention, the download address is submitted at the control node; monitoring the task state according to the download address; according to the task state, carrying out multithread downloading to obtain downloading data; the method for judging whether the downloaded data needs to be repeatedly downloaded solves the problems that the downloading program can be continuously operated on the server in the prior art, but the downloading program is limited by resources such as network and storage of a single server, and the downloading period is long when the downloaded data files are very many. The commands such as curl and wget do not support multithreading technology, efficiency is not high when a single large file is downloaded, downloading speed is low, and meanwhile the commands do not support remote rpc calling and are not beneficial to remote monitoring and downloading task scheduling.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart of a distributed network downloading method for large files in mass according to an embodiment of the present invention;
fig. 2 is a block diagram of a distributed network downloading apparatus for massive large files according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In accordance with an embodiment of the present invention, there is provided a method embodiment of a method for distributed network downloading of large files, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
Example one
Fig. 1 is a flowchart of a distributed network downloading method for massive large files according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
and step S102, submitting the download address at the control node.
And step S104, monitoring the task state according to the download address.
And step S106, carrying out multithread downloading according to the task state to obtain downloading data.
And step S108, judging whether the downloaded data needs to be repeatedly downloaded.
Optionally, after monitoring the task state according to the download address, the method further includes: and acquiring the busy condition according to the task state.
Optionally, the task state includes: busy condition, idle condition.
Optionally, after determining whether the downloaded data needs to be repeatedly downloaded, the method further includes: and repeating the step until the task state is monitored according to the download address.
Specifically, the embodiment of the present invention is implemented by the following steps.
According to the technical scheme, the server is divided into 3 types of control nodes, monitoring nodes and downloading nodes according to the type of the executed task.
And the control node is responsible for analyzing all the download links contained in the original links and adding the tasks into the task queue.
The monitoring node is responsible for monitoring the busy state, the downloading progress, the task state and the like of each downloading node, selecting a relatively idle machine and distributing tasks in the task queue.
The download node is responsible for performing specific download tasks. Each downloading node needs to run the aria2 program. aria2 is a downloading program supporting multiple threads, supporting breakpoint resume, supporting file download using multiple sources or protocols, and speeding up the download. The RPC interface built in aria2 can conveniently check the progress and status of the task.
The method comprises the following steps: a download address is submitted at the control node, e.g. this address is the root directory of a certain file server. The control node acquires the download addresses of all files through webpage analysis and recursive calling, and puts the download addresses into the task queue.
Step two: and the monitoring node collects and monitors the task execution state, the execution progress and the busy degree of the downloading node in the task queue at regular time.
Step three: when the monitoring node monitors that a new task is added into the queue, the monitoring node can acquire the busy state of each downloading node and preferentially distribute the new task to the idle nodes. The queues follow the first-in-first-out principle. When all nodes are busy, the new task is waiting in the queue.
Step four: after receiving the task, the download node hands the task to aria2 for multi-threaded download.
Step five: the monitoring node communicates with the downloading node in a remote calling mode of rpc, updates the downloading data and the downloading state of the data at regular time, and stores the node and the task state information into the zookeeper. And if the data downloading is completed, removing the task from the queue.
Step six: if the data downloading fails, the current node will retry. When the retry number limit is exceeded, the monitoring node marks the task status as failed, records the current download current call-back status and the downloaded current size and position, and then skips executing the next task.
Step seven: if the monitoring node monitors that the IP of the current node is forbidden by the target website, the monitoring node allocates the failed task to other host nodes with different IPs, reads the downloaded content from the shared storage, and continuously downloads the content from the last downloading position, thereby avoiding resource waste caused by repeated downloading.
Step nine: and repeating the second step to the eighth step until all tasks are executed.
Through the embodiment, the problem that in the prior art, although the downloading program can be continuously operated on the server, the downloading program is limited by resources such as a single server network and storage, and when a large number of data files are downloaded, the downloading period is long is solved. The commands such as curl and wget do not support multithreading technology, efficiency is not high when a single large file is downloaded, downloading speed is low, and meanwhile the commands do not support remote rpc calling and are not beneficial to remote monitoring and downloading task scheduling.
Example two
Fig. 2 is a flowchart of a distributed network downloading apparatus for massive large files according to an embodiment of the present invention, as shown in fig. 2, the apparatus includes:
and a download module 20 for submitting the download address at the control node.
And the monitoring module 22 is used for monitoring the task state according to the download address.
And the multithreading module 24 is used for performing multithreading downloading according to the task state to obtain downloading data.
And the judging module 26 is used for judging whether the downloaded data needs to be repeatedly downloaded.
Optionally, after monitoring the task state according to the download address, the method further includes: and acquiring the busy condition according to the task state.
Optionally, the task state includes: busy condition, idle condition.
Optionally, after determining whether the downloaded data needs to be repeatedly downloaded, the method further includes: and repeating the step until the task state is monitored according to the download address.
Specifically, the embodiment of the present invention is implemented by the following steps.
According to the technical scheme, the server is divided into 3 types of control nodes, monitoring nodes and downloading nodes according to the type of the executed task.
And the control node is responsible for analyzing all the download links contained in the original links and adding the tasks into the task queue.
The monitoring node is responsible for monitoring the busy state, the downloading progress, the task state and the like of each downloading node, selecting a relatively idle machine and distributing tasks in the task queue.
The download node is responsible for performing specific download tasks. Each downloading node needs to run the aria2 program. aria2 is a downloading program supporting multiple threads, supporting breakpoint resume, supporting file download using multiple sources or protocols, and speeding up the download. The RPC interface built in aria2 can conveniently check the progress and status of the task.
The method comprises the following steps: a download address is submitted at the control node, e.g. this address is the root directory of a certain file server. The control node acquires the download addresses of all files through webpage analysis and recursive calling, and puts the download addresses into the task queue.
Step two: and the monitoring node collects and monitors the task execution state, the execution progress and the busy degree of the downloading node in the task queue at regular time.
Step three: when the monitoring node monitors that a new task is added into the queue, the monitoring node can acquire the busy state of each downloading node and preferentially distribute the new task to the idle nodes. The queues follow the first-in-first-out principle. When all nodes are busy, the new task is waiting in the queue.
Step four: after receiving the task, the download node hands the task to aria2 for multi-threaded download.
Step five: the monitoring node communicates with the downloading node in a remote calling mode of rpc, updates the downloading data and the downloading state of the data at regular time, and stores the node and the task state information into the zookeeper. And if the data downloading is completed, removing the task from the queue.
Step six: if the data downloading fails, the current node will retry. When the retry number limit is exceeded, the monitoring node marks the task status as failed, records the current download current call-back status and the downloaded current size and position, and then skips executing the next task.
Step seven: if the monitoring node monitors that the IP of the current node is forbidden by the target website, the monitoring node allocates the failed task to other host nodes with different IPs, reads the downloaded content from the shared storage, and continuously downloads the content from the last downloading position, thereby avoiding resource waste caused by repeated downloading.
Step nine: and repeating the second step to the eighth step until all tasks are executed.
Through the embodiment, the problem that in the prior art, although the downloading program can be continuously operated on the server, the downloading program is limited by resources such as a single server network and storage, and when a large number of data files are downloaded, the downloading period is long is solved. The commands such as curl and wget do not support multithreading technology, efficiency is not high when a single large file is downloaded, downloading speed is low, and meanwhile the commands do not support remote rpc calling and are not beneficial to remote monitoring and downloading task scheduling.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A distributed network downloading method for massive large files is characterized by comprising the following steps:
submitting a download address at the control node;
monitoring the task state according to the download address;
according to the task state, carrying out multithread downloading to obtain downloading data;
and judging whether the downloaded data needs to be downloaded repeatedly.
2. The method of claim 1, wherein after said monitoring task status according to said download address, said method further comprises:
and acquiring the busy condition according to the task state.
3. The method of claim 1, wherein the task state comprises: busy condition, idle condition.
4. The method of claim 1, wherein after said determining whether said downloaded data requires repeated downloading, said method further comprises:
and repeating the step until the task state is monitored according to the download address.
5. A distributed network downloading device for massive large files is characterized by comprising:
the download module is used for submitting a download address at the control node;
the monitoring module is used for monitoring the task state according to the download address;
the multithreading module is used for carrying out multithreading downloading according to the task state to obtain downloading data;
and the judging module is used for judging whether the downloaded data needs to be repeatedly downloaded.
6. The apparatus of claim 5, wherein the apparatus further comprises:
and acquiring the busy condition according to the task state.
7. The apparatus of claim 5, wherein the task state comprises: busy condition, idle condition.
8. The apparatus of claim 5, further comprising:
and the repeating module is used for repeating the task state monitoring according to the download address.
9. A non-volatile storage medium, comprising a stored program, wherein the program, when executed, controls an apparatus in which the non-volatile storage medium is located to perform the method of any one of claims 1 to 4.
10. An electronic device comprising a processor and a memory; the memory has stored therein computer readable instructions for execution by the processor, wherein the computer readable instructions when executed perform the method of any one of claims 1 to 4.
CN202111109211.5A 2021-06-30 2021-09-22 Distributed network downloading method and device for massive large files Pending CN113840000A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2021107381733 2021-06-30
CN202110738173 2021-06-30

Publications (1)

Publication Number Publication Date
CN113840000A true CN113840000A (en) 2021-12-24

Family

ID=78960432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111109211.5A Pending CN113840000A (en) 2021-06-30 2021-09-22 Distributed network downloading method and device for massive large files

Country Status (1)

Country Link
CN (1) CN113840000A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103248695A (en) * 2013-05-07 2013-08-14 北京奇虎科技有限公司 File downloading method and system and server node in CDN
CN105978960A (en) * 2016-05-06 2016-09-28 武汉烽火众智数字技术有限责任公司 Cloud scheduling system and method based on mass video structured processing
CN106412137A (en) * 2016-12-20 2017-02-15 北京并行科技股份有限公司 File downloading system and file downloading method
CN112311897A (en) * 2020-11-17 2021-02-02 腾讯科技(深圳)有限公司 Resource file downloading method, device, equipment and medium
US20210096911A1 (en) * 2020-08-17 2021-04-01 Essence Information Technology Co., Ltd Fine granularity real-time supervision system based on edge computing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103248695A (en) * 2013-05-07 2013-08-14 北京奇虎科技有限公司 File downloading method and system and server node in CDN
CN105978960A (en) * 2016-05-06 2016-09-28 武汉烽火众智数字技术有限责任公司 Cloud scheduling system and method based on mass video structured processing
CN106412137A (en) * 2016-12-20 2017-02-15 北京并行科技股份有限公司 File downloading system and file downloading method
US20210096911A1 (en) * 2020-08-17 2021-04-01 Essence Information Technology Co., Ltd Fine granularity real-time supervision system based on edge computing
CN112311897A (en) * 2020-11-17 2021-02-02 腾讯科技(深圳)有限公司 Resource file downloading method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN107567696B (en) Automatic expansion of a group of resource instances within a computing cluster
EP3000040B1 (en) Determining and monitoring performance capabilities of a computer resource service
US8042112B1 (en) Scheduler for search engine crawler
US20180060361A1 (en) Efficient, automated distributed-search methods and systems
CN102880503A (en) Data analysis system and data analysis method
CN107391775A (en) A kind of general web crawlers model implementation method and system
CN105493095A (en) Adaptive and recursive filtering for sample submission
CN106815254A (en) A kind of data processing method and device
CN109271359A (en) Log information processing method, device, electronic equipment and readable storage medium storing program for executing
CN101556586A (en) Method, system and device of automatic data collection
CN107403110A (en) HDFS data desensitization method and device
CN107357885A (en) Method for writing data and device, electronic equipment, computer-readable storage medium
CN113391901A (en) RPA robot management method, device, equipment and storage medium
CN110134646B (en) Knowledge platform service data storage and integration method and system
CN110737814A (en) Crawling method and device for website data, electronic equipment and storage medium
CN109446441A (en) A kind of credible distributed capture storage system of general Web Community
CN111026945B (en) Multi-platform crawler scheduling method, device and storage medium
CN113840000A (en) Distributed network downloading method and device for massive large files
CN112364005A (en) Data synchronization method and device, computer equipment and storage medium
Xiang et al. Optimizing job reliability through contention-free, distributed checkpoint scheduling
CN110968420A (en) Scheduling method and device for multi-crawler platform, storage medium and processor
CN109101636A (en) A kind of method, apparatus and system carrying out data acquisition in cloud by visual configuration
CN105989151A (en) Webpage crawling method and apparatus
CN114647614A (en) System and method for efficient data collection for reporting in large-scale multi-tenant environments based on data access patterns
CN108958906A (en) task processing method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination