CN113840000A - Distributed network downloading method and device for massive large files - Google Patents
Distributed network downloading method and device for massive large files Download PDFInfo
- Publication number
- CN113840000A CN113840000A CN202111109211.5A CN202111109211A CN113840000A CN 113840000 A CN113840000 A CN 113840000A CN 202111109211 A CN202111109211 A CN 202111109211A CN 113840000 A CN113840000 A CN 113840000A
- Authority
- CN
- China
- Prior art keywords
- downloading
- task state
- downloaded
- task
- monitoring
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000012544 monitoring process Methods 0.000 claims abstract description 42
- 238000005516 engineering process Methods 0.000 abstract description 11
- 230000009286 beneficial effect Effects 0.000 abstract description 6
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000005034 decoration Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/06—Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
- H04L67/1008—Server selection for load balancing based on parameters of servers, e.g. available memory or workload
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5018—Thread allocation
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a distributed network downloading method and device for massive large files. Wherein, the method comprises the following steps: submitting a download address at the control node; monitoring the task state according to the download address; according to the task state, carrying out multithread downloading to obtain downloading data; and judging whether the downloaded data needs to be downloaded repeatedly. The invention solves the problem that the downloading program can be continuously operated on the server in the prior art, but is limited by resources such as network, storage and the like of a single server, and the downloading period is longer when a great number of data files are downloaded. The commands such as curl and wget do not support multithreading technology, efficiency is not high when a single large file is downloaded, downloading speed is low, and meanwhile the commands do not support remote rpc calling and are not beneficial to remote monitoring and downloading task scheduling.
Description
Technical Field
The invention relates to the field of data downloading, in particular to a distributed network downloading method and device for massive large files.
Background
Along with the continuous development of intelligent science and technology, people use intelligent equipment more and more among life, work, the study, use intelligent science and technology means, improved the quality of people's life, increased the efficiency of people's study and work.
In many scientific research fields such as atmospheric science, hydrology, oceanography, environmental simulation, geophysical science and the like, data sets are usually published on the internet in files of specific formats for shared access, and common data formats include netCDF, HDF, GRIB and the like. Data in these scientific research fields are usually published according to the data type and time organization form, some form daily data into a file, some form monthly or yearly, and these files contain data of several years, several decades, and even hundreds of years, and contain different observation indexes, so the number of data files is very large, and a single file can reach several tens G at most, and the total amount is very large.
In order to acquire these files, an automated script is generally run on the server side, and the required files are continuously crawled through a curl, wget and other network commands until all the files are completely downloaded. Although the downloading program can be continuously run on the server, the downloading program is limited by resources such as a single server network and storage, and when a large number of data files are downloaded, the downloading period is long. commands such as curl and wget do not support multithreading technology, efficiency is not high when a single large file is downloaded, downloading speed is low, and meanwhile the commands do not support remote rpc calling and are not beneficial to remote monitoring and downloading task scheduling. Often, for security, websites will limit or block IP that frequently crawl websites or download traffic that exceeds a threshold. Downloading a large amount of data at a single point can easily result in the IP being sealed, thereby rendering the data unavailable for downloading.
The invention aims to provide a method for improving the downloading efficiency of massive large files by a distributed network and a multithreading downloading technology. The time required for downloading is shortened by allocating many download tasks to a plurality of servers to work cooperatively. The downloading speed of a single file is improved by a multithreading downloading technology. By means of the distributed cluster, the task of the limited IP can be migrated to other nodes to avoid task failure caused by blocking of the single-point IP.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a distributed network downloading method and device for massive large files, which at least solve the problem that in the prior art, although a downloading program can be continuously operated on a server, the downloading program is limited by resources such as a single server network and storage, and when a great number of data files are downloaded, the downloading period is long. The commands such as curl and wget do not support multithreading technology, efficiency is not high when a single large file is downloaded, downloading speed is low, and meanwhile the commands do not support remote rpc calling and are not beneficial to remote monitoring and downloading task scheduling.
According to an aspect of the embodiments of the present invention, a method for downloading a large number of large files over a distributed network is provided, including: submitting a download address at the control node; monitoring the task state according to the download address; according to the task state, carrying out multithread downloading to obtain downloading data; and judging whether the downloaded data needs to be downloaded repeatedly.
Optionally, after monitoring the task state according to the download address, the method further includes: and acquiring the busy condition according to the task state.
Optionally, the task state includes: busy condition, idle condition.
Optionally, after determining whether the downloaded data needs to be repeatedly downloaded, the method further includes: and repeating the step until the task state is monitored according to the download address.
According to another aspect of the embodiments of the present invention, there is also provided a distributed network downloading apparatus for massive large files, including: the download module is used for submitting a download address at the control node; the monitoring module is used for monitoring the task state according to the download address; the multithreading module is used for carrying out multithreading downloading according to the task state to obtain downloading data; and the judging module is used for judging whether the downloaded data needs to be repeatedly downloaded.
According to another aspect of the embodiments of the present invention, a nonvolatile storage medium is further provided, where the nonvolatile storage medium includes a stored program, and the program controls, when running, a device where the nonvolatile storage medium is located to execute a distributed network downloading method for a large amount of large files.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a processor and a memory; the memory is stored with computer readable instructions, and the processor is used for executing the computer readable instructions, wherein the computer readable instructions execute a distributed network downloading method for massive large files.
In the embodiment of the invention, the download address is submitted at the control node; monitoring the task state according to the download address; according to the task state, carrying out multithread downloading to obtain downloading data; the method for judging whether the downloaded data needs to be repeatedly downloaded solves the problems that the downloading program can be continuously operated on the server in the prior art, but the downloading program is limited by resources such as network and storage of a single server, and the downloading period is long when the downloaded data files are very many. The commands such as curl and wget do not support multithreading technology, efficiency is not high when a single large file is downloaded, downloading speed is low, and meanwhile the commands do not support remote rpc calling and are not beneficial to remote monitoring and downloading task scheduling.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart of a distributed network downloading method for large files in mass according to an embodiment of the present invention;
fig. 2 is a block diagram of a distributed network downloading apparatus for massive large files according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In accordance with an embodiment of the present invention, there is provided a method embodiment of a method for distributed network downloading of large files, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
Example one
Fig. 1 is a flowchart of a distributed network downloading method for massive large files according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
and step S102, submitting the download address at the control node.
And step S104, monitoring the task state according to the download address.
And step S106, carrying out multithread downloading according to the task state to obtain downloading data.
And step S108, judging whether the downloaded data needs to be repeatedly downloaded.
Optionally, after monitoring the task state according to the download address, the method further includes: and acquiring the busy condition according to the task state.
Optionally, the task state includes: busy condition, idle condition.
Optionally, after determining whether the downloaded data needs to be repeatedly downloaded, the method further includes: and repeating the step until the task state is monitored according to the download address.
Specifically, the embodiment of the present invention is implemented by the following steps.
According to the technical scheme, the server is divided into 3 types of control nodes, monitoring nodes and downloading nodes according to the type of the executed task.
And the control node is responsible for analyzing all the download links contained in the original links and adding the tasks into the task queue.
The monitoring node is responsible for monitoring the busy state, the downloading progress, the task state and the like of each downloading node, selecting a relatively idle machine and distributing tasks in the task queue.
The download node is responsible for performing specific download tasks. Each downloading node needs to run the aria2 program. aria2 is a downloading program supporting multiple threads, supporting breakpoint resume, supporting file download using multiple sources or protocols, and speeding up the download. The RPC interface built in aria2 can conveniently check the progress and status of the task.
The method comprises the following steps: a download address is submitted at the control node, e.g. this address is the root directory of a certain file server. The control node acquires the download addresses of all files through webpage analysis and recursive calling, and puts the download addresses into the task queue.
Step two: and the monitoring node collects and monitors the task execution state, the execution progress and the busy degree of the downloading node in the task queue at regular time.
Step three: when the monitoring node monitors that a new task is added into the queue, the monitoring node can acquire the busy state of each downloading node and preferentially distribute the new task to the idle nodes. The queues follow the first-in-first-out principle. When all nodes are busy, the new task is waiting in the queue.
Step four: after receiving the task, the download node hands the task to aria2 for multi-threaded download.
Step five: the monitoring node communicates with the downloading node in a remote calling mode of rpc, updates the downloading data and the downloading state of the data at regular time, and stores the node and the task state information into the zookeeper. And if the data downloading is completed, removing the task from the queue.
Step six: if the data downloading fails, the current node will retry. When the retry number limit is exceeded, the monitoring node marks the task status as failed, records the current download current call-back status and the downloaded current size and position, and then skips executing the next task.
Step seven: if the monitoring node monitors that the IP of the current node is forbidden by the target website, the monitoring node allocates the failed task to other host nodes with different IPs, reads the downloaded content from the shared storage, and continuously downloads the content from the last downloading position, thereby avoiding resource waste caused by repeated downloading.
Step nine: and repeating the second step to the eighth step until all tasks are executed.
Through the embodiment, the problem that in the prior art, although the downloading program can be continuously operated on the server, the downloading program is limited by resources such as a single server network and storage, and when a large number of data files are downloaded, the downloading period is long is solved. The commands such as curl and wget do not support multithreading technology, efficiency is not high when a single large file is downloaded, downloading speed is low, and meanwhile the commands do not support remote rpc calling and are not beneficial to remote monitoring and downloading task scheduling.
Example two
Fig. 2 is a flowchart of a distributed network downloading apparatus for massive large files according to an embodiment of the present invention, as shown in fig. 2, the apparatus includes:
and a download module 20 for submitting the download address at the control node.
And the monitoring module 22 is used for monitoring the task state according to the download address.
And the multithreading module 24 is used for performing multithreading downloading according to the task state to obtain downloading data.
And the judging module 26 is used for judging whether the downloaded data needs to be repeatedly downloaded.
Optionally, after monitoring the task state according to the download address, the method further includes: and acquiring the busy condition according to the task state.
Optionally, the task state includes: busy condition, idle condition.
Optionally, after determining whether the downloaded data needs to be repeatedly downloaded, the method further includes: and repeating the step until the task state is monitored according to the download address.
Specifically, the embodiment of the present invention is implemented by the following steps.
According to the technical scheme, the server is divided into 3 types of control nodes, monitoring nodes and downloading nodes according to the type of the executed task.
And the control node is responsible for analyzing all the download links contained in the original links and adding the tasks into the task queue.
The monitoring node is responsible for monitoring the busy state, the downloading progress, the task state and the like of each downloading node, selecting a relatively idle machine and distributing tasks in the task queue.
The download node is responsible for performing specific download tasks. Each downloading node needs to run the aria2 program. aria2 is a downloading program supporting multiple threads, supporting breakpoint resume, supporting file download using multiple sources or protocols, and speeding up the download. The RPC interface built in aria2 can conveniently check the progress and status of the task.
The method comprises the following steps: a download address is submitted at the control node, e.g. this address is the root directory of a certain file server. The control node acquires the download addresses of all files through webpage analysis and recursive calling, and puts the download addresses into the task queue.
Step two: and the monitoring node collects and monitors the task execution state, the execution progress and the busy degree of the downloading node in the task queue at regular time.
Step three: when the monitoring node monitors that a new task is added into the queue, the monitoring node can acquire the busy state of each downloading node and preferentially distribute the new task to the idle nodes. The queues follow the first-in-first-out principle. When all nodes are busy, the new task is waiting in the queue.
Step four: after receiving the task, the download node hands the task to aria2 for multi-threaded download.
Step five: the monitoring node communicates with the downloading node in a remote calling mode of rpc, updates the downloading data and the downloading state of the data at regular time, and stores the node and the task state information into the zookeeper. And if the data downloading is completed, removing the task from the queue.
Step six: if the data downloading fails, the current node will retry. When the retry number limit is exceeded, the monitoring node marks the task status as failed, records the current download current call-back status and the downloaded current size and position, and then skips executing the next task.
Step seven: if the monitoring node monitors that the IP of the current node is forbidden by the target website, the monitoring node allocates the failed task to other host nodes with different IPs, reads the downloaded content from the shared storage, and continuously downloads the content from the last downloading position, thereby avoiding resource waste caused by repeated downloading.
Step nine: and repeating the second step to the eighth step until all tasks are executed.
Through the embodiment, the problem that in the prior art, although the downloading program can be continuously operated on the server, the downloading program is limited by resources such as a single server network and storage, and when a large number of data files are downloaded, the downloading period is long is solved. The commands such as curl and wget do not support multithreading technology, efficiency is not high when a single large file is downloaded, downloading speed is low, and meanwhile the commands do not support remote rpc calling and are not beneficial to remote monitoring and downloading task scheduling.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
Claims (10)
1. A distributed network downloading method for massive large files is characterized by comprising the following steps:
submitting a download address at the control node;
monitoring the task state according to the download address;
according to the task state, carrying out multithread downloading to obtain downloading data;
and judging whether the downloaded data needs to be downloaded repeatedly.
2. The method of claim 1, wherein after said monitoring task status according to said download address, said method further comprises:
and acquiring the busy condition according to the task state.
3. The method of claim 1, wherein the task state comprises: busy condition, idle condition.
4. The method of claim 1, wherein after said determining whether said downloaded data requires repeated downloading, said method further comprises:
and repeating the step until the task state is monitored according to the download address.
5. A distributed network downloading device for massive large files is characterized by comprising:
the download module is used for submitting a download address at the control node;
the monitoring module is used for monitoring the task state according to the download address;
the multithreading module is used for carrying out multithreading downloading according to the task state to obtain downloading data;
and the judging module is used for judging whether the downloaded data needs to be repeatedly downloaded.
6. The apparatus of claim 5, wherein the apparatus further comprises:
and acquiring the busy condition according to the task state.
7. The apparatus of claim 5, wherein the task state comprises: busy condition, idle condition.
8. The apparatus of claim 5, further comprising:
and the repeating module is used for repeating the task state monitoring according to the download address.
9. A non-volatile storage medium, comprising a stored program, wherein the program, when executed, controls an apparatus in which the non-volatile storage medium is located to perform the method of any one of claims 1 to 4.
10. An electronic device comprising a processor and a memory; the memory has stored therein computer readable instructions for execution by the processor, wherein the computer readable instructions when executed perform the method of any one of claims 1 to 4.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2021107381733 | 2021-06-30 | ||
CN202110738173 | 2021-06-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113840000A true CN113840000A (en) | 2021-12-24 |
Family
ID=78960432
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111109211.5A Pending CN113840000A (en) | 2021-06-30 | 2021-09-22 | Distributed network downloading method and device for massive large files |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113840000A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103248695A (en) * | 2013-05-07 | 2013-08-14 | 北京奇虎科技有限公司 | File downloading method and system and server node in CDN |
CN105978960A (en) * | 2016-05-06 | 2016-09-28 | 武汉烽火众智数字技术有限责任公司 | Cloud scheduling system and method based on mass video structured processing |
CN106412137A (en) * | 2016-12-20 | 2017-02-15 | 北京并行科技股份有限公司 | File downloading system and file downloading method |
CN112311897A (en) * | 2020-11-17 | 2021-02-02 | 腾讯科技(深圳)有限公司 | Resource file downloading method, device, equipment and medium |
US20210096911A1 (en) * | 2020-08-17 | 2021-04-01 | Essence Information Technology Co., Ltd | Fine granularity real-time supervision system based on edge computing |
-
2021
- 2021-09-22 CN CN202111109211.5A patent/CN113840000A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103248695A (en) * | 2013-05-07 | 2013-08-14 | 北京奇虎科技有限公司 | File downloading method and system and server node in CDN |
CN105978960A (en) * | 2016-05-06 | 2016-09-28 | 武汉烽火众智数字技术有限责任公司 | Cloud scheduling system and method based on mass video structured processing |
CN106412137A (en) * | 2016-12-20 | 2017-02-15 | 北京并行科技股份有限公司 | File downloading system and file downloading method |
US20210096911A1 (en) * | 2020-08-17 | 2021-04-01 | Essence Information Technology Co., Ltd | Fine granularity real-time supervision system based on edge computing |
CN112311897A (en) * | 2020-11-17 | 2021-02-02 | 腾讯科技(深圳)有限公司 | Resource file downloading method, device, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107567696B (en) | Automatic expansion of a group of resource instances within a computing cluster | |
EP3000040B1 (en) | Determining and monitoring performance capabilities of a computer resource service | |
US8042112B1 (en) | Scheduler for search engine crawler | |
US20180060361A1 (en) | Efficient, automated distributed-search methods and systems | |
CN102880503A (en) | Data analysis system and data analysis method | |
CN107391775A (en) | A kind of general web crawlers model implementation method and system | |
CN105493095A (en) | Adaptive and recursive filtering for sample submission | |
CN106815254A (en) | A kind of data processing method and device | |
CN109271359A (en) | Log information processing method, device, electronic equipment and readable storage medium storing program for executing | |
CN101556586A (en) | Method, system and device of automatic data collection | |
CN107403110A (en) | HDFS data desensitization method and device | |
CN107357885A (en) | Method for writing data and device, electronic equipment, computer-readable storage medium | |
CN113391901A (en) | RPA robot management method, device, equipment and storage medium | |
CN110134646B (en) | Knowledge platform service data storage and integration method and system | |
CN110737814A (en) | Crawling method and device for website data, electronic equipment and storage medium | |
CN109446441A (en) | A kind of credible distributed capture storage system of general Web Community | |
CN111026945B (en) | Multi-platform crawler scheduling method, device and storage medium | |
CN113840000A (en) | Distributed network downloading method and device for massive large files | |
CN112364005A (en) | Data synchronization method and device, computer equipment and storage medium | |
Xiang et al. | Optimizing job reliability through contention-free, distributed checkpoint scheduling | |
CN110968420A (en) | Scheduling method and device for multi-crawler platform, storage medium and processor | |
CN109101636A (en) | A kind of method, apparatus and system carrying out data acquisition in cloud by visual configuration | |
CN105989151A (en) | Webpage crawling method and apparatus | |
CN114647614A (en) | System and method for efficient data collection for reporting in large-scale multi-tenant environments based on data access patterns | |
CN108958906A (en) | task processing method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |