CN112434205A - Data integration capturing method and system based on data site and computer equipment - Google Patents

Data integration capturing method and system based on data site and computer equipment Download PDF

Info

Publication number
CN112434205A
CN112434205A CN202011369702.9A CN202011369702A CN112434205A CN 112434205 A CN112434205 A CN 112434205A CN 202011369702 A CN202011369702 A CN 202011369702A CN 112434205 A CN112434205 A CN 112434205A
Authority
CN
China
Prior art keywords
data
task
capturing
site
capture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011369702.9A
Other languages
Chinese (zh)
Inventor
候彩云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Second Hand Artificial Intelligence Technology Co ltd
Original Assignee
Beijing Second Hand Artificial Intelligence Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Second Hand Artificial Intelligence Technology Co ltd filed Critical Beijing Second Hand Artificial Intelligence Technology Co ltd
Priority to CN202011369702.9A priority Critical patent/CN112434205A/en
Publication of CN112434205A publication Critical patent/CN112434205A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a data integration capturing method, a data integration capturing system and computer equipment based on a data site, wherein the data integration capturing method comprises the following steps: a task creating step, wherein a data capturing task is created according to the requirements of a user; a task scheduling step, namely, calling the data capturing task, analyzing task parameters in the data capturing task, and determining a target data site and data capturing information; and a data capturing step, wherein target original data are captured based on the target data site and the data capturing information. The method does not need manual support, reduces labor cost, realizes one-time capture of data based on user requirements and data sites, and improves accuracy of data capture.

Description

Data integration capturing method and system based on data site and computer equipment
Technical Field
The invention relates to the technical field of communication, in particular to a data integration capturing method and system based on a data site and computer equipment.
Background
With the rapid development of networks and the expansion of information, network data capture services have become an inevitable component of enterprises because they are very useful in obtaining accurate relevant information. By using the data capture tool, useful information about customer preferences, preferred locations, competitor policies, etc. can be extracted.
Currently, in the prior art, in order to meet the data capture requirements of users, firstly, a website page is manually searched to capture related data; and secondly, accessing the pages of the individual websites through the scattered crawlers to capture related data.
However, the manual grabbing of the related data or the grabbing of the related data by the scattered crawlers has high labor cost, is difficult to realize one-time grabbing of the data according to the user requirements, and has low accuracy of data grabbing due to the limited number of access times of the website.
Disclosure of Invention
In order to solve the technical problems of high labor cost and low data capturing accuracy in data capturing in the prior art, the invention provides a data integrated capturing method based on a data site, which does not need manual support, reduces labor cost, realizes one-time data capturing based on user requirements and the data site, and improves the data capturing accuracy.
The invention provides a data integration capturing method based on a data site, which comprises the following steps:
a task creating step, wherein a data capturing task is created according to the requirements of a user;
a task scheduling step, namely, calling the data capturing task, analyzing task parameters in the data capturing task, and determining a target data site and data capturing information;
and a data capturing step, wherein target original data are captured based on the target data site and the data capturing information.
The data integration capture method based on the data site includes, in the task creating step:
a primary task creating step, wherein a primary data capturing task is created according to the requirements of users;
and a secondary task creating step of decomposing the primary data grabbing task and creating a plurality of secondary data grabbing tasks corresponding to the primary data grabbing task.
In the above data integration capture method based on data sites, the task creating step further includes:
and a filtering step, in the process of creating the secondary data capture task, when the existing secondary data capture task comprises the secondary data capture task to be created, filtering the secondary data capture task to be created.
The data integration capture method based on the data site further comprises the following steps:
and a data storage step, storing the data capture task, the target site, the data capture information and the target original data in a database, and storing the analyzed target original data in the database.
The data integration capture method based on the data site further comprises the following steps:
a task judging step of judging whether the secondary data capturing task needs to be created or not after the task parameters are analyzed, and executing the filtering step when the secondary data capturing task needs to be created; otherwise, continuing to execute the task scheduling step.
The data integration capture method based on the data site further comprises the following steps:
an information matching step, namely matching the data capturing information with the database after the data capturing information is determined, and acquiring the target original data corresponding to the data capturing information based on the database when the data capturing information is successfully matched with the database; and when the data capture information is unsuccessfully matched with the database, executing the data capture step.
The data integration capture method based on the data site further comprises the following steps:
and a step of statistical analysis, wherein the analyzed target original data is subjected to statistical analysis to obtain a data capture result.
The data integration capture method based on the data site further comprises the following steps:
and a result display step, namely displaying the data capture result.
The invention also provides a system for realizing the data integration capturing method based on the data site, which comprises the following steps:
the task creating unit is used for creating a data capturing task according to the user requirement;
the task scheduling unit is used for scheduling the data capturing task, analyzing task parameters in the data capturing task and determining a target data site and data capturing information;
and the data capturing unit is used for capturing target original data based on the target data site and the data capturing information.
The invention also provides a computer device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the data integration capture method based on the data site.
The invention has the technical effects or advantages that:
the invention provides a data integration capturing method based on a data site, which comprises a task creating step, a task scheduling step and a data capturing step, wherein the task creating step comprises creating a data capturing task according to user requirements; the task scheduling step comprises the steps of calling the data capturing task, analyzing task parameters in the data capturing task, and determining a target data site and data capturing information; and the data capturing step comprises capturing target original data based on the target data site and the data capturing information. Through the mode, manual support is not needed, labor cost is reduced, one-time data capture based on user requirements and data sites is realized, and accuracy of data capture is improved.
Drawings
Fig. 1 is a flowchart of a data integration capture method based on a data site according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a system for implementing a data integration capture method based on a data site according to an embodiment of the present invention;
FIG. 3 is a block diagram of an electronic device according to an embodiment of the present invention;
in the above figures:
10. a bus; 11. a processor; 12. a memory; 13. a communication interface.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict. Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The technical solution of the present invention will be described in detail below with reference to the specific embodiments and the accompanying drawings.
The embodiment provides a data integration capturing method based on a data site, which comprises the following steps:
a task creating step, wherein a data capturing task is created according to the requirements of a user;
a task scheduling step, namely, calling the data capturing task, analyzing task parameters in the data capturing task, and determining a target data site and data capturing information;
and a data capturing step, wherein target original data are captured based on the target data site and the data capturing information.
According to the data integration capturing method based on the data site, manual support is not needed, the labor cost is reduced, one-time capturing of data based on user requirements and the data site is achieved, and the accuracy of data capturing is improved.
Specifically, referring to fig. 1, fig. 1 is a flowchart of a data integration capture method based on a data site according to an embodiment of the present invention. The embodiment of the invention provides a data integration capturing method based on a data site, which comprises the following steps:
and a task creating step S1, wherein the data capturing task is created according to the requirement of the user.
In this embodiment, the task creating step S1 specifically includes:
a primary task creating step S11, wherein a primary data capturing task is created according to the user requirement;
a secondary task creating step S12 of decomposing the primary data grab task and creating a plurality of secondary data grab tasks corresponding to the primary data grab task.
In step S1, the task creating step further includes:
a filtering step S13, in the process of creating the secondary data capture task, when the existing secondary data capture task includes the secondary data capture task to be created, the secondary data capture task to be created is filtered.
In this embodiment, in the process of creating the secondary data capture task, when the secondary data capture task includes the secondary data capture task to be created, the secondary data capture task is filtered, and when the secondary data capture task does not include the secondary data capture task to be created, the secondary data capture task is created.
In a specific application, for example, a user needs to obtain related data of a brand from a certain data site, the related data comprises live broadcast information, corresponding anchor information, brand sales information, brand corresponding commodity sales trend information, live broadcast comment information and live broadcast trend information, it is visible by analyzing user demands that the brand information is a common demand, a primary data grabbing task is created through the brand information, the primary data grabbing task comprises whether the live broadcast information is needed, whether the anchor information is needed, whether the commodity information is needed, whether the comment information is needed and the like, the primary data grabbing task is decomposed, and a plurality of secondary data grabbing tasks corresponding to the primary data grabbing task are created, wherein the secondary data grabbing tasks comprise the live broadcast information, the anchor information, the commodity information, the comment information and the like.
And a task scheduling step S2, wherein the data capturing task is called, and after task parameters in the data capturing task are analyzed, a target data site and data capturing information are determined.
In specific application, after one of the two-level data capturing tasks is called, the task parameter analysis is carried out on the two-level data capturing task, and a target data site and data capturing information are determined, wherein the data capturing information comprises brand, live broadcast detail information, anchor broadcast detail information and the like.
And a data capture step S3 of capturing target raw data based on the target data site and the data capture information.
In this embodiment, a crawler is used to capture the target raw data. And after the target data site and the data capture information are determined, requesting url based on the target data site and the data capture information, and initiating an http request to the target data site, so as to capture the target original data.
A data storage step S4, storing the data capture task, the target site, the data capture information, and the target raw data in a database, and storing the analyzed target raw data in the database.
In this embodiment, the primary data capture task, the secondary data capture task, the target site, the data capture information, and the target raw data are all stored in the mysql database, and the analyzed target raw data is also stored in the mysql database.
And a statistical analysis step S5, wherein the analyzed target original data is statistically analyzed to obtain a data capture result.
And a result displaying step S6, displaying the data capture result.
In this embodiment, the data capture result is presented to the user.
In order to reduce the number of accesses to the data site, the data integration capture method based on the data site provided in this embodiment further includes the following steps:
a task judgment step S7, after the task parameters are analyzed, judging whether the secondary data capture task needs to be created, and when the secondary data capture task needs to be created, executing the filtering step; otherwise, continuing to execute the task scheduling step.
In the specific application, when the secondary data grabbing task is analyzed, a new secondary data grabbing task may be analyzed, and the new secondary data grabbing task needs to be written into the secondary data grabbing task again, in order to reduce the number of times of accessing the data site and judge whether the secondary data grabbing task needs to be created, when the secondary data grabbing task needs to be created, the filtering step S13 is executed, and when the secondary data grabbing task does not need to be created, the target data site and the data grabbing information are determined according to the analyzed task parameters.
An information matching step S8, after the data capture information is determined, matching the data capture information with the database, and when the data capture information is successfully matched with the database, acquiring the target original data corresponding to the data capture information based on the database; and when the data capture information is unsuccessfully matched with the database, executing the data capture step.
In the specific application, after the data capture information is determined, when data capture is required according to the data capture information and a target data site, some key information is used, and the key information can be obtained only by performing additional requests on the data site; and when the data capture information is unsuccessfully matched with the database, executing the data capture step. For example, when live list information is captured, a signature corresponding to a user ID is needed, and if the live list information is obtained again by accessing a data site every time, access waste is generated, because the number of access times of an interface is limited, access cannot be performed if the number of access times exceeds the number of access times, before the live list information is captured, through an information matching step, the number of user information access can be greatly reduced every day, and a larger demand is met.
According to the data integration capturing method based on the data site, manual support is not needed, the labor cost is reduced, one-time capturing of data based on user requirements and the data site is achieved, and the accuracy of data capturing is improved.
Referring to fig. 2, this embodiment further provides a system for implementing the data integration capture method based on the data site, including:
and the task creating unit is used for creating a data capturing task according to the user requirement.
The task creating unit specifically comprises a primary data grabbing task creating unit, a primary data grabbing task decomposing unit and a plurality of secondary data grabbing tasks corresponding to the primary data grabbing task creating unit according to user requirements. In the process of creating the secondary data grabbing task, when the existing secondary data grabbing task comprises the secondary data grabbing task to be created, the secondary data grabbing task to be created is filtered out, so that the access times of data sites are reduced.
And the task scheduling unit is used for scheduling the data capturing task, analyzing task parameters in the data capturing task and determining a target data site and data capturing information.
In order to reduce the number of times of accessing a data site, after analyzing task parameters, judging whether a secondary data grabbing task needs to be created, and when the secondary data grabbing task needs to be created, executing a task creating unit; otherwise, the task scheduling unit is continuously executed.
And the data capturing unit is used for capturing target original data based on the target data site and the data capturing information.
In this embodiment, a crawler is used to capture the target raw data. In order to reduce the number of times of accessing the data site, in this embodiment, the data capture task, the target site, the data capture information, and the target raw data are stored in the database, and the analyzed target raw data is stored in the database, when the data capture information is determined and data capture needs to be performed according to the data capture information and the target data site, the data capture information is matched with the database, and when the data capture information is successfully matched with the database, the target raw data corresponding to the data capture information is obtained based on the database; and when the data capture information is not matched with the database successfully, executing the data capture unit.
According to the data integration capturing method based on the data site, manual support is not needed, the labor cost is reduced, one-time capturing of data based on user requirements and the data site is achieved, and the accuracy of data capturing is improved.
Referring to fig. 3, the present embodiment further provides a computer device, which includes a memory 12, a processor 11, and a computer program stored on the memory 12 and executable on the processor 11, and when the processor 11 executes the computer program, the data integration capture method based on the data site is implemented.
The apparatus may comprise a processor 11 and a memory 12 in which computer program instructions are stored. Specifically, the processor 11 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 12 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 12 may include a Hard Disk Drive (Hard Disk Drive, abbreviated to HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 12 may include removable or non-removable (or fixed) media, where appropriate. The memory 12 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 12 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, Memory 12 includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), Electrically rewritable ROM (EAROM), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), an Extended data output Dynamic Random-Access Memory (EDODRAM), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.
The memory 12 may be used to store or cache various data files that need to be processed and/or used for communication, as well as possible computer program instructions executed by the processor 11.
The processor 11 reads and executes the computer program instructions stored in the memory 12 to implement any one of the above-mentioned data site-based data integration capture methods.
In some of these embodiments, the computer device may also include a communication interface 13 and a bus 10. Referring to fig. 3, the processor 11, the memory 12, and the communication interface 13 are connected via the bus 10 and perform communication with each other. The communication interface 13 is used for implementing communication between modules, devices, units and/or equipment in the embodiment of the present application. The communication port 13 may also be implemented with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.
The bus 10 includes hardware, software, or both to couple the components of the electronic device to one another. Bus 10 includes, but is not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, Bus 10 may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (FSB), a HyperTransport (HT) Interconnect, an ISA (ISA) Bus, an InfiniBand (InfiniBand) Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a Microchannel Architecture (MCA) Bus, a PCI (Peripheral Component Interconnect) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (AGP) Bus, a Local Video Association (Video Electronics Bus), abbreviated VLB) bus or other suitable bus or a combination of two or more of these. Bus 10 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A data integration capture method based on a data site is characterized by comprising the following steps:
a task creating step, wherein a data capturing task is created according to the requirements of a user;
a task scheduling step, namely, calling the data capturing task, analyzing task parameters in the data capturing task, and determining a target data site and data capturing information;
and a data capturing step, wherein target original data are captured based on the target data site and the data capturing information.
2. The data integration crawling method based on the data site as claimed in claim 1, wherein the task creating step specifically comprises:
a primary task creating step, wherein a primary data capturing task is created according to the requirements of users;
and a secondary task creating step of decomposing the primary data grabbing task and creating a plurality of secondary data grabbing tasks corresponding to the primary data grabbing task.
3. The data integration crawling method based on the data site as claimed in claim 2, wherein the task creating step further comprises:
and a filtering step, in the process of creating the secondary data capture task, when the existing secondary data capture task comprises the secondary data capture task to be created, filtering the secondary data capture task to be created.
4. The data integration crawling method based on the data site as claimed in any one of claims 2 or 3, further comprising:
and a data storage step, storing the data capture task, the target site, the data capture information and the target original data in a database, and storing the analyzed target original data in the database.
5. The data integration crawling method based on the data site as claimed in claim 3, further comprising:
a task judging step of judging whether the secondary data capturing task needs to be created or not after the task parameters are analyzed, and executing the filtering step when the secondary data capturing task needs to be created; otherwise, continuing to execute the task scheduling step.
6. The data integration crawling method based on the data site as claimed in claim 4, further comprising:
an information matching step, namely matching the data capturing information with the database after the data capturing information is determined, and acquiring the target original data corresponding to the data capturing information based on the database when the data capturing information is successfully matched with the database; and when the data capture information is unsuccessfully matched with the database, executing the data capture step.
7. The data integration crawling method based on the data site as claimed in claim 4, further comprising:
and a step of statistical analysis, wherein the analyzed target original data is subjected to statistical analysis to obtain a data capture result.
8. The data integration crawling method based on the data site as claimed in claim 7, further comprising:
and a result display step, namely displaying the data capture result.
9. A system for implementing the data site-based data integration capture method according to any one of claims 1-8, comprising:
the task creating unit is used for creating a data capturing task according to the user requirement;
the task scheduling unit is used for scheduling the data capturing task, analyzing task parameters in the data capturing task and determining a target data site and data capturing information;
and the data capturing unit is used for capturing target original data based on the target data site and the data capturing information.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the data site-based data integration crawling method according to any one of claims 1 to 8 when executing the computer program.
CN202011369702.9A 2020-11-30 2020-11-30 Data integration capturing method and system based on data site and computer equipment Pending CN112434205A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011369702.9A CN112434205A (en) 2020-11-30 2020-11-30 Data integration capturing method and system based on data site and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011369702.9A CN112434205A (en) 2020-11-30 2020-11-30 Data integration capturing method and system based on data site and computer equipment

Publications (1)

Publication Number Publication Date
CN112434205A true CN112434205A (en) 2021-03-02

Family

ID=74698806

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011369702.9A Pending CN112434205A (en) 2020-11-30 2020-11-30 Data integration capturing method and system based on data site and computer equipment

Country Status (1)

Country Link
CN (1) CN112434205A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069108A (en) * 2015-08-07 2015-11-18 新浪网技术(中国)有限公司 Method and device for massive data query based on PaaS (Platform as a Service) system
CN105243159A (en) * 2015-10-28 2016-01-13 福建亿榕信息技术有限公司 Visual script editor-based distributed web crawler system
CN105956175A (en) * 2016-05-24 2016-09-21 考拉征信服务有限公司 Webpage content crawling method and device
CN107145556A (en) * 2017-04-28 2017-09-08 安徽博约信息科技股份有限公司 General distributed parallel computing environment
CN109325161A (en) * 2018-09-11 2019-02-12 五八有限公司 Public sentiment data grasping means, device, equipment and storage medium
CN109918557A (en) * 2019-03-12 2019-06-21 厦门商集网络科技有限责任公司 A kind of web data crawls merging method and computer readable storage medium
CN110096666A (en) * 2019-05-08 2019-08-06 上海泰豪迈能能源科技有限公司 The method and device of data processing
CN110555147A (en) * 2018-03-30 2019-12-10 上海媒科锐奇网络科技有限公司 website data capturing method, device, equipment and medium thereof
CN110765334A (en) * 2019-09-10 2020-02-07 北京字节跳动网络技术有限公司 Data capture method, system, medium and electronic device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069108A (en) * 2015-08-07 2015-11-18 新浪网技术(中国)有限公司 Method and device for massive data query based on PaaS (Platform as a Service) system
CN105243159A (en) * 2015-10-28 2016-01-13 福建亿榕信息技术有限公司 Visual script editor-based distributed web crawler system
CN105956175A (en) * 2016-05-24 2016-09-21 考拉征信服务有限公司 Webpage content crawling method and device
CN107145556A (en) * 2017-04-28 2017-09-08 安徽博约信息科技股份有限公司 General distributed parallel computing environment
CN110555147A (en) * 2018-03-30 2019-12-10 上海媒科锐奇网络科技有限公司 website data capturing method, device, equipment and medium thereof
CN109325161A (en) * 2018-09-11 2019-02-12 五八有限公司 Public sentiment data grasping means, device, equipment and storage medium
CN109918557A (en) * 2019-03-12 2019-06-21 厦门商集网络科技有限责任公司 A kind of web data crawls merging method and computer readable storage medium
CN110096666A (en) * 2019-05-08 2019-08-06 上海泰豪迈能能源科技有限公司 The method and device of data processing
CN110765334A (en) * 2019-09-10 2020-02-07 北京字节跳动网络技术有限公司 Data capture method, system, medium and electronic device

Similar Documents

Publication Publication Date Title
CN108345642B (en) Method, storage medium and server for crawling website data by proxy IP
JP5353148B2 (en) Image information retrieving apparatus, image information retrieving method and computer program therefor
JP2010073114A6 (en) Image information retrieving apparatus, image information retrieving method and computer program therefor
EP3146698A1 (en) Method and system for acquiring web pages
TWI524302B (en) Method for performing merging control of feeds on at least one social network, and associated apparatus and associated computer program product
CN109600385B (en) Access control method and device
CN110069693B (en) Method and device for determining target page
CN110909229A (en) Webpage data acquisition and storage system based on simulated browser access
CN105302815B (en) The filter method and device of the uniform resource position mark URL of webpage
CN110968765B (en) Book searching method, computing device and computer storage medium
CN111770106A (en) Method, device, system, electronic device and storage medium for data threat analysis
CN106104550A (en) Site information extraction element, system, site information extracting method and site information extraction procedure
US20120166412A1 (en) Super-clustering for efficient information extraction
CN107885875B (en) Synonymy transformation method and device for search words and server
CN109522282B (en) Picture management method, device, computer device and storage medium
CN113989058A (en) Service generation method and device
CN112866279B (en) Webpage security detection method, device, equipment and medium
US8918406B2 (en) Intelligent analysis queue construction
CN112488552A (en) Method and system for constructing service index, electronic equipment and storage medium
CN112307386A (en) Information monitoring method, system, electronic device and computer readable storage medium
CN112434205A (en) Data integration capturing method and system based on data site and computer equipment
CN110990701A (en) Book searching method, computing device and computer storage medium
CN113274736B (en) Cloud game resource scheduling method, device, equipment and storage medium
CN113535338A (en) Interaction method, system, storage medium and electronic device for data access
CN113158044B (en) Method, system, terminal equipment and storage medium for on-line full-media reading

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination