CN112434205A - Data integration capturing method and system based on data site and computer equipment - Google Patents
Data integration capturing method and system based on data site and computer equipment Download PDFInfo
- Publication number
- CN112434205A CN112434205A CN202011369702.9A CN202011369702A CN112434205A CN 112434205 A CN112434205 A CN 112434205A CN 202011369702 A CN202011369702 A CN 202011369702A CN 112434205 A CN112434205 A CN 112434205A
- Authority
- CN
- China
- Prior art keywords
- data
- task
- capturing
- site
- capture
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 230000010354 integration Effects 0.000 title claims abstract description 39
- 238000013481 data capture Methods 0.000 claims abstract description 72
- 238000004590 computer program Methods 0.000 claims description 9
- 238000001914 filtration Methods 0.000 claims description 9
- 238000007619 statistical method Methods 0.000 claims description 5
- 238000013500 data storage Methods 0.000 claims description 3
- 230000009193 crawling Effects 0.000 claims 8
- 238000004891 communication Methods 0.000 description 10
- 238000012545 processing Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention provides a data integration capturing method, a data integration capturing system and computer equipment based on a data site, wherein the data integration capturing method comprises the following steps: a task creating step, wherein a data capturing task is created according to the requirements of a user; a task scheduling step, namely, calling the data capturing task, analyzing task parameters in the data capturing task, and determining a target data site and data capturing information; and a data capturing step, wherein target original data are captured based on the target data site and the data capturing information. The method does not need manual support, reduces labor cost, realizes one-time capture of data based on user requirements and data sites, and improves accuracy of data capture.
Description
Technical Field
The invention relates to the technical field of communication, in particular to a data integration capturing method and system based on a data site and computer equipment.
Background
With the rapid development of networks and the expansion of information, network data capture services have become an inevitable component of enterprises because they are very useful in obtaining accurate relevant information. By using the data capture tool, useful information about customer preferences, preferred locations, competitor policies, etc. can be extracted.
Currently, in the prior art, in order to meet the data capture requirements of users, firstly, a website page is manually searched to capture related data; and secondly, accessing the pages of the individual websites through the scattered crawlers to capture related data.
However, the manual grabbing of the related data or the grabbing of the related data by the scattered crawlers has high labor cost, is difficult to realize one-time grabbing of the data according to the user requirements, and has low accuracy of data grabbing due to the limited number of access times of the website.
Disclosure of Invention
In order to solve the technical problems of high labor cost and low data capturing accuracy in data capturing in the prior art, the invention provides a data integrated capturing method based on a data site, which does not need manual support, reduces labor cost, realizes one-time data capturing based on user requirements and the data site, and improves the data capturing accuracy.
The invention provides a data integration capturing method based on a data site, which comprises the following steps:
a task creating step, wherein a data capturing task is created according to the requirements of a user;
a task scheduling step, namely, calling the data capturing task, analyzing task parameters in the data capturing task, and determining a target data site and data capturing information;
and a data capturing step, wherein target original data are captured based on the target data site and the data capturing information.
The data integration capture method based on the data site includes, in the task creating step:
a primary task creating step, wherein a primary data capturing task is created according to the requirements of users;
and a secondary task creating step of decomposing the primary data grabbing task and creating a plurality of secondary data grabbing tasks corresponding to the primary data grabbing task.
In the above data integration capture method based on data sites, the task creating step further includes:
and a filtering step, in the process of creating the secondary data capture task, when the existing secondary data capture task comprises the secondary data capture task to be created, filtering the secondary data capture task to be created.
The data integration capture method based on the data site further comprises the following steps:
and a data storage step, storing the data capture task, the target site, the data capture information and the target original data in a database, and storing the analyzed target original data in the database.
The data integration capture method based on the data site further comprises the following steps:
a task judging step of judging whether the secondary data capturing task needs to be created or not after the task parameters are analyzed, and executing the filtering step when the secondary data capturing task needs to be created; otherwise, continuing to execute the task scheduling step.
The data integration capture method based on the data site further comprises the following steps:
an information matching step, namely matching the data capturing information with the database after the data capturing information is determined, and acquiring the target original data corresponding to the data capturing information based on the database when the data capturing information is successfully matched with the database; and when the data capture information is unsuccessfully matched with the database, executing the data capture step.
The data integration capture method based on the data site further comprises the following steps:
and a step of statistical analysis, wherein the analyzed target original data is subjected to statistical analysis to obtain a data capture result.
The data integration capture method based on the data site further comprises the following steps:
and a result display step, namely displaying the data capture result.
The invention also provides a system for realizing the data integration capturing method based on the data site, which comprises the following steps:
the task creating unit is used for creating a data capturing task according to the user requirement;
the task scheduling unit is used for scheduling the data capturing task, analyzing task parameters in the data capturing task and determining a target data site and data capturing information;
and the data capturing unit is used for capturing target original data based on the target data site and the data capturing information.
The invention also provides a computer device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the data integration capture method based on the data site.
The invention has the technical effects or advantages that:
the invention provides a data integration capturing method based on a data site, which comprises a task creating step, a task scheduling step and a data capturing step, wherein the task creating step comprises creating a data capturing task according to user requirements; the task scheduling step comprises the steps of calling the data capturing task, analyzing task parameters in the data capturing task, and determining a target data site and data capturing information; and the data capturing step comprises capturing target original data based on the target data site and the data capturing information. Through the mode, manual support is not needed, labor cost is reduced, one-time data capture based on user requirements and data sites is realized, and accuracy of data capture is improved.
Drawings
Fig. 1 is a flowchart of a data integration capture method based on a data site according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a system for implementing a data integration capture method based on a data site according to an embodiment of the present invention;
FIG. 3 is a block diagram of an electronic device according to an embodiment of the present invention;
in the above figures:
10. a bus; 11. a processor; 12. a memory; 13. a communication interface.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict. Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The technical solution of the present invention will be described in detail below with reference to the specific embodiments and the accompanying drawings.
The embodiment provides a data integration capturing method based on a data site, which comprises the following steps:
a task creating step, wherein a data capturing task is created according to the requirements of a user;
a task scheduling step, namely, calling the data capturing task, analyzing task parameters in the data capturing task, and determining a target data site and data capturing information;
and a data capturing step, wherein target original data are captured based on the target data site and the data capturing information.
According to the data integration capturing method based on the data site, manual support is not needed, the labor cost is reduced, one-time capturing of data based on user requirements and the data site is achieved, and the accuracy of data capturing is improved.
Specifically, referring to fig. 1, fig. 1 is a flowchart of a data integration capture method based on a data site according to an embodiment of the present invention. The embodiment of the invention provides a data integration capturing method based on a data site, which comprises the following steps:
and a task creating step S1, wherein the data capturing task is created according to the requirement of the user.
In this embodiment, the task creating step S1 specifically includes:
a primary task creating step S11, wherein a primary data capturing task is created according to the user requirement;
a secondary task creating step S12 of decomposing the primary data grab task and creating a plurality of secondary data grab tasks corresponding to the primary data grab task.
In step S1, the task creating step further includes:
a filtering step S13, in the process of creating the secondary data capture task, when the existing secondary data capture task includes the secondary data capture task to be created, the secondary data capture task to be created is filtered.
In this embodiment, in the process of creating the secondary data capture task, when the secondary data capture task includes the secondary data capture task to be created, the secondary data capture task is filtered, and when the secondary data capture task does not include the secondary data capture task to be created, the secondary data capture task is created.
In a specific application, for example, a user needs to obtain related data of a brand from a certain data site, the related data comprises live broadcast information, corresponding anchor information, brand sales information, brand corresponding commodity sales trend information, live broadcast comment information and live broadcast trend information, it is visible by analyzing user demands that the brand information is a common demand, a primary data grabbing task is created through the brand information, the primary data grabbing task comprises whether the live broadcast information is needed, whether the anchor information is needed, whether the commodity information is needed, whether the comment information is needed and the like, the primary data grabbing task is decomposed, and a plurality of secondary data grabbing tasks corresponding to the primary data grabbing task are created, wherein the secondary data grabbing tasks comprise the live broadcast information, the anchor information, the commodity information, the comment information and the like.
And a task scheduling step S2, wherein the data capturing task is called, and after task parameters in the data capturing task are analyzed, a target data site and data capturing information are determined.
In specific application, after one of the two-level data capturing tasks is called, the task parameter analysis is carried out on the two-level data capturing task, and a target data site and data capturing information are determined, wherein the data capturing information comprises brand, live broadcast detail information, anchor broadcast detail information and the like.
And a data capture step S3 of capturing target raw data based on the target data site and the data capture information.
In this embodiment, a crawler is used to capture the target raw data. And after the target data site and the data capture information are determined, requesting url based on the target data site and the data capture information, and initiating an http request to the target data site, so as to capture the target original data.
A data storage step S4, storing the data capture task, the target site, the data capture information, and the target raw data in a database, and storing the analyzed target raw data in the database.
In this embodiment, the primary data capture task, the secondary data capture task, the target site, the data capture information, and the target raw data are all stored in the mysql database, and the analyzed target raw data is also stored in the mysql database.
And a statistical analysis step S5, wherein the analyzed target original data is statistically analyzed to obtain a data capture result.
And a result displaying step S6, displaying the data capture result.
In this embodiment, the data capture result is presented to the user.
In order to reduce the number of accesses to the data site, the data integration capture method based on the data site provided in this embodiment further includes the following steps:
a task judgment step S7, after the task parameters are analyzed, judging whether the secondary data capture task needs to be created, and when the secondary data capture task needs to be created, executing the filtering step; otherwise, continuing to execute the task scheduling step.
In the specific application, when the secondary data grabbing task is analyzed, a new secondary data grabbing task may be analyzed, and the new secondary data grabbing task needs to be written into the secondary data grabbing task again, in order to reduce the number of times of accessing the data site and judge whether the secondary data grabbing task needs to be created, when the secondary data grabbing task needs to be created, the filtering step S13 is executed, and when the secondary data grabbing task does not need to be created, the target data site and the data grabbing information are determined according to the analyzed task parameters.
An information matching step S8, after the data capture information is determined, matching the data capture information with the database, and when the data capture information is successfully matched with the database, acquiring the target original data corresponding to the data capture information based on the database; and when the data capture information is unsuccessfully matched with the database, executing the data capture step.
In the specific application, after the data capture information is determined, when data capture is required according to the data capture information and a target data site, some key information is used, and the key information can be obtained only by performing additional requests on the data site; and when the data capture information is unsuccessfully matched with the database, executing the data capture step. For example, when live list information is captured, a signature corresponding to a user ID is needed, and if the live list information is obtained again by accessing a data site every time, access waste is generated, because the number of access times of an interface is limited, access cannot be performed if the number of access times exceeds the number of access times, before the live list information is captured, through an information matching step, the number of user information access can be greatly reduced every day, and a larger demand is met.
According to the data integration capturing method based on the data site, manual support is not needed, the labor cost is reduced, one-time capturing of data based on user requirements and the data site is achieved, and the accuracy of data capturing is improved.
Referring to fig. 2, this embodiment further provides a system for implementing the data integration capture method based on the data site, including:
and the task creating unit is used for creating a data capturing task according to the user requirement.
The task creating unit specifically comprises a primary data grabbing task creating unit, a primary data grabbing task decomposing unit and a plurality of secondary data grabbing tasks corresponding to the primary data grabbing task creating unit according to user requirements. In the process of creating the secondary data grabbing task, when the existing secondary data grabbing task comprises the secondary data grabbing task to be created, the secondary data grabbing task to be created is filtered out, so that the access times of data sites are reduced.
And the task scheduling unit is used for scheduling the data capturing task, analyzing task parameters in the data capturing task and determining a target data site and data capturing information.
In order to reduce the number of times of accessing a data site, after analyzing task parameters, judging whether a secondary data grabbing task needs to be created, and when the secondary data grabbing task needs to be created, executing a task creating unit; otherwise, the task scheduling unit is continuously executed.
And the data capturing unit is used for capturing target original data based on the target data site and the data capturing information.
In this embodiment, a crawler is used to capture the target raw data. In order to reduce the number of times of accessing the data site, in this embodiment, the data capture task, the target site, the data capture information, and the target raw data are stored in the database, and the analyzed target raw data is stored in the database, when the data capture information is determined and data capture needs to be performed according to the data capture information and the target data site, the data capture information is matched with the database, and when the data capture information is successfully matched with the database, the target raw data corresponding to the data capture information is obtained based on the database; and when the data capture information is not matched with the database successfully, executing the data capture unit.
According to the data integration capturing method based on the data site, manual support is not needed, the labor cost is reduced, one-time capturing of data based on user requirements and the data site is achieved, and the accuracy of data capturing is improved.
Referring to fig. 3, the present embodiment further provides a computer device, which includes a memory 12, a processor 11, and a computer program stored on the memory 12 and executable on the processor 11, and when the processor 11 executes the computer program, the data integration capture method based on the data site is implemented.
The apparatus may comprise a processor 11 and a memory 12 in which computer program instructions are stored. Specifically, the processor 11 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
The memory 12 may be used to store or cache various data files that need to be processed and/or used for communication, as well as possible computer program instructions executed by the processor 11.
The processor 11 reads and executes the computer program instructions stored in the memory 12 to implement any one of the above-mentioned data site-based data integration capture methods.
In some of these embodiments, the computer device may also include a communication interface 13 and a bus 10. Referring to fig. 3, the processor 11, the memory 12, and the communication interface 13 are connected via the bus 10 and perform communication with each other. The communication interface 13 is used for implementing communication between modules, devices, units and/or equipment in the embodiment of the present application. The communication port 13 may also be implemented with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.
The bus 10 includes hardware, software, or both to couple the components of the electronic device to one another. Bus 10 includes, but is not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, Bus 10 may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (FSB), a HyperTransport (HT) Interconnect, an ISA (ISA) Bus, an InfiniBand (InfiniBand) Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a Microchannel Architecture (MCA) Bus, a PCI (Peripheral Component Interconnect) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (AGP) Bus, a Local Video Association (Video Electronics Bus), abbreviated VLB) bus or other suitable bus or a combination of two or more of these. Bus 10 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (10)
1. A data integration capture method based on a data site is characterized by comprising the following steps:
a task creating step, wherein a data capturing task is created according to the requirements of a user;
a task scheduling step, namely, calling the data capturing task, analyzing task parameters in the data capturing task, and determining a target data site and data capturing information;
and a data capturing step, wherein target original data are captured based on the target data site and the data capturing information.
2. The data integration crawling method based on the data site as claimed in claim 1, wherein the task creating step specifically comprises:
a primary task creating step, wherein a primary data capturing task is created according to the requirements of users;
and a secondary task creating step of decomposing the primary data grabbing task and creating a plurality of secondary data grabbing tasks corresponding to the primary data grabbing task.
3. The data integration crawling method based on the data site as claimed in claim 2, wherein the task creating step further comprises:
and a filtering step, in the process of creating the secondary data capture task, when the existing secondary data capture task comprises the secondary data capture task to be created, filtering the secondary data capture task to be created.
4. The data integration crawling method based on the data site as claimed in any one of claims 2 or 3, further comprising:
and a data storage step, storing the data capture task, the target site, the data capture information and the target original data in a database, and storing the analyzed target original data in the database.
5. The data integration crawling method based on the data site as claimed in claim 3, further comprising:
a task judging step of judging whether the secondary data capturing task needs to be created or not after the task parameters are analyzed, and executing the filtering step when the secondary data capturing task needs to be created; otherwise, continuing to execute the task scheduling step.
6. The data integration crawling method based on the data site as claimed in claim 4, further comprising:
an information matching step, namely matching the data capturing information with the database after the data capturing information is determined, and acquiring the target original data corresponding to the data capturing information based on the database when the data capturing information is successfully matched with the database; and when the data capture information is unsuccessfully matched with the database, executing the data capture step.
7. The data integration crawling method based on the data site as claimed in claim 4, further comprising:
and a step of statistical analysis, wherein the analyzed target original data is subjected to statistical analysis to obtain a data capture result.
8. The data integration crawling method based on the data site as claimed in claim 7, further comprising:
and a result display step, namely displaying the data capture result.
9. A system for implementing the data site-based data integration capture method according to any one of claims 1-8, comprising:
the task creating unit is used for creating a data capturing task according to the user requirement;
the task scheduling unit is used for scheduling the data capturing task, analyzing task parameters in the data capturing task and determining a target data site and data capturing information;
and the data capturing unit is used for capturing target original data based on the target data site and the data capturing information.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the data site-based data integration crawling method according to any one of claims 1 to 8 when executing the computer program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011369702.9A CN112434205A (en) | 2020-11-30 | 2020-11-30 | Data integration capturing method and system based on data site and computer equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011369702.9A CN112434205A (en) | 2020-11-30 | 2020-11-30 | Data integration capturing method and system based on data site and computer equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112434205A true CN112434205A (en) | 2021-03-02 |
Family
ID=74698806
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011369702.9A Pending CN112434205A (en) | 2020-11-30 | 2020-11-30 | Data integration capturing method and system based on data site and computer equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112434205A (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105069108A (en) * | 2015-08-07 | 2015-11-18 | 新浪网技术(中国)有限公司 | Method and device for massive data query based on PaaS (Platform as a Service) system |
CN105243159A (en) * | 2015-10-28 | 2016-01-13 | 福建亿榕信息技术有限公司 | Visual script editor-based distributed web crawler system |
CN105956175A (en) * | 2016-05-24 | 2016-09-21 | 考拉征信服务有限公司 | Webpage content crawling method and device |
CN107145556A (en) * | 2017-04-28 | 2017-09-08 | 安徽博约信息科技股份有限公司 | General distributed parallel computing environment |
CN109325161A (en) * | 2018-09-11 | 2019-02-12 | 五八有限公司 | Public sentiment data grasping means, device, equipment and storage medium |
CN109918557A (en) * | 2019-03-12 | 2019-06-21 | 厦门商集网络科技有限责任公司 | A kind of web data crawls merging method and computer readable storage medium |
CN110096666A (en) * | 2019-05-08 | 2019-08-06 | 上海泰豪迈能能源科技有限公司 | The method and device of data processing |
CN110555147A (en) * | 2018-03-30 | 2019-12-10 | 上海媒科锐奇网络科技有限公司 | website data capturing method, device, equipment and medium thereof |
CN110765334A (en) * | 2019-09-10 | 2020-02-07 | 北京字节跳动网络技术有限公司 | Data capture method, system, medium and electronic device |
-
2020
- 2020-11-30 CN CN202011369702.9A patent/CN112434205A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105069108A (en) * | 2015-08-07 | 2015-11-18 | 新浪网技术(中国)有限公司 | Method and device for massive data query based on PaaS (Platform as a Service) system |
CN105243159A (en) * | 2015-10-28 | 2016-01-13 | 福建亿榕信息技术有限公司 | Visual script editor-based distributed web crawler system |
CN105956175A (en) * | 2016-05-24 | 2016-09-21 | 考拉征信服务有限公司 | Webpage content crawling method and device |
CN107145556A (en) * | 2017-04-28 | 2017-09-08 | 安徽博约信息科技股份有限公司 | General distributed parallel computing environment |
CN110555147A (en) * | 2018-03-30 | 2019-12-10 | 上海媒科锐奇网络科技有限公司 | website data capturing method, device, equipment and medium thereof |
CN109325161A (en) * | 2018-09-11 | 2019-02-12 | 五八有限公司 | Public sentiment data grasping means, device, equipment and storage medium |
CN109918557A (en) * | 2019-03-12 | 2019-06-21 | 厦门商集网络科技有限责任公司 | A kind of web data crawls merging method and computer readable storage medium |
CN110096666A (en) * | 2019-05-08 | 2019-08-06 | 上海泰豪迈能能源科技有限公司 | The method and device of data processing |
CN110765334A (en) * | 2019-09-10 | 2020-02-07 | 北京字节跳动网络技术有限公司 | Data capture method, system, medium and electronic device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108345642B (en) | Method, storage medium and server for crawling website data by proxy IP | |
JP5353148B2 (en) | Image information retrieving apparatus, image information retrieving method and computer program therefor | |
JP2010073114A6 (en) | Image information retrieving apparatus, image information retrieving method and computer program therefor | |
EP3146698A1 (en) | Method and system for acquiring web pages | |
TWI524302B (en) | Method for performing merging control of feeds on at least one social network, and associated apparatus and associated computer program product | |
CN109600385B (en) | Access control method and device | |
CN110069693B (en) | Method and device for determining target page | |
CN110909229A (en) | Webpage data acquisition and storage system based on simulated browser access | |
CN105302815B (en) | The filter method and device of the uniform resource position mark URL of webpage | |
CN110968765B (en) | Book searching method, computing device and computer storage medium | |
CN111770106A (en) | Method, device, system, electronic device and storage medium for data threat analysis | |
CN106104550A (en) | Site information extraction element, system, site information extracting method and site information extraction procedure | |
US20120166412A1 (en) | Super-clustering for efficient information extraction | |
CN107885875B (en) | Synonymy transformation method and device for search words and server | |
CN109522282B (en) | Picture management method, device, computer device and storage medium | |
CN113989058A (en) | Service generation method and device | |
CN112866279B (en) | Webpage security detection method, device, equipment and medium | |
US8918406B2 (en) | Intelligent analysis queue construction | |
CN112488552A (en) | Method and system for constructing service index, electronic equipment and storage medium | |
CN112307386A (en) | Information monitoring method, system, electronic device and computer readable storage medium | |
CN112434205A (en) | Data integration capturing method and system based on data site and computer equipment | |
CN110990701A (en) | Book searching method, computing device and computer storage medium | |
CN113274736B (en) | Cloud game resource scheduling method, device, equipment and storage medium | |
CN113535338A (en) | Interaction method, system, storage medium and electronic device for data access | |
CN113158044B (en) | Method, system, terminal equipment and storage medium for on-line full-media reading |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |