CN113946454A - Webpage data processing method, system, equipment and storage medium - Google Patents

Webpage data processing method, system, equipment and storage medium Download PDF

Info

Publication number
CN113946454A
CN113946454A CN202111147850.0A CN202111147850A CN113946454A CN 113946454 A CN113946454 A CN 113946454A CN 202111147850 A CN202111147850 A CN 202111147850A CN 113946454 A CN113946454 A CN 113946454A
Authority
CN
China
Prior art keywords
data
pulser
functions
conversion
pulsar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202111147850.0A
Other languages
Chinese (zh)
Inventor
董志勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing College of Information Technology
Original Assignee
Nanjing College of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing College of Information Technology filed Critical Nanjing College of Information Technology
Priority to CN202111147850.0A priority Critical patent/CN113946454A/en
Publication of CN113946454A publication Critical patent/CN113946454A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/543User-generated data transfer, e.g. clipboards, dynamic data exchange [DDE], object linking and embedding [OLE]

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a method for processing webpage data, which comprises the following steps: acquiring webpage data, and issuing the webpage data to a Pulsar message middleware, wherein the Pulsar message middleware comprises Pulsar Functions; acquiring webpage data through pulser Functions, analyzing the webpage data according to preset, acquiring target data, and issuing the target data to pulser message middleware through pulser Functions; and acquiring target data through the pulser Functions, checking the conversion target data according to a pre-specified rule, acquiring conversion data, and issuing the conversion data to the pulser message middleware through the pulser Functions. According to the invention, the data are butted through the Pulsar Functions, the data interaction process is unified, and the data processing efficiency is improved.

Description

Webpage data processing method, system, equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, a system, a device, and a storage medium for processing web page data.
Background
The big data technology is a strong driving force for social development, can collect, clean, store, label and model mass data, and is designed and developed through an artificial intelligence technology and a software technology to form an intelligent application system, so that various intelligent application scenes are constructed. In the process, the collection and cleaning of mass Web data become basic work in the process of processing big data.
In the existing technical solution, the collection work and the cleaning work of the Web data processing system are usually developed by using a distributed technology, and are implemented by using different modules. Complicated network communication is needed between the modules, and the problems of non-uniform communication interfaces, non-uniform data formats, difficult butt joint of processing flows and the like exist, so that the Web data processing system is difficult to maintain, and the data processing efficiency is reduced.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a method, a system, equipment and a storage medium for processing webpage data.
In a first aspect, the present invention provides a method for processing web page data, the method comprising the steps of:
acquiring webpage data, and issuing the webpage data to a Pulsar message middleware, wherein the Pulsar message middleware comprises Pulsar Functions;
acquiring webpage data through pulser Functions, analyzing the webpage data according to preset, acquiring target data, and issuing the target data to pulser message middleware through pulser Functions;
acquiring target data through pulser Functions, checking conversion target data according to a preassigned rule, acquiring conversion data, and issuing the conversion data to pulser message middleware through pulser Functions;
and acquiring conversion data through pulser Functions, and storing the conversion data.
Further, the preset of the analysis webpage data comprises a task id, URL information of a website to be crawled, a name of data with crawl, and an xpath expression.
Further, the pre-specified rule of the conversion target data includes a cleansing condition expression and a cleansing action function.
Further, the conversion data obtained through the pulser message middleware is stored in a specified CVS file.
In a second aspect, the present invention further provides a network data processing system, including a crawler module, an extracting module, a converting module, and a storing module, where the modules are connected through a Pulsar message middleware, and the Pulsar message middleware includes Pulsar Functions;
the crawler module is used for crawling data of a specified website and publishing the webpage data to the Pulsar message middleware;
the extraction module subscribes the webpage data issued by the crawler module through the Pulsar Functions, analyzes the webpage data according to the setting, acquires target data and issues the target data to the Pulsar message middleware through the Pulsar Functions;
the conversion module subscribes the target data issued by the extraction module through the pulser Functions, receives the target data through the pulser Functions, performs check conversion on the target data according to a specified rule, and issues the converted data to the pulser message middleware through the pulser Functions;
the storage module is used for subscribing the conversion data issued by the conversion module, receiving the conversion data through pulser Functions and storing the conversion data.
Further, the crawler module, the extraction module, the conversion module and the storage module are registered to the Pulsar message middleware through a plug-in mechanism.
In a third aspect, the present invention further provides an apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the web page data processing method according to any one of the first aspect when executing the computer program.
In a fourth aspect, the present invention further provides a storage medium, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, implements the web page data processing method according to any one of the first aspects.
Compared with the prior art, the invention has the beneficial effects that: according to the invention, the connection is carried out through the plug-in mechanism of the Pulsar message middleware, the modules do not need to consider a complex network communication flow, and data interaction is carried out through the Pulsar Functions, so that the data format and the processing flow are unified, the complexity of the system is reduced, and the data processing efficiency is improved.
Drawings
FIG. 1 is a schematic flow chart illustrating web page data processing according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a web page data processing system according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating a web page data processing system according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Example 1:
as shown in FIG. 1, the invention provides a web page data processing method, which is a method for realizing data processing based on Pulsar message middleware and comprises the steps of crawling web page data, extracting target data and acquiring conversion data.
The Pulsar message middleware is cloud-native, multi-tenant-supporting, high-performance message middleware. In contrast to traditional message middleware (e.g., RabbitMQ, rocktmq, etc.), Pulsar not only supports traditional publish/subscribe functionality, but also provides a plug-in mechanism that can register third-party programs onto Pulsar. In addition, the pulser also provides pulser Functions based on a plug-in mechanism, and the pulser Functions are a serverless computing platform, have the advantages of simple data processing flow, easy data format unification and the like, and can obviously improve the efficiency of data processing.
Specifically, the webpage crawling data initiates an http request to the target website by calling a requests library or a script library of python, so as to obtain the webpage data. The specified web page data is sent to the Pulsar message middleware through a publish function.
Correspondingly, the information of the website to be crawled corresponding to the crawled webpage data corresponds to the following information:
Figure BDA0003286072630000041
Figure BDA0003286072630000051
wherein taskid is task id, and URLLIst is a set of target website URLs corresponding to the task. The task id is globally unique to the system, and the URLList of each task id can comprise a plurality of target website URLs.
The webpage data processing method obtains webpage data through the Pulsar Functions, calls an xpath library of python to perform matching operation on html text of the webpage data according to a preset xpath expression, extracts target data, and sends the target data to the Pulsar message middleware through a publish (publish) function.
Correspondingly, the extraction rule of the web page data is defined as follows:
Figure BDA0003286072630000052
each taskid corresponds to an xpathList, which is a set of xpath expressions, each xpath expression is used for parsing out specific data of the web page, and the specific data are distinguished by a "dataName" field.
The target data is obtained through the Pulsar Functions, and the target data is checked according to the specified cleaning rule. The cleaning rule comprises cleaning condition and cleaning function action. And the cleaning rule converts the target data meeting the cleaning condition through a cleaning function action to form conversion data, and the conversion data is issued to the Pulsar message middleware through Pulsar Functions.
Correspondingly, the cleansing rule of the target data is defined as follows:
name rule name
description rule description information
Condition rule execution Condition
action-rule execution action
The name field represents the rule name, the description field represents the rule description information, the condition field represents the rule execution condition, and the actions field represents the rule execution action. When the rule execution condition specified by the condition is true, the rule execution action specified by the actions is executed.
The contents of the cleansing rule file are exemplified as follows:
name:"rule"
description "invalid data handling"
Condition isinValidFormat ("data name 1")
action:update()
The meaning of the above configuration is: the function isInValidFormat () is used to determine whether the format of the data corresponding to the clear "data name 1" is incorrect. If the format is incorrect, then the function update () of the actions field is called to process the data.
The invention obtains the conversion data through the Pulsar message middleware and stores the conversion data to the appointed CVS file.
Further, the webpage data processing method uses a built-in command Pulsar-admin functions create of the Pulsar message middleware to register the processing flows of crawling webpage data, extracting target data and acquiring conversion data on the Pulsar message middleware respectively.
Further, the processing flow of crawling the webpage data, extracting the target data and acquiring the conversion data realizes a Function < T, R > interface of the Pulsar Functions, rewrites a method String application (String input) by the @ Override annotation, and performs extraction and conversion processing on the data by the method application.
Example 2:
as shown in fig. 2, the present invention further provides a web page data processing system, which includes a crawler module, an extraction module, a conversion module, and a storage module, where the modules are registered in the Pulsar message middleware through a plug-in mechanism, so as to implement a Pulsar Function interface.
The crawler module is used for crawling data of a specified website and publishing the webpage data to the Pulsar message middleware; the extraction module is used for subscribing the webpage data issued by the crawler module, analyzing the webpage data according to the setting, acquiring target data and issuing the target data to the Pulsar message middleware; the conversion module is used for subscribing the target data issued by the extraction module, checking and converting the target data according to a specified rule, and issuing the converted data to a Pulsar message middleware; the storage module is used for subscribing the conversion data issued by the conversion module and storing the conversion data.
Specifically, the crawler module loads the json module, calls a function load () to read a URL configuration file urlConfig, and analyzes taskid and URLList fields, where the URL configuration file urlConfig can be expressed as urlConfig [ { taskid1, URLList1}, { taskid2, URLList2}, … { taskid, URLList } ], where URLList is a set of target website URLs, and can be expressed as [ URL1, URL2, …, URLN ].
The crawler module loads a request library, then traverses urlConfig [ { taskid1, URLLIst1}, { taskid2, URLLIst2}, and … { taskid N, URLLIstN } ], and performs the following operations on each element K { taskidK, URLLIstK } (the K value range is [1, N ]) in urlConfig:
(1) and acquiring a URL set [ URL1, URL2, … and URLN ] corresponding to the URLListK field, calling a get () function of a request module for each element URLX (X value range is [1, N ]) in the URL set, and initiating an http request to the specified URLX to acquire corresponding HTML data. The HTML data set corresponding to the URL set can be represented as [ HTML1, HTML2, …, HTMLN ];
(2) the crawler module loads a PulsarClient library, creates a producer object, and the producer object creates a theme Topic A;
(3) the crawler module traverses HTML data [ HTML1, HTML2, …, HTMLN ], and then issues { task id, HTMLX } (X value range [1, N ]) to the Topic Topic A in sequence.
The extraction module loads the json module, calls function load () to read an extraction configuration file extractConfig, analyzes a taskid field and an xpathList field, and can express the extraction configuration file as extractConfig [ { taskid1, xpathList1}, { taskid2, xpathList2}, …, { taskidN, xpathList n } ]. Where xpathList is a set of xpath expressions, which can be expressed as [ { dataName1, xpath1}, { dataName2, xpath2}, …, { dataNamen, xpathN } ].
The extraction module implements a Pulsar Function interface, subscribes to a Topic a in an initialization Function __ init __ () of the interface, then rewrites a Function process () of the interface, traverses all elements of the extractConfig after receiving data { task id, HTMLX } (X takes a value range of [1, N ]) sent by the crawler module, and obtains xpathList fields [ { dataName1, xpath1}, { dataName2, xpath2}, …, { dataName N, xpathN } ] of an element X if the taskid field of a certain element X of the extractConfig is the same as the task id.
The extraction module traverses all elements of xpathList [ { dataName1, xpath1}, { dataName2, xpath2}, …, dataNamen, xpath } ] of the element x, loads a function xpath () of a regular expression library res on each node { dataNameK, xpath } (the value range of K is [0, N ]) of the xpathList field, then inputs the function xpath () to analyze HTMLX, and obtains analyzed data, and the data is marked as dataValueK (the value range of K is [0, N ]).
The extraction module creates a data set ExtData, and stores the dataValueK (the value range of K is [0, N ]) and the task id into the ExtData, wherein the ExtData can be expressed as { task id, xpathValue }, and xpathValue can be expressed as [ { dataName1, dataValue1}, { dataName2, dataValue2}, …, { dataNamen, dataValueN } ].
The decimation module calls the Function publish () of the Pulsar Function interface to send ExtData { task id, xpathValue } to the Topic Topic B.
The conversion module loads a Java rule engine RulesEngine, instantiates a conversion configuration file RuLEConfig.yml into RuLEInstance through a file () function of the RulesEngine, converts a condition field and an action field of the RuLEConfig.yml into Java functions which are respectively marked as a condition Fun and an action Fun, wherein a parameter of the condition Fun function is dataName. The RuleInstance may be expressed as [ { dataName1, conditionFun1, actionFun1}, { dataName2, conditionFun2, actionFun2}, …, { dataNamen, conditionFunn, actionFunn } ].
The conversion module implements a Pulsar Function interface, subscribes to the Topic B in an initialization Function __ init __ () of the interface, then rewrites a Function process () of the interface, and receives data ExtData { task id, xpathValue }, which is sent by the extraction module, wherein xpathValue can be expressed as [ { dataName1, dataValue1}, { dataName2, dataValue2}, …, { dataNamen, dataValueN } ].
The transformation module traverses xpathValue [ { dataName1, dataValue1}, { dataName2, dataValue2}, …, { dataName N, dataValue en } ] of ExtData, and performs the following operations on any element K { dataName K, dataValue K } (where K has a value range of [1, N ]) of xpathValue: traversing all the elements X of the Ruleinstace, if the dataName field of the element X is the same as the dataNameK of the element K, taking the conditionFanx and actionFunX of the element X, then calling the function conditionFanx (), and inputting the parameter dataValueK. If the format of the dataValueK is incorrect or exceeds a certain threshold range, the return value of the function conditionFunx (dataValueK) is true, at this time, the function actionFunX () is called to perform conversion processing on the dataValueK, and the converted data is recorded as dataChangedValueK.
The transformation module creates ChangedData, which stores datachangevaluek (K ranges from [0, N ]) and task id into ChangedData, respectively, which may be denoted as { task id, changeValue }, where changeValue may be denoted as [ { dataName1, dataChangedValue1}, { dataName2, dataChangedValue2}, …, { dataName N, dataChangedValueN } ].
The conversion module calls the Function publish () of the pulser Function interface to send ChangedData { task id, changeValue } to the Topic C.
The saving module implements a Pulsar Function interface, subscribes to the Topic C in an initialization Function __ init __ () of the interface, then rewrites a Function process () of the interface, and receives data ChangedData { task id, changeValue }, which is sent by the conversion module, wherein changeValue can be expressed as [ { dataName1, dataChangedValue1}, { dataName2, dataChangedValue2}, …, { dataNamen, datachangedvalNulu } ].
And the storage module creates corresponding crawling data CSV files according to the task ids (different task ids correspond to different crawling data CSV files). The pandas library is then called to create the DataFrame structure df. Traversing all elements of changeValue, inserting data corresponding to the dataName and datachangedValue fields of each element into df, which can be expressed as [ { dataName1, datachangedValue1}, { dataName2, datachangedValue2}, …, { dataNamen, datachangedValueN } ].
The saving module calls a function to _ CSV () of the pandas library to save df to the corresponding crawl data CSV file.
Example 3:
the invention also provides a device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor implements the webpage data processing method of the embodiment 1 when executing the computer program.
Example 4:
the present invention also provides a storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the method for processing web page data according to embodiment 1 is implemented.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (8)

1. A method for web page data processing, the method comprising the steps of:
acquiring webpage data, and issuing the webpage data to a Pulsar message middleware, wherein the Pulsar message middleware comprises Pulsar Functions;
acquiring webpage data through pulser Functions, analyzing the webpage data according to preset, acquiring target data, and issuing the target data to pulser message middleware through pulser Functions;
acquiring target data through pulser Functions, checking conversion target data according to a preassigned rule, acquiring conversion data, and issuing the conversion data to pulser message middleware through pulser Functions;
and acquiring conversion data through pulser Functions, and storing the conversion data.
2. The method for processing webpage data according to claim 1, wherein the preset of analyzing the webpage data comprises task id, URL information of a website to be crawled, name of data with crawl, and xpath expression.
3. The method for web page data processing according to claim 1, wherein the pre-specified rule of the conversion target data includes a cleansing conditional expression, a cleansing action function.
4. The method for processing webpage data according to claim 1, wherein the conversion data obtained through pulser message middleware is saved to a designated CVS file.
5. A network data processing system is characterized by comprising a crawler module, an extraction module, a conversion module and a storage module, wherein the modules are connected through a Pulsar message middleware, and the Pulsar message middleware comprises Pulsar Functions;
the crawler module is used for crawling data of a specified website and publishing the webpage data to the Pulsar message middleware;
the extraction module subscribes the webpage data issued by the crawler module through the Pulsar Functions, analyzes the webpage data according to the setting, acquires target data and issues the target data to the Pulsar message middleware through the Pulsar Functions;
the conversion module subscribes the target data issued by the extraction module through the pulser Functions, receives the target data through the pulser Functions, performs check conversion on the target data according to a specified rule, and issues the converted data to the pulser message middleware through the pulser Functions;
the storage module is used for subscribing the conversion data issued by the conversion module, receiving the conversion data through pulser Functions and storing the conversion data.
6. The network data processing system of claim 5, wherein the crawler module, the extraction module, the transformation module, and the save module register with pulser message middleware via a plug-in mechanism.
7. An apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the web page data processing method of any one of claims 1 to 4 when executing the computer program.
8. A storage medium storing a computer program, wherein the computer program is executed by a processor to implement the web page data processing method according to any one of claims 1 to 4.
CN202111147850.0A 2021-09-29 2021-09-29 Webpage data processing method, system, equipment and storage medium Withdrawn CN113946454A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111147850.0A CN113946454A (en) 2021-09-29 2021-09-29 Webpage data processing method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111147850.0A CN113946454A (en) 2021-09-29 2021-09-29 Webpage data processing method, system, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113946454A true CN113946454A (en) 2022-01-18

Family

ID=79329534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111147850.0A Withdrawn CN113946454A (en) 2021-09-29 2021-09-29 Webpage data processing method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113946454A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115018642A (en) * 2022-06-08 2022-09-06 国泰君安证券股份有限公司 System for realizing high-availability receiving and processing aiming at multi-source real-time market data
CN117111904A (en) * 2023-04-26 2023-11-24 领悦数字信息技术有限公司 Method and system for automatically converting web applications into serverless functions

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115018642A (en) * 2022-06-08 2022-09-06 国泰君安证券股份有限公司 System for realizing high-availability receiving and processing aiming at multi-source real-time market data
CN117111904A (en) * 2023-04-26 2023-11-24 领悦数字信息技术有限公司 Method and system for automatically converting web applications into serverless functions
CN117111904B (en) * 2023-04-26 2024-05-28 领悦数字信息技术有限公司 Method and system for automatically converting Web applications into serverless functions

Similar Documents

Publication Publication Date Title
CN109582660B (en) Data blood margin analysis method, device, equipment, system and readable storage medium
AU2017238633B2 (en) Efficient state machines for real-time dataflow programming
CN107506451B (en) Abnormal information monitoring method and device for data interaction
CN113946454A (en) Webpage data processing method, system, equipment and storage medium
CN109299073B (en) Data blood margin generation method and system, electronic equipment and storage medium
CN102469113B (en) Security gateway and method for forwarding webpage by using security gateway
CN110908641B (en) Visualization-based stream computing platform, method, device and storage medium
CN112130993B (en) Electric power edge internet of things proxy edge calculation method and system based on graphical modeling
CN104899323A (en) Crawler system used for IDC harmful information monitoring platform
CN113420201B (en) Cross-domain element positioning and tree generating method for browser RPA system
WO2020061700A1 (en) Robust user interface related robotic process automation
CN101571860A (en) Method and device for generating dynamic website as well as method and device for extracting structural data
CN111488508A (en) Internet information acquisition system and method supporting multi-protocol distributed high concurrency
Gleim et al. SOA implementation of the eHumanities Desktop
CN104598536B (en) A kind of distributed network information structuring processing method
CN112486789A (en) Log analysis system, method and device
CN113806429A (en) Canvas type log analysis method based on large data stream processing framework
CN110045950A (en) Static page based on nodejs develops scaffold method
CN113962597A (en) Data analysis method and device, electronic equipment and storage medium
CN113849718A (en) Internet tobacco science and technology information automatic acquisition device, method and storage medium
CN113204593A (en) ETL job development system and computer equipment based on big data calculation engine
CN111221744B (en) Data acquisition method and device and electronic equipment
CN113297449A (en) Method and system for realizing streaming crawler
Stefanov Analysis of cloud based etl in the era of iot and big data
CN113742550B (en) Browser-based data acquisition method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20220118