CN111125589B - Data acquisition method and device and computer readable storage medium - Google Patents

Data acquisition method and device and computer readable storage medium Download PDF

Info

Publication number
CN111125589B
CN111125589B CN201811283037.4A CN201811283037A CN111125589B CN 111125589 B CN111125589 B CN 111125589B CN 201811283037 A CN201811283037 A CN 201811283037A CN 111125589 B CN111125589 B CN 111125589B
Authority
CN
China
Prior art keywords
scheduling
template
module
analysis
data acquisition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811283037.4A
Other languages
Chinese (zh)
Other versions
CN111125589A (en
Inventor
李宇涵
曹六一
张丹
于晓明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Founder Holdings Development Co ltd
Beijing Founder Electronics Co Ltd
Original Assignee
New Founder Holdings Development Co ltd
Beijing Founder Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New Founder Holdings Development Co ltd, Beijing Founder Electronics Co Ltd filed Critical New Founder Holdings Development Co ltd
Priority to CN201811283037.4A priority Critical patent/CN111125589B/en
Publication of CN111125589A publication Critical patent/CN111125589A/en
Application granted granted Critical
Publication of CN111125589B publication Critical patent/CN111125589B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The application provides a data acquisition method and device and a computer readable storage medium. The method comprises the following steps: the method comprises the steps that a scheduling module obtains scheduling information in a scheduling template, the scheduling template is written and stored in a dynamic template language, then, the scheduling module generates a network request according to the scheduling information, and accordingly, a downloading module downloads webpage source codes according to the network request, and further, an analyzing module processes the webpage source codes by utilizing an analyzing template to obtain target data, and the analyzing template corresponds to the scheduling template. The method improves the flexibility and the data acquisition efficiency of the data acquisition process.

Description

Data acquisition method and device and computer readable storage medium
Technical Field
The present application relates to data processing technology, and in particular, to a data acquisition method and apparatus, and a computer readable storage medium.
Background
In recent years, with the rapid development of the internet, more and more people acquire and release information through the network, and the data volume and the data value in the internet are also increasing. Because the Internet occupies an important position in personal information acquisition channels nowadays, analysis of Internet big data has important application value for various industries. And the necessary premise for analyzing the Internet big data is to collect the related data.
Current acquisition systems generally exist in several ways: custom collections for specific websites or template collections tailored to a certain class of websites. However, custom collection for a specific website requires a longer development period, and has low scalability and poor flexibility; the template acquisition method suitable for a certain type of website cannot be applied to data acquisition of websites with inconsistent types, and has limited expansion capability and poor flexibility.
The existing data acquisition mode has the problem of limited application to different degrees, so that how to realize data acquisition in different application scenes becomes a technical problem to be solved in the field.
Disclosure of Invention
The application provides a data acquisition method and device and a computer readable storage medium, which aim to improve the flexibility of a data acquisition process and the data acquisition efficiency.
In a first aspect, the present application provides a data acquisition method, including:
the scheduling module acquires scheduling information in a scheduling template, and the scheduling template is written and stored in a dynamic template language;
the scheduling module generates a network request according to the scheduling information;
the downloading module downloads the webpage source codes according to the network request;
and the analysis module processes the webpage source code by using an analysis template to obtain target data, and the analysis template corresponds to the scheduling template.
In another possible design, the scheduling module obtains scheduling information in a scheduling template, including:
and the scheduling module calls a first analyzer and acquires the scheduling information obtained by the first analyzer analyzing the scheduling template.
In another possible design, the scheduling information further includes at least one of the following: portal information, frequency control, request mode, preprocessing method, subordinate data fusion mode, and additional data processing method.
In another possible design, the parsing module processes the web page source code using a parsing template to obtain target data, including:
and the analysis module calls a second analyzer and acquires the target data obtained by the second analyzer through processing the webpage source codes according to the analysis template.
In another possible design, the method further includes:
the analysis module sends the target data to the scheduling module;
and the scheduling module outputs or stores the target data according to the scheduling module.
In a second aspect, the present application provides a data acquisition device comprising: the system comprises a scheduling module, a downloading module and an analyzing module; wherein, the liquid crystal display device comprises a liquid crystal display device,
the scheduling module is used for acquiring scheduling information in a scheduling template, and the scheduling template is written and stored in a dynamic template language;
the scheduling module is further used for generating a network request according to the scheduling information;
the downloading module is used for downloading the webpage source codes according to the network request;
the analysis module is further used for processing the webpage source codes by utilizing an analysis template to obtain target data, and the analysis template corresponds to the scheduling template.
In another possible design, the scheduling module is specifically configured to:
and calling a first analyzer, and acquiring the scheduling information obtained by analyzing the scheduling template by the first analyzer.
In another possible design, the scheduling information further includes at least one of the following: portal information, frequency control, request mode, preprocessing method, subordinate data fusion mode, and additional data processing method.
In another possible design, the parsing module is specifically configured to:
and the analysis module calls a second analyzer and acquires the target data obtained by the second analyzer through processing the webpage source codes according to the analysis template.
In another possible design, the parsing module is further configured to send the target data to the scheduling module;
the scheduling module is further used for outputting or storing the target data according to the scheduling module.
In a third aspect, the present application provides a data acquisition device comprising:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any of the first aspects.
In a fourth aspect, the present application provides a computer-readable storage medium having a computer program stored thereon,
the computer program being executable by a processor to implement the method according to any of the first aspects.
In the data acquisition scheme provided by the application, the scheduling module can realize scheduling control of the whole data acquisition flow according to the scheduling information, the analysis module can analyze the webpage source codes downloaded by the downloading module by adopting the analysis template matched with the scheduling information, and in the data acquisition process, the scheduling template is written and stored in a dynamic template language, so that the flexibility of the template can be effectively improved, the coverage rate and the accuracy of the scheduling template are improved, the labor cost required in Internet mass data acquisition is greatly reduced, and the data acquisition efficiency is also improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a schematic flow chart of a data acquisition method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a data acquisition device according to an embodiment of the present application;
FIG. 3 is a schematic diagram of another data acquisition device according to an embodiment of the present application;
fig. 4 is a schematic entity structure diagram of a data acquisition device according to an embodiment of the present application.
Specific embodiments of the present disclosure have been shown by way of the above drawings and will be described in more detail below. These drawings and the written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the disclosed concepts to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
The specific application scene of the application is an internet data acquisition process. The existing data acquisition method can only be applied to data acquisition of specific websites or websites of one type, is difficult to apply and expand aiming at non-specific websites or websites of different types, and has low flexibility.
The application provides a data acquisition method, a data acquisition device and a computer readable storage medium, which aim to solve the technical problems in the prior art and provide the following solution ideas: the template structure is organized by using dynamic template languages such as extensible markup language (Extensible Markup Language, XML), and the corresponding analysis plug-ins are configured to realize analysis, so that a template collection mode with high efficiency and high universality is realized.
The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Example 1
The embodiment of the application provides a data acquisition method. Specifically, referring to fig. 1 and 2, fig. 1 illustrates the data acquisition method, and fig. 2 illustrates an execution device of the data acquisition method (hereinafter referred to as a data acquisition device). As shown in fig. 2, the data acquisition device includes: the scheduling module 21, the downloading module 22 and the analyzing module 23 may be implemented according to the method shown in fig. 1 when the data acquisition is specifically executed.
Specifically, as shown in fig. 1, the method comprises the following steps:
s102, the scheduling module 21 acquires scheduling information in a scheduling template written and stored in a dynamic template language.
S104, the scheduling module 21 generates a network request according to the scheduling information.
S106, the downloading module 22 downloads the webpage source codes according to the network request.
S108, the analysis module 23 processes the webpage source codes by using the analysis template to obtain target data, wherein the analysis template corresponds to the scheduling template.
First, a scheduling template and an analysis template shown in fig. 1 will be described.
In the embodiment of the application, the scheduling template is written and stored in a dynamic template language, and the analysis template can be written and stored in a dynamic template language, or can be written and stored in other modes, and the embodiment of the application is not particularly limited.
Taking a scheduling template as an example, before executing the method, the method may further include the following steps:
the scheduling templates are written and stored in a dynamic template language.
In particular, the method can be realized by extracting data from the websites of the same type and generating and storing templates in a dynamic template language. In the process, the acquisition content can be expanded without programming, and the acquisition development efficiency and the acquisition data accuracy are improved. The implementation manner of writing and storing the scheduling template in the dynamic template language is the same as above, and will not be described again.
The dynamic template language according to the embodiment of the present application may include, but is not limited to: extensible markup language (Extensible Markup Language, XML). The method can be used for defining the operation methods of a plurality of links such as task scheduling, source code analysis, data transmission, data formatting and the like in the data acquisition process.
In the embodiment of the application, the scheduling template carries scheduling information, and the scheduling information is used for describing the scheduling information of a specific channel. The scheduling information may include, but is not limited to, the following information: at least one of entry information, frequency control, request mode, preprocessing method, subordinate data fusion mode, additional data processing method, and identification information of corresponding analysis template. Wherein the identification information of the template is used for uniquely identifying the template, and the expression forms can include but are not limited to: name of the product.
The parsing template is used for carrying parsing information of a specific format webpage, namely, describing a process of extracting effective data on the specific format webpage. The embodiment of the application is not limited to the analysis method indicated by the analysis template, and can be preset according to the requirement.
In a preferred implementation scenario, if the parsing template is written and stored in a dynamic template language, the parsing template may extend parsing functions to multiple dimensions in multiple ways, e.g., template parsing may support multiple different parsing modes such as xpath, css, regex, and may extend support for other parsing forms such as jsonpath, etc.
In another preferred implementation scenario, the parsing template may be capable of extracting data on the web page in a plurality of ways as described above, and formatting the extracted data. The format processing method may include, but is not limited to: at least one of splicing, cutting, combining and formatting.
In the embodiment of the present application, the parsing template used when executing the parsing flow corresponds to the scheduling template. In specific implementation, the correspondence between the analysis template and the scheduling template may be preset.
At this time, the corresponding relation between the two storage positions can be established in a corresponding storage mode, at this time, the storage positions of any two scheduling templates are different, the storage positions of any two analysis templates are also different, and the storage positions of the scheduling templates and the analysis templates with the corresponding relation can be the same or different. Preferably, the scheduling template and the parsing template may also be stored in a storage location.
Alternatively, a correspondence relationship between the identification information of the two may also be preset. For example, the identification information of the scheduling template may be written into the analysis information of the analysis template, so, when the method step shown in fig. 1 is specifically implemented, the identification information of the analysis template carried in the scheduling information is directly read, and the analysis template corresponding to the scheduling template may be determined.
Alternatively, in a preferred implementation scenario, a data acquisition template may be preset, and the scheduling template and the parsing template are two parts of the data acquisition template. At this time, the data acquisition template may further include an analysis result template and a site information template.
The analysis result template comprises all purposes of analysis results returned to the scheduling template by the analysis template, and format regulations of final landing data and intermediate information are included. The result data type, the result floor form and other various information can be defined.
The site information template comprises relevant information of the site to which the channel belongs and is used for identifying the template level, so that the data acquisition device can quickly and conveniently search the reading template.
Based on the foregoing architecture, in the embodiment of the present application, when the data acquisition task is specifically executed, the scheduling module 21 obtains the scheduling information carried in the scheduling template, and assembles a network request according to the instruction of the scheduling template, where the network request is used to request downloading of the web page source code. That is, the generated network request carries information about the source code of the web page that needs to be downloaded by the download module 22.
In the embodiment of the application, the problem of calling the template is considered, and the template analyzer can be arranged in the data acquisition device or a superior device where the data acquisition device is positioned.
In a possible design, please refer to fig. 3, fig. 3 shows a data flow of another data collection method, in which, in the data collection process, the scheduling module 21 implements scheduling of a scheduling template and a parsing result template by calling a first parser 300 set in a previous device where the data collection device 200 is located; and, the parsing module 23 implements scheduling of the parsing templates by calling the second parser 400.
Further, the arrow in fig. 3 indicates the data transmission direction. Specifically, in the system shown in fig. 3, in the process of executing data collection, the scheduling module 21 sends scheduling information to the downloading module 22, after the downloading module 22 downloads the web page source code, the web page source code and the scheduling information are sent to the analysis module 23, and after analysis is completed, the analysis module sends the target data to the scheduling module 21.
In addition, the embodiment of the application is not particularly limited to the data interaction mode among the modules in the data acquisition module. If the scheduling module 21, the downloading module 22 and the analyzing module 23 are the same processing unit in one processor, or are at least two processors in one data acquisition device, the data transmission can be performed by an interface mode. Alternatively, the scheduling module 21, the downloading module 22 and the parsing module 23 may be integrated in the same processor or processing unit.
It should be noted that fig. 3 is a possible implementation manner, and is not meant to limit the present application.
In the embodiment of the application, the first parser can be used as a scheduling template parser for parsing the scheduling template to obtain scheduling information from the scheduling template.
In this implementation scenario, the step S102 may be implemented by the following means when it is specifically implemented:
and the scheduling module calls a first analyzer and acquires the scheduling information obtained by the first analyzer analyzing the scheduling template.
The second parser serves as a parse template parser, and is specifically configured to parse a parse template. In specific implementation, the second parser parses the web page source code according to the parsing template to obtain a parsing result, that is, target data. In a preferred design, the second parser is specifically configured to parse the parsing template, and may further be configured to perform post-formatting processing on the parsed candidate data, to finally obtain formatted target data.
For easy understanding, the embodiment of the present application provides a possible implementation manner when the second parser is specifically configured to parse the parsing template: the second parser constructs the web page source code into a document object template (Document Object Model, DOM) tree according to the parsing template, then traverses the DOM tree according to the depth-first principle, parses each node in the DOM tree to obtain parsed candidate data, and performs standard format conversion on the candidate data according to the parsing result template to obtain target data.
In the embodiment of the application, the second parser can be set by adopting the idea of a state machine. That is, each instruction received by the parser changes the state of the parser, so that after traversing all nodes of the DOM tree, the data contained in the final state will be returned as a final result.
In this implementation scenario, the step S102 may be implemented by the following means:
and the analysis module calls a second analyzer and acquires the target data obtained by the second analyzer through processing the webpage source codes according to the analysis template.
In order to facilitate data processing, in the embodiment of the present application, the target data obtained by the data acquisition method may be output in a standard format preset in Cai Yungong. As previously described, this may be defined in the parsing template so that the second parser performs the format conversion process described above when parsing the data according to the parsing template.
In addition, in the embodiment of the present application, after obtaining the target data, the parsing module may further execute the following flow:
the analysis module sends the target data to the scheduling module;
and the scheduling module outputs or stores the target data according to the scheduling module.
In the embodiment of the application, the scheduling module realizes the global scheduling of the data acquisition process, so that the scheduling module can output or store the target data according to the indication of the scheduling template after receiving the target data sent by the analysis module. In addition, the scheduling module can also execute the next round of data acquisition after receiving the target data.
In a preferred implementation scenario, the target data parsed by the parsing module and sent to the scheduling module is data in a standard format, and then the scheduling module may further verify the format of the target data after receiving the target data.
At this time, as shown in fig. 3, the first parser may further be used as a parse result template parser, where the first parser may specifically use the parse result template to check the target data sent by the parse module, so as to finally realize control of the scheduling flow.
The technical scheme provided by the embodiment of the application at least has the following technical effects:
in the data acquisition scheme provided by the embodiment of the application, the scheduling module can realize the scheduling control of the whole data acquisition flow according to the scheduling information, the analysis module can analyze the webpage source code downloaded by the downloading module by adopting the analysis template matched with the scheduling information, and in the data acquisition process, the scheduling template is written and stored in a dynamic template language, so that the flexibility of the template can be effectively improved, the coverage rate and the accuracy of the scheduling template are improved, the labor cost required in the acquisition of Internet mass data is greatly reduced, and the data acquisition efficiency is also improved.
Example two
Based on the data acquisition method provided in the first embodiment, the embodiment of the present application further provides an apparatus embodiment for implementing each step and method in the foregoing method embodiment.
An embodiment of the present application provides a data acquisition device, please refer to fig. 2 or fig. 3, the data acquisition device 200 includes: a scheduling module 21, a downloading module 22 and an analyzing module 23; wherein, the liquid crystal display device comprises a liquid crystal display device,
the scheduling module 21 is configured to obtain scheduling information in a scheduling template, where the scheduling template is written and stored in a dynamic template language;
the scheduling module 21 is further configured to generate a network request according to the scheduling information;
the downloading module 22 is configured to download the web page source code according to the network request;
the parsing module 23 is further configured to process the web page source code by using a parsing template, so as to obtain target data, where the parsing template corresponds to the scheduling template.
In one possible implementation scenario, the scheduling module 21 is specifically configured to:
and calling a first analyzer, and acquiring the scheduling information obtained by analyzing the scheduling template by the first analyzer.
In the embodiment of the present application, the scheduling information further includes at least one of the following information: portal information, frequency control, request mode, preprocessing method, subordinate data fusion mode, and additional data processing method.
In another possible implementation scenario, the parsing module 23 is specifically configured to:
the parsing module 23 calls a second parser and obtains the target data obtained by the second parser processing the web page source code according to the parsing template.
In addition, in the embodiment of the present application, the parsing module 23 is further configured to send the target data to the scheduling module 21;
the scheduling module 21 is further configured to output or store the target data according to the scheduling module 21.
Moreover, referring to fig. 4, an embodiment of the present application provides a data acquisition device, the data acquisition device 400 includes:
a memory 410;
a processor 420; and
a computer program;
wherein the computer program is stored in the memory 410 and configured to be executed by the processor 420 to implement the method as described in the above embodiments.
In addition, as shown in fig. 4, a transceiver 430 is further provided in the data acquisition device 400, for performing data transmission or communication with other devices, which will not be described herein.
As shown in fig. 4, the memory 410, the processor 420 and the transceiver 430 are connected by a bus.
Furthermore, an embodiment of the present application provides a readable storage medium having stored thereon a computer program,
the computer program is executed by a processor to implement the method as described in embodiment one.
Since each module in this embodiment is capable of executing the method shown in embodiment one, a part of this embodiment which is not described in detail can be referred to the description related to embodiment one.
The technical scheme provided by the embodiment of the application at least has the following technical effects:
in the data acquisition scheme provided by the embodiment of the application, the scheduling module can realize the scheduling control of the whole data acquisition flow according to the scheduling information, the analysis module can analyze the webpage source code downloaded by the downloading module by adopting the analysis template matched with the scheduling information, and in the data acquisition process, the scheduling template is written and stored in a dynamic template language, so that the flexibility of the template can be effectively improved, the coverage rate and the accuracy of the scheduling template are improved, the labor cost required in the acquisition of Internet mass data is greatly reduced, and the data acquisition efficiency is also improved.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (6)

1. A method of data acquisition, comprising:
the scheduling module calls a first analyzer and acquires scheduling information obtained by analyzing a scheduling template by the first analyzer, wherein the scheduling template is written and stored in a dynamic template language;
the scheduling module generates a network request according to the scheduling information;
the downloading module downloads the webpage source codes according to the network request;
the analysis module calls a second analyzer and acquires target data obtained by processing the webpage source code by the second analyzer according to an analysis template, wherein the analysis template corresponds to the scheduling template, and the analysis template is written and stored in a dynamic template language;
the analysis module sends the target data to the scheduling module;
and the scheduling module outputs or stores the target data according to the scheduling template.
2. The method of claim 1, wherein the scheduling information further comprises at least one of: portal information, frequency control, request mode, preprocessing method, subordinate data fusion mode, and additional data processing method.
3. A data acquisition device, comprising: the system comprises a scheduling module, a downloading module and an analyzing module; wherein, the liquid crystal display device comprises a liquid crystal display device,
the scheduling module is used for calling a first analyzer and acquiring scheduling information obtained by analyzing a scheduling template by the first analyzer, wherein the scheduling template is written and stored in a dynamic template language;
the scheduling module is further used for generating a network request according to the scheduling information;
the downloading module is used for downloading the webpage source codes according to the network request;
the analysis module is further used for calling a second analyzer and obtaining target data obtained by the second analyzer through processing the webpage source codes according to an analysis template, the analysis template corresponds to the scheduling template, and the analysis template is written and stored in a dynamic template language;
the analysis module is further used for sending the target data to the scheduling module;
the scheduling module is further used for outputting or storing the target data according to the scheduling template.
4. The apparatus of claim 3, wherein the scheduling information further comprises at least one of: portal information, frequency control, request mode, preprocessing method, subordinate data fusion mode, and additional data processing method.
5. A data acquisition device, comprising:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of claim 1 or 2.
6. A computer-readable storage medium, having a computer program stored thereon,
the computer program being executed by a processor to implement the method of claim 1 or 2.
CN201811283037.4A 2018-10-31 2018-10-31 Data acquisition method and device and computer readable storage medium Active CN111125589B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811283037.4A CN111125589B (en) 2018-10-31 2018-10-31 Data acquisition method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811283037.4A CN111125589B (en) 2018-10-31 2018-10-31 Data acquisition method and device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111125589A CN111125589A (en) 2020-05-08
CN111125589B true CN111125589B (en) 2023-09-05

Family

ID=70484996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811283037.4A Active CN111125589B (en) 2018-10-31 2018-10-31 Data acquisition method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111125589B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101957816A (en) * 2009-07-13 2011-01-26 上海谐宇网络科技有限公司 Webpage metadata automatic extraction method and system based on multi-page comparison
CN102184184A (en) * 2011-04-07 2011-09-14 安徽博约信息科技有限责任公司 Method for acquiring webpage dynamic information
CN102651002A (en) * 2011-02-28 2012-08-29 腾讯科技(深圳)有限公司 Webpage information extracting method and system
CN103853770A (en) * 2012-12-03 2014-06-11 北大方正集团有限公司 Method and system for abstracting information of posts from forum website
CN104050281A (en) * 2014-06-26 2014-09-17 北京思特奇信息技术股份有限公司 Webpage information extraction method and device based on http protocol
US9049117B1 (en) * 2009-10-21 2015-06-02 Narus, Inc. System and method for collecting and processing information of an internet user via IP-web correlation
CN105045838A (en) * 2015-07-01 2015-11-11 华东师范大学 Network crawler system based on distributed storage system
CN107092632A (en) * 2017-02-09 2017-08-25 北京小度信息科技有限公司 Data processing method and device
CN107317724A (en) * 2017-06-06 2017-11-03 中证信用增进股份有限公司 Data collecting system and method based on cloud computing technology
CN107404493A (en) * 2017-08-21 2017-11-28 广州快充网络有限公司 New-energy automobile vehicle data packet parsing component and analytic method
CN107463634A (en) * 2017-07-17 2017-12-12 广州特道信息科技有限公司 web page text extracting method and device
CN107729564A (en) * 2017-11-13 2018-02-23 北京众荟信息技术股份有限公司 A kind of distributed focused web crawler web page crawl method and system
CN107798035A (en) * 2017-04-10 2018-03-13 平安科技(深圳)有限公司 A kind of data processing method and terminal
CN108304498A (en) * 2018-01-12 2018-07-20 深圳壹账通智能科技有限公司 Webpage data acquiring method, device, computer equipment and storage medium
CN108334634A (en) * 2018-02-27 2018-07-27 北京中关村科金技术有限公司 A kind of method, apparatus, equipment and the storage medium of extraction data information

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015078231A1 (en) * 2013-11-26 2015-06-04 优视科技有限公司 Method for generating webpage template and server

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101957816A (en) * 2009-07-13 2011-01-26 上海谐宇网络科技有限公司 Webpage metadata automatic extraction method and system based on multi-page comparison
US9049117B1 (en) * 2009-10-21 2015-06-02 Narus, Inc. System and method for collecting and processing information of an internet user via IP-web correlation
CN102651002A (en) * 2011-02-28 2012-08-29 腾讯科技(深圳)有限公司 Webpage information extracting method and system
CN102184184A (en) * 2011-04-07 2011-09-14 安徽博约信息科技有限责任公司 Method for acquiring webpage dynamic information
CN103853770A (en) * 2012-12-03 2014-06-11 北大方正集团有限公司 Method and system for abstracting information of posts from forum website
CN104050281A (en) * 2014-06-26 2014-09-17 北京思特奇信息技术股份有限公司 Webpage information extraction method and device based on http protocol
CN105045838A (en) * 2015-07-01 2015-11-11 华东师范大学 Network crawler system based on distributed storage system
CN107092632A (en) * 2017-02-09 2017-08-25 北京小度信息科技有限公司 Data processing method and device
CN107798035A (en) * 2017-04-10 2018-03-13 平安科技(深圳)有限公司 A kind of data processing method and terminal
CN107317724A (en) * 2017-06-06 2017-11-03 中证信用增进股份有限公司 Data collecting system and method based on cloud computing technology
CN107463634A (en) * 2017-07-17 2017-12-12 广州特道信息科技有限公司 web page text extracting method and device
CN107404493A (en) * 2017-08-21 2017-11-28 广州快充网络有限公司 New-energy automobile vehicle data packet parsing component and analytic method
CN107729564A (en) * 2017-11-13 2018-02-23 北京众荟信息技术股份有限公司 A kind of distributed focused web crawler web page crawl method and system
CN108304498A (en) * 2018-01-12 2018-07-20 深圳壹账通智能科技有限公司 Webpage data acquiring method, device, computer equipment and storage medium
CN108334634A (en) * 2018-02-27 2018-07-27 北京中关村科金技术有限公司 A kind of method, apparatus, equipment and the storage medium of extraction data information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
垂直搜索引擎中智能爬虫系统的研究与实现;王松;《硕士电子期刊》;全文 *

Also Published As

Publication number Publication date
CN111125589A (en) 2020-05-08

Similar Documents

Publication Publication Date Title
CN106909361B (en) Web development method and device based on template engine
CN111680253B (en) Page application data packet generation method and device, computer equipment and storage medium
CN110333863B (en) Method and device for generating and displaying applet page
CN109376291B (en) Website fingerprint information scanning method and device based on web crawler
US8117531B1 (en) Interpreted language translation system and method
CN110673847A (en) Configuration page generation method and device, electronic equipment and readable storage medium
CN106326116B (en) The method and apparatus of product test
CN108984202B (en) Electronic resource sharing method and device and storage medium
CN113419729A (en) Front-end page building method, device, equipment and storage medium based on modularization
CN103139260A (en) Method and system for reusing hypertext markup language (HTML) content
CN101763432A (en) Method for constructing lightweight webpage dynamic view
CN105094787B (en) Method and device for processing enterprise internet application
CN112632419A (en) Domain name pre-resolution configuration method and device, computer equipment and storage medium
CN110162301B (en) Form rendering method, form rendering device and storage medium
CN111125589B (en) Data acquisition method and device and computer readable storage medium
CN111367500A (en) Data processing method and device
CN112433752B (en) Page analysis method, device, medium and electronic equipment
CN114168366A (en) Method, device and processor for realizing front-end request sending processing based on swagger and computer readable storage medium thereof
CN114064601B (en) Storage process conversion method, device, equipment and storage medium
JP2014519671A5 (en)
CN111258586B (en) Fast application running and compiling method and device, electronic equipment and storage medium
CN116880901B (en) Application page analysis method, device, electronic equipment and computer readable medium
CN112732254B (en) Webpage development method, webpage development device, computer equipment and storage medium
CN112860259B (en) Interface processing method, device, electronic equipment and storage medium
CN115269051A (en) Business item output method and device based on data modularization, medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20230627

Address after: 3007, Hengqin International Financial Center Building, No. 58 Huajin Street, Hengqin New District, Zhuhai City, Guangdong Province, 519030

Applicant after: New founder holdings development Co.,Ltd.

Applicant after: BEIJING FOUNDER ELECTRONICS Co.,Ltd.

Address before: 100871, Beijing, Haidian District, Cheng Fu Road, No. 298, Zhongguancun Fangzheng building, 9 floor

Applicant before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd.

Applicant before: BEIJING FOUNDER ELECTRONICS Co.,Ltd.

GR01 Patent grant
GR01 Patent grant