CN114692050A - Page parsing method and device, computer readable medium and electronic device - Google Patents

Page parsing method and device, computer readable medium and electronic device Download PDF

Info

Publication number
CN114692050A
CN114692050A CN202210331904.7A CN202210331904A CN114692050A CN 114692050 A CN114692050 A CN 114692050A CN 202210331904 A CN202210331904 A CN 202210331904A CN 114692050 A CN114692050 A CN 114692050A
Authority
CN
China
Prior art keywords
analysis
template
result
page
page data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210331904.7A
Other languages
Chinese (zh)
Inventor
赵智博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jindi Technology Co Ltd
Original Assignee
Beijing Jindi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jindi Technology Co Ltd filed Critical Beijing Jindi Technology Co Ltd
Priority to CN202210331904.7A priority Critical patent/CN114692050A/en
Publication of CN114692050A publication Critical patent/CN114692050A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The disclosure relates to a page parsing method, a page parsing device, a computer readable medium and an electronic device, wherein the method comprises the following steps: acquiring collected page data, calling an analysis template in a database to analyze the page data to obtain an analysis result, wherein the determination mode of the analysis template comprises the following steps: the method comprises the steps of configuring analysis templates corresponding to uniform resource locators through an interface provided by a front end, enabling each analysis template to correspond to an analysis rule, enabling the analysis templates to comprise a format returned by an analysis result, verifying the analysis templates through the interface provided by the front end, converting the analysis templates passing verification into an appointed format, and storing the appointed format into a database, wherein the appointed format comprises a character string format. The corresponding analysis template can be provided for different websites, the pages of different websites can be analyzed according to the analysis rule in the analysis template, and a large amount of complex page data of different websites can be managed in a centralized and unified manner.

Description

Page parsing method and device, computer readable medium and electronic device
Technical Field
The present disclosure relates to the field of data parsing, and in particular, to a page parsing method, an apparatus, a computer readable medium, and a computer device.
Background
The internet industry is rapidly developed, and in the current big data age, various information is exponentially increased every year, and data becomes an essential part. The large websites have various structures in the day, and a large amount of different and complicated data analysis and extraction rules are generated. How to manage these data conveniently and uniformly becomes one of the problems to be solved at present.
Disclosure of Invention
The disclosure aims to provide a page parsing method, a page parsing device, a computer readable medium and computer equipment, which are used for solving the problem of page parsing.
In a first aspect, the present disclosure provides a page parsing method, including: acquiring collected page data;
calling an analysis template in a database to analyze the page data to obtain an analysis result; the determination method of the analysis template comprises the following steps: configuring an analysis template corresponding to the uniform resource locator through an interface provided by the front end; each analysis template corresponds to an analysis rule, and the analysis template comprises a format returned by an analysis result; verifying the analysis template through an interface provided by the front end; and converting the analysis template passing the verification into a specified format and storing the specified format into a database, wherein the specified format comprises a character string format.
Optionally, the step of configuring, through the interface provided by the front end, the parsing template corresponding to the uniform resource locator includes: configuring a corresponding analysis template according to the uniform resource locator of the page data; or configuring a corresponding analysis template according to the field of the page data.
Optionally, the step of verifying the parsing template through an interface provided by a front end includes: downloading a page corresponding to the uniform resource locator according to the uniform resource locator and the corresponding resolution template; analyzing the corresponding page to obtain a first analysis result; displaying the first analysis result on an interface provided by the front end; obtaining a verification result of the first analysis result based on the received user judgment result; the verification result comprises passing verification or failing verification.
Optionally, the step of verifying the parsing template through an interface provided by a front end includes: importing a python packet through a local debugging interface; acquiring a page corresponding to the uniform resource locator based on a method provided in the python packet; importing the corresponding page into a local analysis template for analysis to obtain a second analysis result; displaying the second analysis result on an interface provided by the front end; obtaining a verification result of the second analysis result based on user judgment; the verification result comprises passing verification or failing verification.
Optionally, the step of calling an analysis template in the database to analyze the page data to obtain an analysis result includes: extracting uniform resource locators of the page data; acquiring an analysis template corresponding to the uniform resource locator of the page data from a database; registering the corresponding analysis template in a memory; and calling the corresponding analysis template in the memory to analyze the page data to obtain the analysis result.
Optionally, the step of calling an analysis template in the database to analyze the page data to obtain an analysis result includes: when a callback field exists in the analysis result, putting the analysis result into a downloader to continue downloading until all the page data are downloaded, and storing the analysis result; and when the callback field does not exist in the analysis result, storing the analysis result.
Optionally, the step of storing the parsing result comprises: counting the resolution result of each uniform resource locator within a preset time period; the analysis result comprises analysis success and analysis failure; and judging whether the corresponding analysis template is abnormal or not according to the times of successful analysis and the times of failed analysis.
In a second aspect, the present disclosure provides a page resolution apparatus, including: the acquisition module is used for acquiring the acquired page data; the analysis module is used for calling an analysis template in a database to analyze the page data to obtain an analysis result; the configuration module is used for configuring an analysis template corresponding to the uniform resource locator through an interface provided by the front end; each analysis template corresponds to an analysis rule, and the analysis template comprises a format returned by an analysis result; the checking module is used for checking the analysis template through an interface provided by the front end; and the storage module is used for converting the analysis template passing the verification into a specified format and storing the specified format into a database, wherein the specified format comprises a character string format.
In a third aspect, the present disclosure provides a computer readable medium having stored thereon a computer program which, when executed by a processing apparatus, implements the steps of the aforementioned page resolution method.
In a fourth aspect, the present disclosure provides an electronic device comprising: a storage device having a computer program stored thereon; and the processing device is used for executing the computer program in the storage device so as to realize the steps of the page analysis method.
Through the technical scheme, the collected page data is obtained, the analysis template in the database is called to analyze the page data, an analysis result is obtained, and the determination mode of the analysis template comprises the following steps: the method comprises the steps of configuring analysis templates corresponding to uniform resource locators through an interface provided by a front end, enabling each analysis template to correspond to an analysis rule, enabling the analysis templates to comprise a format returned by an analysis result, verifying the analysis templates through the interface provided by the front end, converting the analysis templates passing verification into an appointed format, and storing the appointed format into a database, wherein the appointed format comprises a character string format. The corresponding analysis template can be provided for different websites, the pages of the different websites can be analyzed according to the analysis rule in the analysis template, and a large amount of complicated page data of the different websites can be managed in a centralized and unified manner.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale. In the drawings:
fig. 1 is a schematic structural diagram of a computer system provided in an exemplary embodiment of the present disclosure.
Fig. 2 is a flowchart of a page resolution method according to an exemplary embodiment of the present disclosure.
Fig. 3 is a flowchart illustrating sub-steps of step S102 according to an exemplary embodiment of the present disclosure.
Fig. 4 is a block diagram of a page resolution device according to an exemplary embodiment of the present disclosure.
Fig. 5 is a block diagram of an electronic device shown in an exemplary embodiment of the present disclosure.
Fig. 6 is a block diagram of another electronic device shown in an exemplary embodiment of the present disclosure.
Description of the reference numerals
120-a terminal; 140-a server; 20-a page resolution means; 201-an acquisition module; 203-a parsing module; 205-configuration module; 207-a check module; 209-a memory module; 400-an electronic device; 401-a processor; 402-a memory; 403-multimedia components; 404-input/output (I/O) interface; 405-a communication component; 500-an electronic device; 522-a processor; 532-a memory; 526-power supply components; 550-a communication component; 558-input/output (I/O) interface.
Detailed Description
The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
Fig. 1 shows a schematic structure diagram of a computer system provided by an exemplary embodiment of the present disclosure, which includes a terminal 120 and a server 140.
The terminal 120 and the server 140 are connected to each other through a wired or wireless network.
The terminal 120 may include at least one of a smartphone, a laptop, a desktop, a tablet, a smart speaker, and a smart robot.
The terminal 120 includes a display; the display is used for displaying the analysis result of the page data.
The terminal 120 includes a first memory and a first processor. The first memory stores a first program; the first program is called and executed by the first processor to realize the page resolution method. The first memory may include, but is not limited to, the following: random Access Memory (RAM), Read Only Memory (ROM), Programmable Read-Only Memory (PROM), Erasable Read-Only Memory (EPROM), and electrically Erasable Read-Only Memory (EEPROM).
The first processor may be comprised of one or more integrated circuit chips. Alternatively, the first Processor may be a general purpose Processor, such as a Central Processing Unit (CPU) or a Network Processor (NP). Alternatively, the first processor may implement the page resolution method provided by the present disclosure by calling the first program.
The server 140 includes a second memory and a second processor. The second memory stores a second program, and the second program is called by the second processor to implement the page parsing method provided by the present disclosure. Optionally, the second memory may include, but is not limited to, the following: RAM, ROM, PROM, EPROM, EEPROM. Alternatively, the second processor may be a general purpose processor, such as a CPU or NP.
The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the disclosure is not limited thereto.
Illustratively, a page resolution method provided in an exemplary embodiment of the present disclosure includes: providing a front-end interface at a terminal to acquire acquired page data, calling an analysis template in a database to analyze the page data to obtain an analysis result, for example, acquiring the acquired page data from a kafka queue, wherein the page data is web page data, such as a news page, a bid page and the like; the determining mode of the analysis template comprises the following steps: the method comprises the steps of configuring analysis templates corresponding to uniform resource locators through an interface provided by a front end, enabling each analysis template to correspond to an analysis rule, enabling the analysis templates to comprise a format returned by an analysis result, verifying the analysis templates through the interface provided by the front end, converting the analysis templates passing verification into an appointed format, and storing the appointed format into a database, wherein the appointed format comprises a character string format. Exemplarily, a terminal provides a front-end interface, a user can configure an analysis template corresponding to a Uniform Resource Locator (URL) through the front-end interface, that is, each URL corresponds to one analysis template, then verify the analysis template, for example, download a corresponding page according to the URL, analyze the page according to the analysis template corresponding to the URL to obtain an analysis result, manually verify the correctness of the analysis result, if the analysis result is correct, the corresponding analysis template is correct, and determine that the corresponding analysis template passes the verification, if the analysis result is wrong, the corresponding analysis template is wrong, and determine that the corresponding analysis template does not pass the verification; and then storing the verified analysis template into a database, wherein the database can be a database provided by a terminal or a cloud database provided by a server. The page analysis method provided by the disclosure can provide corresponding analysis templates for different websites, analyze the pages of the different websites according to the analysis rules in the analysis templates, and can manage a large amount of complicated page data of the different websites in a centralized and unified manner.
Referring to fig. 2, fig. 2 is a flowchart of a page resolution method according to an exemplary embodiment of the disclosure. The method is performed by a computer device, for example, a terminal or a server in the computer system shown in fig. 1. The page resolution method shown in fig. 2 includes the following steps:
in step S101, acquired page data is acquired.
Illustratively, for example, the terminal shown in fig. 1 may obtain the collected page data from the kafka queue, where the page data is a large amount of various web page data, such as a news page, a bid page, and a bid page.
The kafka queue is a distributed, high-throughput and high-expansibility message queue system and is used for communication among different services, processes and threads. In the present embodiment, the kafka queue is used to collect various page data.
In step S102, an analysis template in the database is called to analyze the page data, so as to obtain an analysis result.
The database may be a database provided by a terminal, or may be a cloud database provided by a server, and is used for storing the analysis template of the page data. It should be noted that different page data correspond to different URLs, and an analysis template corresponding to each page data may be determined according to the URLs. Illustratively, a user configures parsing templates corresponding to URLs of page data through an interface provided by a front end, where each parsing template corresponds to a parsing rule, and the parsing rule is used to parse the page data, for example, the page data is data in a source code format of a web page, and the parsing rule of the parsing template parses the data in the source code format to obtain data that can be directly read by the user, so as to serve as a final parsing result.
It should be noted that step S102 may further include sub-steps S1021, S1022 and S1023, and the determination manner of the analysis template will be described in detail in the sub-step of step S102. Referring to fig. 3, fig. 3 is a flowchart illustrating sub-steps of step S102 according to an exemplary embodiment of the present disclosure.
In sub-step S1021, a parsing template corresponding to the uniform resource locator is configured via an interface provided by the front end.
Illustratively, a user configures parsing templates corresponding to URLs of page data through an interface provided by a front end, where each parsing template corresponds to a parsing rule, and the parsing rule is used to parse the page data, for example, the page data is data in a source code format of a web page, and the parsing rule of the parsing template parses the data in the source code format of the web page, so as to obtain data in a Json format that can be directly read by the user. The parsing template includes a format returned by the parsing result, such as a Json format.
It should be noted that, configuring the parsing template corresponding to the URL of the page data includes two ways, one is to configure the corresponding parsing template according to the URL of the page data, configure the parsing module according to the URL to be suitable for page data in multiple formats, and need to further process the extracted data, for example, a news headline includes a character "such as" but the user does not need the character and needs to delete the character here.
Secondly, configuring a corresponding analysis template according to fields of page data, such as news websites, most news websites only need a plurality of fields of title, text and release time, so that the fields can be fixed, configuring the analysis template only needs to configure the fields according to corresponding rules, circularly analyzing all the fixed fields by a background during analysis, and finally combining the circularly analyzed results to obtain a final analysis result. The field configuration analysis module according to the page data is suitable for the conditions that the page results of news pages, bid and bid pages and the like are single, the number of extracted fields is few, the extracted data does not need to be further processed as described above, and only fixed fields need to be concerned in configuration, so that the configuration is convenient and the time consumption is less.
If the corresponding analysis template configured according to the URL of the page data and the corresponding analysis template configured according to the field of the page data exist at the same time, the analysis template configured according to the URL is prior, and the priority of the analysis template configured according to the URL is greater than that of the analysis template configured according to the field.
In sub-step S1022, the parsing template is verified through an interface provided by the front end.
Illustratively, the interface verification analysis template provided at the front end includes two modes, namely an online verification mode and a local verification mode.
The on-line verification method comprises the following steps: downloading a page corresponding to the uniform resource locator according to the uniform resource locator and the corresponding resolution template; analyzing the corresponding page to obtain a first analysis result; displaying a first analysis result on an interface provided at the front end; and obtaining a verification result of the first analysis result based on the received user judgment result. Illustratively, according to a URL and a corresponding parsing template, downloading page data corresponding to the URL, where a front-end interface is provided with an operation button, the operation button operates the parsing template when receiving a user click instruction, and the parsing template is used to parse the downloaded page data, then obtain a parsing result and display the parsing result on the front-end interface, and manually check the correctness of the parsing result, if the parsing result is correct, the corresponding parsing template is correct, and it is determined that the corresponding parsing template passes the check, and if the parsing result is wrong, the corresponding parsing template is wrong, and it is determined that the corresponding parsing template does not pass the check.
The local verification method comprises the following steps: importing a python packet through a local debugging interface; acquiring a page corresponding to the uniform resource locator based on a method provided in the python packet; importing the corresponding page into a local analysis template for analysis to obtain a second analysis result; displaying a second analysis result on an interface provided at the front end; obtaining a verification result of the second analysis result based on user judgment; the verification result includes a pass verification or a fail verification. Illustratively, a user imports a python package through a local debugging interface, some methods in the python package can download page data corresponding to a URL through the local debugging interface, the downloading process is provided by a background of the local debugging interface, then an original page is transmitted to a local analysis template for analysis, the analysis result is stored, the analysis result is displayed on a front-end interface, the correctness of the analysis result is manually checked, if the analysis result is correct, the corresponding analysis template is correct, and the corresponding analysis template is judged to pass the check, if the analysis result is wrong, the corresponding analysis template is wrong, and the corresponding analysis template is judged not to pass the check.
In sub-step S1023, the parsed template that passed the check is converted to a specified format and stored in a database.
In an implementation mode, the analysis template passing the verification can be converted into a character string format and stored in a database, and the storage mode can directly run the analysis template in the character string format to obtain an analysis result in the process of analyzing the page data. For example, an analysis template in a string format is registered in a memory based on an exec command in python, and a method (string) in the memory is called to transfer page data in, so that a final analysis result is obtained.
In another embodiment, the parsing template that passes the verification may be converted into a file format and stored in a database, and in the process of parsing the page data, the parsing template in the file format needs to be compiled first, and then compiled into a format that can be directly run through the python language, and then the parsing template in the format is run to obtain a parsing result.
After the analysis result is obtained, the analysis result is stored in a database, and the structural information of the target data table, such as the field name and the field type included in the database, is obtained from the database, for example, the structural information of the field name and the field type in the database is obtained through an sql cache packet in python. Some web pages include nested structures, such as requiring 2-3 clicks on a list page to enter a detail page of the web page, and therefore require a callback field (callback) to indicate the next operation. Illustratively, when a callback field exists in the analysis result, the analysis result is put into a downloader to continue downloading according to the indication of the callback field, the new page data is analyzed to obtain the analysis result after the new page data is downloaded, if the analysis result also has the callback field, the step of putting the analysis result into the downloader to continue downloading is repeatedly executed until all the page data is downloaded, and all the analysis results are stored in a database; and when the callback field does not exist in the analysis result, directly storing the analysis result into the database.
Counting the analysis result of each URL in a preset time period, displaying the counting result on a front-end interface, and judging whether a corresponding analysis template is abnormal or not according to the number of successful analysis times and the number of failed analysis times, for example, if a certain analysis template has an analysis failure record more than the preset number of times within ten minutes, determining that the analysis template is abnormal. The predetermined number of times may be determined based on human experience or other reasonable methods.
In summary, the page parsing method provided in the exemplary embodiment of the present disclosure includes: acquiring collected page data, calling an analysis template in a database to analyze the page data to obtain an analysis result, wherein the determination mode of the analysis template comprises the following steps: the method comprises the steps of configuring analysis templates corresponding to uniform resource locators through an interface provided by a front end, enabling each analysis template to correspond to an analysis rule, enabling the analysis templates to comprise a format returned by an analysis result, verifying the analysis templates through the interface provided by the front end, converting the analysis templates passing verification into an appointed format, and storing the appointed format into a database, wherein the appointed format comprises a character string format. The corresponding analysis template can be provided for different websites, the pages of the different websites can be analyzed according to the analysis rule in the analysis template, and a large amount of complicated page data of the different websites can be managed in a centralized and unified manner.
Fig. 4 is a block diagram of a page resolution device according to an exemplary embodiment of the present disclosure. Referring to fig. 4, the apparatus 20 includes an obtaining module 201, a parsing module 203, a configuration module 205, a verification module 207, and a storage module 209.
The acquiring module 201 is configured to acquire acquired page data;
the analysis module 203 is used for calling an analysis template in the database to analyze the page data to obtain an analysis result;
a configuration module 205, configured to configure, through an interface provided by a front end, an analysis template corresponding to the uniform resource locator; each analysis template corresponds to an analysis rule, and the analysis template comprises a format returned by an analysis result;
a verification module 207, configured to verify the parsing template through an interface provided by the front end;
the storage module 209 is configured to convert the parsing template that passes the verification into a specified format, and store the specified format in a database, where the specified format includes a character string format.
Optionally, the configuring module 205 is further configured to configure a corresponding parsing template according to the uniform resource locator of the page data;
or configuring a corresponding analysis template according to the field of the page data.
Optionally, the checking module 207 is further configured to download a page corresponding to the uniform resource locator according to the uniform resource locator and the corresponding parsing template;
analyzing the corresponding page to obtain a first analysis result;
displaying the first analysis result on an interface provided by the front end;
obtaining a verification result of the first analysis result based on the received user judgment result; the verification result comprises passing verification or failing verification.
Optionally, the checking module 207 is further configured to import a python packet through a local debug interface;
acquiring a page corresponding to the uniform resource locator based on a method provided in the python packet;
importing the corresponding page into a local analysis template for analysis to obtain a second analysis result;
displaying the second analysis result on an interface provided by the front end;
obtaining a verification result of the second analysis result based on user judgment; the verification result comprises passing verification or failing verification.
Optionally, the parsing module 203 is further configured to extract a uniform resource locator of the page data;
acquiring an analysis template corresponding to the uniform resource locator of the page data from a database;
registering the corresponding analysis template in a memory;
and calling the corresponding analysis template in the memory to analyze the page data to obtain the analysis result.
Optionally, the parsing module 203 is further configured to, when a callback field exists in the parsing result, put the parsing result into a downloader for continuous downloading until all the page data is downloaded, and store the parsing result;
and when the callback field does not exist in the analysis result, storing the analysis result.
Optionally, the storage module 209 is further configured to count the resolution result of each uniform resource locator within a predetermined time period; the analysis result comprises analysis success and analysis failure;
and judging whether the corresponding analysis template is abnormal or not according to the times of successful analysis and the times of failed analysis.
Fig. 5 is a block diagram illustrating an electronic device 400 according to an example embodiment. As shown in fig. 5, the electronic device 400 may include: a processor 401 and a memory 402. The electronic device 400 may also include one or more of a multimedia component 403, an input/output (I/O) interface 404, and a communication component 405.
The processor 401 is configured to control the overall operation of the electronic device 400, so as to complete all or part of the steps in the above-mentioned page parsing method. The memory 402 is used to store various types of data to support operation at the electronic device 400, such as instructions for any application or method operating on the electronic device 400 and application-related data, such as contact data, transmitted and received messages, pictures, audio, video, and so forth. The Memory 402 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia components 403 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 402 or transmitted through the communication component 405. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 404 provides an interface between the processor 401 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 405 is used for wired or wireless communication between the electronic device 400 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or a combination of one or more of them, which is not limited herein. The corresponding communication component 405 may therefore include: Wi-Fi module, Bluetooth module, NFC module, etc.
In an exemplary embodiment, the electronic Device 400 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the above-described page resolution method.
In another exemplary embodiment, there is also provided a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the page resolution method described above. For example, the computer readable storage medium may be the memory 402 described above including program instructions that are executable by the processor 401 of the electronic device 400 to perform the page resolution method described above.
Fig. 6 is a block diagram illustrating another electronic device 500 in accordance with an example embodiment. For example, the electronic device 500 may be provided as a server. Referring to fig. 6, the electronic device 500 comprises a processor 522, which may be one or more in number, and a memory 532 for storing computer programs executable by the processor 522. The computer programs stored in memory 532 may include one or more modules that each correspond to a set of instructions. Further, the processor 522 may be configured to execute the computer program to perform the page resolution method described above.
Additionally, the electronic device 500 may also include a power component 526 and a communication component 550, the power component 526 may be configured to perform power management of the electronic device 500, and the communication component 550 may be configured to enable communication, e.g., wired or wireless communication, of the electronic device 500. In addition, the electronic device 500 may also include input/output (I/O) interfaces 558. The electronic device 500 may operate based on an operating system, such as Windows Server, stored in the memory 532TM,Mac OS XTM,UnixTM,LinuxTMAnd so on.
In another exemplary embodiment, there is also provided a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the page resolution method described above. For example, the non-transitory computer readable storage medium may be the memory 532 described above including program instructions that are executable by the processor 522 of the electronic device 500 to perform the page resolution method described above.
In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the page resolution method described above when executed by the programmable apparatus.
It should be noted that all actions of acquiring signals, information or data in the present disclosure are performed under the premise of complying with the corresponding data protection regulation policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims (10)

1. A page resolution method is characterized by comprising the following steps:
acquiring collected page data;
calling an analysis template in a database to analyze the page data to obtain an analysis result; the determination mode of the analysis template comprises the following steps:
configuring an analysis template corresponding to the uniform resource locator through an interface provided by the front end; each analysis template corresponds to an analysis rule, and the analysis template comprises a format returned by an analysis result;
verifying the analysis template through an interface provided by the front end;
and converting the analysis template passing the verification into a specified format and storing the specified format into a database, wherein the specified format comprises a character string format.
2. The method of claim 1, wherein the step of configuring, via the interface provided by the front end, the parsing template corresponding to the uniform resource locator comprises:
configuring a corresponding analysis template according to the uniform resource locator of the page data;
or configuring a corresponding analysis template according to the field of the page data.
3. The method of claim 1, wherein the step of validating the parsing template through an interface provided by a front end comprises:
downloading a page corresponding to the uniform resource locator according to the uniform resource locator and the corresponding resolution template;
analyzing the corresponding page to obtain a first analysis result;
displaying the first analysis result on an interface provided by the front end;
obtaining a verification result of the first analysis result based on the received user judgment result; the verification result comprises passing verification or failing verification.
4. The method of claim 1, wherein the step of validating the parsing template through an interface provided by a front end comprises:
importing a python packet through a local debugging interface;
acquiring a page corresponding to the uniform resource locator based on a method provided in the python packet;
importing the corresponding page into a local analysis template for analysis to obtain a second analysis result;
displaying the second analysis result on an interface provided by the front end;
obtaining a verification result of the second analysis result based on user judgment; the verification result comprises passing verification or failing verification.
5. The method of claim 1, wherein the step of calling an analysis template in the database to analyze the page data to obtain an analysis result comprises:
extracting uniform resource locators of the page data;
acquiring an analysis template corresponding to the uniform resource locator of the page data from a database;
registering the corresponding analysis template in a memory;
and calling the corresponding analysis template in the memory to analyze the page data to obtain the analysis result.
6. The method of claim 1, wherein the step of invoking an analysis template in a database to analyze the page data to obtain an analysis result comprises:
when a callback field exists in the analysis result, putting the analysis result into a downloader to continue downloading until all the page data are downloaded, and storing the analysis result;
and when the callback field does not exist in the analysis result, storing the analysis result.
7. The method of claim 6, wherein the step of storing the parsed result is followed by:
counting the resolution result of each uniform resource locator within a preset time period; the analysis result comprises analysis success and analysis failure;
and judging whether the corresponding analysis template is abnormal or not according to the times of successful analysis and the times of failed analysis.
8. A page resolution apparatus, comprising:
the acquisition module is used for acquiring the acquired page data;
the analysis module is used for calling an analysis template in a database to analyze the page data to obtain an analysis result;
the configuration module is used for configuring an analysis template corresponding to the uniform resource locator through an interface provided by the front end; each analysis template corresponds to an analysis rule, and the analysis template comprises a format returned by an analysis result;
the checking module is used for checking the analysis template through an interface provided by the front end;
and the storage module is used for converting the analysis template passing the verification into a specified format and storing the specified format into a database, wherein the specified format comprises a character string format.
9. A computer-readable medium, on which a computer program is stored, which, when being executed by a processing means, carries out the steps of the method according to any one of claims 1 to 7.
10. An electronic device, comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 1 to 7.
CN202210331904.7A 2022-03-30 2022-03-30 Page parsing method and device, computer readable medium and electronic device Withdrawn CN114692050A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210331904.7A CN114692050A (en) 2022-03-30 2022-03-30 Page parsing method and device, computer readable medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210331904.7A CN114692050A (en) 2022-03-30 2022-03-30 Page parsing method and device, computer readable medium and electronic device

Publications (1)

Publication Number Publication Date
CN114692050A true CN114692050A (en) 2022-07-01

Family

ID=82141275

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210331904.7A Withdrawn CN114692050A (en) 2022-03-30 2022-03-30 Page parsing method and device, computer readable medium and electronic device

Country Status (1)

Country Link
CN (1) CN114692050A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109727050A (en) * 2017-10-31 2019-05-07 北京国双科技有限公司 A kind of method and system obtaining monitoring of the advertisement analysis data
CN110020236A (en) * 2017-08-29 2019-07-16 北京国双科技有限公司 Web analysis method, apparatus, storage medium, processor and equipment
CN110764781A (en) * 2019-10-29 2020-02-07 厦门市美亚柏科信息股份有限公司 Method for automatically analyzing forum website data
CN113032655A (en) * 2021-04-14 2021-06-25 中国刑事警察学院 Method for extracting and fixing dark network electronic data
CN113934913A (en) * 2021-11-12 2022-01-14 盐城金堤科技有限公司 Data capture method and device, storage medium and electronic equipment
CN114238733A (en) * 2021-11-19 2022-03-25 北京天眼查科技有限公司 Key information extraction method and device, computer storage medium and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020236A (en) * 2017-08-29 2019-07-16 北京国双科技有限公司 Web analysis method, apparatus, storage medium, processor and equipment
CN109727050A (en) * 2017-10-31 2019-05-07 北京国双科技有限公司 A kind of method and system obtaining monitoring of the advertisement analysis data
CN110764781A (en) * 2019-10-29 2020-02-07 厦门市美亚柏科信息股份有限公司 Method for automatically analyzing forum website data
CN113032655A (en) * 2021-04-14 2021-06-25 中国刑事警察学院 Method for extracting and fixing dark network electronic data
CN113934913A (en) * 2021-11-12 2022-01-14 盐城金堤科技有限公司 Data capture method and device, storage medium and electronic equipment
CN114238733A (en) * 2021-11-19 2022-03-25 北京天眼查科技有限公司 Key information extraction method and device, computer storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN108415832B (en) Interface automation test method, device, equipment and storage medium
US10467316B2 (en) Systems and methods for web analytics testing and web development
CN109683953B (en) Method and device for processing configuration file based on visual interface
CN109857992B (en) Medical data structured analysis method and device, readable medium and electronic equipment
CN112148674B (en) Log data processing method, device, computer equipment and storage medium
CN112187558B (en) Data verification method and device and electronic equipment
CN112084179B (en) Data processing method, device, equipment and storage medium
CN113382083B (en) Webpage screenshot method and device
CN109614327B (en) Method and apparatus for outputting information
US10594764B2 (en) Request cache to improve web applications performance
CN113590974B (en) Recommendation page configuration method and device, electronic equipment and computer readable medium
CN110851471A (en) Distributed log data processing method, device and system
CN109862074B (en) Data acquisition method and device, readable medium and electronic equipment
CN113590985B (en) Page jump configuration method and device, electronic equipment and computer readable medium
CN114692050A (en) Page parsing method and device, computer readable medium and electronic device
CN112783903B (en) Method and device for generating update log
CN114296793A (en) Anti-obfuscation method and device for obfuscated codes, readable medium and electronic device
CN112559278B (en) Method and device for acquiring operation data
CN113132447A (en) Reverse proxy method and system
CN112433752A (en) Page parsing method, device, medium and electronic equipment
CN113886216A (en) Interface test and tool configuration method, device, electronic equipment and storage medium
CN112579428A (en) Interface testing method and device, electronic equipment and storage medium
CN112988560A (en) Method and device for testing system robustness
CN116880901B (en) Application page analysis method, device, electronic equipment and computer readable medium
CN113608817B (en) Method and system for processing bullet frame

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20220701

WW01 Invention patent application withdrawn after publication