CN112579947A - Webpage element graph intercepting method and device and electronic equipment - Google Patents

Webpage element graph intercepting method and device and electronic equipment Download PDF

Info

Publication number
CN112579947A
CN112579947A CN201910929747.8A CN201910929747A CN112579947A CN 112579947 A CN112579947 A CN 112579947A CN 201910929747 A CN201910929747 A CN 201910929747A CN 112579947 A CN112579947 A CN 112579947A
Authority
CN
China
Prior art keywords
picture
target webpage
target
webpage element
webpage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910929747.8A
Other languages
Chinese (zh)
Inventor
满悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201910929747.8A priority Critical patent/CN112579947A/en
Publication of CN112579947A publication Critical patent/CN112579947A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a method for intercepting a webpage element graph, which comprises the following steps: when determining that a webpage element of a webpage contains a target webpage element to be intercepted, acquiring a first picture, wherein the first picture is a webpage screenshot of the webpage; determining the position of a target webpage element in the first picture; and extracting a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and intercepting to obtain an image of the target webpage element. The invention also discloses a device for intercepting the webpage element graph and electronic equipment.

Description

Webpage element graph intercepting method and device and electronic equipment
Technical Field
The invention relates to the technical field of image interception, in particular to a method and a device for intercepting a webpage element graph and electronic equipment.
Background
With the rapid development of network technology, the world wide web becomes a carrier of a great deal of information, how to effectively extract and utilize the information becomes a great challenge, and a search engine as a tool for assisting people to retrieve information is called an entrance and a guide for accessing the world wide web by a user; however, the general search engine has certain limitations, for example, users in different fields and different backgrounds often have different retrieval purposes and requirements, and the results returned by the general search engine include a large number of web pages which are not concerned by the users, and the data noise is large.
Therefore, in the related technology, related webpage resources are directionally captured through a web crawler, the web crawler is a program for automatically extracting webpages, downloads the webpages from the world wide web for a search engine, and is an important component of the search engine. The web crawler selectively accesses the web pages on the world wide web and the related connection according to the set grabbing target to acquire the required information, and the accuracy of data search is greatly improved.
However, the web crawler in the related art can only obtain the source code (html) of the web site from the world wide web, and cannot obtain a clear and intuitive image of the content of the web page element.
Disclosure of Invention
In view of the above, the present invention provides a method, an apparatus, an electronic device, and a computer-readable storage medium for intercepting a web page element graph, so as to solve the problem that a web crawler in the related art can only obtain an original code of a website from the world wide web and cannot obtain a clear and intuitive image of a web page element content.
In order to achieve the above object, according to an aspect of the present invention, there is provided a method for intercepting a web page element map, including:
when determining that a webpage element of a webpage contains a target webpage element to be intercepted, acquiring a first picture, wherein the first picture is a webpage screenshot of the webpage;
determining the position of the target webpage element in the first picture;
and extracting a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and intercepting to obtain an image of the target webpage element.
In an alternative, the determining the position of the target webpage element in the first picture includes:
acquiring parameter information of the target webpage element;
and determining the position of the target webpage element in the first picture according to the parameter information.
In an optional manner, the obtaining parameter information of the target web page element includes:
acquiring absolute top positioning of the target webpage element, absolute left positioning of the target webpage element, width of the target webpage element and height of the target webpage element;
the determining the position of the target webpage element in the first picture according to the parameter information includes:
and determining the position of the target webpage element in the first picture according to the absolute top positioning of the target webpage element, the absolute left positioning of the target webpage element, the width of the target webpage element and the height of the target webpage element.
In an optional manner, after the acquiring the first picture, the method further includes:
creating a Canvas container, and inserting the first picture into the Canvas container through a preset interface of the Canvas container;
the extracting a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and capturing to obtain an image of the target webpage element includes:
in the Canvas container, the first picture is cut out according to the parameter information, a second picture corresponding to the target webpage element in the first picture is reserved, and an image of the target webpage element is obtained through interception.
In an optional manner, after the cropping the first picture according to the position of the target web page element in the first picture, retaining a second picture corresponding to the target web page element in the first picture, and obtaining an image of the target web page element by means of the cropping, the method further includes:
in the Canvas container, converting the format of the image of the target webpage element to obtain character picture data;
and compressing the character picture data and submitting the compressed character picture data.
In an optional manner, before the obtaining the first picture when it is determined that the web page element of the web page includes the target web page element to be intercepted, the method further includes:
at least one CSS selector is configured in advance in a browser, and the CSS selector is used for describing the target webpage element to be intercepted and the operation executed on the target webpage element to be intercepted;
when determining that the webpage element of the webpage contains the target webpage element to be intercepted, acquiring a first picture, including:
and when the at least one CSS selector determines that the webpage elements of the webpage contain the target webpage elements to be intercepted, acquiring the first picture.
In an optional manner, before the obtaining the first picture, the method further includes:
monitoring the loading progress of webpage elements in a webpage;
the acquiring of the first picture includes:
and when the webpage elements are loaded, acquiring the first picture.
According to a second aspect of the present invention, there is provided an apparatus for intercepting a web page element map, comprising:
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a first picture when determining that a webpage element of a webpage contains a target webpage element to be intercepted, and the first picture is a webpage screenshot of the webpage;
a determining module, configured to determine a position of the target webpage element in the first picture;
and the extraction module is used for extracting a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and capturing to obtain an image of the target webpage element.
In an optional manner, the obtaining module is further configured to obtain parameter information of the target web page element;
the determining module is configured to determine, according to the parameter information, a position of the target webpage element in the first picture.
In an optional manner, the obtaining module is specifically configured to obtain an absolute top location of the target web page element, an absolute left location of the target web page element, a width of the target web page element, and a height of the target web page element;
the determining module is specifically configured to determine the position of the target webpage element in the first picture according to the absolute top location of the target webpage element, the absolute left location of the target webpage element, the width of the target webpage element, and the height of the target webpage element.
In an alternative, the apparatus further comprises:
a creating module, configured to create a Canvas container after the first picture is obtained, and insert the first picture into the Canvas container through a preset interface of the Canvas container;
the extracting module is specifically configured to clip the first picture according to the position of the target webpage element in the first picture in the Canvas container, reserve a second picture corresponding to the target webpage element in the first picture, and clip to obtain an image of the target webpage element.
In an alternative, the apparatus further comprises:
the format conversion module is used for cutting the first picture according to the position of the target webpage element in the first picture, reserving a second picture corresponding to the target webpage element in the first picture, intercepting an image of the target webpage element, and then converting the format of the image of the target webpage element in the Canvas container to obtain character picture data;
and the compression submitting module is used for compressing the character and picture data and submitting the compressed character and picture data.
In an alternative, the apparatus further comprises:
the configuration module is used for pre-configuring at least one CSS selector in a browser, and the CSS selector is used for describing the target webpage element to be intercepted and the operation executed on the target webpage element to be intercepted;
the obtaining module is configured to obtain the first picture when the at least one CSS selector determines that the web page element of the web page includes the target web page element to be intercepted.
In an alternative, the apparatus further comprises:
the monitoring module is used for monitoring the loading progress of webpage elements in a webpage before the first picture is acquired;
the obtaining module is specifically configured to obtain the first picture when the webpage element is loaded.
According to a third aspect of the present invention, there is provided an electronic device comprising a memory, a processor and a communication bus;
the memory is in communication connection with the processor through the communication bus;
the memory has stored therein computer-executable instructions for execution by the processor for performing the method provided in any of the alternative embodiments of the first aspect of the present invention.
According to a fourth aspect of the present invention, there is provided a computer-readable storage medium having stored thereon computer-executable instructions for performing the method provided in any one of the alternative embodiments of the first aspect of the present invention when executed.
The invention provides a method, a device, electronic equipment and a computer readable storage medium for intercepting a webpage element graph, wherein the method for intercepting the webpage element graph comprises the following steps: when determining that a webpage element of a webpage contains a target webpage element to be intercepted, acquiring a first picture, wherein the first picture is a webpage screenshot of the webpage; determining the position of a target webpage element in the first picture; and extracting a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and intercepting to obtain an image of the target webpage element. Therefore, when a web crawler is used for crawling a web page, a web page screenshot of the web page to be intercepted is obtained, the position of the target web page element in the first picture is obtained, and a picture corresponding to the target web page element is extracted from the web page screenshot of the web page to be intercepted, so that an image of the target web page element is intercepted; clear and intuitive page elements are arranged in the image corresponding to the target webpage elements; the problem that in the related technology, the web crawler can only obtain the original code of the website from the world wide web and cannot obtain clear and visual images of the content of the webpage elements is solved, and the intuitiveness of the information crawled by the web crawler is improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings.
FIG. 1 is a flowchart illustrating an implementation of a method for intercepting a web page element graph according to an embodiment of the present application;
FIG. 2 is a flowchart of an implementation of a method for intercepting a web page element graph according to another embodiment of the present application;
FIG. 3 is a flowchart of an implementation of a method for intercepting a web page element graph according to another embodiment of the present application;
FIG. 4 is a schematic structural diagram of an intercepting apparatus of a web page element graph provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In the description of the embodiments of the present invention, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Fig. 1 is a flowchart illustrating an implementation of a method for intercepting a web page element diagram according to an embodiment of the present application.
Referring to fig. 1, in the method for intercepting a web page element graph provided in an embodiment of the present application, a web crawler crawls web resources on a world wide web as an example to describe, it can be understood that the present embodiment is not limited to a crawling operation of web resources on multiple world wide webs of the web crawler, and is also applicable to other screenshot operations; the method can be particularly applied to computer equipment such as servers, personal computers, notebook computers, desktop computers and the like; in some possible ways, the electronic device may also be other electronic devices integrated with a processor and a memory, and the specific form of the electronic device in this embodiment is not particularly limited. The method comprises the following steps:
step 101, when it is determined that a webpage element of a webpage page includes a target webpage element to be intercepted, a first picture is obtained, and the first picture is a webpage screenshot of the webpage.
Specifically, when the web crawler is used to obtain the web Resource from the world wide web, a Uniform Resource Locator (URL) corresponding to a web page to be crawled may be opened by using a search engine such as a corresponding browser, in some optional embodiments, a chrome browser is used as an example for explanation, a URL address corresponding to a web page to be crawled may be opened by using the chrome browser, and the web crawler crawls the required web Resource from the web page opened by the URL address; specifically, after the chrome browser opens the URL address corresponding to the webpage to be crawled, the web crawler may click, scroll, and the like on the webpage elements on the opened webpage, extract the relevant information of the webpage elements, and thereby determine the webpage elements to be intercepted. Specifically, the chrome browser is provided with a screenshot plug-in, and after a webpage corresponding to the URL address is opened, the chrome browser is triggered to be provided with the screenshot plug-in to screenshot the whole webpage corresponding to the URL address, so that a first picture is obtained.
Step 102, determining the position of the target webpage element in the first picture.
Specifically, when the web crawler determines that a target web page element needing to be intercepted exists in a web page after performing operations such as page scrolling on the web page corresponding to the URL address needing to be crawled and web page element clicking, the position of the target web page element in the first picture is determined, and the position of the target web page element in the first picture can be information which can describe the position of the target web page element in the first picture and the area occupying the first picture, such as the offset distance of the top of the target web page element from the top of the web page, the offset distance of the left side of the target web page element from the left side of the web page, the width and the height of the target web page element, and the like. In one specific example, the height of the target web page element may be determined by:
determining the distance by which the bottom of the target web page element is offset from the top of the web page as a first distance X1; the distance by which the top of the target web page element is offset from the top of the web page is determined as the second distance X2, and the height of the target web page element is determined by the difference between the first distance X1 and the second distance X2. Similarly, the width of the target web page element can also be determined in the same way as the height of the target web page; thereby determining the position of the target webpage element in the first picture. It should be noted that the above manner is only an exemplary illustration, and no limitation is made to the specific manner of obtaining the height and width of the target web page element in the embodiment of the present application.
Step 103, extracting a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and capturing to obtain an image of the target webpage element.
Specifically, in this embodiment, after the position of the target webpage element in the first picture and the area occupying the first picture are determined, the first picture is clipped according to the position of the target webpage element in the first picture and the area occupying the first picture, so as to extract a second picture corresponding to the target webpage element in the first picture, and an image of the target webpage element is captured.
The method for intercepting the webpage element graph provided by the embodiment of the application comprises the following steps: when determining that a webpage element of a webpage contains a target webpage element to be intercepted, acquiring a first picture, wherein the first picture is a webpage screenshot of the webpage; determining the position of a target webpage element in the first picture; and extracting a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and intercepting to obtain an image of the target webpage element. Therefore, when a web crawler is used for intercepting a web page, a web page screenshot of the web page to be intercepted is obtained, the position of a target web page element in a first picture is obtained, and a picture corresponding to the target web page element is extracted from the web page screenshot of the web page to be intercepted, so that an image of the target web page element is intercepted and obtained; clear and intuitive page elements are arranged in the image corresponding to the target webpage elements; the problem that in the related technology, the web crawler can only obtain the original code of the website from the world wide web and cannot obtain clear and visual images of the content of the webpage elements is solved, and the intuitiveness of the information crawled by the web crawler is improved.
Fig. 2 is a flowchart of an implementation of a method for intercepting a web page element graph according to another embodiment of the present application.
Based on the foregoing embodiment, referring to fig. 2, a method for intercepting a web page element graph according to another embodiment of the present application includes the following steps:
step 201, when it is determined that the web page element of the web page includes a target web page element to be intercepted, a first picture is obtained, and the first picture is a web page screenshot of the web page.
Specifically, when the web crawler is used to obtain the web Resource from the world wide web, a Uniform Resource Locator (URL) corresponding to a web page to be crawled may be opened by using a search engine such as a corresponding browser, in some optional embodiments, a chrome browser is used as an example for explanation, a URL address corresponding to a web page to be crawled may be opened by using the chrome browser, and the web crawler crawls the required web Resource from the web page opened by the URL address; specifically, after the chrome browser opens the URL address corresponding to the webpage to be crawled, the web crawler may click, scroll, and the like on the webpage elements on the opened webpage, extract the relevant information of the webpage elements, and thereby determine the webpage elements to be intercepted. Specifically, in some optional manners, a crawling operation may be performed on a webpage to be crawled through a pre-configured crawling rule, for example, at least one CSS selector may be configured in an XML schema before crawling, where the at least one CSS selector is used to describe a target webpage element to be intercepted and an operation performed on the target webpage element to be intercepted; specifically, at least one CSS selector may describe content and operations performed by the web crawler on a web page element that needs to be clicked, such as performing operations of clicking, turning pages, scrolling pages, and the like; therefore, the web crawler is guaranteed to crawl the information resources on the world wide web only according to the content of the webpage elements needing to be clicked and the operation needing to be executed, the crawling time is saved, and the information resource obtaining efficiency is improved. Specifically, the chrome browser is provided with a screenshot plug-in, after a webpage corresponding to the URL address is opened, and when at least one CSS selector determines that the webpage elements of the webpage contain target webpage elements to be intercepted, the chrome browser is triggered to be provided with the screenshot plug-in to screenshot the whole webpage corresponding to the URL address, and therefore the first picture is obtained.
Step 202, determining the position of the target webpage element in the first picture.
Specifically, when the web crawler determines that a target web page element needing to be intercepted exists in a web page after performing operations such as page scrolling on the web page corresponding to the URL address needing to be crawled and web page element clicking, the position of the target web page element in the first picture is determined, and the position of the target web page element in the first picture can be information which can describe the position of the target web page element in the first picture and the area occupying the first picture, such as the offset distance of the top of the target web page element from the top of the web page, the offset distance of the left side of the target web page element from the left side of the web page, the width and the height of the target web page element, and the like.
In some optional manners, step 202, determining the position of the target webpage element in the first picture includes:
and acquiring the parameter information of the target webpage element.
Specifically, when it is determined that a target webpage element to be intercepted exists in a webpage, parameter information of the target webpage element, such as offset Top (absolute Top position), offset left (absolute left position), outer Width (Width), outer height (height), and the like, corresponding to the target webpage element can be acquired through J query (or native js); wherein the offset Top (absolute Top position) is the distance by which the Top of the target web page element is offset from the Top of the web page, in pixels, including the distance scrolled off during the clipping process; offset left is the distance to the left of the target web page element offset to the left of the web page in pixels, including the distance scrolled off during the clipping process.
And determining the position of the target webpage element in the first picture according to the parameter information.
Specifically, in the embodiment of the application, the position and the area of the target webpage element in the first picture can be accurately determined by obtaining the absolute top positioning of the target webpage element, the absolute left positioning of the target webpage element, the width of the target webpage element and the height of the target webpage element, so that the image of the target webpage element can be accurately obtained.
And step 203, creating a Canvas container, and inserting the first picture into the Canvas container through a preset interface of the Canvas container.
Specifically, in this embodiment, after the Canvas container is created, the first picture may be inserted into the Canvas container through a draw Image interface of the Canvas.
And 204, in the Canvas container, cutting the first picture according to the position of the target webpage element in the first picture, reserving a second picture corresponding to the target webpage element in the first picture, and capturing to obtain an image of the target webpage element.
Specifically, in this embodiment, in the Canvas container, the first picture is clipped according to the absolute top positioning of the target web page element, the absolute left positioning of the target web page element, the width of the target web page element, and the height of the target web page element, the second picture corresponding to the target web page element in the first picture is retained, and the image of the target web page element is captured.
It should be noted that this embodiment has the same or similar technical effects as the other embodiments of the present application, and details are not described in this embodiment.
Fig. 3 is a flowchart of an implementation of a method for intercepting a web page element graph according to another embodiment of the present application.
Based on the foregoing embodiment, referring to fig. 3, a method for intercepting a web page element graph according to another embodiment of the present application includes the following steps:
step 301, monitoring the loading progress of the webpage elements in the webpage.
Specifically, in this embodiment, when the web crawler is used to obtain the web Resource from the world wide web, a corresponding search engine such as a browser may be used to open a Uniform Resource Locator (URL) corresponding to a webpage to be crawled, and when the URL address is opened, the web request may be counted, so as to determine whether all webpage elements in the webpage corresponding to the opened URL have been completely loaded. In some specific application scenarios, the loading of the web page elements in the web page may be asynchronous loading, and when the browser triggers an on load event, part of the web page elements in the web page may not be completely loaded; therefore, the loading progress of the webpage elements in the webpage is detected in a mode of counting the network requests, so that the first picture is ensured to be obtained after the webpage elements are loaded, all the webpage elements of the webpage are ensured to be contained in the obtained first picture, and the condition that the webpage elements are omitted is avoided.
Step 302, when the webpage element is loaded, a first picture is obtained.
Step 303, determining the position of the target webpage element in the first picture.
And step 304, creating a Canvas container, and inserting the first picture into the Canvas container through a preset interface of the Canvas container.
And 305, in the Canvas container, cutting the first picture according to the position of the target webpage element in the first picture, reserving a second picture corresponding to the target webpage element in the first picture, and capturing to obtain an image of the target webpage element.
And step 306, converting the format of the image of the target webpage element in the Canvas container to obtain character picture data.
Specifically, in the present embodiment, in the Canvas container, the image of the target web page element is converted into a Base64 character string, so as to obtain character picture data; the character and picture data can be conveniently used for later-stage editing, for example, in some embodiments, data crawled by a web crawler from the world wide web can be used for machine learning training, or in other embodiments, the data crawled by the web crawler needs to be modified, and the like.
Step 307, compressing the character picture data, and submitting the compressed character picture data.
Specifically, in this embodiment, after the character and picture data corresponding to the image of the target web page element is obtained, the character and picture data is compressed and submitted to the background server through the receiving interface of the background server, so as to be used by the background server.
It should be noted that this embodiment has the same or similar technical effects as the other embodiments of the present application, and details are not described in this embodiment.
Fig. 4 is a schematic structural diagram of an intercepting apparatus of a web page element graph according to an embodiment of the present application.
Based on the foregoing embodiment, referring to fig. 4, an apparatus 40 for intercepting a web page element map provided in an embodiment of the present application includes:
the acquiring module 41 is configured to acquire a first picture and a page screenshot of a first picture webpage when it is determined that a webpage element of a webpage includes a target webpage element to be intercepted;
a determining module 42, configured to determine a position of the target webpage element in the first picture;
and the extracting module 43 is configured to extract a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and capture an image of the target webpage element.
In an optional manner, the obtaining module 41 is further configured to obtain parameter information of the target webpage element;
and the determining module 42 is configured to determine a position of the target webpage element in the first picture according to the parameter information.
In some optional embodiments, the obtaining module 41 is specifically configured to obtain an absolute top position of the target web page element, an absolute left position of the target web page element, a width of the target web page element, and a height of the target web page element;
the determining module 42 is specifically configured to determine the position of the target webpage element in the first picture according to the absolute top positioning of the target webpage element, the absolute left positioning of the target webpage element, the width of the target webpage element, and the height of the target webpage element.
In some alternative embodiments, the apparatus 40 further comprises:
the creating module 44 is configured to create a Canvas container after the first picture is acquired, and insert the first picture into the Canvas container through a preset interface of the Canvas container;
the extracting module 43 is specifically configured to, in the Canvas container, clip the first picture according to the position of the target webpage element in the first picture, retain the second picture corresponding to the target webpage element in the first picture, and capture an image of the target webpage element.
In some alternatives, the apparatus 40 further comprises:
the format conversion module 45 is configured to cut out the first picture according to the position of the target webpage element in the first picture, retain a second picture corresponding to the target webpage element in the first picture, capture the image of the target webpage element, and convert the format of the image of the target webpage element in a Canvas container to obtain character picture data;
and a compression submitting module 46, configured to compress the character and picture data and submit the compressed character and picture data.
In some alternatives, the apparatus 40 further comprises:
a configuration module 47, configured to pre-configure at least one CSS selector in the browser, where the CSS selector is used to describe a target web page element to be intercepted and an operation performed on the target web page element to be intercepted;
the obtaining module 41 is configured to obtain the first picture when the at least one CSS selector determines that the web page element of the web page includes a target web page element to be intercepted.
In some alternatives, the apparatus 40 further comprises:
the monitoring module 48 is configured to monitor a loading progress of a web page element in a web page before the first picture is acquired;
the obtaining module 41 is specifically configured to obtain the first picture when the webpage element is completely loaded.
It should be noted that the device embodiment provided in the present application and the method embodiment provided in the present application have the same or similar effects, and the description of the embodiment is omitted.
The intercepting device 40 of the web page element graph comprises a processor and a memory, wherein the acquiring module 41, the determining module 42, the extracting module 43, the creating module 44, the format converting module 45, the compressing and submitting module 46, the configuring module 47, the monitoring module 48 and the like are stored in the memory as program modules, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program module from the memory. The kernel can be set to be one or more, and the interception method of the webpage element graph provided by any optional embodiment of the application is realized by adjusting the kernel parameters.
The embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are executed, the computer-executable instructions are used to implement the method for intercepting a web page element graph provided in any optional embodiment of the present application.
Fig. 5 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Based on the foregoing embodiments, referring to fig. 5, an embodiment of the present application provides an electronic device 50, which includes a memory 51, a processor 52, and a communication bus 53;
the memory 51 is connected with the processor 52 in a communication way through a communication bus 53;
the memory 51 stores computer-executable instructions, and the processor 52 is configured to execute the computer-executable instructions, so as to implement the method for intercepting the element graph of the web page provided in any optional embodiment of the present application.
It should be noted that the device embodiment provided in the present application and the method embodiment provided in the present application have the same or similar effects, and the description of the embodiment is omitted.
The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device:
s12, when determining that the webpage elements of the webpage contain target webpage elements to be intercepted, acquiring a first picture, wherein the first picture is a webpage screenshot of the webpage;
s14, determining the position of the target webpage element in the first picture;
and S16, extracting a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and intercepting to obtain an image of the target webpage element.
In some embodiments, S12, determining the position of the target web page element in the first picture includes:
s121, acquiring parameter information of the target webpage element;
and S122, determining the position of the target webpage element in the first picture according to the parameter information.
In some optional manners, S121, obtaining parameter information of the target webpage element includes:
s1211, obtaining absolute top positioning of the target webpage element, absolute left positioning of the target webpage element, width of the target webpage element and height of the target webpage element;
s122, determining the position of the target webpage element in the first picture according to the parameter information, wherein the step of determining the position comprises the following steps:
s1221, in the Canvas container, determining the position of the target webpage element in the first picture according to the absolute top positioning of the target webpage element, the absolute left positioning of the target webpage element, the width of the target webpage element and the height of the target webpage element.
In some alternatives, S12, after acquiring the first picture, further includes:
s13, creating a Canvas container, and inserting the first picture into the Canvas container through a preset interface of the Canvas container;
s16, according to the position of the target webpage element in the first picture, extracting a second picture corresponding to the target webpage element in the first picture, and capturing to obtain an image of the target webpage element, wherein the image comprises:
s161, in the Canvas container, according to the position of the target webpage element in the first picture, cutting the first picture, reserving a second picture corresponding to the target webpage element in the first picture, and capturing to obtain an image of the target webpage element.
In some optional manners, S161, cutting out the first picture according to the position of the target webpage element in the first picture, retaining a second picture corresponding to the target webpage element in the first picture, and after capturing the image of the target webpage element, further includes:
s18, converting the format of the image of the target webpage element in a Canvas container to obtain character picture data;
and S20, compressing the character picture data and submitting the compressed character picture data.
In some optional manners, S12, before acquiring the first picture when it is determined that the web page element of the web page includes the target web page element to be intercepted, further includes:
s10, at least one CSS selector is pre-configured in the browser, and the CSS selector is used for describing the target webpage element to be intercepted and the operation executed by the target webpage element to be intercepted;
s12, when it is determined that the web page element of the web page includes a target web page element to be intercepted, acquiring the first picture, further including:
s123, when the at least one CSS selector determines that the webpage elements of the webpage contain the target webpage elements to be intercepted, the first picture is obtained.
In some alternatives, before acquiring the first picture, S12 further includes:
s11, monitoring the loading progress of the webpage elements in the webpage;
s12, acquiring the first picture, further comprising:
and S124, acquiring the first picture when the webpage elements are loaded.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a device includes one or more processors (CPUs), memory, and a bus. The device may also include input/output interfaces, network interfaces, and the like.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip. The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A method for intercepting a webpage element graph is characterized by comprising the following steps:
when determining that a webpage element of a webpage contains a target webpage element to be intercepted, acquiring a first picture, wherein the first picture is a webpage screenshot of the webpage;
determining the position of the target webpage element in the first picture;
and extracting a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and intercepting to obtain an image of the target webpage element.
2. The method of claim 1, wherein the determining the position of the target web page element in the first picture comprises:
acquiring parameter information of the target webpage element;
and determining the position of the target webpage element in the first picture according to the parameter information.
3. The method of claim 2, wherein the obtaining parameter information of the target web page element comprises:
acquiring absolute top positioning of the target webpage element, absolute left positioning of the target webpage element, width of the target webpage element and height of the target webpage element;
the determining the position of the target webpage element in the first picture according to the parameter information includes:
and determining the position of the target webpage element in the first picture according to the absolute top positioning of the target webpage element, the absolute left positioning of the target webpage element, the width of the target webpage element and the height of the target webpage element.
4. The method of claim 1, wherein after the obtaining the first picture, the method further comprises:
creating a Canvas container, and inserting the first picture into the Canvas container through a preset interface of the Canvas container;
the extracting a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and capturing to obtain an image of the target webpage element includes:
in the Canvas container, the first picture is cut out according to the position of the target webpage element in the first picture, a second picture corresponding to the target webpage element in the first picture is reserved, and the image of the target webpage element is obtained through cutting out.
5. The method of claim 4, wherein the cropping the first picture according to the position of the target webpage element in the first picture, retaining a second picture corresponding to the target webpage element in the first picture, and after the cropping the image of the target webpage element, the method further comprises:
in the Canvas container, converting the format of the image of the target webpage element to obtain character picture data;
and compressing the character picture data and submitting the compressed character picture data.
6. The method according to any one of claims 1 to 5, wherein before the obtaining the first picture when the web page element of the web page is determined to contain the target web page element to be intercepted, the method further comprises:
at least one CSS selector is configured in advance in a browser, and the CSS selector is used for describing the target webpage element to be intercepted and the operation executed on the target webpage element to be intercepted;
when determining that the webpage element of the webpage contains the target webpage element to be intercepted, acquiring a first picture, including:
and when the at least one CSS selector determines that the webpage elements of the webpage contain the target webpage elements to be intercepted, acquiring the first picture.
7. The method of any of claims 1-5, wherein prior to obtaining the first picture, the method further comprises:
monitoring the loading progress of webpage elements in a webpage;
the acquiring of the first picture includes:
and when the webpage elements are loaded, acquiring the first picture.
8. An apparatus for intercepting a web page element graph, comprising:
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a first picture when determining that a webpage element of a webpage contains a target webpage element to be intercepted, and the first picture is a webpage screenshot of the webpage;
a determining module, configured to determine a position of the target webpage element in the first picture;
and the extraction module is used for extracting a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and capturing to obtain an image of the target webpage element.
9. An electronic device comprising a memory, a processor, and a communication bus;
the memory is in communication connection with the processor through the communication bus;
the memory has stored therein computer-executable instructions for execution by the processor for performing the method of any of claims 1-7.
10. A computer-readable storage medium having computer-executable instructions stored thereon, which when executed, perform the method of any one of claims 1-7.
CN201910929747.8A 2019-09-29 2019-09-29 Webpage element graph intercepting method and device and electronic equipment Pending CN112579947A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910929747.8A CN112579947A (en) 2019-09-29 2019-09-29 Webpage element graph intercepting method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910929747.8A CN112579947A (en) 2019-09-29 2019-09-29 Webpage element graph intercepting method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN112579947A true CN112579947A (en) 2021-03-30

Family

ID=75110714

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910929747.8A Pending CN112579947A (en) 2019-09-29 2019-09-29 Webpage element graph intercepting method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN112579947A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113885978A (en) * 2021-09-17 2022-01-04 北京来也网络科技有限公司 Element screenshot method and device combining RPA and AI
CN115657916A (en) * 2022-12-20 2023-01-31 北京数智新天信息技术咨询有限公司 Method and device for acquiring e-commerce data and electronic equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021872A (en) * 2007-01-17 2007-08-22 深圳市光芒科技有限公司 Method for intercepting page content
FR2980605A1 (en) * 2011-09-27 2013-03-29 Myriad Group Ag METHOD FOR RETRIEVING A REPRESENTATION OF A ANNOTATED WEB DOCUMENT, COMPUTER PROGRAM AND ELECTRONIC DEVICE THEREFOR
CN104504090A (en) * 2014-12-26 2015-04-08 北京奇虎科技有限公司 Method and device for processing image in webpage
CN104834753A (en) * 2015-05-28 2015-08-12 百度在线网络技术(北京)有限公司 Webpage screenshot generating method and device
CN106354792A (en) * 2016-08-24 2017-01-25 北京小米移动软件有限公司 Webpage display method and device
CN106610829A (en) * 2015-10-26 2017-05-03 北京国双科技有限公司 Webpage screenshot method and device
CN107885848A (en) * 2017-11-10 2018-04-06 杭州美创科技有限公司 Web page screen-cutting method based on web technology
CN108959605A (en) * 2018-07-13 2018-12-07 彩讯科技股份有限公司 For the screenshot method of webpage, device, computer equipment and storage medium
CN110020070A (en) * 2017-09-28 2019-07-16 北京国双科技有限公司 Webpage circle selects data processing method, apparatus and system
CN110020044A (en) * 2017-09-22 2019-07-16 北京国双科技有限公司 A kind of crawling method and device of crawler

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021872A (en) * 2007-01-17 2007-08-22 深圳市光芒科技有限公司 Method for intercepting page content
FR2980605A1 (en) * 2011-09-27 2013-03-29 Myriad Group Ag METHOD FOR RETRIEVING A REPRESENTATION OF A ANNOTATED WEB DOCUMENT, COMPUTER PROGRAM AND ELECTRONIC DEVICE THEREFOR
CN104504090A (en) * 2014-12-26 2015-04-08 北京奇虎科技有限公司 Method and device for processing image in webpage
CN104834753A (en) * 2015-05-28 2015-08-12 百度在线网络技术(北京)有限公司 Webpage screenshot generating method and device
CN106610829A (en) * 2015-10-26 2017-05-03 北京国双科技有限公司 Webpage screenshot method and device
CN106354792A (en) * 2016-08-24 2017-01-25 北京小米移动软件有限公司 Webpage display method and device
CN110020044A (en) * 2017-09-22 2019-07-16 北京国双科技有限公司 A kind of crawling method and device of crawler
CN110020070A (en) * 2017-09-28 2019-07-16 北京国双科技有限公司 Webpage circle selects data processing method, apparatus and system
CN107885848A (en) * 2017-11-10 2018-04-06 杭州美创科技有限公司 Web page screen-cutting method based on web technology
CN108959605A (en) * 2018-07-13 2018-12-07 彩讯科技股份有限公司 For the screenshot method of webpage, device, computer equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113885978A (en) * 2021-09-17 2022-01-04 北京来也网络科技有限公司 Element screenshot method and device combining RPA and AI
CN115657916A (en) * 2022-12-20 2023-01-31 北京数智新天信息技术咨询有限公司 Method and device for acquiring e-commerce data and electronic equipment

Similar Documents

Publication Publication Date Title
CN104049986B (en) plug-in loading method and device
CN109376291B (en) Website fingerprint information scanning method and device based on web crawler
CN104281626B (en) Web page display method and web page display device based on pictured processing
CN110968824B (en) Page data processing method and device
CN110069683B (en) Method and device for crawling data based on browser
WO2015196954A1 (en) Webpage element display method and browser device
US9454535B2 (en) Topical mapping
TW201800962A (en) Webpage file sending method, webpage rendering method and device and webpage rendering system
CN114519156A (en) Webpage display method, operation event recording method and device
RU2016139156A (en) AUTOMATED INTELLECTUAL DATA COLLECTION AND VERIFICATION
CN112579623B (en) Method, device, storage medium and equipment for storing data
US20160117335A1 (en) Systems and methods for archiving media assets
CN107294918B (en) Phishing webpage detection method and device
CN107147645B (en) Method and device for acquiring network security data
CN107015986B (en) Method and device for crawling webpage by crawler
CN112579947A (en) Webpage element graph intercepting method and device and electronic equipment
CN109582883B (en) Column page determination method and device
CN110365776B (en) Picture batch downloading method and device, electronic equipment and storage medium
CN104281629A (en) Method and device for extracting picture from webpage and client equipment
CN114329281A (en) Rendering server, webpage rendering method and webpage rendering device
CN115437930B (en) Webpage application fingerprint information identification method and related equipment
CN110708270B (en) Abnormal link detection method and device
CN104834589A (en) Webpage detection method and device
CN110825976B (en) Website page detection method and device, electronic equipment and medium
CN112579952B (en) Page display method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination