CN112579947A - Webpage element graph intercepting method and device and electronic equipment - Google Patents
Webpage element graph intercepting method and device and electronic equipment Download PDFInfo
- Publication number
- CN112579947A CN112579947A CN201910929747.8A CN201910929747A CN112579947A CN 112579947 A CN112579947 A CN 112579947A CN 201910929747 A CN201910929747 A CN 201910929747A CN 112579947 A CN112579947 A CN 112579947A
- Authority
- CN
- China
- Prior art keywords
- picture
- target webpage
- target
- webpage element
- webpage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000004891 communication Methods 0.000 claims description 9
- 238000012544 monitoring process Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 11
- VYZAMTAEIAYCRO-UHFFFAOYSA-N Chromium Chemical compound [Cr] VYZAMTAEIAYCRO-UHFFFAOYSA-N 0.000 description 10
- 238000004590 computer program Methods 0.000 description 6
- 230000009193 crawling Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/957—Browsing optimisation, e.g. caching or content distillation
- G06F16/9577—Optimising the visualization of content, e.g. distillation of HTML documents
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a method for intercepting a webpage element graph, which comprises the following steps: when determining that a webpage element of a webpage contains a target webpage element to be intercepted, acquiring a first picture, wherein the first picture is a webpage screenshot of the webpage; determining the position of a target webpage element in the first picture; and extracting a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and intercepting to obtain an image of the target webpage element. The invention also discloses a device for intercepting the webpage element graph and electronic equipment.
Description
Technical Field
The invention relates to the technical field of image interception, in particular to a method and a device for intercepting a webpage element graph and electronic equipment.
Background
With the rapid development of network technology, the world wide web becomes a carrier of a great deal of information, how to effectively extract and utilize the information becomes a great challenge, and a search engine as a tool for assisting people to retrieve information is called an entrance and a guide for accessing the world wide web by a user; however, the general search engine has certain limitations, for example, users in different fields and different backgrounds often have different retrieval purposes and requirements, and the results returned by the general search engine include a large number of web pages which are not concerned by the users, and the data noise is large.
Therefore, in the related technology, related webpage resources are directionally captured through a web crawler, the web crawler is a program for automatically extracting webpages, downloads the webpages from the world wide web for a search engine, and is an important component of the search engine. The web crawler selectively accesses the web pages on the world wide web and the related connection according to the set grabbing target to acquire the required information, and the accuracy of data search is greatly improved.
However, the web crawler in the related art can only obtain the source code (html) of the web site from the world wide web, and cannot obtain a clear and intuitive image of the content of the web page element.
Disclosure of Invention
In view of the above, the present invention provides a method, an apparatus, an electronic device, and a computer-readable storage medium for intercepting a web page element graph, so as to solve the problem that a web crawler in the related art can only obtain an original code of a website from the world wide web and cannot obtain a clear and intuitive image of a web page element content.
In order to achieve the above object, according to an aspect of the present invention, there is provided a method for intercepting a web page element map, including:
when determining that a webpage element of a webpage contains a target webpage element to be intercepted, acquiring a first picture, wherein the first picture is a webpage screenshot of the webpage;
determining the position of the target webpage element in the first picture;
and extracting a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and intercepting to obtain an image of the target webpage element.
In an alternative, the determining the position of the target webpage element in the first picture includes:
acquiring parameter information of the target webpage element;
and determining the position of the target webpage element in the first picture according to the parameter information.
In an optional manner, the obtaining parameter information of the target web page element includes:
acquiring absolute top positioning of the target webpage element, absolute left positioning of the target webpage element, width of the target webpage element and height of the target webpage element;
the determining the position of the target webpage element in the first picture according to the parameter information includes:
and determining the position of the target webpage element in the first picture according to the absolute top positioning of the target webpage element, the absolute left positioning of the target webpage element, the width of the target webpage element and the height of the target webpage element.
In an optional manner, after the acquiring the first picture, the method further includes:
creating a Canvas container, and inserting the first picture into the Canvas container through a preset interface of the Canvas container;
the extracting a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and capturing to obtain an image of the target webpage element includes:
in the Canvas container, the first picture is cut out according to the parameter information, a second picture corresponding to the target webpage element in the first picture is reserved, and an image of the target webpage element is obtained through interception.
In an optional manner, after the cropping the first picture according to the position of the target web page element in the first picture, retaining a second picture corresponding to the target web page element in the first picture, and obtaining an image of the target web page element by means of the cropping, the method further includes:
in the Canvas container, converting the format of the image of the target webpage element to obtain character picture data;
and compressing the character picture data and submitting the compressed character picture data.
In an optional manner, before the obtaining the first picture when it is determined that the web page element of the web page includes the target web page element to be intercepted, the method further includes:
at least one CSS selector is configured in advance in a browser, and the CSS selector is used for describing the target webpage element to be intercepted and the operation executed on the target webpage element to be intercepted;
when determining that the webpage element of the webpage contains the target webpage element to be intercepted, acquiring a first picture, including:
and when the at least one CSS selector determines that the webpage elements of the webpage contain the target webpage elements to be intercepted, acquiring the first picture.
In an optional manner, before the obtaining the first picture, the method further includes:
monitoring the loading progress of webpage elements in a webpage;
the acquiring of the first picture includes:
and when the webpage elements are loaded, acquiring the first picture.
According to a second aspect of the present invention, there is provided an apparatus for intercepting a web page element map, comprising:
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a first picture when determining that a webpage element of a webpage contains a target webpage element to be intercepted, and the first picture is a webpage screenshot of the webpage;
a determining module, configured to determine a position of the target webpage element in the first picture;
and the extraction module is used for extracting a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and capturing to obtain an image of the target webpage element.
In an optional manner, the obtaining module is further configured to obtain parameter information of the target web page element;
the determining module is configured to determine, according to the parameter information, a position of the target webpage element in the first picture.
In an optional manner, the obtaining module is specifically configured to obtain an absolute top location of the target web page element, an absolute left location of the target web page element, a width of the target web page element, and a height of the target web page element;
the determining module is specifically configured to determine the position of the target webpage element in the first picture according to the absolute top location of the target webpage element, the absolute left location of the target webpage element, the width of the target webpage element, and the height of the target webpage element.
In an alternative, the apparatus further comprises:
a creating module, configured to create a Canvas container after the first picture is obtained, and insert the first picture into the Canvas container through a preset interface of the Canvas container;
the extracting module is specifically configured to clip the first picture according to the position of the target webpage element in the first picture in the Canvas container, reserve a second picture corresponding to the target webpage element in the first picture, and clip to obtain an image of the target webpage element.
In an alternative, the apparatus further comprises:
the format conversion module is used for cutting the first picture according to the position of the target webpage element in the first picture, reserving a second picture corresponding to the target webpage element in the first picture, intercepting an image of the target webpage element, and then converting the format of the image of the target webpage element in the Canvas container to obtain character picture data;
and the compression submitting module is used for compressing the character and picture data and submitting the compressed character and picture data.
In an alternative, the apparatus further comprises:
the configuration module is used for pre-configuring at least one CSS selector in a browser, and the CSS selector is used for describing the target webpage element to be intercepted and the operation executed on the target webpage element to be intercepted;
the obtaining module is configured to obtain the first picture when the at least one CSS selector determines that the web page element of the web page includes the target web page element to be intercepted.
In an alternative, the apparatus further comprises:
the monitoring module is used for monitoring the loading progress of webpage elements in a webpage before the first picture is acquired;
the obtaining module is specifically configured to obtain the first picture when the webpage element is loaded.
According to a third aspect of the present invention, there is provided an electronic device comprising a memory, a processor and a communication bus;
the memory is in communication connection with the processor through the communication bus;
the memory has stored therein computer-executable instructions for execution by the processor for performing the method provided in any of the alternative embodiments of the first aspect of the present invention.
According to a fourth aspect of the present invention, there is provided a computer-readable storage medium having stored thereon computer-executable instructions for performing the method provided in any one of the alternative embodiments of the first aspect of the present invention when executed.
The invention provides a method, a device, electronic equipment and a computer readable storage medium for intercepting a webpage element graph, wherein the method for intercepting the webpage element graph comprises the following steps: when determining that a webpage element of a webpage contains a target webpage element to be intercepted, acquiring a first picture, wherein the first picture is a webpage screenshot of the webpage; determining the position of a target webpage element in the first picture; and extracting a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and intercepting to obtain an image of the target webpage element. Therefore, when a web crawler is used for crawling a web page, a web page screenshot of the web page to be intercepted is obtained, the position of the target web page element in the first picture is obtained, and a picture corresponding to the target web page element is extracted from the web page screenshot of the web page to be intercepted, so that an image of the target web page element is intercepted; clear and intuitive page elements are arranged in the image corresponding to the target webpage elements; the problem that in the related technology, the web crawler can only obtain the original code of the website from the world wide web and cannot obtain clear and visual images of the content of the webpage elements is solved, and the intuitiveness of the information crawled by the web crawler is improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings.
FIG. 1 is a flowchart illustrating an implementation of a method for intercepting a web page element graph according to an embodiment of the present application;
FIG. 2 is a flowchart of an implementation of a method for intercepting a web page element graph according to another embodiment of the present application;
FIG. 3 is a flowchart of an implementation of a method for intercepting a web page element graph according to another embodiment of the present application;
FIG. 4 is a schematic structural diagram of an intercepting apparatus of a web page element graph provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In the description of the embodiments of the present invention, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Fig. 1 is a flowchart illustrating an implementation of a method for intercepting a web page element diagram according to an embodiment of the present application.
Referring to fig. 1, in the method for intercepting a web page element graph provided in an embodiment of the present application, a web crawler crawls web resources on a world wide web as an example to describe, it can be understood that the present embodiment is not limited to a crawling operation of web resources on multiple world wide webs of the web crawler, and is also applicable to other screenshot operations; the method can be particularly applied to computer equipment such as servers, personal computers, notebook computers, desktop computers and the like; in some possible ways, the electronic device may also be other electronic devices integrated with a processor and a memory, and the specific form of the electronic device in this embodiment is not particularly limited. The method comprises the following steps:
Specifically, when the web crawler is used to obtain the web Resource from the world wide web, a Uniform Resource Locator (URL) corresponding to a web page to be crawled may be opened by using a search engine such as a corresponding browser, in some optional embodiments, a chrome browser is used as an example for explanation, a URL address corresponding to a web page to be crawled may be opened by using the chrome browser, and the web crawler crawls the required web Resource from the web page opened by the URL address; specifically, after the chrome browser opens the URL address corresponding to the webpage to be crawled, the web crawler may click, scroll, and the like on the webpage elements on the opened webpage, extract the relevant information of the webpage elements, and thereby determine the webpage elements to be intercepted. Specifically, the chrome browser is provided with a screenshot plug-in, and after a webpage corresponding to the URL address is opened, the chrome browser is triggered to be provided with the screenshot plug-in to screenshot the whole webpage corresponding to the URL address, so that a first picture is obtained.
Specifically, when the web crawler determines that a target web page element needing to be intercepted exists in a web page after performing operations such as page scrolling on the web page corresponding to the URL address needing to be crawled and web page element clicking, the position of the target web page element in the first picture is determined, and the position of the target web page element in the first picture can be information which can describe the position of the target web page element in the first picture and the area occupying the first picture, such as the offset distance of the top of the target web page element from the top of the web page, the offset distance of the left side of the target web page element from the left side of the web page, the width and the height of the target web page element, and the like. In one specific example, the height of the target web page element may be determined by:
determining the distance by which the bottom of the target web page element is offset from the top of the web page as a first distance X1; the distance by which the top of the target web page element is offset from the top of the web page is determined as the second distance X2, and the height of the target web page element is determined by the difference between the first distance X1 and the second distance X2. Similarly, the width of the target web page element can also be determined in the same way as the height of the target web page; thereby determining the position of the target webpage element in the first picture. It should be noted that the above manner is only an exemplary illustration, and no limitation is made to the specific manner of obtaining the height and width of the target web page element in the embodiment of the present application.
Specifically, in this embodiment, after the position of the target webpage element in the first picture and the area occupying the first picture are determined, the first picture is clipped according to the position of the target webpage element in the first picture and the area occupying the first picture, so as to extract a second picture corresponding to the target webpage element in the first picture, and an image of the target webpage element is captured.
The method for intercepting the webpage element graph provided by the embodiment of the application comprises the following steps: when determining that a webpage element of a webpage contains a target webpage element to be intercepted, acquiring a first picture, wherein the first picture is a webpage screenshot of the webpage; determining the position of a target webpage element in the first picture; and extracting a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and intercepting to obtain an image of the target webpage element. Therefore, when a web crawler is used for intercepting a web page, a web page screenshot of the web page to be intercepted is obtained, the position of a target web page element in a first picture is obtained, and a picture corresponding to the target web page element is extracted from the web page screenshot of the web page to be intercepted, so that an image of the target web page element is intercepted and obtained; clear and intuitive page elements are arranged in the image corresponding to the target webpage elements; the problem that in the related technology, the web crawler can only obtain the original code of the website from the world wide web and cannot obtain clear and visual images of the content of the webpage elements is solved, and the intuitiveness of the information crawled by the web crawler is improved.
Fig. 2 is a flowchart of an implementation of a method for intercepting a web page element graph according to another embodiment of the present application.
Based on the foregoing embodiment, referring to fig. 2, a method for intercepting a web page element graph according to another embodiment of the present application includes the following steps:
Specifically, when the web crawler is used to obtain the web Resource from the world wide web, a Uniform Resource Locator (URL) corresponding to a web page to be crawled may be opened by using a search engine such as a corresponding browser, in some optional embodiments, a chrome browser is used as an example for explanation, a URL address corresponding to a web page to be crawled may be opened by using the chrome browser, and the web crawler crawls the required web Resource from the web page opened by the URL address; specifically, after the chrome browser opens the URL address corresponding to the webpage to be crawled, the web crawler may click, scroll, and the like on the webpage elements on the opened webpage, extract the relevant information of the webpage elements, and thereby determine the webpage elements to be intercepted. Specifically, in some optional manners, a crawling operation may be performed on a webpage to be crawled through a pre-configured crawling rule, for example, at least one CSS selector may be configured in an XML schema before crawling, where the at least one CSS selector is used to describe a target webpage element to be intercepted and an operation performed on the target webpage element to be intercepted; specifically, at least one CSS selector may describe content and operations performed by the web crawler on a web page element that needs to be clicked, such as performing operations of clicking, turning pages, scrolling pages, and the like; therefore, the web crawler is guaranteed to crawl the information resources on the world wide web only according to the content of the webpage elements needing to be clicked and the operation needing to be executed, the crawling time is saved, and the information resource obtaining efficiency is improved. Specifically, the chrome browser is provided with a screenshot plug-in, after a webpage corresponding to the URL address is opened, and when at least one CSS selector determines that the webpage elements of the webpage contain target webpage elements to be intercepted, the chrome browser is triggered to be provided with the screenshot plug-in to screenshot the whole webpage corresponding to the URL address, and therefore the first picture is obtained.
Specifically, when the web crawler determines that a target web page element needing to be intercepted exists in a web page after performing operations such as page scrolling on the web page corresponding to the URL address needing to be crawled and web page element clicking, the position of the target web page element in the first picture is determined, and the position of the target web page element in the first picture can be information which can describe the position of the target web page element in the first picture and the area occupying the first picture, such as the offset distance of the top of the target web page element from the top of the web page, the offset distance of the left side of the target web page element from the left side of the web page, the width and the height of the target web page element, and the like.
In some optional manners, step 202, determining the position of the target webpage element in the first picture includes:
and acquiring the parameter information of the target webpage element.
Specifically, when it is determined that a target webpage element to be intercepted exists in a webpage, parameter information of the target webpage element, such as offset Top (absolute Top position), offset left (absolute left position), outer Width (Width), outer height (height), and the like, corresponding to the target webpage element can be acquired through J query (or native js); wherein the offset Top (absolute Top position) is the distance by which the Top of the target web page element is offset from the Top of the web page, in pixels, including the distance scrolled off during the clipping process; offset left is the distance to the left of the target web page element offset to the left of the web page in pixels, including the distance scrolled off during the clipping process.
And determining the position of the target webpage element in the first picture according to the parameter information.
Specifically, in the embodiment of the application, the position and the area of the target webpage element in the first picture can be accurately determined by obtaining the absolute top positioning of the target webpage element, the absolute left positioning of the target webpage element, the width of the target webpage element and the height of the target webpage element, so that the image of the target webpage element can be accurately obtained.
And step 203, creating a Canvas container, and inserting the first picture into the Canvas container through a preset interface of the Canvas container.
Specifically, in this embodiment, after the Canvas container is created, the first picture may be inserted into the Canvas container through a draw Image interface of the Canvas.
And 204, in the Canvas container, cutting the first picture according to the position of the target webpage element in the first picture, reserving a second picture corresponding to the target webpage element in the first picture, and capturing to obtain an image of the target webpage element.
Specifically, in this embodiment, in the Canvas container, the first picture is clipped according to the absolute top positioning of the target web page element, the absolute left positioning of the target web page element, the width of the target web page element, and the height of the target web page element, the second picture corresponding to the target web page element in the first picture is retained, and the image of the target web page element is captured.
It should be noted that this embodiment has the same or similar technical effects as the other embodiments of the present application, and details are not described in this embodiment.
Fig. 3 is a flowchart of an implementation of a method for intercepting a web page element graph according to another embodiment of the present application.
Based on the foregoing embodiment, referring to fig. 3, a method for intercepting a web page element graph according to another embodiment of the present application includes the following steps:
Specifically, in this embodiment, when the web crawler is used to obtain the web Resource from the world wide web, a corresponding search engine such as a browser may be used to open a Uniform Resource Locator (URL) corresponding to a webpage to be crawled, and when the URL address is opened, the web request may be counted, so as to determine whether all webpage elements in the webpage corresponding to the opened URL have been completely loaded. In some specific application scenarios, the loading of the web page elements in the web page may be asynchronous loading, and when the browser triggers an on load event, part of the web page elements in the web page may not be completely loaded; therefore, the loading progress of the webpage elements in the webpage is detected in a mode of counting the network requests, so that the first picture is ensured to be obtained after the webpage elements are loaded, all the webpage elements of the webpage are ensured to be contained in the obtained first picture, and the condition that the webpage elements are omitted is avoided.
And step 304, creating a Canvas container, and inserting the first picture into the Canvas container through a preset interface of the Canvas container.
And 305, in the Canvas container, cutting the first picture according to the position of the target webpage element in the first picture, reserving a second picture corresponding to the target webpage element in the first picture, and capturing to obtain an image of the target webpage element.
And step 306, converting the format of the image of the target webpage element in the Canvas container to obtain character picture data.
Specifically, in the present embodiment, in the Canvas container, the image of the target web page element is converted into a Base64 character string, so as to obtain character picture data; the character and picture data can be conveniently used for later-stage editing, for example, in some embodiments, data crawled by a web crawler from the world wide web can be used for machine learning training, or in other embodiments, the data crawled by the web crawler needs to be modified, and the like.
Specifically, in this embodiment, after the character and picture data corresponding to the image of the target web page element is obtained, the character and picture data is compressed and submitted to the background server through the receiving interface of the background server, so as to be used by the background server.
It should be noted that this embodiment has the same or similar technical effects as the other embodiments of the present application, and details are not described in this embodiment.
Fig. 4 is a schematic structural diagram of an intercepting apparatus of a web page element graph according to an embodiment of the present application.
Based on the foregoing embodiment, referring to fig. 4, an apparatus 40 for intercepting a web page element map provided in an embodiment of the present application includes:
the acquiring module 41 is configured to acquire a first picture and a page screenshot of a first picture webpage when it is determined that a webpage element of a webpage includes a target webpage element to be intercepted;
a determining module 42, configured to determine a position of the target webpage element in the first picture;
and the extracting module 43 is configured to extract a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and capture an image of the target webpage element.
In an optional manner, the obtaining module 41 is further configured to obtain parameter information of the target webpage element;
and the determining module 42 is configured to determine a position of the target webpage element in the first picture according to the parameter information.
In some optional embodiments, the obtaining module 41 is specifically configured to obtain an absolute top position of the target web page element, an absolute left position of the target web page element, a width of the target web page element, and a height of the target web page element;
the determining module 42 is specifically configured to determine the position of the target webpage element in the first picture according to the absolute top positioning of the target webpage element, the absolute left positioning of the target webpage element, the width of the target webpage element, and the height of the target webpage element.
In some alternative embodiments, the apparatus 40 further comprises:
the creating module 44 is configured to create a Canvas container after the first picture is acquired, and insert the first picture into the Canvas container through a preset interface of the Canvas container;
the extracting module 43 is specifically configured to, in the Canvas container, clip the first picture according to the position of the target webpage element in the first picture, retain the second picture corresponding to the target webpage element in the first picture, and capture an image of the target webpage element.
In some alternatives, the apparatus 40 further comprises:
the format conversion module 45 is configured to cut out the first picture according to the position of the target webpage element in the first picture, retain a second picture corresponding to the target webpage element in the first picture, capture the image of the target webpage element, and convert the format of the image of the target webpage element in a Canvas container to obtain character picture data;
and a compression submitting module 46, configured to compress the character and picture data and submit the compressed character and picture data.
In some alternatives, the apparatus 40 further comprises:
a configuration module 47, configured to pre-configure at least one CSS selector in the browser, where the CSS selector is used to describe a target web page element to be intercepted and an operation performed on the target web page element to be intercepted;
the obtaining module 41 is configured to obtain the first picture when the at least one CSS selector determines that the web page element of the web page includes a target web page element to be intercepted.
In some alternatives, the apparatus 40 further comprises:
the monitoring module 48 is configured to monitor a loading progress of a web page element in a web page before the first picture is acquired;
the obtaining module 41 is specifically configured to obtain the first picture when the webpage element is completely loaded.
It should be noted that the device embodiment provided in the present application and the method embodiment provided in the present application have the same or similar effects, and the description of the embodiment is omitted.
The intercepting device 40 of the web page element graph comprises a processor and a memory, wherein the acquiring module 41, the determining module 42, the extracting module 43, the creating module 44, the format converting module 45, the compressing and submitting module 46, the configuring module 47, the monitoring module 48 and the like are stored in the memory as program modules, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program module from the memory. The kernel can be set to be one or more, and the interception method of the webpage element graph provided by any optional embodiment of the application is realized by adjusting the kernel parameters.
The embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are executed, the computer-executable instructions are used to implement the method for intercepting a web page element graph provided in any optional embodiment of the present application.
Fig. 5 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Based on the foregoing embodiments, referring to fig. 5, an embodiment of the present application provides an electronic device 50, which includes a memory 51, a processor 52, and a communication bus 53;
the memory 51 is connected with the processor 52 in a communication way through a communication bus 53;
the memory 51 stores computer-executable instructions, and the processor 52 is configured to execute the computer-executable instructions, so as to implement the method for intercepting the element graph of the web page provided in any optional embodiment of the present application.
It should be noted that the device embodiment provided in the present application and the method embodiment provided in the present application have the same or similar effects, and the description of the embodiment is omitted.
The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device:
s12, when determining that the webpage elements of the webpage contain target webpage elements to be intercepted, acquiring a first picture, wherein the first picture is a webpage screenshot of the webpage;
s14, determining the position of the target webpage element in the first picture;
and S16, extracting a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and intercepting to obtain an image of the target webpage element.
In some embodiments, S12, determining the position of the target web page element in the first picture includes:
s121, acquiring parameter information of the target webpage element;
and S122, determining the position of the target webpage element in the first picture according to the parameter information.
In some optional manners, S121, obtaining parameter information of the target webpage element includes:
s1211, obtaining absolute top positioning of the target webpage element, absolute left positioning of the target webpage element, width of the target webpage element and height of the target webpage element;
s122, determining the position of the target webpage element in the first picture according to the parameter information, wherein the step of determining the position comprises the following steps:
s1221, in the Canvas container, determining the position of the target webpage element in the first picture according to the absolute top positioning of the target webpage element, the absolute left positioning of the target webpage element, the width of the target webpage element and the height of the target webpage element.
In some alternatives, S12, after acquiring the first picture, further includes:
s13, creating a Canvas container, and inserting the first picture into the Canvas container through a preset interface of the Canvas container;
s16, according to the position of the target webpage element in the first picture, extracting a second picture corresponding to the target webpage element in the first picture, and capturing to obtain an image of the target webpage element, wherein the image comprises:
s161, in the Canvas container, according to the position of the target webpage element in the first picture, cutting the first picture, reserving a second picture corresponding to the target webpage element in the first picture, and capturing to obtain an image of the target webpage element.
In some optional manners, S161, cutting out the first picture according to the position of the target webpage element in the first picture, retaining a second picture corresponding to the target webpage element in the first picture, and after capturing the image of the target webpage element, further includes:
s18, converting the format of the image of the target webpage element in a Canvas container to obtain character picture data;
and S20, compressing the character picture data and submitting the compressed character picture data.
In some optional manners, S12, before acquiring the first picture when it is determined that the web page element of the web page includes the target web page element to be intercepted, further includes:
s10, at least one CSS selector is pre-configured in the browser, and the CSS selector is used for describing the target webpage element to be intercepted and the operation executed by the target webpage element to be intercepted;
s12, when it is determined that the web page element of the web page includes a target web page element to be intercepted, acquiring the first picture, further including:
s123, when the at least one CSS selector determines that the webpage elements of the webpage contain the target webpage elements to be intercepted, the first picture is obtained.
In some alternatives, before acquiring the first picture, S12 further includes:
s11, monitoring the loading progress of the webpage elements in the webpage;
s12, acquiring the first picture, further comprising:
and S124, acquiring the first picture when the webpage elements are loaded.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a device includes one or more processors (CPUs), memory, and a bus. The device may also include input/output interfaces, network interfaces, and the like.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip. The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.
Claims (10)
1. A method for intercepting a webpage element graph is characterized by comprising the following steps:
when determining that a webpage element of a webpage contains a target webpage element to be intercepted, acquiring a first picture, wherein the first picture is a webpage screenshot of the webpage;
determining the position of the target webpage element in the first picture;
and extracting a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and intercepting to obtain an image of the target webpage element.
2. The method of claim 1, wherein the determining the position of the target web page element in the first picture comprises:
acquiring parameter information of the target webpage element;
and determining the position of the target webpage element in the first picture according to the parameter information.
3. The method of claim 2, wherein the obtaining parameter information of the target web page element comprises:
acquiring absolute top positioning of the target webpage element, absolute left positioning of the target webpage element, width of the target webpage element and height of the target webpage element;
the determining the position of the target webpage element in the first picture according to the parameter information includes:
and determining the position of the target webpage element in the first picture according to the absolute top positioning of the target webpage element, the absolute left positioning of the target webpage element, the width of the target webpage element and the height of the target webpage element.
4. The method of claim 1, wherein after the obtaining the first picture, the method further comprises:
creating a Canvas container, and inserting the first picture into the Canvas container through a preset interface of the Canvas container;
the extracting a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and capturing to obtain an image of the target webpage element includes:
in the Canvas container, the first picture is cut out according to the position of the target webpage element in the first picture, a second picture corresponding to the target webpage element in the first picture is reserved, and the image of the target webpage element is obtained through cutting out.
5. The method of claim 4, wherein the cropping the first picture according to the position of the target webpage element in the first picture, retaining a second picture corresponding to the target webpage element in the first picture, and after the cropping the image of the target webpage element, the method further comprises:
in the Canvas container, converting the format of the image of the target webpage element to obtain character picture data;
and compressing the character picture data and submitting the compressed character picture data.
6. The method according to any one of claims 1 to 5, wherein before the obtaining the first picture when the web page element of the web page is determined to contain the target web page element to be intercepted, the method further comprises:
at least one CSS selector is configured in advance in a browser, and the CSS selector is used for describing the target webpage element to be intercepted and the operation executed on the target webpage element to be intercepted;
when determining that the webpage element of the webpage contains the target webpage element to be intercepted, acquiring a first picture, including:
and when the at least one CSS selector determines that the webpage elements of the webpage contain the target webpage elements to be intercepted, acquiring the first picture.
7. The method of any of claims 1-5, wherein prior to obtaining the first picture, the method further comprises:
monitoring the loading progress of webpage elements in a webpage;
the acquiring of the first picture includes:
and when the webpage elements are loaded, acquiring the first picture.
8. An apparatus for intercepting a web page element graph, comprising:
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a first picture when determining that a webpage element of a webpage contains a target webpage element to be intercepted, and the first picture is a webpage screenshot of the webpage;
a determining module, configured to determine a position of the target webpage element in the first picture;
and the extraction module is used for extracting a second picture corresponding to the target webpage element in the first picture according to the position of the target webpage element in the first picture, and capturing to obtain an image of the target webpage element.
9. An electronic device comprising a memory, a processor, and a communication bus;
the memory is in communication connection with the processor through the communication bus;
the memory has stored therein computer-executable instructions for execution by the processor for performing the method of any of claims 1-7.
10. A computer-readable storage medium having computer-executable instructions stored thereon, which when executed, perform the method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910929747.8A CN112579947A (en) | 2019-09-29 | 2019-09-29 | Webpage element graph intercepting method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910929747.8A CN112579947A (en) | 2019-09-29 | 2019-09-29 | Webpage element graph intercepting method and device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112579947A true CN112579947A (en) | 2021-03-30 |
Family
ID=75110714
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910929747.8A Pending CN112579947A (en) | 2019-09-29 | 2019-09-29 | Webpage element graph intercepting method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112579947A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113885978A (en) * | 2021-09-17 | 2022-01-04 | 北京来也网络科技有限公司 | Element screenshot method and device combining RPA and AI |
CN115657916A (en) * | 2022-12-20 | 2023-01-31 | 北京数智新天信息技术咨询有限公司 | Method and device for acquiring e-commerce data and electronic equipment |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021872A (en) * | 2007-01-17 | 2007-08-22 | 深圳市光芒科技有限公司 | Method for intercepting page content |
FR2980605A1 (en) * | 2011-09-27 | 2013-03-29 | Myriad Group Ag | METHOD FOR RETRIEVING A REPRESENTATION OF A ANNOTATED WEB DOCUMENT, COMPUTER PROGRAM AND ELECTRONIC DEVICE THEREFOR |
CN104504090A (en) * | 2014-12-26 | 2015-04-08 | 北京奇虎科技有限公司 | Method and device for processing image in webpage |
CN104834753A (en) * | 2015-05-28 | 2015-08-12 | 百度在线网络技术(北京)有限公司 | Webpage screenshot generating method and device |
CN106354792A (en) * | 2016-08-24 | 2017-01-25 | 北京小米移动软件有限公司 | Webpage display method and device |
CN106610829A (en) * | 2015-10-26 | 2017-05-03 | 北京国双科技有限公司 | Webpage screenshot method and device |
CN107885848A (en) * | 2017-11-10 | 2018-04-06 | 杭州美创科技有限公司 | Web page screen-cutting method based on web technology |
CN108959605A (en) * | 2018-07-13 | 2018-12-07 | 彩讯科技股份有限公司 | For the screenshot method of webpage, device, computer equipment and storage medium |
CN110020070A (en) * | 2017-09-28 | 2019-07-16 | 北京国双科技有限公司 | Webpage circle selects data processing method, apparatus and system |
CN110020044A (en) * | 2017-09-22 | 2019-07-16 | 北京国双科技有限公司 | A kind of crawling method and device of crawler |
-
2019
- 2019-09-29 CN CN201910929747.8A patent/CN112579947A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021872A (en) * | 2007-01-17 | 2007-08-22 | 深圳市光芒科技有限公司 | Method for intercepting page content |
FR2980605A1 (en) * | 2011-09-27 | 2013-03-29 | Myriad Group Ag | METHOD FOR RETRIEVING A REPRESENTATION OF A ANNOTATED WEB DOCUMENT, COMPUTER PROGRAM AND ELECTRONIC DEVICE THEREFOR |
CN104504090A (en) * | 2014-12-26 | 2015-04-08 | 北京奇虎科技有限公司 | Method and device for processing image in webpage |
CN104834753A (en) * | 2015-05-28 | 2015-08-12 | 百度在线网络技术(北京)有限公司 | Webpage screenshot generating method and device |
CN106610829A (en) * | 2015-10-26 | 2017-05-03 | 北京国双科技有限公司 | Webpage screenshot method and device |
CN106354792A (en) * | 2016-08-24 | 2017-01-25 | 北京小米移动软件有限公司 | Webpage display method and device |
CN110020044A (en) * | 2017-09-22 | 2019-07-16 | 北京国双科技有限公司 | A kind of crawling method and device of crawler |
CN110020070A (en) * | 2017-09-28 | 2019-07-16 | 北京国双科技有限公司 | Webpage circle selects data processing method, apparatus and system |
CN107885848A (en) * | 2017-11-10 | 2018-04-06 | 杭州美创科技有限公司 | Web page screen-cutting method based on web technology |
CN108959605A (en) * | 2018-07-13 | 2018-12-07 | 彩讯科技股份有限公司 | For the screenshot method of webpage, device, computer equipment and storage medium |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113885978A (en) * | 2021-09-17 | 2022-01-04 | 北京来也网络科技有限公司 | Element screenshot method and device combining RPA and AI |
CN115657916A (en) * | 2022-12-20 | 2023-01-31 | 北京数智新天信息技术咨询有限公司 | Method and device for acquiring e-commerce data and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104049986B (en) | plug-in loading method and device | |
CN109376291B (en) | Website fingerprint information scanning method and device based on web crawler | |
CN104281626B (en) | Web page display method and web page display device based on pictured processing | |
CN110968824B (en) | Page data processing method and device | |
CN110069683B (en) | Method and device for crawling data based on browser | |
WO2015196954A1 (en) | Webpage element display method and browser device | |
US9454535B2 (en) | Topical mapping | |
TW201800962A (en) | Webpage file sending method, webpage rendering method and device and webpage rendering system | |
CN114519156A (en) | Webpage display method, operation event recording method and device | |
RU2016139156A (en) | AUTOMATED INTELLECTUAL DATA COLLECTION AND VERIFICATION | |
CN112579623B (en) | Method, device, storage medium and equipment for storing data | |
US20160117335A1 (en) | Systems and methods for archiving media assets | |
CN107294918B (en) | Phishing webpage detection method and device | |
CN107147645B (en) | Method and device for acquiring network security data | |
CN107015986B (en) | Method and device for crawling webpage by crawler | |
CN112579947A (en) | Webpage element graph intercepting method and device and electronic equipment | |
CN109582883B (en) | Column page determination method and device | |
CN110365776B (en) | Picture batch downloading method and device, electronic equipment and storage medium | |
CN104281629A (en) | Method and device for extracting picture from webpage and client equipment | |
CN114329281A (en) | Rendering server, webpage rendering method and webpage rendering device | |
CN115437930B (en) | Webpage application fingerprint information identification method and related equipment | |
CN110708270B (en) | Abnormal link detection method and device | |
CN104834589A (en) | Webpage detection method and device | |
CN110825976B (en) | Website page detection method and device, electronic equipment and medium | |
CN112579952B (en) | Page display method and device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |