CN111428162A - Page screenshot method and device - Google Patents

Page screenshot method and device Download PDF

Info

Publication number
CN111428162A
CN111428162A CN202010200265.1A CN202010200265A CN111428162A CN 111428162 A CN111428162 A CN 111428162A CN 202010200265 A CN202010200265 A CN 202010200265A CN 111428162 A CN111428162 A CN 111428162A
Authority
CN
China
Prior art keywords
screenshot
page
loaded
elements
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010200265.1A
Other languages
Chinese (zh)
Inventor
韩喆
王春霈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010200265.1A priority Critical patent/CN111428162A/en
Publication of CN111428162A publication Critical patent/CN111428162A/en
Priority to PCT/CN2020/140556 priority patent/WO2021184896A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A page screenshot method and a page screenshot device are provided, and the method comprises the steps of analyzing a screenshot request initiated by a user to obtain a uniform resource locator UR L of a page to be screenshot, loading the page to be screenshot according to the UR L, judging whether page elements related to screenshot are completely loaded in the loading process, stopping the loading in response to the completion of the loading of the page elements related to screenshot, and screenshot the loaded part of the page to be screenshot.

Description

Page screenshot method and device
Technical Field
The application relates to the technical field of computer application, in particular to a page screenshot method and device.
Background
At present, some netizens use the internet to carry out illegal activities such as plagiarism and embezzlement of other people works, rumor manufacturing, and illegal goods selling, which brings bad social influence. In order to carry out duty confirmation on a violator, electronic evidence collection is often required on related pages. The page screenshot is a feasible electronic evidence obtaining method, and the content displayed on the page related to illegal activities is stored in a picture form by using the method and can be used as an electronic evidence.
Disclosure of Invention
The application discloses a page screenshot method and a page screenshot device.
According to a first aspect of an embodiment of the present application, a method for page screenshot is disclosed, which includes:
responding to a screenshot request initiated by a user, and acquiring a uniform resource locator UR L of a page to be screenshot;
loading the page to be screenshot according to the UR L, and judging whether the loading of the page elements related to the screenshot is finished in the loading process, wherein the page elements related to the screenshot are the page elements appointed by the user;
and in response to the completion of the loading of the page elements related to the screenshot, stopping the loading of the page to be screenshot, and performing screenshot on the loaded part of the page to be screenshot.
According to a second aspect of the embodiments of the present application, a method for page screenshot is disclosed, which includes:
responding to a screenshot request initiated by a user, and acquiring a uniform resource locator UR L of a page to be screenshot;
loading the page to be captured according to the UR L;
after the page to be subjected to screenshot is loaded, preprocessing the page to be subjected to screenshot; the preprocessing comprises deleting screenshot interference elements in the page to be screenshot;
and performing screenshot on the preprocessed page to be subjected to screenshot.
According to a third aspect of the embodiments of the present application, a page screenshot apparatus is disclosed, which includes:
the UR L acquisition module is used for responding to a screenshot request initiated by a user and acquiring a uniform resource locator UR L of a page to be screenshot;
the page loading module is used for loading the page to be screenshot according to the UR L and judging whether the loading of the page elements related to the screenshot is finished or not in the loading process, wherein the page elements related to the screenshot are the page elements appointed by the user;
and the execution module is used for responding to the completion of loading of the page elements related to the screenshot, stopping the loading of the page to be screenshot, and performing screenshot on the loaded part in the page to be screenshot.
According to a fourth aspect of the embodiments of the present application, a page screenshot apparatus is disclosed, which includes:
the UR L acquisition module is used for responding to a screenshot request initiated by a user and acquiring a uniform resource locator UR L of a page to be screenshot;
the page loading module loads the page to be captured according to the UR L;
the page preprocessing module is used for preprocessing the page to be captured after the page to be captured is loaded; the preprocessing comprises deleting screenshot interference elements in the page to be screenshot;
and the screenshot executing module is used for carrying out screenshot on the preprocessed page to be subjected to screenshot.
In the technical scheme, whether the page elements related to the screenshot are completely loaded or not is determined in the loading process of the page to be screenshot, so that the loading of useless resources is reduced, the waste of computer resources is reduced on one hand, the useless information in the screenshot result is also reduced on the other hand, and the occupation ratio of the required information in the screenshot result is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with this specification and together with the description, serve to explain the principles.
FIG. 1 is a flowchart illustrating a page screenshot method shown in the present specification;
fig. 2 is a schematic diagram illustrating a method for determining whether a page element related to a screenshot is loaded;
FIG. 3 is an exemplary diagram illustrating an original page and a captured image in comparison to one another;
FIG. 4 is a diagram illustrating an exemplary structure of a page capture device shown in this specification;
FIG. 5 is a block diagram of an example of an electronic device for performing a page screenshot shown in the present specification;
FIG. 6 is a flowchart illustrating a method for screenshot of a page presented in this specification;
FIG. 7 is a diagram illustrating a page comparison before and after page preprocessing, as described herein;
fig. 8 is a schematic flowchart illustrating a process of determining whether a page element related to a screenshot is loaded;
FIG. 9 is a diagram illustrating an exemplary structure of a page capture device shown in this specification;
fig. 10 is a diagram illustrating an example of a structure of an electronic device for performing a page screenshot according to this specification.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in one or more embodiments of the present disclosure, the technical solutions in one or more embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in one or more embodiments of the present disclosure. It is to be understood that the described embodiments are only a few, and not all embodiments. All other embodiments that can be derived by one of ordinary skill in the art from one or more embodiments of the disclosure without making any creative effort shall fall within the protection scope of the present application.
When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of systems and methods consistent with certain aspects of the present description, as detailed in the appended claims.
The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present specification. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
At present, some netizens use the internet to carry out illegal activities such as plagiarism and embezzlement of other people works, rumor manufacturing, and illegal goods selling, which brings bad social influence. In order to carry out duty confirmation on a violator, electronic evidence collection is often required on related pages. The page screenshot is a feasible electronic evidence obtaining method, and the content displayed on the page related to illegal activities is stored in a picture form by using the method and can be used as an electronic evidence.
In practical application, a screenshot mode of calling a webpage screenshot function of a browser is usually adopted to complete screenshot of a page indicated by a user. However, screenshots obtained by the method when processing a long webpage often include much information irrelevant to screenshot intentions, and when processing a webpage of a "rolling load" type (such as microblog, dynamic bar pasting, and the like), problems such as program crash and the like caused by an overlarge webpage size are also caused.
In view of this, the present specification discloses a technical solution for dynamically determining whether page loading needs to be stopped in the process of loading a page to be captured, and capturing a screenshot only on the loaded part after stopping page loading.
When the method is realized, in the loading process of the page to be subjected to screenshot, judging whether page elements related to screenshot in the page to be subjected to screenshot are completely loaded or not, and stopping the loading of the page to be subjected to screenshot in response to the completion of the loading of the page elements related to screenshot; and then, only the loaded part in the page to be screenshot can be subjected to screenshot.
In the technical scheme, on one hand, the problem of program crash when the screenshot is carried out on the ultra-long page can be avoided due to the fact that the page loading is stopped in time; on the other hand, the loading of the stop page is performed on the premise that the image obtained by the final screenshot contains all the content related to the screenshot after the page elements related to the screenshot are loaded, so that the irrelevant content in the obtained image is reduced, the user experience is improved, and the waste of computer resources is reduced.
The present application is described below with reference to specific embodiments and specific application scenarios.
Referring to fig. 1, fig. 1 is a page screenshot method provided in an embodiment of the present specification, where the method includes the following steps:
s101, analyzing a screenshot request initiated by a user to obtain a uniform resource locator UR L of a page to be screenshot;
s102, loading the page to be screenshot according to the UR L;
s103, judging whether the loading of the page elements related to the screenshot is finished in the loading process;
and S104, responding to the completion of loading of the page elements related to the screenshot, stopping the loading, and performing screenshot on the loaded part in the page to be screenshot.
In the present specification, the main body for executing the above method may be selected according to specific situations and specific requirements, and the present specification is not limited; for example, the system may be a cloud server that receives a screenshot request from a user through a network connection, a personal computer of the user, a screenshot request from the user through a communication mechanism between software modules, and the like.
In an embodiment shown, the method is applied to a distributed server cluster, and the screenshot request may include sub-requests corresponding to a plurality of pages to be screenshot; the distributed server cluster can respectively complete the screenshot tasks of the multiple pages to be screenshot according to a preset distribution algorithm.
In the description, a user-initiated screenshot request can be responded, and a uniform resource locator UR L of a page to be screenshot can be obtained, and specifically, the process has multiple implementation modes, and the description is not specifically limited, for example, a UR L field of the page to be screenshot can be directly carried in the screenshot request, the corresponding UR L can be obtained by analyzing the screenshot request, keywords corresponding to the page to be screenshot can also be carried in the screenshot request, and the corresponding UR L can also be indirectly obtained through the keywords.
In an embodiment shown, the screenshot request may carry a character string for indicating a page to be screenshot, and after the character string is obtained from the request, the character string may be further analyzed to obtain UR L of the page to be screenshot, where the analyzing method may be semantic analysis based on natural language, or analysis based on specific codes such as short website, sharing code, and the like, and those skilled in the art may select the method according to specific situations, and the description is not specifically limited;
for example, the screenshot request may carry a character string of an official microblog page indicating "pay for treasure", and after the request is analyzed and the character string is acquired, semantic information of the character string may be further extracted through a semantic analysis algorithm, and a mapping relation table between preset semantic information and UR L is queried according to the semantic information of "pay for treasure" and "microblog", so that UR L of the official microblog page of "pay for treasure" may be acquired as UR L of the page to be screenshot.
In this specification, after the UR L of the page to be screenshot is obtained, the page to be screenshot may be loaded according to the UR L.
In this specification, in the loading process of the page to be screenshot, whether the screenshot related element is completely loaded or not may be determined, and specifically, the trigger mechanism of the determination is not specifically limited in this specification; for example, the triggering may be performed periodically according to a preset time interval, or according to the number of loaded elements in the page, or according to the capacity of the page file, or may combine the above triggering manners freely; for example, the judgment on whether the screenshot related element is completely loaded may be triggered every 100 milliseconds, or the judgment on whether the screenshot related element is completely loaded may be triggered every 20 page elements.
In this specification, the screenshot related element refers specifically to an element related to the purpose of screenshot, and may be specified by a user; specifically, the mode designated by the user can be based on the screenshot request, and can also be preset in the system; for example, a screenshot request initiated by a user may include a character string such as "photograph steal evidence collection" for specifying a screenshot related element as a photograph in a web page; for another example, the user may preset that, for all screenshots of forum pages, all screenshot-related elements may be specified as the speaking content of the forum user, and do not include related popularization information and the like.
In the description, whether the page elements related to the screenshot are loaded or not can be judged according to various different standards, and the description does not need to be specifically limited; for example, whether the page element related to the screenshot is loaded or not can be judged from the perspective of the page element related to the screenshot, and whether the page element related to the screenshot is loaded or not can be indirectly judged from the perspective of the page element unrelated to the screenshot.
Fig. 2 is a schematic diagram illustrating an embodiment of determining whether loading of a page element related to a screenshot is completed, in which in this example, it may be determined whether loading of the page element related to the screenshot is completed by determining whether a loaded element includes a tail element in the page element related to the screenshot; if the loaded elements comprise the tail elements in the page elements related to the screenshot, the page elements related to the screenshot can be determined to be completely loaded; the page elements related to screenshot can be determined by the screenshot request of the user as described above, or determined according to the preset of the user; the page structure of the page sent by the page server is received firstly in the loading process of the page to be captured; for example, the page structure is in the form of html tree structure, so that the last element in the page elements related to the screenshot can be determined by the abstract structure of the page to be screenshot;
for example, if a user specifies that the user comment content in a certain page is to be intercepted, the last element in the user comment content can be determined according to the tree structure of the html file; in the loading process, after the end element is detected to be completely loaded, the page element related to the screenshot can be considered to be completely loaded.
In an embodiment shown, the screenshot display method can be implemented by judging whether the loaded elements include preset target elements for indicating that the screenshot related elements are possibly loaded; if so, the page element related to the screenshot is considered to be completely loaded;
for example, when screenshot evidence is obtained for a certain page embezzled with a photographic picture, since it is determined that APP promotion information at the bottom of the page obviously does not belong to a related page element that needs to be obtained, in the page loading process, it is detected that APP promotion information appears as loaded content, that is, it can be considered that the page element related to screenshot (in this example, an embezzled photographic picture in the page body) has been loaded.
In one embodiment shown, the screenshot-independent elements may be page elements related to advertisements; such as picture advertisements at the bottom of the page, sharing inducing links, recommendation of related articles, etc.; those skilled in the art can automatically specify the specific types of the screenshot-independent elements according to specific requirements.
In this specification, in response to a judgment result that the page element related to the screenshot is completely loaded, the loading of the page to be screenshot may be stopped, and the screenshot may be performed on the loaded part of the page to be screenshot, and a specific screenshot manner may refer to a related technology, which is not limited in this specification.
In an embodiment shown, the final screenshot can be obtained by a sectional screenshot mode; specifically, under the condition that it is determined that the size of the loaded part in the page to be screenshot is greater than the preset size threshold, the loaded part in the page to be screenshot may be divided into a plurality of fragments, the position relationship among the plurality of fragments is recorded, and after the plurality of fragments are respectively screenshot, the screenshots of the plurality of fragments may be spliced into the screenshots of the loaded part in the page to be screenshot according to the recorded position relationship.
In this specification, before screenshot is performed on a loaded part in the page to be screenshot, the loaded part in the page to be screenshot may be preprocessed to obtain a better screenshot effect; specifically, the preprocessing mode may include deleting screenshot interference elements in a page to be screenshot; for example, floating advertisements, recommendation information, shortcut buttons, etc. that may obscure screenshot related elements are deleted.
In one embodiment, the pretreatment may also include other pretreatment methods; for example, a folding element in a page may be expanded such that the folded content is displayed in its entirety and is screenshot; for another example, the display style of the designated element may be changed to enhance the display effect of the designated element; as another example, screenshot markers can be added to elements in the page to highlight content that needs attention, and so on.
Referring to fig. 3, fig. 3 is a diagram illustrating a comparison between an original page of a page to be captured and an image obtained by capturing a picture; in the example of fig. 3, the text images 1 and 2 are page elements related to screenshot, and it can be seen that, after preprocessing, floating advertisements blocking the text images can be removed, the text image 2 originally hidden by folding can be expanded and displayed, the text image 1 needing important prompt is added with screenshot marks, and related recommendation at the tail of the page and buttons of "returning to the first page" and "adding collection" can not appear in the image obtained by final screenshot due to the stop of loading.
In this specification, the screenshot request of the user may also carry other customized information to implement more customized features of the screenshot function.
In an embodiment shown, a screenshot request of a user may carry requirement information indicating a preprocessing mode, and correspondingly, when the preprocessing process is executed, the preprocessing mode may be determined according to the requirement information carried in the screenshot request of the user, and further, the page to be screenshot is preprocessed according to the determined preprocessing mode;
for example, the screenshot request of the user may carry requirement information indicating that floating advertisements need to be removed and hidden texts need to be expanded, and when the preprocessing process is executed, the preprocessing mode that needs to be executed may be determined to include removing interfering content (floating advertisements) and expanding hidden content (texts) according to the requirement information, and preprocessing is correspondingly executed.
In an embodiment shown, a screenshot request initiated by a user may carry a specification identifier for indicating a screenshot specification; therefore, the loaded part in the screenshot page can be subjected to screenshot according to the screenshot specification indicated by the specification identification carried in the screenshot request initiated by the user;
for example, specification identifiers of screenshot specifications such as a picture format, a resolution, a color specification and the like indicating that a user needs a webpage screenshot can be carried in a screenshot request initiated by the user, and during a screenshot stage, a loaded part in a screenshot page can be subjected to screenshot according to the screenshot specification indicated by the specification identifiers.
In this specification, a better screenshot effect can be obtained by further processing an image obtained by screenshot.
In one embodiment shown, a machine learning model for determining the position of interference information in an image can be used to determine the position of the interference information in the image obtained by screenshot in the image, and further remove the interference information at the position from the image through image processing; specifically, the machine learning model may be obtained by training a page screenshot of a plurality of positions marked with the interference element as a training sample;
for example, an image obtained by screenshot for a page, which still contains a certain type of advertisement information, may interfere with the screenshot, and thus a machine learning model for determining the location of the type of advertisement information in the image may be invoked, located from the image, and removed from the image by an image processing algorithm. By the method, the interference information in the screenshot can be removed from the image angle, so that the screenshot with less interference information can be obtained.
The present specification also provides a page screenshot device, please refer to fig. 4, fig. 4 is a structural example diagram of the device; the device includes:
the UR L obtaining module 401, in response to a screenshot request initiated by a user, obtains a uniform resource locator UR L of a page to be screenshot;
the page loading module 402 loads the page to be screenshot according to the UR L, and judges whether the loading of the page elements related to the screenshot is finished in the loading process, wherein the page elements related to the screenshot are the page elements appointed by the user;
the execution module 403, in response to the completion of loading the page element related to the screenshot, stops loading the page to be screenshot, and performs screenshot on the loaded part of the page to be screenshot.
In this specification, the UR L obtaining module 401 may obtain the uniform resource locator UR L of the page to be screenshot based on a screenshot request initiated by a user, and the above process may have various implementation manners, and this specification is not particularly limited, for example, the screenshot request may directly carry the UR L field of the page to be screenshot, and the corresponding UR L may be directly extracted from the screenshot request, or the screenshot request may carry a character string corresponding to the UR L of the page to be screenshot, and then the corresponding UR L is indirectly obtained by means of querying and the like.
In an illustrated embodiment, the screenshot request may carry a character string indicating UR L of the page to be screenshot, and the UR L obtaining module 401 may further obtain UR L indicated by the character string by parsing the character string, for example, if the character string is "abc bar home page", it may determine, according to the character string, that the UR L indicated by the character string is UR L of "abc bar home page", and thus, the UR L of the page to be screenshot is UR L of "abc bar home page", where the manner of determining the corresponding UR L based on the character string may be a manner of keyword query or a manner of semantic analysis, and may specifically be flexibly selected according to actual development requirements, and this specification is not particularly limited.
In this specification, the screenshot related element refers specifically to an element related to the purpose of screenshot, and may be specified by a user; specifically, the mode designated by the user can be based on the screenshot request, and can also be preset in the system; for example, the screenshot request initiated by the user may include a character string such as "photograph steal evidence collection", and the screenshot related elements are the photographed pictures in the web page; for another example, the user may preset, and for the screenshot of the forum page, the screenshot-related element may be the speaking content of the forum user, but does not include the related promotion information, and the like.
In this specification, the page loading module 402 determines whether the page element related to the screenshot is completely loaded, which may be according to various different standards, and this specification does not need to be specifically limited; for example, whether the page element related to the screenshot is loaded or not can be judged from the perspective of the page element related to the screenshot, and whether the page element related to the screenshot is loaded or not can be indirectly judged from the perspective of the page element unrelated to the screenshot.
In an illustrated embodiment, the page loading module 402 may determine whether the page element related to the screenshot is completely loaded by determining whether the loaded element includes an end element in the page element related to the screenshot; if the loaded elements comprise the tail elements in the page elements related to the screenshot, the page elements related to the screenshot can be determined to be completely loaded; the page elements related to screenshot can be determined by the screenshot request of the user as described above, or determined according to the preset screenshot page element type; and because the page to be screenshot first receives the abstract structure of the page in the form of the html tree structure in the loading process, the tail element in the page elements related to the screenshot can be determined by the abstract structure of the page to be screenshot.
In one embodiment shown, page loading module 402 may determine whether the page elements related to the screenshot are loaded completely by determining whether loaded page elements include screenshot-independent elements indicating that the page elements related to the screenshot are loaded completely; if the judgment result is yes, the page elements related to the screenshot are considered to be loaded completely; for example, the screenshot-unrelated element is a page-based advertisement, generally speaking, the occurrence of the page-based advertisement means that all page elements related to the screenshot in the page are completely loaded, and if the page-based advertisement is already loaded, a judgment can be made, that is, all page elements related to the screenshot in the page are completely loaded.
In one embodiment, the screenshot-independent elements may be page elements related to advertisements; such as photo ads at the bottom of the page, share inducement links, related article recommendations, and so forth.
In this specification, the execution module 403, in response to a judgment result that the page element related to the screenshot is completely loaded, may stop loading the page to be screenshot, and perform the screenshot on the loaded part of the page to be screenshot, where a specific screenshot manner may refer to a related technology, and this specification is not limited specifically.
In an embodiment shown, the execution module 403 may obtain a final screenshot in a sectional screenshot manner; specifically, under the condition that it is determined that the size of the loaded part in the page to be screenshot is greater than the preset size threshold, the loaded part in the page to be screenshot may be divided into a plurality of fragments, the position relationship among the plurality of fragments is recorded, and after the plurality of fragments are respectively screenshot, the screenshots of the plurality of fragments may be spliced into the screenshots of the loaded part in the page to be screenshot according to the recorded position relationship.
In this specification, the apparatus may further include a preprocessing module, which preprocesses the loaded part of the page to be screenshot, so as to obtain a better screenshot effect; specifically, the preprocessing mode may include deleting screenshot interference elements in a page to be screenshot; for example, floating advertisements, recommendation information, shortcut buttons, etc. that may obscure screenshot related elements are deleted.
In one embodiment, the pretreatment may also include other pretreatment methods; for example, hidden elements in a page may be expanded such that hidden content (e.g., folded multi-layer reference comments) is fully displayed and screenshot; for another example, the display style of the specified page element can be changed, and the display effect of the specified element can be enhanced; as another example, screenshot markers can be added to elements in the page to highlight content that needs attention, and so on.
In this specification, the screenshot request of the user may also carry other customized information to implement more customized features of the screenshot function.
In an embodiment shown, a screenshot request of a user may carry requirement information indicating a preprocessing mode, and correspondingly, the preprocessing module may determine the preprocessing mode according to the requirement information carried in the screenshot request of the user, and further preprocess the page to be screenshot according to the determined preprocessing mode.
In an embodiment shown, a screenshot request initiated by a user may carry a specification identifier for indicating a screenshot specification; therefore, the execution module 403 may perform screenshot on the loaded part in the screenshot page according to the screenshot specification indicated by the specification identifier carried in the screenshot request initiated by the user.
In this specification, the apparatus may further include an image processing module, and may further process an image obtained by capturing the image, so as to obtain a better image capturing effect.
In an illustrated embodiment, the image processing module may train a machine learning model obtained by using a screenshot of a page marked with a position of a plurality of interference elements as a training sample to further process the screenshot image; specifically, the image may be input into the trained machine learning model to determine the position of the interference element in the image, and the image may be further processed based on the position to delete the interference element.
The implementation process of the functions and actions of each module in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
Embodiments of the present specification further provide a computer device, which at least includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the foregoing page capture method when executing the program.
Fig. 5 is a schematic diagram illustrating a more specific hardware structure of a computing device according to an embodiment of the present disclosure, where the computing device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and execute related programs to implement the technical solutions provided in the embodiments of the present specification.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
Embodiments of the present specification further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the foregoing page screenshot method.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
At present, some netizens use the internet to carry out illegal activities such as plagiarism and embezzlement of other people works, rumor manufacturing, and illegal goods selling, which brings bad social influence. In order to carry out duty confirmation on a violator, electronic evidence collection is often required on related pages. The page screenshot is a feasible electronic evidence obtaining method, and the content displayed on the page related to illegal activities is stored in a picture form by using the method and can be used as an electronic evidence.
In practical application, a screenshot mode of calling a webpage screenshot function of a browser is usually adopted to complete screenshot of a page indicated by a user. However, screenshots obtained by the method when processing complex webpages often include many elements which can interfere with information related to screenshots, so that key information needing to be proved is accidentally shielded.
In view of this, the present specification discloses a technical solution for preprocessing a page to be screenshot before the page is subjected to screenshot.
When the method is realized, after the loading of the page to be subjected to screenshot is finished, preprocessing operation including deleting interference elements is carried out on the page to be subjected to screenshot; and then, screenshot can be carried out on the preprocessed page to be subjected to screenshot.
In the technical scheme, because the interference elements in the original page are removed by means of preprocessing, the condition that the interference elements exist in the screenshot result or the screenshot related elements are accidentally shielded is avoided, and the completeness of the screenshot related elements in the page is ensured.
The present application is described below with reference to specific embodiments and specific application scenarios.
Referring to fig. 6, fig. 6 is a page screenshot method provided in an embodiment of the present specification, where the method performs the following steps:
s601, responding to a screenshot request initiated by a user, and acquiring a uniform resource locator UR L of a page to be screenshot;
s602, loading the page to be screenshot according to the UR L;
s603, after the page to be captured is loaded, preprocessing the page to be captured; the preprocessing comprises deleting screenshot interference elements in the page to be screenshot;
and S604, performing screenshot on the preprocessed page to be screenshot.
In the present specification, the main body for executing the above method may be selected according to specific situations and specific requirements, and the present specification is not limited; for example, the system may be a cloud server that receives a screenshot request from a user through a network connection, a personal computer of the user, a screenshot request from the user through a communication mechanism between software modules, and the like.
In an embodiment shown, the method is applied to a distributed server cluster, and the screenshot request may include sub-requests corresponding to a plurality of pages to be screenshot; the distributed server cluster can respectively complete the screenshot tasks of the multiple pages to be screenshot according to a preset distribution algorithm.
In the description, a user-initiated screenshot request can be responded, and a uniform resource locator UR L of a page to be screenshot can be obtained, and specifically, the process has multiple implementation modes, and the description is not specifically limited, for example, a UR L field of the page to be screenshot can be directly carried in the screenshot request, the corresponding UR L can be obtained by analyzing the screenshot request, keywords corresponding to the page to be screenshot can also be carried in the screenshot request, and the corresponding UR L can also be indirectly obtained through the keywords.
In an embodiment shown, the screenshot request may carry a character string for indicating a page to be screenshot, and after the character string is obtained from the request, the character string may be further analyzed to obtain UR L of the page to be screenshot, where the analyzing method may be semantic analysis based on natural language, or analysis based on specific codes such as short website, sharing code, and the like, and those skilled in the art may select the method according to specific situations, and the description is not specifically limited;
for example, the screenshot request may carry a character string of an official microblog page indicating "pay for treasure", and after the request is analyzed and the character string is acquired, semantic information of the character string may be further extracted through a semantic analysis algorithm, and a mapping relation table between preset semantic information and UR L is queried according to the semantic information of "pay for treasure" and "microblog", so that UR L of the official microblog page of "pay for treasure" may be acquired as UR L of the page to be screenshot.
In the description, after the UR L of the page to be screenshot is obtained, the page to be screenshot can be loaded according to the UR L.
In the description, the page to be subjected to screenshot can be preprocessed to obtain a better screenshot effect; specifically, the preprocessing mode may include deleting screenshot interference elements in a page to be screenshot; for example, floating advertisements, recommendation information, shortcut buttons, etc. that may obscure screenshot related elements are deleted.
In one embodiment, the pretreatment may also include other pretreatment methods; for example, hidden elements in a page may be expanded such that hidden and folded content is displayed and screenshot in its entirety; for another example, the display style of the designated element may be changed to enhance the display effect of the designated element; as another example, screenshot markers can be added to elements in the page to highlight content that needs attention, and so on.
In an embodiment shown, a screenshot request of a user may carry requirement information indicating a preprocessing mode, and correspondingly, when the preprocessing process is executed, the preprocessing mode may be determined according to the requirement information carried in the screenshot request of the user, and further, the page to be screenshot is preprocessed according to the determined preprocessing mode;
for example, the screenshot request of the user may carry requirement information indicating that floating advertisements need to be removed and a folded text needs to be expanded, and when the preprocessing process is executed, the preprocessing mode that needs to be executed may be determined to include removing interfering content (floating advertisements) and expanding the folded content (text) according to the requirement information, and preprocessing is correspondingly executed.
Referring to fig. 7, fig. 7 is a diagram illustrating a comparison example before and after preprocessing a certain page to be captured; in the example of fig. 7, the text images 1 and 2 are page elements related to screenshot, and it can be seen that, after preprocessing, floating advertisements blocking the text images can be removed, the original folded text image 2 can be expanded and displayed, the text image 1 needing important prompt is added with screenshot marks, and related recommendation and "return to the first page" and "add collection" buttons at the tail of the page can also be regarded as interfering elements to be removed, and will not appear in the image obtained by final screenshot.
In this specification, in the process of loading the page to be screenshot, it may be further determined whether loading of the screenshot-related element is completed, and in response to completion of loading of the page element related to the screenshot, the loading of the page to be screenshot may be stopped; specifically, the trigger mechanism of the determination is not specifically limited in this specification; for example, the triggering may be performed periodically according to a preset time interval, or according to the number of loaded elements in the page, or according to the capacity of the page file, or may be freely combined. And the loading of the page is stopped in time, so that the loading of elements irrelevant to screenshot can be reduced on the premise of ensuring that the page elements relevant to screenshot are not lost, and the occupation ratio of the elements relevant to screenshot in the final screenshot is improved.
In this specification, a screenshot related element refers specifically to an element related to a screenshot purpose, which may be specified by a user; specifically, the mode designated by the user can be based on the screenshot request, and can also be preset in the system; for example, a screenshot request initiated by a user may include a character string such as "photograph steal evidence collection" for specifying a screenshot related element as a photograph in a web page; for another example, the user may preset that, for all screenshots of forum pages, all screenshot-related elements may be specified as the speaking content of the forum user, and do not include related popularization information and the like.
In the description, whether the page elements related to the screenshot are loaded or not can be judged according to various different standards, and the description does not need to be specifically limited; for example, whether the page element related to the screenshot is loaded or not can be judged from the perspective of the page element related to the screenshot, and whether the page element related to the screenshot is loaded or not can be indirectly judged from the perspective of the page element unrelated to the screenshot.
Fig. 8 is a schematic diagram illustrating an embodiment of determining whether the page element related to the screenshot is completely loaded, in this example, it may be determined whether the page element related to the screenshot is completely loaded by determining whether the loaded element includes a tail element in the page element related to the screenshot; if the loaded elements comprise the tail elements in the page elements related to the screenshot, the page elements related to the screenshot can be determined to be completely loaded; the page elements related to screenshot can be determined by the screenshot request of the user as described above, or determined according to the preset of the user; the page structure of the page sent by the page server is received firstly in the loading process of the page to be captured; for example, the page structure is in the form of html tree structure, so that the last element in the page elements related to the screenshot can be determined by the abstract structure of the page to be screenshot;
for example, if a user specifies that the user comment content in a certain page is to be intercepted, the last element in the user comment content can be determined according to the tree structure of the html file; in the loading process, after the end element is detected to be completely loaded, the page element related to the screenshot can be considered to be completely loaded.
In an embodiment shown, the screenshot display method can be implemented by judging whether the loaded elements include preset target elements for indicating that the screenshot related elements are possibly loaded; if so, the page element related to the screenshot is considered to be completely loaded;
for example, when screenshot evidence is obtained for a certain page embezzled with a photographic picture, since it is determined that APP promotion information at the bottom of the page obviously does not belong to a related page element that needs to be obtained, in the page loading process, it is detected that APP promotion information appears as loaded content, that is, it can be considered that the page element related to screenshot (in this example, an embezzled photographic picture in the page body) has been loaded.
In one embodiment shown, the screenshot-independent elements may be page elements related to advertisements; such as picture advertisements at the bottom of the page, sharing inducing links, recommendation of related articles, etc.; those skilled in the art can automatically specify the specific types of the screenshot-independent elements according to specific requirements.
In an embodiment shown, the final screenshot can be obtained by a sectional screenshot mode; specifically, under the condition that the size of the preprocessed page to be subjected to screenshot is determined to be larger than a preset size threshold, the page to be subjected to screenshot can be divided into a plurality of fragments, the position relation among the plurality of fragments is recorded, and after the plurality of fragments are respectively subjected to screenshot, screenshots of the plurality of fragments can be spliced into screenshots of the page to be subjected to screenshot according to the recorded position relation.
In this specification, the screenshot request of the user may also carry other customized information to implement more customized features of the screenshot function.
In an embodiment shown, a screenshot request initiated by a user may carry a specification identifier for indicating a screenshot specification; therefore, the screenshot can be performed on the page to be screenshot according to the screenshot specification indicated by the specification identification carried in the screenshot request initiated by the user;
for example, specification identifiers of screenshot specifications such as a picture format, a resolution, a color specification and the like indicating that a user needs a webpage screenshot can be carried in a screenshot request initiated by the user, and during a screenshot stage, a screenshot can be performed on a screenshot page according to the screenshot specification indicated by the specification identifiers.
In this specification, a better screenshot effect can be obtained by further processing an image obtained by screenshot.
In one embodiment shown, a machine learning model for determining the position of interference information in an image can be used to determine the position of the interference information in the image obtained by screenshot in the image, and further remove the interference information at the position from the image through image processing; specifically, the machine learning model may be obtained by training a page screenshot of a plurality of positions marked with the interference element as a training sample;
for example, an image obtained by screenshot for a page, which still contains a certain type of advertisement information, may interfere with the screenshot, and thus a machine learning model for determining the location of the type of advertisement information in the image may be invoked, located from the image, and removed from the image by an image processing algorithm. By the method, the interference information in the screenshot can be removed from the image angle, so that the screenshot with less interference information can be obtained.
The present specification also provides a page screenshot device, please refer to fig. 9, fig. 9 is a structural example diagram of the device; the device includes:
a UR L obtaining module 901, which responds to a screenshot request initiated by a user, and obtains a uniform resource locator UR L of a page to be screenshot;
the page loading module 902 loads the page to be captured according to the UR L;
the page preprocessing module 903 is used for preprocessing the page to be captured after the page to be captured is loaded; the preprocessing comprises deleting screenshot interference elements in the page to be screenshot;
and a screenshot executing module 904, which is used for performing screenshot on the preprocessed page to be screenshot.
In this specification, the UR L obtaining module 901 may obtain the uniform resource locator UR L of the page to be screenshot in response to a screenshot request initiated by a user, specifically, the process has various implementation manners, and this specification is not particularly limited, for example, the UR L field of the page to be screenshot may be directly carried in the screenshot request, the corresponding UR L may be obtained by analyzing the screenshot request, the keyword corresponding to the page to be screenshot may also be carried in the screenshot request, and the corresponding UR L may also be indirectly obtained by the keyword.
In an embodiment shown, the screenshot request may carry a character string for indicating a page to be screenshot, and after the character string is obtained from the request, the character string may be further analyzed to obtain UR L of the page to be screenshot, where the analyzing method may be semantic analysis based on natural language, or analysis based on specific codes such as short website, sharing code, and the like, and those skilled in the art may select the method according to specific situations, and the description is not specifically limited;
for example, the screenshot request may carry a character string of an official microblog page indicating "pay for treasure", and after the request is analyzed and the character string is acquired, semantic information of the character string may be further extracted through a semantic analysis algorithm, and a mapping relation table between preset semantic information and UR L is queried according to the semantic information of "pay for treasure" and "microblog", so that UR L of the official microblog page of "pay for treasure" may be acquired as UR L of the page to be screenshot.
In this specification, a page preprocessing module 903 in the device preprocesses the page to be captured to obtain a better capture effect; specifically, the preprocessing mode may include deleting screenshot interference elements in a page to be screenshot; for example, floating advertisements, recommendation information, shortcut buttons, etc. that may obscure screenshot related elements are deleted.
In one embodiment, the pretreatment may also include other pretreatment methods; for example, hidden elements in a page may be expanded such that hidden content (e.g., folded multi-layer reference comments) is fully displayed and screenshot; for another example, the display style of the specified page element can be changed, and the display effect of the specified element can be enhanced; as another example, screenshot markers can be added to elements in the page to highlight content that needs attention, and so on.
In an embodiment shown, a screenshot request of a user may carry requirement information indicating a preprocessing mode, and correspondingly, the preprocessing module may determine the preprocessing mode according to the requirement information carried in the screenshot request of the user, and further preprocess the page to be screenshot according to the determined preprocessing mode.
In this specification, the apparatus may further include a dynamic loading module, which determines whether the loading of the page element related to the screenshot is completed in the loading process; and stopping loading the page to be screenshot in response to the completion of loading the page elements related to the screenshot.
In this specification, a screenshot related element refers specifically to an element related to a screenshot purpose, which may be specified by a user; specifically, the mode designated by the user can be based on the screenshot request, and can also be preset in the system; for example, a screenshot request initiated by a user may include a character string such as "photograph steal evidence collection" for specifying a screenshot related element as a photograph in a web page; for another example, the user may preset that, for all screenshots of forum pages, all screenshot-related elements may be specified as the speaking content of the forum user, and do not include related popularization information and the like.
In the specification, the dynamic loading module judges whether the loading of the page elements related to the screenshot is finished or not according to various different standards, and the specification does not need to be specifically limited; for example, whether the page element related to the screenshot is loaded or not can be judged from the perspective of the page element related to the screenshot, and whether the page element related to the screenshot is loaded or not can be indirectly judged from the perspective of the page element unrelated to the screenshot.
In an embodiment shown in the foregoing, the dynamic loading module may further determine whether the page element related to the screenshot is completely loaded by judging whether the loaded element includes a tail element in the page element related to the screenshot; if the loaded elements comprise the tail elements in the page elements related to the screenshot, the page elements related to the screenshot can be determined to be completely loaded; the page elements related to screenshot can be determined by the screenshot request of the user as described above, or determined according to the preset screenshot page element type; and because the page to be screenshot first receives the abstract structure of the page in the form of the html tree structure in the loading process, the tail element in the page elements related to the screenshot can be determined by the abstract structure of the page to be screenshot.
In an embodiment shown, the dynamic loading module may further determine whether the page elements related to the screenshot are completely loaded by determining whether the loaded page elements include a screenshot-independent element indicating that the page elements related to the screenshot are completely loaded; if the judgment result is yes, the page elements related to the screenshot are considered to be loaded completely; for example, the screenshot-unrelated element is a page-based advertisement, generally speaking, the occurrence of the page-based advertisement means that all page elements related to the screenshot in the page are completely loaded, and if the page-based advertisement is already loaded, a judgment can be made, that is, all page elements related to the screenshot in the page are completely loaded.
In one embodiment, the screenshot-independent elements may be page elements related to advertisements; such as photo ads at the bottom of the page, share inducement links, related article recommendations, and so forth.
In this specification, the screenshot executing module 904 may perform screenshot on the preprocessed page to be subjected to screenshot, and a specific screenshot manner may refer to a related technology, which is not limited in this specification.
In an embodiment shown, the screenshot executing module 904 may obtain a final screenshot in a sectional screenshot manner; specifically, under the condition that the size of the preprocessed page to be subjected to screenshot is larger than a preset size threshold, the page to be preprocessed to be subjected to screenshot can be divided into a plurality of fragments, the position relation among the plurality of fragments is recorded, after the plurality of fragments are respectively subjected to screenshot, the screenshots of the plurality of fragments can be spliced into the screenshot of the preprocessed page to be subjected to screenshot according to the recorded position relation.
In this specification, the screenshot request of the user may also carry other customized information to implement more customized features of the screenshot function.
In an embodiment shown, a screenshot request initiated by a user may carry a specification identifier for indicating a screenshot specification; therefore, the screenshot executing module 904 can perform screenshot on the preprocessed page to be screenshot according to the screenshot specification indicated by the specification identifier carried in the screenshot request initiated by the user.
In this specification, the apparatus may further include an image processing module, and may further process an image obtained by capturing the image, so as to obtain a better image capturing effect.
In an illustrated embodiment, the image processing module may train a machine learning model obtained by using a screenshot of a page marked with a position of a plurality of interference elements as a training sample to further process the screenshot image; specifically, the image may be input into the trained machine learning model to determine the position of the interference element in the image, and the image may be further processed based on the position to delete the interference element.
The implementation process of the functions and actions of each module in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
Embodiments of the present specification further provide a computer device, which at least includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the foregoing page capture method when executing the program.
Fig. 10 is a more specific hardware structure diagram of a computing device provided in an embodiment of the present specification, where the device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and execute related programs to implement the technical solutions provided in the embodiments of the present specification.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
Embodiments of the present specification further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the foregoing page screenshot method.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
From the above description of the embodiments, it is clear to those skilled in the art that the embodiments of the present disclosure can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the embodiments of the present specification may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments of the present specification.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points. The above-described apparatus embodiments are merely illustrative, and the modules described as separate components may or may not be physically separate, and the functions of the modules may be implemented in one or more software and/or hardware when implementing the embodiments of the present disclosure. And part or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The foregoing is only a specific embodiment of the embodiments of the present disclosure, and it should be noted that, for those skilled in the art, a plurality of modifications and decorations can be made without departing from the principle of the embodiments of the present disclosure, and these modifications and decorations should also be regarded as the protection scope of the embodiments of the present disclosure.

Claims (46)

1. A page screenshot method comprises the following steps:
responding to a screenshot request initiated by a user, and acquiring a uniform resource locator UR L of a page to be screenshot;
loading the page to be screenshot according to the UR L, and judging whether the loading of the page elements related to the screenshot is finished in the loading process, wherein the page elements related to the screenshot are the page elements appointed by the user;
and in response to the completion of the loading of the page elements related to the screenshot, stopping the loading of the page to be screenshot, and performing screenshot on the loaded part of the page to be screenshot.
2. The method of claim 1, the user-initiated screenshot request carrying the UR L or a string indicating UR L of a page to be screenshot;
the acquiring the uniform resource locator UR L of the page to be captured includes:
obtaining the UR L carried in the screenshot request initiated by the user, or,
and acquiring the character string carried in the screenshot request initiated by the user, and analyzing the character string to acquire the UR L indicated by the character string.
3. The method of claim 1, wherein the determining whether the page element associated with the screenshot is loaded comprises:
judging whether the loaded page elements contain screenshot-irrelevant elements indicating that the page elements relevant to the screenshot are loaded completely;
if yes, determining that the page elements related to the screenshot are loaded completely.
4. The method of claim 3, the screenshot independent elements comprising page elements related to advertisements.
5. The method of claim 1, wherein the determining whether the page element associated with the screenshot is loaded comprises:
determining a tail element in page elements related to screenshot based on the page structure of the page to be screenshot;
judging whether the loaded element comprises the tail element or not;
and if the loaded element comprises the tail element, determining that the page element related to the screenshot is completely loaded.
6. The method of claim 1, before screenshot the loaded portion of the page to be screenshot, further comprising:
preprocessing the loaded part in the page to be captured; and the preprocessing comprises deleting screenshot interference elements in the page to be screenshot.
7. The method of claim 6, wherein the pre-processing further comprises any one or a combination of pre-processing modes shown as follows:
expanding hidden elements in the page;
changing the display style of the specified page element;
adding screenshot marks to elements in the page.
8. The method of claim 6, wherein the screenshot request of the user carries requirement information indicating a preprocessing mode;
the preprocessing the page to be captured comprises the following steps:
and determining a preprocessing mode according to the requirement information carried in the screenshot request of the user, and preprocessing the page to be screenshot according to the determined preprocessing mode.
9. The method of claim 1, wherein the screenshot of the loaded portion of the page to be screenshot comprises:
determining whether the size of the loaded part in the page to be captured is larger than a preset threshold value;
if yes, dividing the loaded part in the page to be screenshot into a plurality of fragments, and recording the position relation among the fragments;
respectively carrying out screenshot on the plurality of fragments;
and splicing the screenshots of the plurality of fragments into screenshots of the loaded parts in the page to be subjected to screenshot according to the recorded position relation.
10. The method of claim 1, further comprising:
inputting the image of the page to be subjected to screenshot, which is obtained by screenshot, into an identification model so as to identify the position of an interference element in the image obtained by screenshot; the identification model is a machine learning model obtained by training a page screenshot of positions marked with a plurality of interference elements as a training sample;
and deleting the page element positioned at the identified position in the image obtained by screenshot as an interference element.
11. The method of claim 1, wherein the screenshot request initiated by the user carries a specification identifier for indicating a screenshot specification;
the screenshot of the loaded part in the page to be screenshot comprises the following steps:
and performing screenshot on the loaded part in the page to be screenshot according to the screenshot specification indicated by the specification identification carried in the screenshot request initiated by the user.
12. A page screenshot method comprises the following steps:
responding to a screenshot request initiated by a user, and acquiring a uniform resource locator UR L of a page to be screenshot;
loading the page to be captured according to the UR L;
after the page to be subjected to screenshot is loaded, preprocessing the page to be subjected to screenshot; the preprocessing comprises deleting screenshot interference elements in the page to be screenshot;
and performing screenshot on the preprocessed page to be subjected to screenshot.
13. The method of claim 12, the user-initiated screenshot request carrying the UR L, or a string indicating a page to be screenshot UR L;
the acquiring the uniform resource locator UR L of the page to be captured includes:
obtain the UR L carried in the user-initiated screenshot request, or,
and acquiring the character string carried in the screenshot request initiated by the user, and analyzing the character string to acquire the UR L address indicated by the character string.
14. The method of claim 12, wherein the pre-processing further comprises any one or a combination of pre-processing modes shown as follows:
expanding hidden elements in the page;
changing the display style of the specified page element;
adding screenshot marks to elements in the page.
15. The method of claim 12, wherein the screenshot request of the user carries requirement information indicating a preprocessing mode;
the preprocessing the page to be captured comprises the following steps:
and determining a preprocessing mode according to the requirement information carried in the screenshot request of the user, and preprocessing the page to be screenshot according to the determined preprocessing mode.
16. The method of claim 12, further comprising:
judging whether the page elements related to screenshot are completely loaded or not in the process of loading the page to be screenshot; the page elements related to the screenshot are page elements appointed by a user;
and stopping the loading in response to the completion of the loading of the page elements related to the screenshot.
17. The method of claim 16, wherein the determining whether the page element associated with the screenshot is loaded comprises:
judging whether the loaded page elements contain screenshot-irrelevant elements indicating that the page elements relevant to the screenshot are loaded completely;
if yes, determining that the page elements related to the screenshot are loaded completely.
18. The method of claim 17, the screenshot independent elements comprising page elements related to advertisements.
19. The method of claim 16, wherein the determining whether the page element associated with the screenshot is loaded comprises:
determining a tail element in page elements related to screenshot based on the page structure of the page to be screenshot;
judging whether the loaded element comprises the tail element or not;
and if the loaded element comprises the tail element, determining that the page element related to the screenshot is completely loaded.
20. The method of claim 12, wherein the screenshot of the preprocessed page to be screenshot comprises:
determining whether the size of the preprocessed page to be subjected to screenshot is larger than a preset size threshold value;
if yes, dividing the preprocessed page to be subjected to screenshot into a plurality of fragments, and recording the position relation among the fragments;
respectively carrying out screenshot on the plurality of fragments;
and splicing the screenshots of the plurality of fragments into the screenshots of the preprocessed page to be subjected to screenshot according to the recorded position relation.
21. The method of claim 12, further comprising:
inputting the image of the page to be subjected to screenshot, which is obtained by screenshot, into an identification model so as to identify the position of an interference element in the image obtained by screenshot; the identification model is a machine learning model obtained by training a page screenshot of positions marked with a plurality of interference elements as a training sample;
and deleting the page element positioned at the identified position in the image obtained by screenshot as an interference element.
22. The method of claim 12, wherein the user-initiated screenshot request carries a specification identifier for indicating a screenshot specification;
the screenshot of the preprocessed page to be subjected to screenshot comprises the following steps:
and performing screenshot on the preprocessed page to be subjected to screenshot according to the screenshot specification indicated by the specification identification carried in the screenshot request initiated by the user.
23. A page screen capture apparatus comprising:
the UR L acquisition module is used for responding to a screenshot request initiated by a user and acquiring a uniform resource locator UR L of a page to be screenshot;
the page loading module is used for loading the page to be screenshot according to the UR L and judging whether the loading of the page elements related to the screenshot is finished or not in the loading process, wherein the page elements related to the screenshot are the page elements appointed by the user;
and the execution module is used for responding to the completion of loading of the page elements related to the screenshot, stopping the loading of the page to be screenshot, and performing screenshot on the loaded part in the page to be screenshot.
24. The apparatus of claim 23, the user-initiated screenshot request carrying the UR L or a string indicating UR L of a to-be-screenshot page;
the UR L acquisition module further:
obtaining the UR L carried in the screenshot request initiated by the user, or,
and acquiring the character string carried in the screenshot request initiated by the user, and analyzing the character string to acquire the UR L indicated by the character string.
25. The apparatus of claim 23, the page load module further to:
judging whether the loaded page elements contain screenshot-irrelevant elements indicating that the page elements relevant to the screenshot are loaded completely;
if yes, determining that the page elements related to the screenshot are loaded completely.
26. The apparatus of claim 25, the screenshot independent elements comprising a page element related to an advertisement.
27. The apparatus of claim 23, the page load module further to:
determining a tail element in page elements related to screenshot based on the page structure of the page to be screenshot;
judging whether the loaded element comprises the tail element or not;
and if the loaded element comprises the tail element, determining that the page element related to the screenshot is completely loaded.
28. The apparatus of claim 23, the apparatus further comprising:
the preprocessing module is used for preprocessing the loaded part in the page to be captured; and the preprocessing comprises deleting screenshot interference elements in the page to be screenshot.
29. The apparatus of claim 28, wherein the pre-processing further comprises any one or a combination of pre-processing modes shown as follows:
expanding hidden elements in the page;
changing the display style of the specified page element;
adding screenshot marks to elements in the page.
30. The apparatus of claim 28, wherein the screenshot request of the user carries requirement information indicating a preprocessing mode;
the pre-processing module further:
and determining a preprocessing mode according to the requirement information carried in the screenshot request of the user, and preprocessing the page to be screenshot according to the determined preprocessing mode.
31. The apparatus of claim 23, the execution module further to:
determining whether the size of the loaded part in the page to be captured is larger than a preset threshold value;
if yes, dividing the loaded part in the page to be screenshot into a plurality of fragments, and recording the position relation among the fragments;
respectively carrying out screenshot on the plurality of fragments;
and splicing the screenshots of the plurality of fragments into screenshots of the loaded parts in the page to be subjected to screenshot according to the recorded position relation.
32. The apparatus of claim 23, further comprising an image processing module,
inputting the image of the page to be subjected to screenshot, which is obtained by screenshot, into an identification model so as to identify the position of an interference element in the image obtained by screenshot; the identification model is a machine learning model obtained by training a page screenshot of positions marked with a plurality of interference elements as a training sample;
and deleting the page element positioned at the identified position in the image obtained by screenshot as an interference element.
33. The apparatus of claim 23, wherein the screenshot request initiated by the user carries a specification identifier for indicating a screenshot specification;
the execution module further:
and stopping loading in response to the completion of loading of page elements related to screenshot, and performing screenshot on the loaded part in the page to be screenshot according to the screenshot specification indicated by the specification identification carried in the screenshot request initiated by the user.
34. A page screen capture apparatus comprising:
the UR L acquisition module is used for responding to a screenshot request initiated by a user and acquiring a uniform resource locator UR L of a page to be screenshot;
the page loading module loads the page to be captured according to the UR L;
the page preprocessing module is used for preprocessing the page to be captured after the page to be captured is loaded; the preprocessing comprises deleting screenshot interference elements in the page to be screenshot;
and the screenshot executing module is used for carrying out screenshot on the preprocessed page to be subjected to screenshot.
35. The apparatus of claim 34, the user-initiated screenshot request carries the UR L, or a string indicating a page to be screenshot UR L;
the UR L acquisition module further:
obtain the UR L carried in the user-initiated screenshot request, or,
and acquiring the character string carried in the screenshot request initiated by the user, and analyzing the character string to acquire the UR L address indicated by the character string.
36. The apparatus of claim 34, the pre-processing further comprising any one or a combination of pre-processing means shown below:
expanding hidden elements in the page;
changing the display style of the specified page element;
adding screenshot marks to elements in the page.
37. The apparatus of claim 34, wherein the screenshot request of the user carries requirement information indicating a preprocessing mode;
the page pre-processing module further:
and determining a preprocessing mode according to the requirement information carried in the screenshot request of the user, and preprocessing the page to be screenshot according to the determined preprocessing mode.
38. The apparatus of claim 34, further comprising a dynamic loading module,
judging whether the page elements related to screenshot are completely loaded or not in the process of loading the page to be screenshot; the page elements related to the screenshot are page elements appointed by a user;
and stopping the loading in response to the completion of the loading of the page elements related to the screenshot.
39. The apparatus of claim 38, the dynamic loading module further to:
judging whether the loaded page elements contain screenshot-irrelevant elements indicating that the page elements relevant to the screenshot are loaded completely;
if yes, determining that the page elements related to the screenshot are loaded completely.
40. The apparatus of claim 39, the screenshot independent elements comprising a page element related to an advertisement.
41. The apparatus of claim 38, the dynamic loading module further to:
determining a tail element in page elements related to screenshot based on the page structure of the page to be screenshot;
judging whether the loaded element comprises the tail element or not;
and if the loaded element comprises the tail element, determining that the page element related to the screenshot is completely loaded.
42. The apparatus of claim 34, the screenshot execution module further to:
determining whether the size of the preprocessed page to be subjected to screenshot is larger than a preset size threshold value;
if yes, dividing the preprocessed page to be subjected to screenshot into a plurality of fragments, and recording the position relation among the fragments;
respectively carrying out screenshot on the plurality of fragments;
and splicing the screenshots of the plurality of fragments into the screenshots of the preprocessed page to be subjected to screenshot according to the recorded position relation.
43. The apparatus of claim 34, further comprising an image processing module,
inputting the image of the page to be subjected to screenshot, which is obtained by screenshot, into an identification model so as to identify the position of an interference element in the image obtained by screenshot; the identification model is a machine learning model obtained by training a page screenshot of positions marked with a plurality of interference elements as a training sample;
and deleting the page element positioned at the identified position in the image obtained by screenshot as an interference element.
44. The apparatus of claim 34, wherein the user-initiated screenshot request carries a specification identifier indicating a screenshot specification;
the screenshot executing module further:
and performing screenshot on the preprocessed page to be subjected to screenshot according to the screenshot specification indicated by the specification identification carried in the screenshot request initiated by the user.
45. A computer device comprising at least a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 11 when executing the program.
46. A computer device comprising at least a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 12 to 22 when executing the program.
CN202010200265.1A 2020-03-20 2020-03-20 Page screenshot method and device Pending CN111428162A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010200265.1A CN111428162A (en) 2020-03-20 2020-03-20 Page screenshot method and device
PCT/CN2020/140556 WO2021184896A1 (en) 2020-03-20 2020-12-29 Page screenshot method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010200265.1A CN111428162A (en) 2020-03-20 2020-03-20 Page screenshot method and device

Publications (1)

Publication Number Publication Date
CN111428162A true CN111428162A (en) 2020-07-17

Family

ID=71549674

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010200265.1A Pending CN111428162A (en) 2020-03-20 2020-03-20 Page screenshot method and device

Country Status (2)

Country Link
CN (1) CN111428162A (en)
WO (1) WO2021184896A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112596833A (en) * 2020-12-21 2021-04-02 北京鸿腾智能科技有限公司 Webpage screenshot generating method, device, equipment and storage medium
WO2021184896A1 (en) * 2020-03-20 2021-09-23 支付宝(杭州)信息技术有限公司 Page screenshot method and device
CN114047985A (en) * 2021-10-21 2022-02-15 盐城金堤科技有限公司 Screenshot method, screenshot device, storage medium and electronic equipment

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691962B (en) * 2022-04-25 2024-04-19 清华大学 Mobile terminal page crawler method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140344658A1 (en) * 2013-05-15 2014-11-20 Microsoft Corporation Enhanced links in curation and collaboration applications
CN106775298A (en) * 2016-11-28 2017-05-31 北京小米移动软件有限公司 The processing method and processing device of sectional drawing
CN109033466A (en) * 2018-08-31 2018-12-18 掌阅科技股份有限公司 Page sharing method calculates equipment and computer storage medium
CN110704784A (en) * 2019-10-10 2020-01-17 深圳前海微众银行股份有限公司 Web page screen capturing method, device, equipment and computer readable storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020231A (en) * 2017-07-25 2019-07-16 阿里巴巴集团控股有限公司 Webpage capture method and device thereof
CN110020240A (en) * 2017-09-28 2019-07-16 北京国双科技有限公司 A kind of webpage capture method, apparatus, storage medium and processor
CN108595583B (en) * 2018-04-18 2022-12-02 平安科技(深圳)有限公司 Dynamic graph page data crawling method, device, terminal and storage medium
CN110889072B (en) * 2019-11-21 2023-09-26 深圳前海环融联易信息科技服务有限公司 Screenshot method and device for removing webpage advertisements, computer equipment and storage medium
CN111428162A (en) * 2020-03-20 2020-07-17 支付宝(杭州)信息技术有限公司 Page screenshot method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140344658A1 (en) * 2013-05-15 2014-11-20 Microsoft Corporation Enhanced links in curation and collaboration applications
CN106775298A (en) * 2016-11-28 2017-05-31 北京小米移动软件有限公司 The processing method and processing device of sectional drawing
CN109033466A (en) * 2018-08-31 2018-12-18 掌阅科技股份有限公司 Page sharing method calculates equipment and computer storage medium
CN110704784A (en) * 2019-10-10 2020-01-17 深圳前海微众银行股份有限公司 Web page screen capturing method, device, equipment and computer readable storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021184896A1 (en) * 2020-03-20 2021-09-23 支付宝(杭州)信息技术有限公司 Page screenshot method and device
CN112596833A (en) * 2020-12-21 2021-04-02 北京鸿腾智能科技有限公司 Webpage screenshot generating method, device, equipment and storage medium
CN114047985A (en) * 2021-10-21 2022-02-15 盐城金堤科技有限公司 Screenshot method, screenshot device, storage medium and electronic equipment

Also Published As

Publication number Publication date
WO2021184896A1 (en) 2021-09-23

Similar Documents

Publication Publication Date Title
CN107256109B (en) Information display method and device and terminal
CN111428162A (en) Page screenshot method and device
CN105210051B (en) Estimate the method and system of the visibility of content item
US8955739B2 (en) Barcode scanner on webpage
CN108427731B (en) Page code processing method and device, terminal equipment and medium
US9934206B2 (en) Method and apparatus for extracting web page content
CN106033450B (en) Advertisement blocking method and device and browser
CN112433923A (en) Backtracking file generation method, backtracking method and equipment
CN102929971A (en) Multimedia information playing method and system
CN104899203B (en) Webpage generation method and device and terminal equipment
CN115422334A (en) Information processing method, device, electronic equipment and storage medium
CN112947900B (en) Web application development method and device, server and development terminal
CN112187622A (en) Instant message display method and device and server
CN111131419A (en) Information pushing method and server based on book pages
CN112667934A (en) Dynamic simulation diagram display method and device, electronic equipment and computer readable medium
CN106776634A (en) A kind of method for network access, device and terminal device
CN112433778A (en) Mobile equipment page display method and device, electronic equipment and storage medium
CN108509229A (en) Method, terminal device and the computer readable storage medium of the cross-domain control of window
CN105589870B (en) Method and system for filtering webpage advertisements
CN110909155B (en) Book order generation method, calculation device and computer storage medium
CN113343137A (en) Optimized SEO page generation method and device, electronic equipment and storage medium
CN112950167A (en) Design service matching method, device, equipment and storage medium
CN111338709A (en) Target scene skipping method, device, equipment and storage medium in client
US20210021639A1 (en) Method and electronic device for displaying web page
CN111125605A (en) Page element acquisition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination