WO2021184896A1 - Page screenshot method and device - Google Patents

Page screenshot method and device Download PDF

Info

Publication number
WO2021184896A1
WO2021184896A1 PCT/CN2020/140556 CN2020140556W WO2021184896A1 WO 2021184896 A1 WO2021184896 A1 WO 2021184896A1 CN 2020140556 W CN2020140556 W CN 2020140556W WO 2021184896 A1 WO2021184896 A1 WO 2021184896A1
Authority
WO
WIPO (PCT)
Prior art keywords
screenshot
page
loaded
preprocessing
elements
Prior art date
Application number
PCT/CN2020/140556
Other languages
French (fr)
Chinese (zh)
Inventor
韩喆
王春霈
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2021184896A1 publication Critical patent/WO2021184896A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation

Definitions

  • This application relates to the field of computer application technology, and in particular to a method and device for screenshots of pages.
  • Page screenshots are a feasible method of electronic forensics. By using this method, the content displayed on pages related to illegal activities can be saved in the form of pictures, which can then be used as electronic evidence.
  • This application discloses a method and device for screenshots of a page.
  • a method for screenshotting a page which includes: in response to a screenshot request initiated by a user, obtaining the uniform resource locator URL of the page to be screenshotted; loading the page to be screenshotted according to the URL, And in the loading process, it is judged whether the page elements related to the screenshot are loaded; wherein the page elements related to the screenshot are the page elements specified by the user; in response to the page elements related to the screenshot being loaded, the loading of the page to be screenshot is stopped. , And take a screenshot of the loaded part of the page to be screenshot.
  • a method for screenshotting a page including: in response to a screenshot request initiated by a user, obtaining a uniform resource locator URL of the page to be screenshotted; and loading the page to be screenshotted according to the URL; When the loading of the page to be screenshot is completed, preprocessing the page to be screenshot; the preprocessing includes deleting the screenshot interference elements in the page to be screenshot; and taking a screenshot of the page to be screenshot after the preprocessing is completed.
  • a page screenshot device including: a URL acquisition module, which, in response to a screenshot request initiated by a user, acquires the uniform resource locator URL of the page to be screenshot; The URL loads the page to be screenshot, and during the loading process it is judged whether the page elements related to the screenshot have been loaded; wherein the page elements related to the screenshot are page elements specified by the user; the execution module responds to the loading of the page elements related to the screenshot When finished, stop the loading of the page to be screenshot, and take a screenshot of the loaded part of the page to be screenshot.
  • a page screenshot device including: a URL acquisition module, which, in response to a screenshot request initiated by a user, acquires the uniform resource locator URL of the page to be screenshot; URL, load the page to be screenshot; page preprocessing module, when the page to be screenshot loaded is completed, preprocess the page to be screenshot; the preprocessing includes deleting the screenshot interference elements in the page to be screenshot ; Screenshot execution module to take screenshots of the pre-processed page to be screenshot.
  • Fig. 1 is an example flow chart of a page screenshot method shown in this specification
  • Fig. 2 is a schematic diagram of judging whether page elements related to screenshots are loaded as shown in this specification
  • Figure 3 is an example diagram showing the comparison between the original page shown in this manual and the image obtained by the screenshot;
  • Fig. 4 is a structural example diagram of a page screenshot device shown in this specification.
  • FIG. 5 is a structural example diagram of an electronic device for taking screenshots of a page shown in this specification
  • Fig. 6 is an example flow chart of a page screenshot method shown in this specification.
  • FIG. 7 is an example diagram of page comparison before and after page preprocessing shown in this specification.
  • Fig. 8 is a schematic diagram of a process for judging whether page elements related to screenshots have been loaded as shown in this specification;
  • Fig. 9 is a structural example diagram of a page screenshot device shown in this specification.
  • Fig. 10 is a structural example diagram of an electronic device for taking a screenshot of a page shown in this specification.
  • first, second, third, etc. may be used in this specification to describe various information, the information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as second information, and similarly, the second information may also be referred to as first information.
  • word “if” as used herein can be interpreted as "when” or “when” or "in response to determination”.
  • Page screenshots are a feasible method of electronic forensics. By using this method, the content displayed on pages related to illegal activities can be saved in the form of pictures, which can then be used as electronic evidence.
  • the screenshot is usually taken by calling the webpage screenshot function of the browser to complete the screenshot of the page indicated by the user.
  • the screenshots obtained by this method when processing long webpages often include a lot of information that is not related to the intent of the screenshot.
  • the page size may be too large. Causes problems such as program crashes.
  • this specification discloses a technical solution that dynamically determines whether the page loading needs to be stopped during the loading process of the page to be screenshotted, and after the page loading is stopped, only the part that has been loaded is captured.
  • FIG. 1 is a page screenshot method provided by an embodiment of this specification, and the method executes steps S101 to S103.
  • S101 Parse the screenshot request initiated by the user, and obtain the uniform resource locator URL of the page to be screenshot.
  • This manual does not need to be limited; for example, it can be a cloud server that receives screenshot requests from users through a network connection, or it can be a user’s personal computer. Receive user's screenshot request through the communication mechanism between software modules, and so on.
  • the above method is applied to a distributed server cluster
  • the above screenshot request may include sub-requests corresponding to multiple pages to be screenshot; the above distributed server cluster may complete the matching according to a preset allocation algorithm.
  • the uniform resource locator URL of the page to be screenshot can be obtained.
  • the URL field of the page to be screenshot can be directly carried in the screenshot request, and the corresponding URL can be obtained by parsing the screenshot request, and the corresponding URL can also be carried in the screenshot request.
  • the corresponding URL can also be obtained indirectly through the keyword.
  • the above screenshot request may carry a character string for indicating the page to be screenshot, after obtaining the character string from the request, the character string can be further parsed to obtain the URL of the page to be screenshot ;
  • the analysis method can be semantic analysis based on natural language, or analysis based on specific codes such as short URLs, sharing codes, etc. Those skilled in the art can choose according to specific circumstances, and this specification does not specifically limit it;
  • the screenshot request can carry "Alipay Weibo", a string indicating the official Weibo page of "Alipay.”
  • Semantic information using the semantic information of "Alipay” and "Weibo” to query the preset mapping table of semantic information and URL, the URL of the official Weibo page of "Alipay” can be obtained as the URL of the page to be captured.
  • the page to be captured can be loaded according to the URL.
  • This process can be loaded using a normal browser or a Headless browser, which does not need to be limited in this application, and can be determined by those skilled in the art according to specific needs.
  • the trigger mechanism of this judgment is not specifically limited in this manual; for example, it can be based on a preset time interval. Periodic triggering can also be triggered based on the number of loaded elements on the page, or based on the capacity of the page file, or can be freely combined with the above-mentioned multiple triggering methods; for example, it can be triggered every 100 milliseconds. Whether the screenshot-related elements have been loaded is judged, and it can also be triggered every time 20 page elements are loaded to judge whether the above-mentioned screenshot-related elements have been loaded, and so on.
  • screenshot-related elements specifically refer to elements related to the purpose of the screenshot, which can be specified by the user; specifically, the user-specified method can be based on the screenshot request or preset in the system; for example, a user-initiated screenshot request Can include strings such as "photographic stealing forensics", which are used to specify screenshot-related elements as photographic pictures in webpages; for example, users can preset that for all forum page screenshots, screenshot-related elements are It can be specified as the content of the forum user's speech, excluding related promotion information and so on.
  • judging whether the screenshot-related page elements are loaded can be based on a variety of different standards, and this manual does not need to be specifically limited; for example, you can judge whether the screenshot-related page elements have been loaded from the perspective of the page elements themselves, or The angle of page elements that are not related to the screenshot is an indirect judgment on whether the page elements related to the screenshot have been loaded.
  • Figure 2 is a schematic diagram showing an implementation manner for judging whether screenshot-related page elements have been loaded.
  • the screenshot-related page elements can be determined by judging whether the loaded elements include the last element in the screenshot-related page elements Whether the page elements of the screenshot have been loaded; if the loaded elements include the last element in the page elements related to the screenshot, it can be determined that the page elements related to the screenshot have been loaded; among them, the page elements related to the screenshot can be determined by the user as described above.
  • the screenshot request is determined, or determined according to the user's preset; and because the page to be captured will first receive the page structure of the page issued by the page server during the loading process; for example, the page structure of the html tree structure Therefore, the last element in the page elements related to the screenshot can be determined by the summary structure of the page to be screenshot.
  • the last element in the user comment content can be determined according to the tree structure of the html file; during the loading process, after detecting that the last element has been loaded, that is It can be considered that the page elements related to the screenshot have been loaded.
  • the loaded element contains a preset target element used to indicate that the screenshot-related element may be loaded; if so, it can be considered that the screenshot-related page element has been loaded. .
  • the APP promotion information is detected as loaded content during the page loading process If it appears, it can be considered that the page element related to the screenshot (in this example, the stolen photographic image in the body of the page) has been loaded.
  • the above-mentioned irrelevant elements of the screenshot may be page elements related to advertisements; for example, image advertisements at the bottom of the page, sharing inducing links, related article recommendations, etc.; those skilled in the art can specify by themselves according to specific needs The specific types of screenshot irrelevant elements.
  • the final screenshot can be obtained by segmented screenshots; specifically, in the case where it is determined that the size of the loaded part of the page to be screenshot is greater than the preset size threshold, Divide the loaded part of the page to be screenshot into several slices, and record the positional relationship between the above several slices. After taking screenshots of the above several slices, you can divide the parts of the above several slices. The screenshot is stitched into a screenshot of the loaded part of the page to be screenshot according to the recorded positional relationship.
  • the preprocessing method may include deleting screenshot interference elements in the page to be screenshotted; for example, deleting floating advertisements, recommendation information, shortcut buttons, etc. that may obscure screenshot-related elements.
  • the above preprocessing can also include other preprocessing methods; for example, the collapsed elements in the page can be expanded, so that the collapsed content can be fully displayed and screenshots; for example, the specified element can be changed Display style, enhance the display effect of specified elements; for example, you can add screenshot markers to elements on the page to highlight the content that needs attention, and so on.
  • Figure 3 is an example of a comparison between the original page of a page to be screenshotted and the image obtained from the screenshot; in the example of Figure 3, the text 1 and 2 of the main text are page elements related to the screenshot. It can be seen that after preprocessing , Floating ads that obscure the text of the text can be removed, the text of the text 2 that was originally folded and hidden can be expanded and displayed, the text of the text 1 that needs to be highlighted is marked with a screenshot, and the relevant recommendations at the end of the page and "Back The "Home” and "Add to Favorites” buttons can all be prevented from appearing in the final screenshot due to the stoppage of loading.
  • the user's screenshot request can also carry other customized information to realize more custom features of the screenshot function.
  • the user’s screenshot request can carry demand information indicating the preprocessing method.
  • the preprocessing can be determined according to the demand information carried in the user’s screenshot request. Processing method, and further preprocessing the page to be screenshotted according to the determined preprocessing method;
  • the user’s screenshot request can carry demand information indicating the need to remove floating ads and expand the hidden text.
  • the preprocessing method that needs to be performed can be determined according to the demand information, including removing interference content. (Floating advertisement) and expand hidden content (text), and perform preprocessing accordingly.
  • the screenshot request initiated by the user may carry a specification identifier for indicating the screenshot specification; therefore, the screenshot page may be treated according to the screenshot specification indicated by the specification identifier carried in the screenshot request initiated by the user.
  • the specifications of the screenshot specifications can be carried in the screenshot request initiated by the user.
  • the screenshot specifications indicated by the specifications can be marked according to the specifications. , To take a screenshot of the loaded part of the screenshot page.
  • a machine learning model for determining the position of the interference information in the image can be used to determine the position of the interference information in the image obtained by the screenshot in the image, and further through image processing, The interference information at the above-mentioned location is removed from the image; specifically, the above-mentioned machine learning model may be a machine learning model obtained by training by taking screenshots of a number of pages marked with interference elements as training samples;
  • an image obtained from a screenshot of a certain page still contains a certain type of advertising information, which will cause interference to the screenshot. Therefore, a machine learning model used to determine the position of this type of advertising information in the image can be called to locate the image from the image. Class advertising information, and remove it from the image through image processing algorithms. By applying this method, the interference information in the screenshot can be removed from the image angle, so as to obtain a screenshot with less interference information.
  • FIG. 4 is a structural example diagram of the device; the device includes: a URL acquisition module 401, a page loading module 402, and an execution module 403.
  • the URL obtaining module 401 obtains the uniform resource locator URL of the page to be captured in response to a screenshot request initiated by the user.
  • the page loading module 402 loads the page to be screenshotted according to the URL, and determines whether the page elements related to the screenshot have been loaded during the loading process; wherein the page elements related to the screenshot are page elements designated by the user.
  • the execution module 403 in response to the completion of loading of page elements related to the screenshot, stops the loading of the page to be screenshot, and takes a screenshot of the loaded part of the page to be screenshot.
  • the URL obtaining module 401 can obtain the uniform resource locator URL of the page to be screenshot based on the screenshot request initiated by the user; the above process can be implemented in multiple ways, which is not specifically limited in this manual; for example, the above screenshot request can be Directly carry the URL field of the page to be screenshot, and the corresponding URL can be extracted directly from the screenshot request, or the above screenshot request can carry a string corresponding to the URL of the page to be screenshot, and then the corresponding URL can be obtained indirectly by means such as query.
  • the above screenshot request may carry a character string for indicating the URL of the page to be screenshot
  • the URL obtaining module 401 may further obtain the URL indicated by the character string by parsing the character string; For example, if the string is "abc Post Bar Homepage", it can be determined based on this string that the indicated URL is the URL of "abc Post Bar Homepage”. Therefore, the URL of the page to be screenshot is the URL of "abc Post Bar Homepage";
  • the method of determining the corresponding URL based on the character string can be through keyword query or semantic analysis; it can be flexibly selected according to actual development needs, and this specification does not specifically limit it.
  • screenshot-related elements specifically refer to elements related to the purpose of the screenshot, which can be specified by the user; specifically, the user-specified method can be based on the screenshot request or preset in the system; for example, a user-initiated screenshot request Can include strings such as "photographic embezzlement forensics", the screenshot-related element is the photographic picture in the webpage; for example, the user can pre-set, for the screenshot of the forum page, the screenshot-related element can be the forum user The content of the speech, excluding relevant promotion information, etc.
  • the page loading module 402 determines whether the page elements related to the screenshot are loaded. It can be based on a variety of different standards. This specification does not need to be specifically limited; for example, it can be judged from the perspective of the page elements related to the screenshot whether they are loaded. When finished, you can also indirectly judge whether the page elements related to the screenshot have been loaded from the perspective of the page elements that are not related to the screenshot.
  • the page loading module 402 can determine whether the page elements related to the screenshot have been loaded by determining whether the loaded elements include the last element in the page elements related to screenshots; if the loaded elements include screenshots The last element in the relevant page elements, you can determine that the screenshot-related page elements have been loaded; among them, the screenshot-related page elements can be determined by the user's screenshot request as described above, or according to the preset screenshot page elements The type is determined; and because the page to be screenshotted will first receive the summary structure of the page in the form of an html tree structure during the loading process, the last element in the page elements related to the screenshot can be derived from the summary structure of the page to be screenshot Sure.
  • the page loading module 402 can determine whether the page elements related to the screenshot are loaded by determining whether the loaded page elements include the screenshot irrelevant elements indicating that the page elements related to the screenshot have been loaded are completed; If the judgment result is yes, it is considered that the page elements related to the screenshot have been loaded; for example, the above-mentioned irrelevant elements of the screenshot are bottom-page advertisements.
  • the appearance of the bottom-page advertisement means that all the page elements related to the screenshots on the page have been loaded. Assuming that the advertisement at the bottom of the page has been loaded, a judgment can be made, that is, all page elements related to the screenshot on the page have been loaded.
  • the above-mentioned irrelevant element of the screenshot may be a page element related to an advertisement; for example, an image advertisement at the bottom of the page, a sharing inducement link, a recommendation of a related article, and so on.
  • the execution module 403 responds to the judgment result that the page elements related to the screenshot are loaded, it can stop loading the page to be screenshotd, and take screenshots of the loaded part of the page to be screenshotd, specifically to take screenshots
  • the method can refer to related technologies, and this specification does not make specific restrictions.
  • the execution module 403 may obtain the final screenshot by segmenting screenshots; specifically, when it is determined that the size of the loaded part of the page to be screenshot is greater than the preset size threshold Next, you can divide the loaded part of the page to be screenshot into several slices, and record the positional relationship between the above several slices. After taking screenshots of the several above The fragmented screenshots are spliced into a screenshot of the loaded part of the page to be screenshot according to the recorded position relationship.
  • the device may also include a preprocessing module to preprocess the previously loaded part of the page to be captured to obtain a better screenshot effect; specifically, the preprocessing method may include deleting the captured screenshot. Interfering elements of screenshots on the page; for example, delete floating advertisements, recommendation information, shortcut buttons, etc. that may obscure screenshot-related elements.
  • the above-mentioned pre-processing may also include other pre-processing methods; for example, hidden elements in the page may be expanded, so that the hidden content (for example, folded multi-layered quoted comments) can be fully displayed and displayed. Screenshot; for example, you can change the display style of the specified page element to enhance the display effect of the specified element; for another example, you can add screenshot markers to the elements on the page to highlight the content that needs attention, and so on.
  • hidden elements in the page may be expanded, so that the hidden content (for example, folded multi-layered quoted comments) can be fully displayed and displayed.
  • Screenshot for example, you can change the display style of the specified page element to enhance the display effect of the specified element; for another example, you can add screenshot markers to the elements on the page to highlight the content that needs attention, and so on.
  • the user's screenshot request can also carry other customized information to realize more custom features of the screenshot function.
  • the user’s screenshot request may carry demand information indicating the preprocessing mode.
  • the aforementioned preprocessing module may determine the preprocessing method according to the demand information carried in the user’s screenshot request. Furthermore, according to the determined pre-processing method, the above-mentioned page to be screenshot is pre-processed.
  • the screenshot request initiated by the user may carry a specification identifier for indicating the screenshot specification; therefore, the execution module 403 may follow the screenshot specification indicated by the specification identifier carried in the screenshot request initiated by the user. , To take a screenshot of the loaded part of the screenshot page.
  • the device may also include an image processing module, which can further process the image obtained by the screenshot to obtain a better screenshot effect.
  • the above-mentioned image processing module may use the machine learning model obtained by training several screenshots of pages with the locations of the interference elements marked as training samples to further process the images obtained by the screenshots; specifically;
  • the above-mentioned image can be input into the trained machine learning model to determine the position of the interference element in the image, and further based on the position, image processing is performed on the image to delete the above-mentioned interference element.
  • the embodiments of this specification also provide a computer device, which at least includes a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor implements the aforementioned page screenshot method when the program is executed.
  • FIG. 5 shows a more specific hardware structure diagram of a computing device provided by an embodiment of this specification.
  • the device may include a processor 510, a memory 520, an input/output interface 530, a communication interface 540, and a bus 550.
  • the processor 510, the memory 520, the input/output interface 530, and the communication interface 540 realize the communication connection between each other in the device through the bus 550.
  • the processor 510 may be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and execute related programs.
  • a general-purpose CPU Central Processing Unit, central processing unit
  • microprocessor microprocessor
  • application specific integrated circuit Application Specific Integrated Circuit, ASIC
  • ASIC Application Specific Integrated Circuit
  • the memory 520 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory, random access memory), static storage device, dynamic storage device, etc.
  • the memory 520 may store an operating system and other application programs.
  • related program codes are stored in the memory 520 and called and executed by the processor 510.
  • the input/output interface 530 is used to connect an input/output module to realize information input and output.
  • the input/output/module can be configured in the device as a component (not shown in the figure), or can be connected to the device to provide corresponding functions.
  • the input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and an output device may include a display, a speaker, a vibrator, an indicator light, and the like.
  • the communication interface 540 is used to connect a communication module (not shown in the figure) to realize the communication interaction between the device and other devices.
  • the communication module can realize communication through wired means (such as USB, network cable, etc.), or through wireless means (such as mobile network, WIFI, Bluetooth, etc.).
  • the bus 550 includes a path to transmit information between various components of the device (for example, the processor 510, the memory 520, the input/output interface 530, and the communication interface 540).
  • the device may also include the necessary equipment for normal operation.
  • the above-mentioned device may also include only the components necessary to implement the solutions of the embodiments of the present specification, and not necessarily include all the components shown in the figures.
  • the embodiment of the present specification also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the aforementioned page screenshot method is implemented.
  • Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology.
  • the information can be computer-readable instructions, data structures, program modules, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
  • Page screenshots are a feasible method of electronic forensics. By using this method, the content displayed on pages related to illegal activities can be saved in the form of pictures, which can then be used as electronic evidence.
  • the screenshot is usually taken by calling the webpage screenshot function of the browser to complete the screenshot of the page indicated by the user.
  • the screenshots obtained by this method when processing complex webpages often include a lot of elements that interfere with the information related to the screenshots, resulting in accidental obscuration of key information that needs to be obtained.
  • this specification discloses a technical solution for preprocessing the page to be screenshot before taking a screenshot of the page to be screenshot.
  • preprocessing operations including removing interference elements are performed on the page to be screenshot; after that, screenshots can be taken on the page to be screenshot that has been preprocessed.
  • FIG. 6 is a page screenshot method provided by an embodiment of this specification, and the method executes steps S601 to S504.
  • S601 In response to a screenshot request initiated by the user, obtain the uniform resource locator URL of the page to be screenshotted.
  • S603 After the loading of the page to be screenshot is completed, perform preprocessing on the page to be screenshot; the preprocessing includes deleting screenshot interference elements in the page to be screenshot.
  • S604 Take a screenshot of the pre-processed page to be screenshot.
  • This manual does not need to be limited; for example, it can be a cloud server that receives screenshot requests from users through a network connection, or it can be a user’s personal computer. Receive user's screenshot request through the communication mechanism between software modules, and so on.
  • the above method is applied to a distributed server cluster
  • the above screenshot request may include sub-requests corresponding to multiple pages to be screenshot; the above distributed server cluster may complete the matching according to a preset allocation algorithm.
  • the uniform resource locator URL of the page to be screenshot can be obtained.
  • the URL field of the page to be screenshot can be directly carried in the screenshot request, and the corresponding URL can be obtained by parsing the screenshot request, and the corresponding URL can also be carried in the screenshot request.
  • the corresponding URL can also be obtained indirectly through the keyword.
  • the above screenshot request may carry a character string for indicating the page to be screenshot, after obtaining the character string from the request, the character string can be further parsed to obtain the URL of the page to be screenshot ;
  • the analysis method can be semantic analysis based on natural language, or analysis based on specific codes such as short URLs, sharing codes, etc. Those skilled in the art can choose according to specific circumstances, and this specification does not specifically limit it.
  • the screenshot request can carry "Alipay Weibo", a string indicating the official Weibo page of "Alipay.”
  • Semantic information using the semantic information of "Alipay” and "Weibo” to query the preset mapping table of semantic information and URL, the URL of the official Weibo page of "Alipay” can be obtained as the URL of the page to be captured.
  • the page to be captured can be loaded according to the URL.
  • This process can be loaded using a normal browser or a Headless browser.
  • This application does not need to be limited and can be determined according to specific needs.
  • the above page to be screenshot can be preprocessed to obtain a better screenshot effect; specifically, the preprocessing method can include deleting the screenshot interference elements in the page to be screenshotted; for example, deleting the related elements that may block the screenshot Element's floating advertisement, recommendation information, shortcut button, etc.
  • the above preprocessing may also include other preprocessing methods; for example, the hidden elements in the page can be expanded, so that the hidden and collapsed content can be fully displayed and screenshots; for example, the specified element can be changed The display style of to enhance the display effect of the specified element; for example, you can add screenshot markers to the elements on the page to highlight the content that needs attention, and so on.
  • the user’s screenshot request can carry demand information indicating the preprocessing method.
  • the preprocessing can be determined according to the demand information carried in the user’s screenshot request. Processing method, and further preprocessing the page to be captured according to the determined preprocessing method.
  • the user’s screenshot request can carry demand information indicating the need to remove floating ads and expand and collapse the text.
  • the preprocessing method that needs to be performed can be determined according to the demand information, including removing interference content. (Floating advertisement) and expand the collapsed content (text), and perform preprocessing accordingly.
  • Figure 7 is a comparison example of a page to be screenshot before and after preprocessing; in the example of Figure 7, text 1 and 2 of the text are page elements related to the screenshot.
  • the floating advertisement of the main text can be removed, the original text of the collapsed text 2 can be expanded and displayed, the text 1 of the main text that needs to be reminded is marked with a screenshot, and the relevant recommendation at the end of the page and "Back to home page”"
  • the "Add to Favorites” button can also be regarded as interference elements removed and will not appear in the final screenshot.
  • the trigger mechanism of this judgment is not specifically limited in this specification; for example, it can be triggered periodically according to a preset time interval, it can also be triggered according to the number of loaded elements in the page, or it can be triggered according to the capacity of the page file , Or you can freely combine the above-mentioned multiple triggering methods. Stopping the page loading in time can reduce the loading of irrelevant elements in the screenshot and increase the proportion of screenshot-related elements in the final screenshot while ensuring that the page elements related to the screenshot are not missing.
  • screenshot-related elements specifically refer to elements related to the purpose of the screenshot, which can be specified by the user; specifically, the user-specified method can be based on the screenshot request or preset in the system; for example, user-initiated screenshots
  • the request can include a string of "photographic stealing forensics", which is used to specify the screenshot-related elements as the photographic pictures in the webpage; for example, the user can preset that for all the screenshots of the forum page, the screenshot-related elements All can be designated as the content of forum users' speeches, excluding relevant promotion information and so on.
  • judging whether the screenshot-related page elements are loaded can be based on a variety of different standards, and this manual does not need to be specifically limited; for example, you can judge whether the screenshot-related page elements have been loaded from the perspective of the page elements themselves, or The angle of page elements that are not related to the screenshot is an indirect judgment on whether the page elements related to the screenshot have been loaded.
  • FIG. 8 is a schematic diagram showing an implementation manner for judging whether page elements related to screenshots are loaded.
  • the screenshot request is determined, or determined according to the user's preset; and because the page to be captured will first receive the page structure of the page issued by the page server during the loading process; for example, the page structure of the html tree structure Therefore, the last element in the page elements related to the screenshot can be determined by the summary structure of the page to be screenshot.
  • the last element in the user comment content can be determined according to the tree structure of the html file; during the loading process, after detecting that the last element has been loaded, that is It can be considered that the page elements related to the screenshot have been loaded.
  • the loaded element contains a preset target element used to indicate that the screenshot-related element may be loaded; if so, it can be considered that the screenshot-related page element has been loaded. .
  • the APP promotion information is detected as loaded content during the page loading process If it appears, it can be considered that the page element related to the screenshot (in this example, the stolen photographic image in the body of the page) has been loaded.
  • the above-mentioned irrelevant elements of the screenshot may be page elements related to advertisements; for example, image advertisements at the bottom of the page, sharing inducing links, related article recommendations, etc.; those skilled in the art can specify by themselves according to specific needs The specific types of screenshot irrelevant elements.
  • the final screenshot can be obtained by segmenting screenshots; specifically, in the case where it is determined that the size of the pre-processed page to be screenshot is greater than the preset size threshold, the After the screenshot page is divided into several fragments, and the positional relationship between the above several fragments is recorded, after taking screenshots of the above several fragments, the screenshots of the above several fragments can be taken according to the recorded positions Relationship, spliced into a screenshot of the page to be screenshot.
  • the user's screenshot request can also carry other customized information to realize more custom features of the screenshot function.
  • the screenshot request initiated by the user may carry a specification identifier for indicating the screenshot specification; therefore, the screenshot page may be treated according to the screenshot specification indicated by the specification identifier carried in the screenshot request initiated by the user.
  • the specifications of the screenshot specifications can be carried in the screenshot request initiated by the user.
  • the screenshot specifications indicated by the specifications can be marked according to the specifications. , To take a screenshot of the page to be taken.
  • a machine learning model for determining the position of the interference information in the image can be used to determine the position of the interference information in the image obtained by the screenshot in the image, and further through image processing, The interference information at the above-mentioned location is removed from the image; specifically, the above-mentioned machine learning model may be a machine learning model obtained by training by taking screenshots of a number of pages marked with interference elements as training samples;
  • an image obtained from a screenshot of a certain page still contains a certain type of advertising information, which will cause interference to the screenshot. Therefore, a machine learning model used to determine the position of this type of advertising information in the image can be called to locate the image from the image. Class advertising information, and remove it from the image through image processing algorithms. By applying this method, the interference information in the screenshot can be removed from the image angle, so as to obtain a screenshot with less interference information.
  • Figure 9 is a structural example of the device; the device includes: URL acquisition module 901, page loading module 902, page preprocessing module 903, screenshots Execute module 904.
  • the URL obtaining module 901 obtains the uniform resource locator URL of the page to be captured in response to the screenshot request initiated by the user.
  • the page loading module 902 loads the page to be captured according to the URL.
  • the page preprocessing module 903 performs preprocessing on the page to be screenshot after the page to be screenshot is loaded; the preprocessing includes deleting the screenshot interference elements in the page to be screenshot.
  • the screenshot execution module 904 performs screenshots on the pre-processed page to be screenshot.
  • the URL obtaining module 901 may obtain the uniform resource locator URL of the page to be captured in response to a screenshot request initiated by the user.
  • the URL field of the page to be screenshot can be directly carried in the screenshot request, and the corresponding URL can be obtained by parsing the screenshot request, and the corresponding URL can also be carried in the screenshot request.
  • the corresponding URL can also be obtained indirectly through the keyword.
  • the above screenshot request may carry a character string for indicating the page to be screenshot, after obtaining the character string from the request, the character string can be further parsed to obtain the URL of the page to be screenshot ;
  • the analysis method can be semantic analysis based on natural language, or analysis based on specific codes such as short URLs, sharing codes, etc. Those skilled in the art can choose according to specific circumstances, and this specification does not specifically limit it;
  • the screenshot request can carry "Alipay Weibo", a string indicating the official Weibo page of "Alipay.”
  • Semantic information using the semantic information of "Alipay” and "Weibo” to query the preset mapping table of semantic information and URL, the URL of the official Weibo page of "Alipay” can be obtained as the URL of the page to be captured.
  • the page preprocessing module 903 in the device preprocesses the page to be screenshotted to obtain better screenshot effects; specifically, the preprocessing method may include deleting the screenshot interference in the page to be screenshotted. Elements; for example, delete floating ads, recommended information, shortcut buttons, etc. that may obscure elements related to the screenshot.
  • the above-mentioned pre-processing may also include other pre-processing methods; for example, hidden elements in the page may be expanded, so that the hidden content (for example, folded multi-layered quoted comments) can be fully displayed and displayed. Screenshot; for example, you can change the display style of the specified page element to enhance the display effect of the specified element; for another example, you can add screenshot markers to the elements on the page to highlight the content that needs attention, and so on.
  • hidden elements in the page may be expanded, so that the hidden content (for example, folded multi-layered quoted comments) can be fully displayed and displayed.
  • Screenshot for example, you can change the display style of the specified page element to enhance the display effect of the specified element; for another example, you can add screenshot markers to the elements on the page to highlight the content that needs attention, and so on.
  • the user’s screenshot request may carry demand information indicating the preprocessing mode.
  • the aforementioned preprocessing module may determine the preprocessing method according to the demand information carried in the user’s screenshot request. Furthermore, according to the determined pre-processing method, the above-mentioned page to be screenshot is pre-processed.
  • the device may also include a dynamic loading module, which determines whether the page elements related to the screenshot have been loaded during the loading process; and in response to the completion of the page elements related to the screenshot, the loading of the page to be captured is stopped.
  • a dynamic loading module which determines whether the page elements related to the screenshot have been loaded during the loading process; and in response to the completion of the page elements related to the screenshot, the loading of the page to be captured is stopped.
  • screenshot-related elements specifically refer to elements related to the purpose of the screenshot, which can be specified by the user; specifically, the user-specified method can be based on the screenshot request or preset in the system; for example, user-initiated screenshots
  • the request can include a string of "photographic stealing forensics", which is used to specify the screenshot-related elements as the photographic pictures in the webpage; for example, the user can preset that for all the screenshots of the forum page, the screenshot-related elements All can be designated as the content of forum users' speeches, excluding relevant promotion information and so on.
  • the dynamic loading module can determine whether the page elements related to the screenshots are loaded according to a variety of different standards, and this manual does not need to make specific restrictions; for example, you can judge whether the page elements related to the screenshots have been loaded from the perspective of the page elements themselves. It is also possible to indirectly judge whether the page elements related to the screenshot have been loaded from the perspective of the page elements that are not related to the screenshot.
  • the above-mentioned dynamic loading module may further determine whether the page elements related to the screenshot are loaded by determining whether the loaded elements include the last element in the page elements related to the screenshot; if the loaded elements include The last element in the screenshot-related page elements, you can determine that the screenshot-related page elements have been loaded; among them, the screenshot-related page elements can be determined by the user's screenshot request as described above, or according to the preset
  • the element type of the screenshot page is determined; and because the page to be screenshot is loaded, it will first receive the summary structure of the page in the form of an html tree structure, so the last element in the page elements related to the screenshot can be used by the page to be screenshot The summary structure is determined.
  • the dynamic loading module may further determine whether the page elements related to the screenshot are loaded by determining whether the loaded page elements include a screenshot irrelevant element indicating that the page element related to the screenshot is loaded is completed; If the judgment result is yes, it is considered that the page elements related to the screenshot have been loaded; for example, the above-mentioned irrelevant elements of the screenshot are bottom-page advertisements.
  • the appearance of the bottom-page advertisement means that all the page elements related to the screenshots on the page have been loaded. Assuming that the advertisement at the bottom of the page has been loaded, a judgment can be made, that is, all page elements related to the screenshot on the page have been loaded.
  • the above-mentioned irrelevant element of the screenshot may be a page element related to an advertisement; for example, an image advertisement at the bottom of the page, a sharing inducement link, a recommendation of a related article, and so on.
  • the screenshot execution module 904 can take a screenshot of the preprocessed page to be screenshot, and the specific way of taking the screenshot can refer to related technologies, and this specification does not make specific restrictions.
  • the screenshot execution module 904 can obtain the final screenshot by segmented screenshot; specifically, when the size of the preprocessed page to be screenshot is greater than the preset size threshold, You can divide the page to be screenshot to be preprocessed into several fragments, and record the positional relationship between the above several fragments. After taking screenshots of the above several fragments, you can combine the above several fragments. The screenshot is spliced into a screenshot of the preprocessed page to be screenshot based on the recorded position relationship.
  • the user's screenshot request can also carry other customized information to realize more custom features of the screenshot function.
  • the screenshot request initiated by the user may carry a specification identifier for indicating the screenshot specification; therefore, the above-mentioned screenshot execution module 904 may perform screenshots according to the specification identifier carried in the screenshot request initiated by the user. Specifications, take screenshots of the pre-processed page to be screenshot.
  • the device may also include an image processing module, which can further process the image obtained by the screenshot to obtain a better screenshot effect.
  • the above-mentioned image processing module may use the machine learning model obtained by training several screenshots of pages with the locations of the interference elements marked as training samples to further process the images obtained by the screenshots; specifically;
  • the above-mentioned image can be input to the trained machine learning model to determine the position of the interference element in the image, and further based on the position, image processing is performed on the image to delete the above-mentioned interference element.
  • the embodiments of this specification also provide a computer device, which includes at least a memory, a processor, and a computer program stored on the memory and capable of running on the processor, wherein the processor implements the aforementioned page screenshot method when the program is executed.
  • FIG. 10 shows a more specific hardware structure diagram of a computing device provided by an embodiment of this specification.
  • the device may include a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050.
  • the processor 1010, the memory 1020, the input/output interface 1030, and the communication interface 1040 realize the communication connection between each other in the device through the bus 1050.
  • the processor 1010 may be implemented by a general CPU (Central Processing Unit, central processing unit), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc., and execute related programs.
  • CPU Central Processing Unit
  • ASIC Application Specific Integrated Circuit
  • the memory 1020 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc.
  • the memory 1020 may store an operating system and other application programs. When the technical solutions provided in the embodiments of this specification are implemented by software or firmware, related program codes are stored in the memory 1020 and called and executed by the processor 1010.
  • the input/output interface 1030 is used to connect an input/output module to realize information input and output.
  • the input/output/module can be configured in the device as a component (not shown in the figure), or can be connected to the device to provide corresponding functions.
  • the input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and an output device may include a display, a speaker, a vibrator, an indicator light, and the like.
  • the communication interface 1040 is used to connect a communication module (not shown in the figure) to realize the communication interaction between the device and other devices.
  • the communication module can realize communication through wired means (such as USB, network cable, etc.), or through wireless means (such as mobile network, WIFI, Bluetooth, etc.).
  • the bus 1050 includes a path to transmit information between various components of the device (for example, the processor 1010, the memory 1020, the input/output interface 1030, and the communication interface 1040).
  • the above device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040, and the bus 1050, in the specific implementation process, the device may also include the equipment necessary for normal operation. Other components.
  • the above-mentioned device may also include only the components necessary to implement the solutions of the embodiments of the present specification, and not necessarily include all the components shown in the figures.
  • the embodiment of the present specification also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the aforementioned page screenshot method is implemented.
  • Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology.
  • the information can be computer-readable instructions, data structures, program modules, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
  • a typical implementation device is a computer.
  • the specific form of the computer can be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email receiving and sending device, and a game control A console, a tablet computer, a wearable device, or a combination of any of these devices.
  • the various embodiments in this specification are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments.
  • the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.
  • the device embodiments described above are merely illustrative.
  • the modules described as separate components may or may not be physically separated.
  • the functions of the modules can be combined in the same way when implementing the solutions of the embodiments of this specification. Or multiple software and/or hardware implementations. It is also possible to select some or all of the modules according to actual needs to achieve the objectives of the solutions of the embodiments. Those of ordinary skill in the art can understand and implement without creative work.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A page screenshot method and device. The method comprises: analyzing a screenshot request initiated by a user, so as to obtain a uniform resource locator (URL) of a page to be subjected to screenshot; loading the page according to the URL; in a loading process, determining whether loading of page elements related to the screenshot is completed; and in response to completion of loading of the page elements related to the screenshot, stopping loading, and performing screenshot on the loaded part in the page.

Description

一种页面截图方法及装置Method and device for screenshot of page 技术领域Technical field
本申请涉及计算机应用技术领域,尤其涉及一种页面截图方法及装置。This application relates to the field of computer application technology, and in particular to a method and device for screenshots of pages.
背景技术Background technique
现如今,部分网民利用互联网进行剽窃盗用他人作品、制造谣言、售卖违禁物品等违法活动,带来了恶劣的社会影响。为了对违法者进行责任认定,常常需要对相关的页面进行电子取证。页面截图,是一种可行的电子取证方法,利用该方法将与违法活动相关的页面所显示的内容以图片形式保存下来,即可作为电子证物使用。Nowadays, some netizens use the Internet to carry out illegal activities such as plagiarism and misappropriation of other people's works, creating rumors, and selling prohibited items, which have brought bad social effects. In order to determine the responsibility of the offender, it is often necessary to conduct electronic evidence collection on the relevant pages. Page screenshots are a feasible method of electronic forensics. By using this method, the content displayed on pages related to illegal activities can be saved in the form of pictures, which can then be used as electronic evidence.
发明内容Summary of the invention
本申请公开了一种页面截图方法和装置。This application discloses a method and device for screenshots of a page.
根据本申请实施例的第一方面,公开了一种页面截图方法,包括:响应于用户发起的截图请求,获取待截图页面的统一资源定位符URL;根据所述URL加载所述待截图页面,并在加载过程中判断截图相关的页面元素是否加载完毕;其中,所述截图相关的页面元素为用户指定的页面元素;响应于截图相关的页面元素加载完毕,停止所述加载所述待截图页面,并对所述待截图页面中已加载完成的部分进行截图。According to the first aspect of the embodiments of the present application, a method for screenshotting a page is disclosed, which includes: in response to a screenshot request initiated by a user, obtaining the uniform resource locator URL of the page to be screenshotted; loading the page to be screenshotted according to the URL, And in the loading process, it is judged whether the page elements related to the screenshot are loaded; wherein the page elements related to the screenshot are the page elements specified by the user; in response to the page elements related to the screenshot being loaded, the loading of the page to be screenshot is stopped. , And take a screenshot of the loaded part of the page to be screenshot.
根据本申请实施例的第二方面,公开了一种页面截图方法,包括:响应于用户发起的截图请求,获取待截图页面的统一资源定位符URL;根据所述URL加载所述待截图页面;当所述待截图页面加载完成后,对所述待截图页面进行预处理;所述预处理包括删除所述待截图页面中的截图干扰元素;对预处理完成的所述待截图页面进行截图。According to a second aspect of the embodiments of the present application, a method for screenshotting a page is disclosed, including: in response to a screenshot request initiated by a user, obtaining a uniform resource locator URL of the page to be screenshotted; and loading the page to be screenshotted according to the URL; When the loading of the page to be screenshot is completed, preprocessing the page to be screenshot; the preprocessing includes deleting the screenshot interference elements in the page to be screenshot; and taking a screenshot of the page to be screenshot after the preprocessing is completed.
根据本申请实施例的第三方面,公开了一种页面截图装置,包括:URL获取模块,响应于用户发起的截图请求,获取待截图页面的统一资源定位符URL;页面加载模块,根据所述URL加载所述待截图页面,并在加载过程中判断截图相关的页面元素是否加载完毕;其中,所述截图相关的页面元素为用户指定的页面元素;执行模块,响应于截图相关的页面元素加载完毕,停止所述加载所述待截图页面,并对所述待截图页面中已加载完成的部分进行截图。According to a third aspect of the embodiments of the present application, a page screenshot device is disclosed, including: a URL acquisition module, which, in response to a screenshot request initiated by a user, acquires the uniform resource locator URL of the page to be screenshot; The URL loads the page to be screenshot, and during the loading process it is judged whether the page elements related to the screenshot have been loaded; wherein the page elements related to the screenshot are page elements specified by the user; the execution module responds to the loading of the page elements related to the screenshot When finished, stop the loading of the page to be screenshot, and take a screenshot of the loaded part of the page to be screenshot.
根据本申请实施例的第四方面,公开了一种页面截图装置,包括:URL获取模块,响应于用户发起的截图请求,获取待截图页面的统一资源定位符URL;页面加载模块,根据所述URL,加载所述待截图页面;页面预处理模块,当所述待截图页面加载完成后,对所述待截图页面进行预处理;所述预处理包括删除所述待截图页面中的截图干扰元素;截图执行模块,对预处理完成的所述待截图页面进行截图。According to a fourth aspect of the embodiments of the present application, a page screenshot device is disclosed, including: a URL acquisition module, which, in response to a screenshot request initiated by a user, acquires the uniform resource locator URL of the page to be screenshot; URL, load the page to be screenshot; page preprocessing module, when the page to be screenshot loaded is completed, preprocess the page to be screenshot; the preprocessing includes deleting the screenshot interference elements in the page to be screenshot ; Screenshot execution module to take screenshots of the pre-processed page to be screenshot.
以上技术方案中,由于在待截图页面的加载过程中确定了截图相关的页面元素是否已经加载完毕,从而减少了无用资源的加载,一方面减少了计算机资源的浪费,另一方面也减少了截图结果中的无用信息,提高了所需信息在截图结果中的占比。In the above technical solution, it is determined during the loading process of the page to be screenshot whether the page elements related to the screenshot have been loaded, thereby reducing the loading of useless resources. On the one hand, it reduces the waste of computer resources and on the other hand, it also reduces screenshots. The useless information in the results increases the proportion of the required information in the screenshot results.
附图说明Description of the drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本说明书的实施例,并与说明书文本一同用于解释原理。The drawings here are incorporated into the specification and constitute a part of the specification, show embodiments conforming to the specification, and are used to explain the principle together with the text of the specification.
图1是本说明书示出的一种页面截图方法的流程示例图;Fig. 1 is an example flow chart of a page screenshot method shown in this specification;
图2是本说明书示出的一种判断截图相关的页面元素是否加载完毕的示意图;Fig. 2 is a schematic diagram of judging whether page elements related to screenshots are loaded as shown in this specification;
图3是本说明书示出的原始页面与截图所得图像对比示例图;Figure 3 is an example diagram showing the comparison between the original page shown in this manual and the image obtained by the screenshot;
图4是本说明书示出的一种页面截图装置的结构示例图;Fig. 4 is a structural example diagram of a page screenshot device shown in this specification;
图5是本说明书示出的一种用于进行页面截图的电子设备的结构示例图;FIG. 5 is a structural example diagram of an electronic device for taking screenshots of a page shown in this specification;
图6是本说明书示出的一种页面截图方法的流程示例图;Fig. 6 is an example flow chart of a page screenshot method shown in this specification;
图7是本说明书示出的页面预处理前后的页面对比示例图;FIG. 7 is an example diagram of page comparison before and after page preprocessing shown in this specification;
图8是本说明书示出的一种判断截图相关的页面元素是否加载完毕的流程示意图;Fig. 8 is a schematic diagram of a process for judging whether page elements related to screenshots have been loaded as shown in this specification;
图9是本说明书示出的一种页面截图装置的结构示例图;Fig. 9 is a structural example diagram of a page screenshot device shown in this specification;
图10是本说明书示出的一种用于进行页面截图的电子设备的结构示例图。Fig. 10 is a structural example diagram of an electronic device for taking a screenshot of a page shown in this specification.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本说明书一个或多个实施例中的技术方案,下面将结合本说明书一个或多个实施例中的附图,对本说明书一个或多个实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是一部分实施例,而不是全部的实施例。基于本说明书一个或多个实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。In order to enable those skilled in the art to better understand the technical solutions in one or more embodiments of this specification, the following will combine the drawings in one or more embodiments of this specification to compare The technical solution is described clearly and completely. Obviously, the described embodiments are only a part of the embodiments, rather than all the embodiments. Based on one or more embodiments of this specification, all other embodiments obtained by a person of ordinary skill in the art without creative work shall fall within the protection scope of this application.
下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本说明书相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本说明书的一些方面相一致的系统和方法的例子。When the following description refers to the accompanying drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The implementation manners described in the following exemplary embodiments do not represent all implementation manners consistent with this specification. Rather, they are merely examples of systems and methods consistent with some aspects of this specification as detailed in the appended claims.
在本说明书使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本说明书。在本说明书和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terms used in this specification are only for the purpose of describing specific embodiments, and are not intended to limit the specification. The singular forms of "a", "said" and "the" used in this specification and appended claims are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" as used herein refers to and includes any or all possible combinations of one or more associated listed items.
应当理解,尽管在本说明书可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本说明书范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used in this specification to describe various information, the information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of this specification, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information. Depending on the context, the word "if" as used herein can be interpreted as "when" or "when" or "in response to determination".
现如今,部分网民利用互联网进行剽窃盗用他人作品、制造谣言、售卖违禁物品等违法活动,带来了恶劣的社会影响。为了对违法者进行责任认定,常常需要对相关的页面进行电子取证。页面截图,是一种可行的电子取证方法,利用该方法将与违法活动相关的页面所显示的内容以图片形式保存下来,即可作为电子证物使用。Nowadays, some netizens use the Internet to carry out illegal activities such as plagiarism and misappropriation of other people's works, creating rumors, and selling prohibited items, which have brought bad social effects. In order to determine the responsibility of the offender, it is often necessary to conduct electronic evidence collection on the relevant pages. Page screenshots are a feasible method of electronic forensics. By using this method, the content displayed on pages related to illegal activities can be saved in the form of pictures, which can then be used as electronic evidence.
在实际应用中,通常采用调用浏览器自带的网页截图功能进行截图的方式,完成对用户所指示的页面的截图。然而,该方法在处理较长网页时得到的截图往往包括极多与截图意图不相关的信息,处理“滚动加载”类的网页(例如微博、贴吧动态等等)时还会由于网页尺寸过大导致程序崩溃等问题。In actual applications, the screenshot is usually taken by calling the webpage screenshot function of the browser to complete the screenshot of the page indicated by the user. However, the screenshots obtained by this method when processing long webpages often include a lot of information that is not related to the intent of the screenshot. When processing "rolling loading" webpages (such as Weibo, Tieba dynamics, etc.), the page size may be too large. Causes problems such as program crashes.
鉴于此,本说明书公开了一种通过在待截图页面的加载过程中动态判断是否需要停止页面加载,在停止页面加载后,仅对已加载完成的部分进行截图的技术方案。In view of this, this specification discloses a technical solution that dynamically determines whether the page loading needs to be stopped during the loading process of the page to be screenshotted, and after the page loading is stopped, only the part that has been loaded is captured.
在实现时,在待截图页面的加载过程中,判断该待截图页面中与截图相关的页面元素是否已经加载完毕,响应于与截图相关的页面元素加载完毕,即可停止该待截图页 面的加载;之后,即可仅对该待截图页面中已经加载完成的部分进行截图。In the implementation, during the loading process of the page to be screenshot, it is judged whether the page elements related to the screenshot in the page to be screenshot have been loaded. In response to the page elements related to the screenshot being loaded, the loading of the page to be screenshot can be stopped. ; After that, you can only take a screenshot of the loaded part of the page to be screenshot.
在以上技术方案中,一方面,由于及时中止了页面加载,可以避免对超长页面进行截图时的程序崩溃问题;另一方面,由于停止页面的加载是在与截图相关的页面元素加载完毕后,保证最终截图获得的图像中包含了全部与截图相关内容的前提下,减少了所获得的图像中的无关内容,提升了用户体验,减少了计算机资源的浪费。In the above technical solutions, on the one hand, because the page loading is suspended in time, it can avoid the program crash when taking a screenshot of a very long page; on the other hand, because the page loading is stopped after the page elements related to the screenshot are loaded. , Under the premise of ensuring that the image obtained by the final screenshot contains all the content related to the screenshot, the irrelevant content in the obtained image is reduced, the user experience is improved, and the waste of computer resources is reduced.
下面通过具体实施例并结合具体的应用场景对本申请进行描述。The application will be described below through specific embodiments in combination with specific application scenarios.
请参考图1,图1是本说明书一实施例提供的一种页面截图方法,该方法执行步骤S101~S103。Please refer to FIG. 1. FIG. 1 is a page screenshot method provided by an embodiment of this specification, and the method executes steps S101 to S103.
S101,解析用户发起的截图请求,获取待截图页面的统一资源定位符URL。S101: Parse the screenshot request initiated by the user, and obtain the uniform resource locator URL of the page to be screenshot.
S102,根据所述URL,加载所述待截图页面;在所述加载过程中判断截图相关的页面元素是否加载完毕;其中,所述截图相关的页面元素为用户指定的页面元素。S102, according to the URL, load the page to be screenshot; in the loading process, it is determined whether the page element related to the screenshot has been loaded; wherein the page element related to the screenshot is a page element designated by the user.
S103,响应于截图相关的页面元素加载完毕,停止所述加载,并对所述待截图页面中已加载完成的部分进行截图。S103: In response to the completion of loading of page elements related to the screenshot, the loading is stopped, and a screenshot of the loaded part of the page to be screenshot is performed.
在本说明书中,执行上述方法的主体可以视具体情况、具体需求进行选择,本说明书无需限定;例如,可以是云服务器,通过网络连接接收来自用户的截图请求,也可以是用户的个人电脑,通过软件模块之间的通信机制接收用户的截图请求,等等。In this manual, the subject who executes the above method can choose according to specific conditions and specific needs. This manual does not need to be limited; for example, it can be a cloud server that receives screenshot requests from users through a network connection, or it can be a user’s personal computer. Receive user's screenshot request through the communication mechanism between software modules, and so on.
在示出的一种实施方式中,上述方法应用于分布式服务器集群,上述截图请求可以包括对应多个待截图页面的子请求;上述分布式服务器集群可以根据预设的分配算法,分别完成对上述多个待截图页面的截图任务。In an embodiment shown, the above method is applied to a distributed server cluster, the above screenshot request may include sub-requests corresponding to multiple pages to be screenshot; the above distributed server cluster may complete the matching according to a preset allocation algorithm. The screenshot task of the above multiple pages to be screenshotted.
在本说明书中,可以响应于用户发起的截图请求,获取待截图页面的统一资源定位符URL。具体而言,该过程有多种实现方式,本说明书不作具体限定;例如,截图请求中可以直接携带待截图页面的URL字段,解析该截图请求即可获得对应URL,截图请求中也可以携带对应于待截图页面的关键字,通过该关键字,也可以间接地获得对应URL。In this manual, in response to a screenshot request initiated by the user, the uniform resource locator URL of the page to be screenshot can be obtained. Specifically, there are many ways to implement this process, which are not specifically limited in this specification; for example, the URL field of the page to be screenshot can be directly carried in the screenshot request, and the corresponding URL can be obtained by parsing the screenshot request, and the corresponding URL can also be carried in the screenshot request. For the keyword of the page to be screenshot, the corresponding URL can also be obtained indirectly through the keyword.
在示出的一种实施方式中,上述截图请求中可以携带用于指示待截图页面的字符串,从该请求中获取该字符串后,可以进一步解析该字符串,进而获得待截图页面的URL;其中,解析的方法可以为基于自然语言的语义分析,也可以为基于短网址、分享码等特定编码的解析,本领域技术人员可以根据具体情况选择,本说明书不作具体限定;In one embodiment shown, the above screenshot request may carry a character string for indicating the page to be screenshot, after obtaining the character string from the request, the character string can be further parsed to obtain the URL of the page to be screenshot ; Among them, the analysis method can be semantic analysis based on natural language, or analysis based on specific codes such as short URLs, sharing codes, etc. Those skilled in the art can choose according to specific circumstances, and this specification does not specifically limit it;
例如,截图请求中可以携带“支付宝微博”这一用于指示“支付宝”的官方微博页面的字符串,解析该请求并获取该字符串后,可以进一步通过语义分析算法提取该字符串的语义信息,以“支付宝”和“微博”的语义信息,查询预设的语义信息与URL的映射关系表,即可以获得“支付宝”的官方微博页面的URL,作为待截图页面的URL。For example, the screenshot request can carry "Alipay Weibo", a string indicating the official Weibo page of "Alipay." Semantic information, using the semantic information of "Alipay" and "Weibo" to query the preset mapping table of semantic information and URL, the URL of the official Weibo page of "Alipay" can be obtained as the URL of the page to be captured.
在本说明书中,获取上述待截图页面的URL后,即可根据该URL对待截图页面进行加载。该过程可以使用普通浏览器进行加载,也可以使用Headless无头浏览器进行加载,本申请对此无需进行限定,本领域技术人员可以视具体需求确定。In this manual, after obtaining the URL of the page to be captured, the page to be captured can be loaded according to the URL. This process can be loaded using a normal browser or a Headless browser, which does not need to be limited in this application, and can be determined by those skilled in the art according to specific needs.
在本说明书中,在上述待截图页面的加载过程中,可以对截图相关元素是否加载完毕进行判断,具体而言,该判断的触发机制本说明书不作具体限定;例如,可以根据预设的时间间隔进行周期性触发,也可以根据页面中已加载元素的数量进行触发,亦可以根据页面文件的容量进行触发,或者可以将上述多种触发方式自由结合;例如,可以 每隔100毫秒即触发对上述截图相关元素是否加载完毕进行判断,也可以每加载20个页面元素即触发对上述截图相关元素是否加载完毕进行判断,等等。In this manual, during the loading process of the page to be screenshot described above, it can be judged whether the relevant elements of the screenshot have been loaded. Specifically, the trigger mechanism of this judgment is not specifically limited in this manual; for example, it can be based on a preset time interval. Periodic triggering can also be triggered based on the number of loaded elements on the page, or based on the capacity of the page file, or can be freely combined with the above-mentioned multiple triggering methods; for example, it can be triggered every 100 milliseconds. Whether the screenshot-related elements have been loaded is judged, and it can also be triggered every time 20 page elements are loaded to judge whether the above-mentioned screenshot-related elements have been loaded, and so on.
在本说明书中,截图相关元素特指与截图目的相关的元素,可以由用户指定;具体而言,用户指定的方式可以基于截图请求,也可以在系统中预设;例如,用户发起的截图请求中可以包括“摄影盗用取证”一类的字符串,用于指定截图相关元素为网页中的摄影图片;又例如,用户可以预先设定,对于所有的论坛页面的截图而言,截图相关元素均可以指定为论坛用户发言内容,而不包括相关推广信息等等。In this manual, screenshot-related elements specifically refer to elements related to the purpose of the screenshot, which can be specified by the user; specifically, the user-specified method can be based on the screenshot request or preset in the system; for example, a user-initiated screenshot request Can include strings such as "photographic stealing forensics", which are used to specify screenshot-related elements as photographic pictures in webpages; for example, users can preset that for all forum page screenshots, screenshot-related elements are It can be specified as the content of the forum user's speech, excluding related promotion information and so on.
在本说明书中,判断截图相关的页面元素是否加载完毕可以依据多种不同的标准,本说明书无需进行具体限定;例如,可以从截图相关的页面元素本身角度进行判断其是否加载完毕,也可以从截图不相关的页面元素角度,对截图相关的页面元素是否加载完毕进行间接判断。In this manual, judging whether the screenshot-related page elements are loaded can be based on a variety of different standards, and this manual does not need to be specifically limited; for example, you can judge whether the screenshot-related page elements have been loaded from the perspective of the page elements themselves, or The angle of page elements that are not related to the screenshot is an indirect judgment on whether the page elements related to the screenshot have been loaded.
图2是示出的一种判断截图相关的页面元素是否加载完毕的实施方式的示意图,在该例中,可以通过判断已加载元素中是否包括截图相关的页面元素中的末尾元素,确定截图相关的页面元素是否加载完毕;如果已加载元素中包括截图相关的页面元素中的末尾元素,则可以确定截图相关的页面元素已经加载完毕;其中,截图相关的页面元素可以如前所述,由用户的截图请求确定,或者,根据用户的预先设置确定;又因为待截图页面在加载过程中会首先收到由页面服务器下发的该页面的页面结构;比如,形如html树状结构的页面结构,因而上述截图相关的页面元素中的末尾元素则可以由该待截图页面的摘要结构确定。Figure 2 is a schematic diagram showing an implementation manner for judging whether screenshot-related page elements have been loaded. In this example, the screenshot-related page elements can be determined by judging whether the loaded elements include the last element in the screenshot-related page elements Whether the page elements of the screenshot have been loaded; if the loaded elements include the last element in the page elements related to the screenshot, it can be determined that the page elements related to the screenshot have been loaded; among them, the page elements related to the screenshot can be determined by the user as described above. The screenshot request is determined, or determined according to the user's preset; and because the page to be captured will first receive the page structure of the page issued by the page server during the loading process; for example, the page structure of the html tree structure Therefore, the last element in the page elements related to the screenshot can be determined by the summary structure of the page to be screenshot.
例如,用户指定要截取某页面中的用户评论内容,则根据html文件的树状结构即可确定上述用户评论内容中的末尾元素;在加载过程中,在检测到该末尾元素加载完成后,即可以认为截图相关的页面元素已经加载完毕。For example, if the user specifies to intercept user comment content in a page, the last element in the user comment content can be determined according to the tree structure of the html file; during the loading process, after detecting that the last element has been loaded, that is It can be considered that the page elements related to the screenshot have been loaded.
在示出的一种实施方式中,可以通过判断已加载元素中是否包含预设的、用于指示截图相关元素可能加载完毕的目标元素;如果是,则可以认为截图相关的页面元素已经加载完毕。In the illustrated embodiment, it can be determined whether the loaded element contains a preset target element used to indicate that the screenshot-related element may be loaded; if so, it can be considered that the screenshot-related page element has been loaded. .
例如,在对某一盗用摄影图片的页面进行截图取证时,由于确定页面底部的APP推广信息显然不属于需要取证的相关页面元素,则在页面加载过程中,检测到APP推广信息作为已加载内容出现了,即可认为截图相关的页面元素(在该例中为,处于页面正文中的被盗用的摄影图片)已经加载完毕。For example, when taking a screenshot for evidence collection on a page with stolen photographic pictures, since it is determined that the APP promotion information at the bottom of the page obviously does not belong to the relevant page elements that require forensics, the APP promotion information is detected as loaded content during the page loading process If it appears, it can be considered that the page element related to the screenshot (in this example, the stolen photographic image in the body of the page) has been loaded.
在示出的一种实施方式中,上述截图无关元素可以是与广告相关的页面元素;例如页面底部的图片广告、分享诱导链接、相关文章推荐等;本领域技术人员可以根据具体需求,自行指定截图无关元素的具体种类。In the illustrated embodiment, the above-mentioned irrelevant elements of the screenshot may be page elements related to advertisements; for example, image advertisements at the bottom of the page, sharing inducing links, related article recommendations, etc.; those skilled in the art can specify by themselves according to specific needs The specific types of screenshot irrelevant elements.
在本说明书中,响应于上述截图相关的页面元素加载完毕判断结果,可以停止对上述待截图页面的加载,并对上述待截图页面中已加载完成的部分进行截图,具体进行截图的方式可以参考相关技术,本说明书不作具体限制。In this manual, in response to the judgment result that the page elements related to the screenshot are loaded, you can stop the loading of the page to be screenshotd, and take a screenshot of the part of the page to be screenshot that has been loaded. For the specific screenshot method, please refer to Related technologies are not specifically limited in this manual.
在示出的一种实施方式中,可以通过分段截图的方式,获得最终截图;具体而言,在确定待截图页面中已加载完成的部分的尺寸大于预设的尺寸阈值的情况下,可以将待截图页面中已加载完成的部分分割为若干个分片,并记录上述若干个分片之间的位置关系,对上述若干个分片分别进行截图后,即可将上述若干个分片的截图,根据所记录的 位置关系,拼接为对该待截图页面中已加载完成的部分的截图。In the illustrated embodiment, the final screenshot can be obtained by segmented screenshots; specifically, in the case where it is determined that the size of the loaded part of the page to be screenshot is greater than the preset size threshold, Divide the loaded part of the page to be screenshot into several slices, and record the positional relationship between the above several slices. After taking screenshots of the above several slices, you can divide the parts of the above several slices. The screenshot is stitched into a screenshot of the loaded part of the page to be screenshot according to the recorded positional relationship.
在本说明书中,在对上述待截图页面中已加载完成的部分进行截图之前,还可以对上述待截图页面中已加载完成的部分进行预处理,以取得更好的截图效果;具体而言,预处理的方式可以包括删除待截图页面中的截图干扰元素;例如,删除可能遮挡截图相关元素的浮动广告、推荐信息、快捷按钮等等。In this manual, before taking a screenshot of the loaded part of the page to be screenshot, you can also preprocess the loaded part of the page to be screenshot to obtain a better screenshot effect; specifically, The preprocessing method may include deleting screenshot interference elements in the page to be screenshotted; for example, deleting floating advertisements, recommendation information, shortcut buttons, etc. that may obscure screenshot-related elements.
在示出的一种实施方式中,上述预处理还可以包括其他预处理方式;例如,可以展开页面中的折叠元素,使得被折叠的内容得以完整显示并截图;又例如,可以变更指定元素的显示样式,增强指定元素的显示效果;又例如,可以为页面中的元素添加截图标记,以突出指示需要关注的内容,等等。In the illustrated embodiment, the above preprocessing can also include other preprocessing methods; for example, the collapsed elements in the page can be expanded, so that the collapsed content can be fully displayed and screenshots; for example, the specified element can be changed Display style, enhance the display effect of specified elements; for example, you can add screenshot markers to elements on the page to highlight the content that needs attention, and so on.
请参见图3,图3是针对某待截图页面的原始页面与截图所得图像对比示例图;在图3示例中,正文图文1和2为截图相关的页面元素,可以看到,经过预处理,遮挡正文图文的漂浮广告可以被去除,原先被折叠隐藏的正文图文2可以被展开显示,需要重点提示的正文图文1被添加了截图标记,而页面尾部的相关推荐和“回到首页”“添加收藏”按钮都可以由于加载的停止而不会出现在最终截图所得的图像中。Please refer to Figure 3. Figure 3 is an example of a comparison between the original page of a page to be screenshotted and the image obtained from the screenshot; in the example of Figure 3, the text 1 and 2 of the main text are page elements related to the screenshot. It can be seen that after preprocessing , Floating ads that obscure the text of the text can be removed, the text of the text 2 that was originally folded and hidden can be expanded and displayed, the text of the text 1 that needs to be highlighted is marked with a screenshot, and the relevant recommendations at the end of the page and "Back The "Home" and "Add to Favorites" buttons can all be prevented from appearing in the final screenshot due to the stoppage of loading.
在本说明书中,用户的截图请求中还可以携带其他客制化的信息,用以实现截图功能的更多自定义特性。In this manual, the user's screenshot request can also carry other customized information to realize more custom features of the screenshot function.
在示出的一种实施方式中,用户的截图请求中可以携带指示预处理方式的需求信息,对应地,在执行上述预处理过程时,可以根据上述用户的截图请求中携带的需求信息确定预处理方式,并进一步根据确定出的上述预处理方式,对上述待截图页面进行预处理;In the illustrated embodiment, the user’s screenshot request can carry demand information indicating the preprocessing method. Correspondingly, when the above preprocessing process is performed, the preprocessing can be determined according to the demand information carried in the user’s screenshot request. Processing method, and further preprocessing the page to be screenshotted according to the determined preprocessing method;
例如,用户的截图请求中可以携带用于指示需要去除漂浮广告和展开隐藏正文的需求信息,执行上述预处理过程时,即可根据该需求信息,确定需要执行的预处理的方式包括去除干扰内容(漂浮广告)和展开隐藏内容(正文),并对应执行预处理。For example, the user’s screenshot request can carry demand information indicating the need to remove floating ads and expand the hidden text. When the above preprocessing process is performed, the preprocessing method that needs to be performed can be determined according to the demand information, including removing interference content. (Floating advertisement) and expand hidden content (text), and perform preprocessing accordingly.
在示出的一种实施方式中,用户发起的截图请求中可以携带用于指示截图规格的规格标识;因此,可以根据用户发起的截图请求中携带的规格标识所指示的截图规格,对待截图页面中已加载完成的部分进行截图;In the illustrated embodiment, the screenshot request initiated by the user may carry a specification identifier for indicating the screenshot specification; therefore, the screenshot page may be treated according to the screenshot specification indicated by the specification identifier carried in the screenshot request initiated by the user. Take a screenshot of the loaded part in the file;
例如,指示用户需要网页截图的图片格式、分辨率、颜色规格等等截图规格的规格标识都可以携带在用户发起的截图请求中,在截图阶段时,即可根据该规格标识所指示的截图规格,对待截图页面中已加载完成的部分进行截图。For example, the specifications of the screenshot specifications, such as the image format, resolution, color specifications, etc. that indicate that the user needs web page screenshots, can be carried in the screenshot request initiated by the user. During the screenshot phase, the screenshot specifications indicated by the specifications can be marked according to the specifications. , To take a screenshot of the loaded part of the screenshot page.
在本说明书中,可以通过对截图得到的图像进行进一步处理,取得更佳的截图效果。In this manual, you can perform further processing on the image obtained by the screenshot to obtain a better screenshot effect.
在示出的一种实施方式中,可以使用用于确定图像中干扰信息在图像中位置的机器学习模型,确定截图得到的图像中的干扰信息在图像中的位置,并进一步通过图像处理,将上述位置的干扰信息从该图像中移除;具体而言,上述机器学习模型可以是,将若干被标记了干扰元素的位置的页面截图作为训练样本,训练得到的机器学习模型;In the illustrated embodiment, a machine learning model for determining the position of the interference information in the image can be used to determine the position of the interference information in the image obtained by the screenshot in the image, and further through image processing, The interference information at the above-mentioned location is removed from the image; specifically, the above-mentioned machine learning model may be a machine learning model obtained by training by taking screenshots of a number of pages marked with interference elements as training samples;
例如,针对某一页面截图获得的图像,其中仍然包含某类广告信息,会对截图造成干扰,因而可以调用用于确定图像中该类广告信息的位置的机器学习模型,从该图像中定位该类广告信息,并通过图像处理算法将其从该图像中去除。应用该方法,能够从图像角度去除截图中的干扰信息,以获得干扰信息更少的截图。For example, an image obtained from a screenshot of a certain page still contains a certain type of advertising information, which will cause interference to the screenshot. Therefore, a machine learning model used to determine the position of this type of advertising information in the image can be called to locate the image from the image. Class advertising information, and remove it from the image through image processing algorithms. By applying this method, the interference information in the screenshot can be removed from the image angle, so as to obtain a screenshot with less interference information.
本说明书还对应提供了一种页面截图装置,请参见图4,图4为该装置的一种结构示例图;该装置包括:URL获取模块401、页面加载模块402、执行模块403。This specification also correspondingly provides a page screenshot device, please refer to FIG. 4, which is a structural example diagram of the device; the device includes: a URL acquisition module 401, a page loading module 402, and an execution module 403.
URL获取模块401,响应于用户发起的截图请求,获取待截图页面的统一资源定位符URL。The URL obtaining module 401 obtains the uniform resource locator URL of the page to be captured in response to a screenshot request initiated by the user.
页面加载模块402,根据所述URL加载所述待截图页面,并在加载过程中判断截图相关的页面元素是否加载完毕;其中,所述截图相关的页面元素为用户指定的页面元素。The page loading module 402 loads the page to be screenshotted according to the URL, and determines whether the page elements related to the screenshot have been loaded during the loading process; wherein the page elements related to the screenshot are page elements designated by the user.
执行模块403,响应于截图相关的页面元素加载完毕,停止所述加载所述待截图页面,并对所述待截图页面中已加载完成的部分进行截图。The execution module 403, in response to the completion of loading of page elements related to the screenshot, stops the loading of the page to be screenshot, and takes a screenshot of the loaded part of the page to be screenshot.
在本说明书中,URL获取模块401可以基于用户发起的截图请求,获取待截图页面的统一资源定位符URL;上述过程可以有多种实现方式,本说明书不作具体限定;例如,上述截图请求中可以直接携带待截图页面的URL字段,从该截图请求中可以直接提取对应URL,或者,上述截图请求中可以携带对应于待截图页面URL的字符串,进而通过查询等方式间接获得对应URL。In this manual, the URL obtaining module 401 can obtain the uniform resource locator URL of the page to be screenshot based on the screenshot request initiated by the user; the above process can be implemented in multiple ways, which is not specifically limited in this manual; for example, the above screenshot request can be Directly carry the URL field of the page to be screenshot, and the corresponding URL can be extracted directly from the screenshot request, or the above screenshot request can carry a string corresponding to the URL of the page to be screenshot, and then the corresponding URL can be obtained indirectly by means such as query.
在示出的一种实施方式中,上述截图请求中可以携带用于指示待截图页面URL的字符串,URL获取模块401可以进一步通过对该字符串的解析,获取该字符串所指示的URL;例如,该字符串为“abc贴吧主页”,则可以根据该字符串,确定其指示的URL为“abc贴吧主页”的URL,因此,待截图页面的URL即“abc贴吧主页”的URL;其中,基于字符串确定对应URL的方式,可以是通过关键词查询的方式,也可以是通过语义分析的方式;具体可以根据实际开发需求灵活选择,本说明书不作具体限定。In the illustrated embodiment, the above screenshot request may carry a character string for indicating the URL of the page to be screenshot, and the URL obtaining module 401 may further obtain the URL indicated by the character string by parsing the character string; For example, if the string is "abc Post Bar Homepage", it can be determined based on this string that the indicated URL is the URL of "abc Post Bar Homepage". Therefore, the URL of the page to be screenshot is the URL of "abc Post Bar Homepage"; , The method of determining the corresponding URL based on the character string can be through keyword query or semantic analysis; it can be flexibly selected according to actual development needs, and this specification does not specifically limit it.
在本说明书中,截图相关元素特指与截图目的相关的元素,可以由用户指定;具体而言,用户指定的方式可以基于截图请求,也可以在系统中预设;例如,用户发起的截图请求中可以包括“摄影盗用取证”一类的字符串,则截图相关元素即为网页中的摄影图片;又例如,用户可以预先设定,对于论坛页面的截图而言,截图相关元素可以为论坛用户发言内容,而不包括相关推广信息等等。In this manual, screenshot-related elements specifically refer to elements related to the purpose of the screenshot, which can be specified by the user; specifically, the user-specified method can be based on the screenshot request or preset in the system; for example, a user-initiated screenshot request Can include strings such as "photographic embezzlement forensics", the screenshot-related element is the photographic picture in the webpage; for example, the user can pre-set, for the screenshot of the forum page, the screenshot-related element can be the forum user The content of the speech, excluding relevant promotion information, etc.
在本说明书中,页面加载模块402判断截图相关的页面元素是否加载完毕,可以依据多种不同的标准,本说明书无需进行具体限定;例如,可以从截图相关的页面元素本身角度进行判断其是否加载完毕,也可以从截图不相关的页面元素角度,对截图相关的页面元素是否加载完毕进行间接判断。In this specification, the page loading module 402 determines whether the page elements related to the screenshot are loaded. It can be based on a variety of different standards. This specification does not need to be specifically limited; for example, it can be judged from the perspective of the page elements related to the screenshot whether they are loaded. When finished, you can also indirectly judge whether the page elements related to the screenshot have been loaded from the perspective of the page elements that are not related to the screenshot.
在示出的一种实施方式中,页面加载模块402可以通过判断已加载元素中是否包括截图相关的页面元素中的末尾元素,确定截图相关的页面元素是否加载完毕;如果已加载元素中包括截图相关的页面元素中的末尾元素,则可以确定截图相关的页面元素已经加载完毕;其中,截图相关的页面元素可以如前所述,由用户的截图请求确定,或者,根据预设的截图页面元素类型确定;又因为待截图页面在加载过程中会首先收到该页面的形如html树状结构的摘要结构,因而上述截图相关的页面元素中的末尾元素则可以由该待截图页面的摘要结构确定。In the illustrated embodiment, the page loading module 402 can determine whether the page elements related to the screenshot have been loaded by determining whether the loaded elements include the last element in the page elements related to screenshots; if the loaded elements include screenshots The last element in the relevant page elements, you can determine that the screenshot-related page elements have been loaded; among them, the screenshot-related page elements can be determined by the user's screenshot request as described above, or according to the preset screenshot page elements The type is determined; and because the page to be screenshotted will first receive the summary structure of the page in the form of an html tree structure during the loading process, the last element in the page elements related to the screenshot can be derived from the summary structure of the page to be screenshot Sure.
在示出的一种实施方式中,页面加载模块402可以通过判断已加载页面元素中是否包含指示所述截图相关的页面元素加载完毕的截图无关元素,以判断截图相关的页面元素是否加载完毕;如果判断结果为是,则认为截图相关的页面元素已经加载完毕;例 如,上述截图无关元素为页底广告,一般而言页底广告的出现即意味着页面中截图相关的页面元素全部加载完毕,假设该页底广告已经加载完成,则可以作出判断,即该页面中所有截图相关的页面元素全部加载完毕。In the illustrated embodiment, the page loading module 402 can determine whether the page elements related to the screenshot are loaded by determining whether the loaded page elements include the screenshot irrelevant elements indicating that the page elements related to the screenshot have been loaded are completed; If the judgment result is yes, it is considered that the page elements related to the screenshot have been loaded; for example, the above-mentioned irrelevant elements of the screenshot are bottom-page advertisements. Generally speaking, the appearance of the bottom-page advertisement means that all the page elements related to the screenshots on the page have been loaded. Assuming that the advertisement at the bottom of the page has been loaded, a judgment can be made, that is, all page elements related to the screenshot on the page have been loaded.
在示出的一种实施方式中,上述截图无关元素可以为与广告相关的页面元素;例如,页面底部的图片广告、分享诱导链接、相关文章推荐等等。In the illustrated embodiment, the above-mentioned irrelevant element of the screenshot may be a page element related to an advertisement; for example, an image advertisement at the bottom of the page, a sharing inducement link, a recommendation of a related article, and so on.
在本说明书中,执行模块403响应于上述截图相关的页面元素加载完毕判断结果,可以停止对上述待截图页面的加载,并对上述待截图页面中已加载完成的部分进行截图,具体进行截图的方式可以参考相关技术,本说明书不作具体限制。In this specification, the execution module 403 responds to the judgment result that the page elements related to the screenshot are loaded, it can stop loading the page to be screenshotd, and take screenshots of the loaded part of the page to be screenshotd, specifically to take screenshots The method can refer to related technologies, and this specification does not make specific restrictions.
在示出的一种实施方式中,执行模块403可以通过分段截图的方式,获得最终截图;具体而言,在确定待截图页面中已加载完成的部分的尺寸大于预设的尺寸阈值的情况下,可以将待截图页面中已加载完成的部分分割为若干个分片,并记录上述若干个分片之间的位置关系,对上述若干个分片分别进行截图后,即可将上述若干个分片的截图,根据所记录的位置关系,拼接为对该待截图页面中已加载完成的部分的截图。In the illustrated embodiment, the execution module 403 may obtain the final screenshot by segmenting screenshots; specifically, when it is determined that the size of the loaded part of the page to be screenshot is greater than the preset size threshold Next, you can divide the loaded part of the page to be screenshot into several slices, and record the positional relationship between the above several slices. After taking screenshots of the several above The fragmented screenshots are spliced into a screenshot of the loaded part of the page to be screenshot according to the recorded position relationship.
在本说明书中,该装置还可以包括预处理模块,对上述待截图页面中已加载完成的部分进行预处理,以取得更好的截图效果;具体而言,预处理的方式可以包括删除待截图页面中的截图干扰元素;例如,删除可能遮挡截图相关元素的浮动广告、推荐信息、快捷按钮等等。In this specification, the device may also include a preprocessing module to preprocess the previously loaded part of the page to be captured to obtain a better screenshot effect; specifically, the preprocessing method may include deleting the captured screenshot. Interfering elements of screenshots on the page; for example, delete floating advertisements, recommendation information, shortcut buttons, etc. that may obscure screenshot-related elements.
在示出的一种实施方式中,上述预处理还可以包括其他预处理方式;例如,可以展开页面中的隐藏元素,使得被隐藏的内容(例如被折叠的多层引用评论)得以完整显示并截图;又例如,可以变更指定页面元素的显示样式,增强指定元素的显示效果;又例如,可以为页面中的元素添加截图标记,以突出指示需要关注的内容,等等。In the illustrated embodiment, the above-mentioned pre-processing may also include other pre-processing methods; for example, hidden elements in the page may be expanded, so that the hidden content (for example, folded multi-layered quoted comments) can be fully displayed and displayed. Screenshot; for example, you can change the display style of the specified page element to enhance the display effect of the specified element; for another example, you can add screenshot markers to the elements on the page to highlight the content that needs attention, and so on.
在本说明书中,用户的截图请求中还可以携带其他客制化的信息,用以实现截图功能的更多自定义特性。In this manual, the user's screenshot request can also carry other customized information to realize more custom features of the screenshot function.
在示出的一种实施方式中,用户的截图请求中可以携带指示预处理方式的需求信息,对应地,上述预处理模块,可以根据上述用户的截图请求中携带的需求信息确定预处理方式,并进一步根据确定出的上述预处理方式,对上述待截图页面进行预处理。In the illustrated embodiment, the user’s screenshot request may carry demand information indicating the preprocessing mode. Correspondingly, the aforementioned preprocessing module may determine the preprocessing method according to the demand information carried in the user’s screenshot request. Furthermore, according to the determined pre-processing method, the above-mentioned page to be screenshot is pre-processed.
在示出的一种实施方式中,用户发起的截图请求中可以携带用于指示截图规格的规格标识;因此,上述执行模块403可以根据用户发起的截图请求中携带的规格标识所指示的截图规格,对待截图页面中已加载完成的部分进行截图。In the illustrated embodiment, the screenshot request initiated by the user may carry a specification identifier for indicating the screenshot specification; therefore, the execution module 403 may follow the screenshot specification indicated by the specification identifier carried in the screenshot request initiated by the user. , To take a screenshot of the loaded part of the screenshot page.
在本说明书中,该装置还可以包括图像处理模块,可以通过对截图得到的图像进行进一步处理,取得更佳的截图效果。In this specification, the device may also include an image processing module, which can further process the image obtained by the screenshot to obtain a better screenshot effect.
在示出的一种实施方式中,上述图像处理模块可以使用将若干被标记了干扰元素的位置的页面截图作为训练样本,训练得到的机器学习模型,来对截图得到的图像进行进一步处理;具体而言,可以将上述图像输入该训练完成的机器学习模型,以确定干扰元素在该图像中的所在位置,进一步基于该位置,对该图像进行图片处理,删除上述干扰元素。In the illustrated embodiment, the above-mentioned image processing module may use the machine learning model obtained by training several screenshots of pages with the locations of the interference elements marked as training samples to further process the images obtained by the screenshots; specifically; In other words, the above-mentioned image can be input into the trained machine learning model to determine the position of the interference element in the image, and further based on the position, image processing is performed on the image to delete the above-mentioned interference element.
上述装置中各个模块的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。For the implementation process of the functions and roles of each module in the above-mentioned device, please refer to the implementation process of the corresponding steps in the above-mentioned method for details, which will not be repeated here.
本说明书实施例还提供一种计算机设备,其至少包括存储器、处理器及存储在存 储器上并可在处理器上运行的计算机程序,其中,处理器执行所述程序时实现前述的页面截图方法。The embodiments of this specification also provide a computer device, which at least includes a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor implements the aforementioned page screenshot method when the program is executed.
图5示出了本说明书实施例所提供的一种更为具体的计算设备硬件结构示意图,该设备可以包括:处理器510、存储器520、输入/输出接口530、通信接口540和总线550。其中处理器510、存储器520、输入/输出接口530和通信接口540通过总线550实现彼此之间在设备内部的通信连接。FIG. 5 shows a more specific hardware structure diagram of a computing device provided by an embodiment of this specification. The device may include a processor 510, a memory 520, an input/output interface 530, a communication interface 540, and a bus 550. The processor 510, the memory 520, the input/output interface 530, and the communication interface 540 realize the communication connection between each other in the device through the bus 550.
处理器510可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,执行相关程序,以实现本说明书实施例所提供的技术方案。The processor 510 may be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and execute related programs. In order to realize the technical solutions provided in the embodiments of this specification.
存储器520可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器520可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器520中,并由处理器510来调用执行。The memory 520 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory, random access memory), static storage device, dynamic storage device, etc. The memory 520 may store an operating system and other application programs. When the technical solutions provided in the embodiments of the present specification are implemented through software or firmware, related program codes are stored in the memory 520 and called and executed by the processor 510.
输入/输出接口530用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。The input/output interface 530 is used to connect an input/output module to realize information input and output. The input/output/module can be configured in the device as a component (not shown in the figure), or can be connected to the device to provide corresponding functions. The input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and an output device may include a display, a speaker, a vibrator, an indicator light, and the like.
通信接口540用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。The communication interface 540 is used to connect a communication module (not shown in the figure) to realize the communication interaction between the device and other devices. The communication module can realize communication through wired means (such as USB, network cable, etc.), or through wireless means (such as mobile network, WIFI, Bluetooth, etc.).
总线550包括一通路,在设备的各个组件(例如处理器510、存储器520、输入/输出接口530和通信接口540)之间传输信息。The bus 550 includes a path to transmit information between various components of the device (for example, the processor 510, the memory 520, the input/output interface 530, and the communication interface 540).
需要说明的是,尽管上述设备仅示出了处理器510、存储器520、输入/输出接口530、通信接口540以及总线550,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本说明书实施例方案所必需的组件,而不必包含图中所示的全部组件。It should be noted that although the above device only shows the processor 510, the memory 520, the input/output interface 530, the communication interface 540, and the bus 550, in the specific implementation process, the device may also include the necessary equipment for normal operation. Other components. In addition, those skilled in the art can understand that the above-mentioned device may also include only the components necessary to implement the solutions of the embodiments of the present specification, and not necessarily include all the components shown in the figures.
本说明书实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现前述的页面截图方法。The embodiment of the present specification also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the aforementioned page screenshot method is implemented.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology. The information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
现如今,部分网民利用互联网进行剽窃盗用他人作品、制造谣言、售卖违禁物品等违法活动,带来了恶劣的社会影响。为了对违法者进行责任认定,常常需要对相关的 页面进行电子取证。页面截图,是一种可行的电子取证方法,利用该方法将与违法活动相关的页面所显示的内容以图片形式保存下来,即可作为电子证物使用。Nowadays, some netizens use the Internet to carry out illegal activities such as plagiarism and misappropriation of other people's works, creating rumors, and selling prohibited items, which have brought bad social effects. In order to determine the responsibility of the offender, it is often necessary to conduct electronic evidence collection on the relevant pages. Page screenshots are a feasible method of electronic forensics. By using this method, the content displayed on pages related to illegal activities can be saved in the form of pictures, which can then be used as electronic evidence.
在实际应用中,通常采用调用浏览器的网页截图功能进行截图的方式,完成对用户所指示的页面的截图。然而,该方法在处理复杂网页时得到的截图往往包括极多会对截图相关的信息造成干扰的元素,导致需要取证的关键信息被意外遮挡。In practical applications, the screenshot is usually taken by calling the webpage screenshot function of the browser to complete the screenshot of the page indicated by the user. However, the screenshots obtained by this method when processing complex webpages often include a lot of elements that interfere with the information related to the screenshots, resulting in accidental obscuration of key information that needs to be obtained.
鉴于此,本说明书公开了一种在对待截图页面进行截图之前,对该待截图页面进行预处理的技术方案。In view of this, this specification discloses a technical solution for preprocessing the page to be screenshot before taking a screenshot of the page to be screenshot.
在实现时,在待截图页面的加载完成后,对该待截图页面进行包括删除干扰元素在内的预处理操作;之后,即可对预处理完成的待截图页面进行截图。In implementation, after the loading of the page to be screenshot is completed, preprocessing operations including removing interference elements are performed on the page to be screenshot; after that, screenshots can be taken on the page to be screenshot that has been preprocessed.
在以上技术方案中,由于通过预处理的手段去除了原始页面中的干扰元素,避免了截图结果中存在干扰元素,或是截图相关元素被意外遮挡的情况,保证了页面中截图相关元素的完整。In the above technical solution, since the interfering elements in the original page are removed by preprocessing, the interfering elements in the screenshot result or the accidental occlusion of the screenshot-related elements are avoided, and the integrity of the screenshot-related elements in the page is guaranteed .
下面通过具体实施例并结合具体的应用场景对本申请进行描述。The application will be described below through specific embodiments in combination with specific application scenarios.
请参考图6,图6是本说明书一实施例提供的一种页面截图方法,该方法执行步骤S601~S504。Please refer to FIG. 6. FIG. 6 is a page screenshot method provided by an embodiment of this specification, and the method executes steps S601 to S504.
S601,响应于用户发起的截图请求,获取待截图页面的统一资源定位符URL。S601: In response to a screenshot request initiated by the user, obtain the uniform resource locator URL of the page to be screenshotted.
S602,根据所述URL加载所述待截图页面。S602: Load the page to be captured according to the URL.
S603,当所述待截图页面加载完成后,对所述待截图页面进行预处理;所述预处理包括删除所述待截图页面中的截图干扰元素。S603: After the loading of the page to be screenshot is completed, perform preprocessing on the page to be screenshot; the preprocessing includes deleting screenshot interference elements in the page to be screenshot.
S604,对预处理完成的所述待截图页面进行截图。S604: Take a screenshot of the pre-processed page to be screenshot.
在本说明书中,执行上述方法的主体可以视具体情况、具体需求进行选择,本说明书无需限定;例如,可以是云服务器,通过网络连接接收来自用户的截图请求,也可以是用户的个人电脑,通过软件模块之间的通信机制接收用户的截图请求,等等。In this manual, the subject who executes the above method can choose according to specific conditions and specific needs. This manual does not need to be limited; for example, it can be a cloud server that receives screenshot requests from users through a network connection, or it can be a user’s personal computer. Receive user's screenshot request through the communication mechanism between software modules, and so on.
在示出的一种实施方式中,上述方法应用于分布式服务器集群,上述截图请求可以包括对应多个待截图页面的子请求;上述分布式服务器集群可以根据预设的分配算法,分别完成对上述多个待截图页面的截图任务。In an embodiment shown, the above method is applied to a distributed server cluster, the above screenshot request may include sub-requests corresponding to multiple pages to be screenshot; the above distributed server cluster may complete the matching according to a preset allocation algorithm. The screenshot task of the above multiple pages to be screenshotted.
在本说明书中,可以响应于用户发起的截图请求,获取待截图页面的统一资源定位符URL。具体而言,该过程有多种实现方式,本说明书不作具体限定;例如,截图请求中可以直接携带待截图页面的URL字段,解析该截图请求即可获得对应URL,截图请求中也可以携带对应于待截图页面的关键字,通过该关键字,也可以间接地获得对应URL。In this manual, in response to a screenshot request initiated by the user, the uniform resource locator URL of the page to be screenshot can be obtained. Specifically, there are many ways to implement this process, which are not specifically limited in this specification; for example, the URL field of the page to be screenshot can be directly carried in the screenshot request, and the corresponding URL can be obtained by parsing the screenshot request, and the corresponding URL can also be carried in the screenshot request. For the keyword of the page to be screenshot, the corresponding URL can also be obtained indirectly through the keyword.
在示出的一种实施方式中,上述截图请求中可以携带用于指示待截图页面的字符串,从该请求中获取该字符串后,可以进一步解析该字符串,进而获得待截图页面的URL;其中,解析的方法可以为基于自然语言的语义分析,也可以为基于短网址、分享码等特定编码的解析,本领域技术人员可以根据具体情况选择,本说明书不作具体限定。In one embodiment shown, the above screenshot request may carry a character string for indicating the page to be screenshot, after obtaining the character string from the request, the character string can be further parsed to obtain the URL of the page to be screenshot ; Wherein, the analysis method can be semantic analysis based on natural language, or analysis based on specific codes such as short URLs, sharing codes, etc. Those skilled in the art can choose according to specific circumstances, and this specification does not specifically limit it.
例如,截图请求中可以携带“支付宝微博”这一用于指示“支付宝”的官方微博页面的字符串,解析该请求并获取该字符串后,可以进一步通过语义分析算法提取该字符串的语义信息,以“支付宝”和“微博”的语义信息,查询预设的语义信息与URL的映射关系表,即可以获得“支付宝”的官方微博页面的URL,作为待截图页面的URL。For example, the screenshot request can carry "Alipay Weibo", a string indicating the official Weibo page of "Alipay." Semantic information, using the semantic information of "Alipay" and "Weibo" to query the preset mapping table of semantic information and URL, the URL of the official Weibo page of "Alipay" can be obtained as the URL of the page to be captured.
在本说明书中,获取上述待截图页面的URL后,即可根据该URL对待截图页面进行加载。该过程可以使用普通浏览器进行加载,也可以使用Headless无头浏览器进行加载,本申请对此无需进行限定,可以视具体需求确定。In this manual, after obtaining the URL of the page to be captured, the page to be captured can be loaded according to the URL. This process can be loaded using a normal browser or a Headless browser. This application does not need to be limited and can be determined according to specific needs.
在本说明书中,可以对上述待截图页面进行预处理,以取得更好的截图效果;具体而言,预处理的方式可以包括删除待截图页面中的截图干扰元素;例如,删除可能遮挡截图相关元素的浮动广告、推荐信息、快捷按钮等等。In this manual, the above page to be screenshot can be preprocessed to obtain a better screenshot effect; specifically, the preprocessing method can include deleting the screenshot interference elements in the page to be screenshotted; for example, deleting the related elements that may block the screenshot Element's floating advertisement, recommendation information, shortcut button, etc.
在示出的一种实施方式中,上述预处理还可以包括其他预处理方式;例如,可以展开页面中的隐藏元素,使得被隐藏折叠的内容得以完整显示并截图;又例如,可以变更指定元素的显示样式,增强指定元素的显示效果;又例如,可以为页面中的元素添加截图标记,以突出指示需要关注的内容,等等。In the illustrated embodiment, the above preprocessing may also include other preprocessing methods; for example, the hidden elements in the page can be expanded, so that the hidden and collapsed content can be fully displayed and screenshots; for example, the specified element can be changed The display style of to enhance the display effect of the specified element; for example, you can add screenshot markers to the elements on the page to highlight the content that needs attention, and so on.
在示出的一种实施方式中,用户的截图请求中可以携带指示预处理方式的需求信息,对应地,在执行上述预处理过程时,可以根据上述用户的截图请求中携带的需求信息确定预处理方式,并进一步根据确定出的上述预处理方式,对上述待截图页面进行预处理。In the illustrated embodiment, the user’s screenshot request can carry demand information indicating the preprocessing method. Correspondingly, when the above preprocessing process is performed, the preprocessing can be determined according to the demand information carried in the user’s screenshot request. Processing method, and further preprocessing the page to be captured according to the determined preprocessing method.
例如,用户的截图请求中可以携带用于指示需要去除漂浮广告和展开折叠正文的需求信息,执行上述预处理过程时,即可根据该需求信息,确定需要执行的预处理的方式包括去除干扰内容(漂浮广告)和展开折叠内容(正文),并对应执行预处理。For example, the user’s screenshot request can carry demand information indicating the need to remove floating ads and expand and collapse the text. When the above preprocessing process is performed, the preprocessing method that needs to be performed can be determined according to the demand information, including removing interference content. (Floating advertisement) and expand the collapsed content (text), and perform preprocessing accordingly.
请参见图7,图7是针对某待截图页面的预处理前后的对比示例图;在图7示例中,正文图文1和2为截图相关的页面元素,可以看到,经过预处理,遮挡正文图文的漂浮广告可以被去除,原先被折叠的正文图文2可以被展开显示,需要重点提示的正文图文1被添加了截图标记,而页面尾部的相关推荐和“回到首页”“添加收藏”按钮也可以被视为干扰元素被去除,不会出现在最终截图所得的图像中。Please refer to Figure 7. Figure 7 is a comparison example of a page to be screenshot before and after preprocessing; in the example of Figure 7, text 1 and 2 of the text are page elements related to the screenshot. The floating advertisement of the main text can be removed, the original text of the collapsed text 2 can be expanded and displayed, the text 1 of the main text that needs to be reminded is marked with a screenshot, and the relevant recommendation at the end of the page and "Back to home page"" The "Add to Favorites" button can also be regarded as interference elements removed and will not appear in the final screenshot.
在本说明书中,在上述待截图页面的加载过程中,还可以对截图相关元素是否加载完毕进行判断,并响应于上述截图相关的页面元素加载完毕,可以停止对上述待截图页面的加载;具体而言,该判断的触发机制本说明书不作具体限定;例如,可以根据预设的时间间隔进行周期性触发,也可以根据页面中已加载元素的数量进行触发,亦可以根据页面文件的容量进行触发,或者可以将上述多种触发方式自由结合。及时停止页面的加载,可以在保证截图相关的页面元素不缺失的前提下,减少截图不相关元素的加载,提高截图相关元素在最终截图中的占比。In this manual, during the loading process of the page to be screenshot described above, it can also be judged whether the screenshot-related elements have been loaded, and in response to the page elements related to the screenshot being loaded, the loading of the page to be screenshot can be stopped; In other words, the trigger mechanism of this judgment is not specifically limited in this specification; for example, it can be triggered periodically according to a preset time interval, it can also be triggered according to the number of loaded elements in the page, or it can be triggered according to the capacity of the page file , Or you can freely combine the above-mentioned multiple triggering methods. Stopping the page loading in time can reduce the loading of irrelevant elements in the screenshot and increase the proportion of screenshot-related elements in the final screenshot while ensuring that the page elements related to the screenshot are not missing.
在本说明书中,截图相关元素特指与截图目的相关的元素,其可以由用户指定;具体而言,用户指定的方式可以基于截图请求,也可以在系统中预设;例如,用户发起的截图请求中可以包括“摄影盗用取证”一类的字符串,用于指定截图相关元素为网页中的摄影图片;又例如,用户可以预先设定,对于所有的论坛页面的截图而言,截图相关元素均可以指定为论坛用户发言内容,而不包括相关推广信息等等。In this manual, screenshot-related elements specifically refer to elements related to the purpose of the screenshot, which can be specified by the user; specifically, the user-specified method can be based on the screenshot request or preset in the system; for example, user-initiated screenshots The request can include a string of "photographic stealing forensics", which is used to specify the screenshot-related elements as the photographic pictures in the webpage; for example, the user can preset that for all the screenshots of the forum page, the screenshot-related elements All can be designated as the content of forum users' speeches, excluding relevant promotion information and so on.
在本说明书中,判断截图相关的页面元素是否加载完毕可以依据多种不同的标准,本说明书无需进行具体限定;例如,可以从截图相关的页面元素本身角度进行判断其是否加载完毕,也可以从截图不相关的页面元素角度,对截图相关的页面元素是否加载完毕进行间接判断。In this manual, judging whether the screenshot-related page elements are loaded can be based on a variety of different standards, and this manual does not need to be specifically limited; for example, you can judge whether the screenshot-related page elements have been loaded from the perspective of the page elements themselves, or The angle of page elements that are not related to the screenshot is an indirect judgment on whether the page elements related to the screenshot have been loaded.
图8是示出的一种判断截图相关的页面元素是否加载完毕的实施方式的示意图, 在该例中,可以通过判断已加载元素中是否包括截图相关的页面元素中的末尾元素,确定截图相关的页面元素是否加载完毕;如果已加载元素中包括截图相关的页面元素中的末尾元素,则可以确定截图相关的页面元素已经加载完毕;其中,截图相关的页面元素可以如前所述,由用户的截图请求确定,或者,根据用户的预先设置确定;又因为待截图页面在加载过程中会首先收到由页面服务器下发的该页面的页面结构;比如,形如html树状结构的页面结构,因而上述截图相关的页面元素中的末尾元素则可以由该待截图页面的摘要结构确定。FIG. 8 is a schematic diagram showing an implementation manner for judging whether page elements related to screenshots are loaded. In this example, it is possible to determine whether screenshots are relevant by judging whether the loaded elements include the last element in the page elements related to screenshots Whether the page elements of the screenshot have been loaded; if the loaded elements include the last element in the page elements related to the screenshot, it can be determined that the page elements related to the screenshot have been loaded; among them, the page elements related to the screenshot can be determined by the user as described above. The screenshot request is determined, or determined according to the user's preset; and because the page to be captured will first receive the page structure of the page issued by the page server during the loading process; for example, the page structure of the html tree structure Therefore, the last element in the page elements related to the screenshot can be determined by the summary structure of the page to be screenshot.
例如,用户指定要截取某页面中的用户评论内容,则根据html文件的树状结构即可确定上述用户评论内容中的末尾元素;在加载过程中,在检测到该末尾元素加载完成后,即可以认为截图相关的页面元素已经加载完毕。For example, if the user specifies to intercept user comment content in a page, the last element in the user comment content can be determined according to the tree structure of the html file; during the loading process, after detecting that the last element has been loaded, that is It can be considered that the page elements related to the screenshot have been loaded.
在示出的一种实施方式中,可以通过判断已加载元素中是否包含预设的、用于指示截图相关元素可能加载完毕的目标元素;如果是,则可以认为截图相关的页面元素已经加载完毕。In the illustrated embodiment, it can be determined whether the loaded element contains a preset target element used to indicate that the screenshot-related element may be loaded; if so, it can be considered that the screenshot-related page element has been loaded. .
例如,在对某一盗用摄影图片的页面进行截图取证时,由于确定页面底部的APP推广信息显然不属于需要取证的相关页面元素,则在页面加载过程中,检测到APP推广信息作为已加载内容出现了,即可认为截图相关的页面元素(在该例中为,处于页面正文中的被盗用的摄影图片)已经加载完毕。For example, when taking a screenshot for evidence collection on a page with stolen photographic pictures, since it is determined that the APP promotion information at the bottom of the page obviously does not belong to the relevant page elements that require forensics, the APP promotion information is detected as loaded content during the page loading process If it appears, it can be considered that the page element related to the screenshot (in this example, the stolen photographic image in the body of the page) has been loaded.
在示出的一种实施方式中,上述截图无关元素可以是与广告相关的页面元素;例如页面底部的图片广告、分享诱导链接、相关文章推荐等;本领域技术人员可以根据具体需求,自行指定截图无关元素的具体种类。In the illustrated embodiment, the above-mentioned irrelevant elements of the screenshot may be page elements related to advertisements; for example, image advertisements at the bottom of the page, sharing inducing links, related article recommendations, etc.; those skilled in the art can specify by themselves according to specific needs The specific types of screenshot irrelevant elements.
在示出的一种实施方式中,可以通过分段截图的方式,获得最终截图;具体而言,在确定预处理完成的待截图页面的尺寸大于预设的尺寸阈值的情况下,可以将该待截图页面分割为若干个分片,并记录上述若干个分片之间的位置关系,对上述若干个分片分别进行截图后,即可将上述若干个分片的截图,根据所记录的位置关系,拼接为对该待截图页面的截图。In the illustrated embodiment, the final screenshot can be obtained by segmenting screenshots; specifically, in the case where it is determined that the size of the pre-processed page to be screenshot is greater than the preset size threshold, the After the screenshot page is divided into several fragments, and the positional relationship between the above several fragments is recorded, after taking screenshots of the above several fragments, the screenshots of the above several fragments can be taken according to the recorded positions Relationship, spliced into a screenshot of the page to be screenshot.
在本说明书中,用户的截图请求中还可以携带其他客制化的信息,用以实现截图功能的更多自定义特性。In this manual, the user's screenshot request can also carry other customized information to realize more custom features of the screenshot function.
在示出的一种实施方式中,用户发起的截图请求中可以携带用于指示截图规格的规格标识;因此,可以根据用户发起的截图请求中携带的规格标识所指示的截图规格,对待截图页面进行截图;In the illustrated embodiment, the screenshot request initiated by the user may carry a specification identifier for indicating the screenshot specification; therefore, the screenshot page may be treated according to the screenshot specification indicated by the specification identifier carried in the screenshot request initiated by the user. Take a screenshot;
例如,指示用户需要网页截图的图片格式、分辨率、颜色规格等等截图规格的规格标识都可以携带在用户发起的截图请求中,在截图阶段时,即可根据该规格标识所指示的截图规格,对待截图页面进行截图。For example, the specifications of the screenshot specifications, such as the image format, resolution, color specifications, etc. that indicate that the user needs web page screenshots, can be carried in the screenshot request initiated by the user. During the screenshot phase, the screenshot specifications indicated by the specifications can be marked according to the specifications. , To take a screenshot of the page to be taken.
在本说明书中,可以通过对截图得到的图像进行进一步处理,取得更佳的截图效果。In this manual, you can perform further processing on the image obtained by the screenshot to obtain a better screenshot effect.
在示出的一种实施方式中,可以使用用于确定图像中干扰信息在图像中位置的机器学习模型,确定截图得到的图像中的干扰信息在图像中的位置,并进一步通过图像处理,将上述位置的干扰信息从该图像中移除;具体而言,上述机器学习模型可以是,将若干被标记了干扰元素的位置的页面截图作为训练样本,训练得到的机器学习模型;In the illustrated embodiment, a machine learning model for determining the position of the interference information in the image can be used to determine the position of the interference information in the image obtained by the screenshot in the image, and further through image processing, The interference information at the above-mentioned location is removed from the image; specifically, the above-mentioned machine learning model may be a machine learning model obtained by training by taking screenshots of a number of pages marked with interference elements as training samples;
例如,针对某一页面截图获得的图像,其中仍然包含某类广告信息,会对截图造成干扰,因而可以调用用于确定图像中该类广告信息的位置的机器学习模型,从该图像中定位该类广告信息,并通过图像处理算法将其从该图像中去除。应用该方法,能够从图像角度去除截图中的干扰信息,以获得干扰信息更少的截图。For example, an image obtained from a screenshot of a certain page still contains a certain type of advertising information, which will cause interference to the screenshot. Therefore, a machine learning model used to determine the position of this type of advertising information in the image can be called to locate the image from the image. Class advertising information, and remove it from the image through image processing algorithms. By applying this method, the interference information in the screenshot can be removed from the image angle, so as to obtain a screenshot with less interference information.
本说明书还对应提供了一种页面截图装置,请参见图9,图9为该装置的一种结构示例图;该装置包括:URL获取模块901、页面加载模块902、页面预处理模块903、截图执行模块904。This manual also provides a page screenshot device, please refer to Figure 9, Figure 9 is a structural example of the device; the device includes: URL acquisition module 901, page loading module 902, page preprocessing module 903, screenshots Execute module 904.
URL获取模块901,响应于用户发起的截图请求,获取待截图页面的统一资源定位符URL。The URL obtaining module 901 obtains the uniform resource locator URL of the page to be captured in response to the screenshot request initiated by the user.
页面加载模块902,根据所述URL,加载所述待截图页面。The page loading module 902 loads the page to be captured according to the URL.
页面预处理模块903,当所述待截图页面加载完成后,对所述待截图页面进行预处理;所述预处理包括删除所述待截图页面中的截图干扰元素。The page preprocessing module 903 performs preprocessing on the page to be screenshot after the page to be screenshot is loaded; the preprocessing includes deleting the screenshot interference elements in the page to be screenshot.
截图执行模块904,对预处理完成的所述待截图页面进行截图。The screenshot execution module 904 performs screenshots on the pre-processed page to be screenshot.
在本说明书中,URL获取模块901可以响应于用户发起的截图请求,获取待截图页面的统一资源定位符URL。具体而言,该过程有多种实现方式,本说明书不作具体限定;例如,截图请求中可以直接携带待截图页面的URL字段,解析该截图请求即可获得对应URL,截图请求中也可以携带对应于待截图页面的关键字,通过该关键字,也可以间接地获得对应URL。In this specification, the URL obtaining module 901 may obtain the uniform resource locator URL of the page to be captured in response to a screenshot request initiated by the user. Specifically, there are many ways to implement this process, which are not specifically limited in this specification; for example, the URL field of the page to be screenshot can be directly carried in the screenshot request, and the corresponding URL can be obtained by parsing the screenshot request, and the corresponding URL can also be carried in the screenshot request. For the keyword of the page to be screenshot, the corresponding URL can also be obtained indirectly through the keyword.
在示出的一种实施方式中,上述截图请求中可以携带用于指示待截图页面的字符串,从该请求中获取该字符串后,可以进一步解析该字符串,进而获得待截图页面的URL;其中,解析的方法可以为基于自然语言的语义分析,也可以为基于短网址、分享码等特定编码的解析,本领域技术人员可以根据具体情况选择,本说明书不作具体限定;In one embodiment shown, the above screenshot request may carry a character string for indicating the page to be screenshot, after obtaining the character string from the request, the character string can be further parsed to obtain the URL of the page to be screenshot ; Among them, the analysis method can be semantic analysis based on natural language, or analysis based on specific codes such as short URLs, sharing codes, etc. Those skilled in the art can choose according to specific circumstances, and this specification does not specifically limit it;
例如,截图请求中可以携带“支付宝微博”这一用于指示“支付宝”的官方微博页面的字符串,解析该请求并获取该字符串后,可以进一步通过语义分析算法提取该字符串的语义信息,以“支付宝”和“微博”的语义信息,查询预设的语义信息与URL的映射关系表,即可以获得“支付宝”的官方微博页面的URL,作为待截图页面的URL。For example, the screenshot request can carry "Alipay Weibo", a string indicating the official Weibo page of "Alipay." Semantic information, using the semantic information of "Alipay" and "Weibo" to query the preset mapping table of semantic information and URL, the URL of the official Weibo page of "Alipay" can be obtained as the URL of the page to be captured.
在本说明书中,该装置中的页面预处理模块903,对上述待截图页面进行预处理,以取得更好的截图效果;具体而言,预处理的方式可以包括删除待截图页面中的截图干扰元素;例如,删除可能遮挡截图相关元素的浮动广告、推荐信息、快捷按钮等等。In this specification, the page preprocessing module 903 in the device preprocesses the page to be screenshotted to obtain better screenshot effects; specifically, the preprocessing method may include deleting the screenshot interference in the page to be screenshotted. Elements; for example, delete floating ads, recommended information, shortcut buttons, etc. that may obscure elements related to the screenshot.
在示出的一种实施方式中,上述预处理还可以包括其他预处理方式;例如,可以展开页面中的隐藏元素,使得被隐藏的内容(例如被折叠的多层引用评论)得以完整显示并截图;又例如,可以变更指定页面元素的显示样式,增强指定元素的显示效果;又例如,可以为页面中的元素添加截图标记,以突出指示需要关注的内容,等等。In the illustrated embodiment, the above-mentioned pre-processing may also include other pre-processing methods; for example, hidden elements in the page may be expanded, so that the hidden content (for example, folded multi-layered quoted comments) can be fully displayed and displayed. Screenshot; for example, you can change the display style of the specified page element to enhance the display effect of the specified element; for another example, you can add screenshot markers to the elements on the page to highlight the content that needs attention, and so on.
在示出的一种实施方式中,用户的截图请求中可以携带指示预处理方式的需求信息,对应地,上述预处理模块,可以根据上述用户的截图请求中携带的需求信息确定预处理方式,并进一步根据确定出的上述预处理方式,对上述待截图页面进行预处理。In the illustrated embodiment, the user’s screenshot request may carry demand information indicating the preprocessing mode. Correspondingly, the aforementioned preprocessing module may determine the preprocessing method according to the demand information carried in the user’s screenshot request. Furthermore, according to the determined pre-processing method, the above-mentioned page to be screenshot is pre-processed.
在本说明书中,该装置还可以包括动态加载模块,在加载过程中判断截图相关的页面元素是否加载完毕;并响应于截图相关的页面元素加载完毕,停止对待截图页面的加载。In this specification, the device may also include a dynamic loading module, which determines whether the page elements related to the screenshot have been loaded during the loading process; and in response to the completion of the page elements related to the screenshot, the loading of the page to be captured is stopped.
在本说明书中,截图相关元素特指与截图目的相关的元素,其可以由用户指定;具体而言,用户指定的方式可以基于截图请求,也可以在系统中预设;例如,用户发起的截图请求中可以包括“摄影盗用取证”一类的字符串,用于指定截图相关元素为网页中的摄影图片;又例如,用户可以预先设定,对于所有的论坛页面的截图而言,截图相关元素均可以指定为论坛用户发言内容,而不包括相关推广信息等等。In this manual, screenshot-related elements specifically refer to elements related to the purpose of the screenshot, which can be specified by the user; specifically, the user-specified method can be based on the screenshot request or preset in the system; for example, user-initiated screenshots The request can include a string of "photographic stealing forensics", which is used to specify the screenshot-related elements as the photographic pictures in the webpage; for example, the user can preset that for all the screenshots of the forum page, the screenshot-related elements All can be designated as the content of forum users' speeches, excluding relevant promotion information and so on.
在本说明书中,动态加载模块判断截图相关的页面元素是否加载完毕可以依据多种不同的标准,本说明书无需进行具体限定;例如,可以从截图相关的页面元素本身角度进行判断其是否加载完毕,也可以从截图不相关的页面元素角度,对截图相关的页面元素是否加载完毕进行间接判断。In this manual, the dynamic loading module can determine whether the page elements related to the screenshots are loaded according to a variety of different standards, and this manual does not need to make specific restrictions; for example, you can judge whether the page elements related to the screenshots have been loaded from the perspective of the page elements themselves. It is also possible to indirectly judge whether the page elements related to the screenshot have been loaded from the perspective of the page elements that are not related to the screenshot.
在示出的一种实施方式中,上述动态加载模块进一步可以通过判断已加载元素中是否包括截图相关的页面元素中的末尾元素,确定截图相关的页面元素是否加载完毕;如果已加载元素中包括截图相关的页面元素中的末尾元素,则可以确定截图定截图相关的页面元素已经加载完毕;其中,截图相关的页面元素可以如前所述,由用户的截图请求确定,或者,根据预设的截图页面元素类型确定;又因为待截图页面在加载过程中会首先收到该页面的形如html树状结构的摘要结构,因而上述截图相关的页面元素中的末尾元素则可以由该待截图页面的摘要结构确定。In the illustrated embodiment, the above-mentioned dynamic loading module may further determine whether the page elements related to the screenshot are loaded by determining whether the loaded elements include the last element in the page elements related to the screenshot; if the loaded elements include The last element in the screenshot-related page elements, you can determine that the screenshot-related page elements have been loaded; among them, the screenshot-related page elements can be determined by the user's screenshot request as described above, or according to the preset The element type of the screenshot page is determined; and because the page to be screenshot is loaded, it will first receive the summary structure of the page in the form of an html tree structure, so the last element in the page elements related to the screenshot can be used by the page to be screenshot The summary structure is determined.
在示出的一种实施方式中,动态加载模块进一步可以通过判断已加载页面元素中是否包含指示所述截图相关的页面元素加载完毕的截图无关元素,以判断截图相关的页面元素是否加载完毕;如果判断结果为是,则认为截图相关的页面元素已经加载完毕;例如,上述截图无关元素为页底广告,一般而言页底广告的出现即意味着页面中截图相关的页面元素全部加载完毕,假设该页底广告已经加载完成,则可以作出判断,即该页面中所有截图相关的页面元素全部加载完毕。In the illustrated embodiment, the dynamic loading module may further determine whether the page elements related to the screenshot are loaded by determining whether the loaded page elements include a screenshot irrelevant element indicating that the page element related to the screenshot is loaded is completed; If the judgment result is yes, it is considered that the page elements related to the screenshot have been loaded; for example, the above-mentioned irrelevant elements of the screenshot are bottom-page advertisements. Generally speaking, the appearance of the bottom-page advertisement means that all the page elements related to the screenshots on the page have been loaded. Assuming that the advertisement at the bottom of the page has been loaded, a judgment can be made, that is, all page elements related to the screenshot on the page have been loaded.
在示出的一种实施方式中,上述截图无关元素可以为与广告相关的页面元素;例如,页面底部的图片广告、分享诱导链接、相关文章推荐等等。In the illustrated embodiment, the above-mentioned irrelevant element of the screenshot may be a page element related to an advertisement; for example, an image advertisement at the bottom of the page, a sharing inducement link, a recommendation of a related article, and so on.
在本说明书中,截图执行模块904可以对上述预处理后的待截图页面进行截图,具体进行截图的方式可以参考相关技术,本说明书不作具体限制。In this specification, the screenshot execution module 904 can take a screenshot of the preprocessed page to be screenshot, and the specific way of taking the screenshot can refer to related technologies, and this specification does not make specific restrictions.
在示出的一种实施方式中,截图执行模块904可以通过分段截图的方式,获得最终截图;具体而言,在预处理后的待截图页面的尺寸大于预设的尺寸阈值的情况下,可以将待预处理后的待截图页面分割为若干个分片,并记录上述若干个分片之间的位置关系,对上述若干个分片分别进行截图后,即可将上述若干个分片的截图,根据所记录的位置关系,拼接为对该预处理后的待截图页面的截图。In the illustrated embodiment, the screenshot execution module 904 can obtain the final screenshot by segmented screenshot; specifically, when the size of the preprocessed page to be screenshot is greater than the preset size threshold, You can divide the page to be screenshot to be preprocessed into several fragments, and record the positional relationship between the above several fragments. After taking screenshots of the above several fragments, you can combine the above several fragments. The screenshot is spliced into a screenshot of the preprocessed page to be screenshot based on the recorded position relationship.
在本说明书中,用户的截图请求中还可以携带其他客制化的信息,用以实现截图功能的更多自定义特性。In this manual, the user's screenshot request can also carry other customized information to realize more custom features of the screenshot function.
在示出的一种实施方式中,用户发起的截图请求中可以携带用于指示截图规格的规格标识;因此,上述截图执行模块904可以根据用户发起的截图请求中携带的规格标识所指示的截图规格,对预处理后的待截图页面进行截图。In the illustrated embodiment, the screenshot request initiated by the user may carry a specification identifier for indicating the screenshot specification; therefore, the above-mentioned screenshot execution module 904 may perform screenshots according to the specification identifier carried in the screenshot request initiated by the user. Specifications, take screenshots of the pre-processed page to be screenshot.
在本说明书中,该装置还可以包括图像处理模块,可以通过对截图得到的图像进行进一步处理,取得更佳的截图效果。In this specification, the device may also include an image processing module, which can further process the image obtained by the screenshot to obtain a better screenshot effect.
在示出的一种实施方式中,上述图像处理模块可以使用将若干被标记了干扰元素 的位置的页面截图作为训练样本,训练得到的机器学习模型,来对截图得到的图像进行进一步处理;具体而言,可以将上述图像输入该训练完成的机器学习模型,以确定干扰元素在该图像中的所在位置,进一步基于该位置,对该图像进行图片处理,删除上述干扰元素。In the illustrated embodiment, the above-mentioned image processing module may use the machine learning model obtained by training several screenshots of pages with the locations of the interference elements marked as training samples to further process the images obtained by the screenshots; specifically; In other words, the above-mentioned image can be input to the trained machine learning model to determine the position of the interference element in the image, and further based on the position, image processing is performed on the image to delete the above-mentioned interference element.
上述装置中各个模块的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。For the implementation process of the functions and roles of each module in the above-mentioned device, please refer to the implementation process of the corresponding steps in the above-mentioned method for details, which will not be repeated here.
本说明书实施例还提供一种计算机设备,其至少包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,处理器执行所述程序时实现前述的页面截图方法。The embodiments of this specification also provide a computer device, which includes at least a memory, a processor, and a computer program stored on the memory and capable of running on the processor, wherein the processor implements the aforementioned page screenshot method when the program is executed.
图10示出了本说明书实施例所提供的一种更为具体的计算设备硬件结构示意图,该设备可以包括:处理器1010、存储器1020、输入/输出接口1030、通信接口1040和总线1050。其中处理器1010、存储器1020、输入/输出接口1030和通信接口1040通过总线1050实现彼此之间在设备内部的通信连接。FIG. 10 shows a more specific hardware structure diagram of a computing device provided by an embodiment of this specification. The device may include a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. The processor 1010, the memory 1020, the input/output interface 1030, and the communication interface 1040 realize the communication connection between each other in the device through the bus 1050.
处理器1010可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,执行相关程序,以实现本说明书实施例所提供的技术方案。The processor 1010 may be implemented by a general CPU (Central Processing Unit, central processing unit), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc., and execute related programs. In order to realize the technical solutions provided in the embodiments of this specification.
存储器1020可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器1020可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器1020中,并由处理器1010来调用执行。The memory 1020 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc. The memory 1020 may store an operating system and other application programs. When the technical solutions provided in the embodiments of this specification are implemented by software or firmware, related program codes are stored in the memory 1020 and called and executed by the processor 1010.
输入/输出接口1030用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。The input/output interface 1030 is used to connect an input/output module to realize information input and output. The input/output/module can be configured in the device as a component (not shown in the figure), or can be connected to the device to provide corresponding functions. The input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and an output device may include a display, a speaker, a vibrator, an indicator light, and the like.
通信接口1040用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。The communication interface 1040 is used to connect a communication module (not shown in the figure) to realize the communication interaction between the device and other devices. The communication module can realize communication through wired means (such as USB, network cable, etc.), or through wireless means (such as mobile network, WIFI, Bluetooth, etc.).
总线1050包括一通路,在设备的各个组件(例如处理器1010、存储器1020、输入/输出接口1030和通信接口1040)之间传输信息。The bus 1050 includes a path to transmit information between various components of the device (for example, the processor 1010, the memory 1020, the input/output interface 1030, and the communication interface 1040).
需要说明的是,尽管上述设备仅示出了处理器1010、存储器1020、输入/输出接口1030、通信接口1040以及总线1050,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本说明书实施例方案所必需的组件,而不必包含图中所示的全部组件。It should be noted that although the above device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040, and the bus 1050, in the specific implementation process, the device may also include the equipment necessary for normal operation. Other components. In addition, those skilled in the art can understand that the above-mentioned device may also include only the components necessary to implement the solutions of the embodiments of the present specification, and not necessarily include all the components shown in the figures.
本说明书实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现前述的页面截图方法。The embodiment of the present specification also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the aforementioned page screenshot method is implemented.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储 器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology. The information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本说明书实施例可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本说明书实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本说明书实施例各个实施例或者实施例的某些部分所述的方法。From the description of the foregoing implementation manners, it can be understood that those skilled in the art can clearly understand that the embodiments of this specification can be implemented by means of software plus a necessary general hardware platform. Based on this understanding, the technical solutions of the embodiments of this specification can be embodied in the form of software products, which can be stored in storage media, such as ROM/RAM, A magnetic disk, an optical disk, etc., include several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in the various embodiments or some parts of the embodiments of this specification.
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。The systems, devices, modules, or units illustrated in the above embodiments may be specifically implemented by computer chips or entities, or implemented by products with certain functions. A typical implementation device is a computer. The specific form of the computer can be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email receiving and sending device, and a game control A console, a tablet computer, a wearable device, or a combination of any of these devices.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,在实施本说明书实施例方案时可以把各模块的功能在同一个或多个软件和/或硬件中实现。也可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。The various embodiments in this specification are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment. The device embodiments described above are merely illustrative. The modules described as separate components may or may not be physically separated. The functions of the modules can be combined in the same way when implementing the solutions of the embodiments of this specification. Or multiple software and/or hardware implementations. It is also possible to select some or all of the modules according to actual needs to achieve the objectives of the solutions of the embodiments. Those of ordinary skill in the art can understand and implement without creative work.
以上所述仅是本说明书实施例的具体实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本说明书实施例原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本说明书实施例的保护范围。The above are only specific implementations of the embodiments of this specification. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of the embodiments of this specification, several improvements and modifications can be made. These Improvements and retouching should also be regarded as the protection scope of the embodiments of this specification.

Claims (46)

  1. 一种页面截图方法,包括:A page screenshot method, including:
    响应于用户发起的截图请求,获取待截图页面的统一资源定位符URL;In response to the screenshot request initiated by the user, obtain the uniform resource locator URL of the page to be screenshotted;
    根据所述URL加载所述待截图页面,并在加载过程中判断截图相关的页面元素是否加载完毕;其中,所述截图相关的页面元素为用户指定的页面元素;Load the page to be screenshotted according to the URL, and determine whether the page element related to the screenshot has been loaded during the loading process; wherein the page element related to the screenshot is a page element designated by the user;
    响应于截图相关的页面元素加载完毕,停止所述加载所述待截图页面,并对所述待截图页面中已加载完成的部分进行截图。In response to the completion of loading of page elements related to the screenshot, the loading of the page to be screenshot is stopped, and a screenshot of the loaded part of the page to be screenshot is performed.
  2. 根据权利要求1所述的方法,所述用户发起的截图请求携带所述URL或者用于指示待截图页面的URL的字符串;The method according to claim 1, wherein the screenshot request initiated by the user carries the URL or a string used to indicate the URL of the page to be screenshotted;
    所述获取待截图页面的统一资源定位符URL,包括:The obtaining the uniform resource locator URL of the page to be screenshot includes:
    获取所述用户发起的截图请求中携带的所述URL;或者,Obtain the URL carried in the screenshot request initiated by the user; or,
    获取所述用户发起的截图请求中携带的所述字符串,并解析所述字符串,以获取所述字符串所指示的URL。Obtain the character string carried in the screenshot request initiated by the user, and parse the character string to obtain the URL indicated by the character string.
  3. 根据权利要求1所述的方法,所述判断截图相关的页面元素是否加载完毕,包括:The method according to claim 1, wherein said determining whether the page elements related to the screenshot are loaded completely comprises:
    判断已加载页面元素中是否包含指示所述截图相关的页面元素加载完毕的截图无关元素;Judging whether the loaded page element contains a screenshot irrelevant element indicating that the screenshot-related page element has been loaded;
    如果是,则确定截图相关的页面元素已经加载完毕。If it is, it is determined that the page elements related to the screenshot have been loaded.
  4. 根据权利要求3所述的方法,所述截图无关元素包括与广告相关的页面元素。The method according to claim 3, wherein the screenshot irrelevant elements include page elements related to advertisements.
  5. 根据权利要求1所述的方法,所述判断截图相关的页面元素是否加载完毕,包括:The method according to claim 1, wherein said determining whether the page elements related to the screenshot are loaded completely comprises:
    基于所述待截图页面的页面结构,确定截图相关的页面元素中的末尾元素;Determine the last element of the page elements related to the screenshot based on the page structure of the page to be screenshotted;
    判断已加载元素中是否包括所述末尾元素;Determine whether the last element is included in the loaded element;
    如果已加载元素中包括所述末尾元素,则确定截图相关的页面元素已经加载完毕。If the last element is included in the loaded elements, it is determined that the page elements related to the screenshot have been loaded.
  6. 根据权利要求1所述的方法,所述对所述待截图页面中已加载完成的部分进行截图之前,还包括:The method according to claim 1, before taking a screenshot of the loaded part of the page to be screenshot, the method further comprises:
    对所述待截图页面中已加载完成的部分进行预处理;所述预处理包括,删除所述待截图页面中的截图干扰元素。Preprocessing is performed on the loaded part of the page to be screenshot; the preprocessing includes deleting the screenshot interference elements in the page to be screenshot.
  7. 根据权利要求6所述的方法,所述预处理还包括下列示出的任意一种或多种预处理方式的组合:The method according to claim 6, wherein the preprocessing further comprises any one or a combination of the following preprocessing methods:
    展开页面中的隐藏元素;Expand hidden elements in the page;
    变更指定页面元素的展示样式;Change the display style of the specified page elements;
    为页面中的元素增加截图标记。Add screenshot markers to elements on the page.
  8. 根据权利要求6所述的方法,所述用户的截图请求中携带指示预处理方式的需求信息;The method according to claim 6, wherein the user's screenshot request carries demand information indicating a preprocessing mode;
    所述对所述待截图页面进行预处理,包括:The preprocessing of the page to be screenshot includes:
    根据所述用户的截图请求中携带的所述需求信息,确定预处理方式,并根据确定出的所述预处理方式,对所述待截图页面进行预处理。Determine a preprocessing mode according to the demand information carried in the screenshot request of the user, and perform preprocessing on the page to be captured according to the determined preprocessing mode.
  9. 根据权利要求1所述的方法,所述对所述待截图页面中已加载完成的部分进行截图,包括:The method according to claim 1, wherein said taking a screenshot of the loaded part of the page to be screenshot, comprising:
    确定所述待截图页面中已加载完成的部分的尺寸是否大于预设阈值;Determining whether the size of the loaded part of the page to be screenshot is greater than a preset threshold;
    如果是,将待截图页面中已加载完成的部分分割为若干个分片,并记录所述若干个分片之间的位置关系;If so, divide the loaded part of the page to be screenshot into several fragments, and record the positional relationship between the several fragments;
    对所述若干个分片分别进行截图;Take screenshots of the several segments respectively;
    将所述若干个分片的截图,根据所记录的位置关系,拼接为对所述待截图页面中已加载完成的部分的截图。The screenshots of the several fragments are spliced into screenshots of the loaded part of the page to be screenshot according to the recorded position relationship.
  10. 根据权利要求1所述的方法,所述方法还包括:The method according to claim 1, further comprising:
    将截图得到的所述待截图页面的图像输入识别模型,以识别截图得到的图像中的干扰元素所在位置;其中,所述识别模型为将若干被标记了干扰元素的位置的页面截图作为训练样本,训练得到的机器学习模型;The image of the page to be screenshot obtained by the screenshot is input into the recognition model to identify the position of the interference element in the image obtained by the screenshot; wherein the recognition model is to take screenshots of a number of pages marked with the position of the interference element as a training sample , The trained machine learning model;
    将截图得到的图像中位于识别出的所述位置上的页面元素作为干扰元素进行删除。The page element located at the identified position in the image obtained by the screenshot is deleted as an interference element.
  11. 根据权利要求1所述的方法,所述用户发起的截图请求中携带用于指示截图规格的规格标识;The method according to claim 1, wherein the screenshot request initiated by the user carries a specification identifier for indicating the screenshot specification;
    所述对所述待截图页面中已加载完成的部分进行截图,包括:The screenshot of the loaded part of the page to be screenshot includes:
    根据所述用户发起的截图请求中携带的所述规格标识所指示的截图规格,对所述待截图页面中已加载完成的部分进行截图。According to the screenshot specification indicated by the specification identifier carried in the screenshot request initiated by the user, a screenshot is taken of the loaded part of the page to be screenshot.
  12. 一种页面截图方法,包括:A page screenshot method, including:
    响应于用户发起的截图请求,获取待截图页面的统一资源定位符URL;In response to the screenshot request initiated by the user, obtain the uniform resource locator URL of the page to be screenshotted;
    根据所述URL加载所述待截图页面;Load the page to be captured according to the URL;
    当所述待截图页面加载完成后,对所述待截图页面进行预处理;所述预处理包括删除所述待截图页面中的截图干扰元素;After the loading of the page to be screenshot is completed, preprocessing the page to be screenshot; the preprocessing includes deleting the screenshot interference elements in the page to be screenshot;
    对预处理完成的所述待截图页面进行截图。Take a screenshot of the pre-processed page to be screenshot.
  13. 根据权利要求12所述的方法,所述用户发起的截图请求携带所述URL,或者用于指示待截图页面URL的字符串;The method according to claim 12, wherein the screenshot request initiated by the user carries the URL, or a string used to indicate the URL of the page to be screenshotted;
    所述获取待截图页面的统一资源定位符URL,包括:The obtaining the uniform resource locator URL of the page to be screenshot includes:
    获取用户发起的截图请求中携带的所述URL;或者,Obtain the URL carried in the screenshot request initiated by the user; or,
    获取所述用户发起的截图请求中携带的所述字符串,并解析所述字符串,以获取所述字符串所指示的URL地址。Obtain the character string carried in the screenshot request initiated by the user, and parse the character string to obtain the URL address indicated by the character string.
  14. 根据权利要求12所述的方法,所述预处理还包括下列示出的任意一种或多种预处理方式的组合:The method according to claim 12, the preprocessing further comprises any one or a combination of the following preprocessing methods:
    展开页面中的隐藏元素;Expand hidden elements in the page;
    变更指定页面元素的展示样式;Change the display style of the specified page elements;
    为页面中的元素增加截图标记。Add screenshot markers to elements on the page.
  15. 根据权利要求12所述的方法,所述用户的截图请求中携带指示预处理方式的需求信息;The method according to claim 12, wherein the user's screenshot request carries demand information indicating a preprocessing mode;
    所述对所述待截图页面进行预处理,包括:The preprocessing of the page to be screenshot includes:
    根据所述用户的截图请求中携带的所述需求信息,确定预处理方式,并根据确定出的所述预处理方式,对所述待截图页面进行预处理。Determine a preprocessing mode according to the demand information carried in the screenshot request of the user, and perform preprocessing on the page to be captured according to the determined preprocessing mode.
  16. 根据权利要求12所述的方法,所述方法还包括:The method according to claim 12, the method further comprising:
    在所述待截图页面被加载的过程中,判断截图相关的页面元素是否加载完毕;其中,所述截图相关的页面元素为用户指定的页面元素;In the process that the page to be screenshot is loaded, it is determined whether the page element related to the screenshot is loaded; wherein the page element related to the screenshot is a page element designated by the user;
    响应于截图相关的页面元素加载完毕,停止所述加载。In response to the completion of loading of the page elements related to the screenshot, the loading is stopped.
  17. 根据权利要求16所述的方法,所述判断截图相关的页面元素是否加载完毕,包括:The method according to claim 16, said determining whether the page elements related to the screenshot have been loaded, comprising:
    判断已加载页面元素中是否包含指示所述截图相关的页面元素加载完毕的截图无关元素;Judging whether the loaded page element contains a screenshot irrelevant element indicating that the screenshot-related page element has been loaded;
    如果是,则确定截图相关的页面元素已经加载完毕。If it is, it is determined that the page elements related to the screenshot have been loaded.
  18. 根据权利要求17所述的方法,所述截图无关元素包括与广告相关的页面元素。The method according to claim 17, wherein the screenshot irrelevant elements include page elements related to advertisements.
  19. 根据权利要求16所述的方法,所述判断截图相关的页面元素是否加载完毕,包括:The method according to claim 16, said determining whether the page elements related to the screenshot have been loaded, comprising:
    基于所述待截图页面的页面结构,确定截图相关的页面元素中的末尾元素;Determine the last element of the page elements related to the screenshot based on the page structure of the page to be screenshotted;
    判断已加载元素中是否包括所述末尾元素;Determine whether the last element is included in the loaded element;
    如果已加载元素中包括所述末尾元素,则确定截图相关的页面元素已经加载完毕。If the last element is included in the loaded elements, it is determined that the page elements related to the screenshot have been loaded.
  20. 根据权利要求12所述的方法,所述对预处理完成的所述待截图页面进行截图,包括:The method according to claim 12, wherein the screenshot of the page to be screenshot after the preprocessing is completed includes:
    确定所述预处理完成的所述待截图页面的尺寸是否大于预设的尺寸阈值;Determining whether the size of the page to be screenshot after the preprocessing is completed is greater than a preset size threshold;
    如果是,将预处理完成的所述待截图页面分割为若干个分片,并记录所述若干个分片之间的位置关系;If yes, divide the pre-processed page to be screenshot into several fragments, and record the positional relationship between the several fragments;
    对所述若干个分片分别进行截图;Take screenshots of the several segments respectively;
    将所述若干个分片的截图,根据所记录的位置关系,拼接为对所述预处理完成的所述待截图页面的截图。The screenshots of the several fragments are spliced into a screenshot of the page to be screenshot that is completed by the preprocessing according to the recorded position relationship.
  21. 根据权利要求12所述的方法,所述方法还包括:The method according to claim 12, the method further comprising:
    将截图得到的所述待截图页面的图像输入识别模型,以识别截图得到的图像中的干扰元素所在位置;其中,所述识别模型为将若干被标记了干扰元素的位置的页面截图作为训练样本,训练得到的机器学习模型;The image of the page to be screenshot obtained by the screenshot is input into the recognition model to identify the position of the interference element in the image obtained by the screenshot; wherein the recognition model is to take screenshots of a number of pages marked with the position of the interference element as a training sample , The trained machine learning model;
    将截图得到的图像中位于识别出的所述位置上的页面元素作为干扰元素进行删除。The page element located at the identified position in the image obtained by the screenshot is deleted as an interference element.
  22. 根据权利要求12所述的方法,所述用户发起的截图请求中携带用于指示截图规格的规格标识;The method according to claim 12, wherein the screenshot request initiated by the user carries a specification identifier for indicating the screenshot specification;
    所述对预处理完成的所述待截图页面进行截图,包括:The screenshot of the pre-processed page to be screenshotted includes:
    根据所述用户发起的截图请求中携带的所述规格标识所指示的截图规格,对所述预处理完成的所述待截图页面进行截图。According to the screenshot specification indicated by the specification identifier carried in the screenshot request initiated by the user, a screenshot is performed on the page to be screenshot after the preprocessing is completed.
  23. 一种页面截图装置,包括:A page screenshot device, including:
    URL获取模块,响应于用户发起的截图请求,获取待截图页面的统一资源定位符URL;The URL acquisition module, in response to the screenshot request initiated by the user, acquires the uniform resource locator URL of the page to be screenshotted;
    页面加载模块,根据所述URL加载所述待截图页面,并在加载过程中判断截图相关的页面元素是否加载完毕;其中,所述截图相关的页面元素为用户指定的页面元素;The page loading module loads the page to be screenshotted according to the URL, and determines whether the page elements related to the screenshot have been loaded during the loading process; wherein the page elements related to the screenshot are page elements specified by the user;
    执行模块,响应于截图相关的页面元素加载完毕,停止所述加载所述待截图页面,并对所述待截图页面中已加载完成的部分进行截图。The execution module, in response to the completion of loading of page elements related to the screenshot, stops the loading of the page to be screenshot, and takes a screenshot of the loaded part of the page to be screenshot.
  24. 根据权利要求23所述的装置,所述用户发起的截图请求携带所述URL或者用于指示待截图页面的URL的字符串;The device according to claim 23, wherein the screenshot request initiated by the user carries the URL or a string used to indicate the URL of the page to be screenshotted;
    所述URL获取模块进一步:The URL acquisition module further:
    获取所述用户发起的截图请求中携带的所述URL;或者,Obtain the URL carried in the screenshot request initiated by the user; or,
    获取所述用户发起的截图请求中携带的所述字符串,并解析所述字符串,以获取所述字符串所指示的URL。Obtain the character string carried in the screenshot request initiated by the user, and parse the character string to obtain the URL indicated by the character string.
  25. 根据权利要求23所述的装置,所述页面加载模块进一步:The device according to claim 23, the page loading module further:
    判断已加载页面元素中是否包含指示所述截图相关的页面元素加载完毕的截图无关元素;Judging whether the loaded page element contains a screenshot irrelevant element indicating that the screenshot-related page element has been loaded;
    如果是,则确定截图相关的页面元素已经加载完毕。If it is, it is determined that the page elements related to the screenshot have been loaded.
  26. 根据权利要求25所述的装置,所述截图无关元素包括与广告相关的页面元素。The apparatus according to claim 25, wherein the screenshot irrelevant elements include page elements related to advertisements.
  27. 根据权利要求23所述的装置,所述页面加载模块进一步:The device according to claim 23, the page loading module further:
    基于所述待截图页面的页面结构,确定截图相关的页面元素中的末尾元素;Determine the last element of the page elements related to the screenshot based on the page structure of the page to be screenshotted;
    判断已加载元素中是否包括所述末尾元素;Determine whether the last element is included in the loaded element;
    如果已加载元素中包括所述末尾元素,则确定截图相关的页面元素已经加载完毕。If the last element is included in the loaded elements, it is determined that the page elements related to the screenshot have been loaded.
  28. 根据权利要求23所述的装置,所述装置还包括:The device according to claim 23, the device further comprising:
    预处理模块,对所述待截图页面中已加载完成的部分进行预处理;所述预处理包括,删除所述待截图页面中的截图干扰元素。The preprocessing module performs preprocessing on the loaded part of the page to be screenshot; the preprocessing includes deleting the screenshot interference elements in the page to be screenshot.
  29. 根据权利要求28所述的装置,所述预处理还包括下列示出的任意一种或多种预处理方式的组合:The device according to claim 28, wherein the preprocessing further comprises any one or a combination of the following preprocessing methods:
    展开页面中的隐藏元素;Expand hidden elements in the page;
    变更指定页面元素的展示样式;Change the display style of the specified page elements;
    为页面中的元素增加截图标记。Add screenshot markers to elements on the page.
  30. 根据权利要求28所述的装置,所述用户的截图请求中携带指示预处理方式的需求信息;The device according to claim 28, wherein the user's screenshot request carries demand information indicating a preprocessing mode;
    所述预处理模块进一步:The preprocessing module further:
    根据所述用户的截图请求中携带的所述需求信息,确定预处理方式,并根据确定出的所述预处理方式,对所述待截图页面进行预处理。Determine a preprocessing mode according to the demand information carried in the screenshot request of the user, and perform preprocessing on the page to be captured according to the determined preprocessing mode.
  31. 根据权利要求23所述的装置,所述执行模块进一步:The device according to claim 23, the execution module further:
    确定所述待截图页面中已加载完成的部分的尺寸是否大于预设阈值;Determining whether the size of the loaded part of the page to be screenshot is greater than a preset threshold;
    如果是,将待截图页面中已加载完成的部分分割为若干个分片,并记录所述若干个分片之间的位置关系;If so, divide the loaded part of the page to be screenshot into several fragments, and record the positional relationship between the several fragments;
    对所述若干个分片分别进行截图;Take screenshots of the several segments respectively;
    将所述若干个分片的截图,根据所记录的位置关系,拼接为对所述待截图页面中已加载完成的部分的截图。The screenshots of the several fragments are spliced into screenshots of the loaded part of the page to be screenshot according to the recorded position relationship.
  32. 根据权利要求23所述的装置,所述装置还包括图像处理模块,The device according to claim 23, the device further comprising an image processing module,
    将截图得到的所述待截图页面的图像输入识别模型,以识别截图得到的图像中的干扰元素所在位置;其中,所述识别模型为将若干被标记了干扰元素的位置的页面截图作为训练样本,训练得到的机器学习模型;The image of the page to be screenshot obtained by the screenshot is input into the recognition model to identify the position of the interference element in the image obtained by the screenshot; wherein the recognition model is to take screenshots of a number of pages marked with the position of the interference element as a training sample , The trained machine learning model;
    将截图得到的图像中位于识别出的所述位置上的页面元素作为干扰元素进行删除。The page element located at the identified position in the image obtained by the screenshot is deleted as an interference element.
  33. 根据权利要求23所述的装置,所述用户发起的截图请求中携带用于指示截图规格的规格标识;The device according to claim 23, wherein the screenshot request initiated by the user carries a specification identifier for indicating a screenshot specification;
    所述执行模块进一步:The execution module further:
    响应于截图相关的页面元素加载完毕,停止所述加载,并根据所述用户发起的截图请求中携带的所述规格标识所指示的截图规格,对所述待截图页面中已加载完成的部分进行截图。In response to the completion of the loading of the page elements related to the screenshot, the loading is stopped, and according to the screenshot specification indicated by the specification identifier carried in the screenshot request initiated by the user, the loaded part of the page to be screenshot is performed screenshot.
  34. 一种页面截图装置,包括:A page screenshot device, including:
    URL获取模块,响应于用户发起的截图请求,获取待截图页面的统一资源定位符URL;The URL acquisition module, in response to the screenshot request initiated by the user, acquires the uniform resource locator URL of the page to be screenshotted;
    页面加载模块,根据所述URL,加载所述待截图页面;The page loading module loads the page to be captured according to the URL;
    页面预处理模块,当所述待截图页面加载完成后,对所述待截图页面进行预处理;所述预处理包括删除所述待截图页面中的截图干扰元素;The page preprocessing module, when the page to be screenshot is loaded, preprocesses the page to be screenshot; the preprocessing includes deleting the screenshot interference elements in the page to be screenshot;
    截图执行模块,对预处理完成的所述待截图页面进行截图。The screenshot execution module takes screenshots of the pre-processed page to be screenshot.
  35. 根据权利要求34所述的装置,所述用户发起的截图请求携带所述URL,或者用于指示待截图页面URL的字符串;The device according to claim 34, wherein the screenshot request initiated by the user carries the URL, or a string used to indicate the URL of the page to be screenshotted;
    所述URL获取模块进一步:The URL acquisition module further:
    获取用户发起的截图请求中携带的所述URL;或者,Obtain the URL carried in the screenshot request initiated by the user; or,
    获取所述用户发起的截图请求中携带的所述字符串,并解析所述字符串,以获取所述字符串所指示的URL地址。Obtain the character string carried in the screenshot request initiated by the user, and parse the character string to obtain the URL address indicated by the character string.
  36. 根据权利要求34所述的装置,所述预处理还包括下列示出的任意一种或多种预处理方式的组合:The device according to claim 34, wherein the preprocessing further comprises any one or a combination of the following preprocessing methods:
    展开页面中的隐藏元素;Expand hidden elements in the page;
    变更指定页面元素的展示样式;Change the display style of the specified page elements;
    为页面中的元素增加截图标记。Add screenshot markers to elements on the page.
  37. 根据权利要求34所述的装置,所述用户的截图请求中携带指示预处理方式的需求信息;The device according to claim 34, wherein the user's screenshot request carries demand information indicating a preprocessing mode;
    所述页面预处理模块进一步:The page preprocessing module further:
    根据所述用户的截图请求中携带的所述需求信息,确定预处理方式,并根据确定出的所述预处理方式,对所述待截图页面进行预处理。Determine a preprocessing mode according to the demand information carried in the screenshot request of the user, and perform preprocessing on the page to be captured according to the determined preprocessing mode.
  38. 根据权利要求34所述的装置,所述装置还包括动态加载模块,The device according to claim 34, further comprising a dynamic loading module,
    在所述待截图页面被加载的过程中,判断截图相关的页面元素是否加载完毕;其中,所述截图相关的页面元素为用户指定的页面元素;In the process that the page to be screenshot is loaded, it is determined whether the page element related to the screenshot is loaded; wherein the page element related to the screenshot is a page element designated by the user;
    响应于截图相关的页面元素加载完毕,停止所述加载。In response to the completion of loading of the page elements related to the screenshot, the loading is stopped.
  39. 根据权利要求38所述的装置,所述动态加载模块进一步:The device according to claim 38, the dynamic loading module further:
    判断已加载页面元素中是否包含指示所述截图相关的页面元素加载完毕的截图无关元素;Judging whether the loaded page element contains a screenshot irrelevant element indicating that the screenshot-related page element has been loaded;
    如果是,则确定截图相关的页面元素已经加载完毕。If it is, it is determined that the page elements related to the screenshot have been loaded.
  40. 根据权利要求39所述的装置,所述截图无关元素包括与广告相关的页面元素。The apparatus of claim 39, wherein the screenshot irrelevant elements include page elements related to advertisements.
  41. 根据权利要求38所述的装置,所述动态加载模块进一步:The device according to claim 38, the dynamic loading module further:
    基于所述待截图页面的页面结构,确定截图相关的页面元素中的末尾元素;Determine the last element of the page elements related to the screenshot based on the page structure of the page to be screenshotted;
    判断已加载元素中是否包括所述末尾元素;Determine whether the last element is included in the loaded element;
    如果已加载元素中包括所述末尾元素,则确定截图相关的页面元素已经加载完毕。If the last element is included in the loaded elements, it is determined that the page elements related to the screenshot have been loaded.
  42. 根据权利要求34所述的装置,所述截图执行模块进一步:The device according to claim 34, the screenshot execution module further:
    确定所述预处理完成的所述待截图页面的尺寸是否大于预设的尺寸阈值;Determining whether the size of the page to be screenshot after the preprocessing is completed is greater than a preset size threshold;
    如果是,将预处理完成的所述待截图页面分割为若干个分片,并记录所述若干个分片之间的位置关系;If yes, divide the pre-processed page to be screenshot into several fragments, and record the positional relationship between the several fragments;
    对所述若干个分片分别进行截图;Take screenshots of the several segments respectively;
    将所述若干个分片的截图,根据所记录的位置关系,拼接为对所述预处理完成的所述待截图页面的截图。The screenshots of the several fragments are spliced into a screenshot of the page to be screenshot that is completed by the preprocessing according to the recorded position relationship.
  43. 根据权利要求34所述的装置,所述装置还包括图像处理模块,The device according to claim 34, further comprising an image processing module,
    将截图得到的所述待截图页面的图像输入识别模型,以识别截图得到的图像中的干扰元素所在位置;其中,所述识别模型为将若干被标记了干扰元素的位置的页面截图作为训练样本,训练得到的机器学习模型;The image of the page to be screenshot obtained by the screenshot is input into the recognition model to identify the position of the interference element in the image obtained by the screenshot; wherein the recognition model is to take screenshots of a number of pages marked with the position of the interference element as a training sample , The trained machine learning model;
    将截图得到的图像中位于识别出的所述位置上的页面元素作为干扰元素进行删除。The page element located at the identified position in the image obtained by the screenshot is deleted as an interference element.
  44. 根据权利要求34所述的装置,所述用户发起的截图请求中携带用于指示截图规格的规格标识;The device according to claim 34, wherein the screenshot request initiated by the user carries a specification identifier for indicating the screenshot specification;
    所述截图执行模块进一步:The screenshot execution module further:
    根据所述用户发起的截图请求中携带的所述规格标识所指示的截图规格,对所述预 处理完成的所述待截图页面进行截图。According to the screenshot specification indicated by the specification identifier carried in the screenshot request initiated by the user, a screenshot is performed on the page to be screenshot after the pre-processing is completed.
  45. 一种计算机设备,其至少包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,处理器执行所述程序时实现权利要求1~11任一所述的方法。A computer device comprising at least a memory, a processor, and a computer program stored on the memory and capable of running on the processor, wherein the processor implements the method described in any one of claims 1-11 when the processor executes the program.
  46. 一种计算机设备,其至少包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,处理器执行所述程序时实现权利要求12~22任一所述的方法。A computer device comprising at least a memory, a processor, and a computer program stored on the memory and capable of running on the processor, wherein the processor implements the method according to any one of claims 12-22 when the processor executes the program.
PCT/CN2020/140556 2020-03-20 2020-12-29 Page screenshot method and device WO2021184896A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010200265.1 2020-03-20
CN202010200265.1A CN111428162A (en) 2020-03-20 2020-03-20 Page screenshot method and device

Publications (1)

Publication Number Publication Date
WO2021184896A1 true WO2021184896A1 (en) 2021-09-23

Family

ID=71549674

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/140556 WO2021184896A1 (en) 2020-03-20 2020-12-29 Page screenshot method and device

Country Status (2)

Country Link
CN (1) CN111428162A (en)
WO (1) WO2021184896A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691962A (en) * 2022-04-25 2022-07-01 清华大学 Mobile terminal page crawler method and device and electronic equipment

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111428162A (en) * 2020-03-20 2020-07-17 支付宝(杭州)信息技术有限公司 Page screenshot method and device
CN112596833B (en) * 2020-12-21 2024-08-20 三六零数字安全科技集团有限公司 Webpage screenshot generation method, device, equipment and storage medium
CN114047985A (en) * 2021-10-21 2022-02-15 盐城金堤科技有限公司 Screenshot method, screenshot device, storage medium and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020240A (en) * 2017-09-28 2019-07-16 北京国双科技有限公司 A kind of webpage capture method, apparatus, storage medium and processor
CN110020231A (en) * 2017-07-25 2019-07-16 阿里巴巴集团控股有限公司 Webpage capture method and device thereof
WO2019200783A1 (en) * 2018-04-18 2019-10-24 平安科技(深圳)有限公司 Method for data crawling in page containing dynamic image or table, device, terminal, and storage medium
CN110704784A (en) * 2019-10-10 2020-01-17 深圳前海微众银行股份有限公司 Web page screen capturing method, device, equipment and computer readable storage medium
CN110889072A (en) * 2019-11-21 2020-03-17 深圳前海环融联易信息科技服务有限公司 Screenshot method and device for removing webpage advertisements, computer equipment and storage medium
CN111428162A (en) * 2020-03-20 2020-07-17 支付宝(杭州)信息技术有限公司 Page screenshot method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11232250B2 (en) * 2013-05-15 2022-01-25 Microsoft Technology Licensing, Llc Enhanced links in curation and collaboration applications
CN106775298A (en) * 2016-11-28 2017-05-31 北京小米移动软件有限公司 The processing method and processing device of sectional drawing
CN109033466B (en) * 2018-08-31 2019-12-03 掌阅科技股份有限公司 Page sharing method calculates equipment and computer storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020231A (en) * 2017-07-25 2019-07-16 阿里巴巴集团控股有限公司 Webpage capture method and device thereof
CN110020240A (en) * 2017-09-28 2019-07-16 北京国双科技有限公司 A kind of webpage capture method, apparatus, storage medium and processor
WO2019200783A1 (en) * 2018-04-18 2019-10-24 平安科技(深圳)有限公司 Method for data crawling in page containing dynamic image or table, device, terminal, and storage medium
CN110704784A (en) * 2019-10-10 2020-01-17 深圳前海微众银行股份有限公司 Web page screen capturing method, device, equipment and computer readable storage medium
CN110889072A (en) * 2019-11-21 2020-03-17 深圳前海环融联易信息科技服务有限公司 Screenshot method and device for removing webpage advertisements, computer equipment and storage medium
CN111428162A (en) * 2020-03-20 2020-07-17 支付宝(杭州)信息技术有限公司 Page screenshot method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691962A (en) * 2022-04-25 2022-07-01 清华大学 Mobile terminal page crawler method and device and electronic equipment
CN114691962B (en) * 2022-04-25 2024-04-19 清华大学 Mobile terminal page crawler method and device and electronic equipment

Also Published As

Publication number Publication date
CN111428162A (en) 2020-07-17

Similar Documents

Publication Publication Date Title
WO2021184896A1 (en) Page screenshot method and device
CN107256109B (en) Information display method and device and terminal
US10097623B2 (en) Method and device for displaying information flows in social network, and server
US8788944B1 (en) Personalized mobile device application presentation using photograph-based capability detection
CA2918840C (en) Presenting fixed format documents in reflowed format
US20100211865A1 (en) Cross-browser page visualization generation
CN108334517A (en) A kind of webpage rendering intent and relevant device
US9934206B2 (en) Method and apparatus for extracting web page content
CN102831148B (en) A kind of recommending data loading method based on browser and device
CN106033450B (en) Advertisement blocking method and device and browser
CN104852883A (en) Method and system for protecting safety of account information
US20160026858A1 (en) Image based search to identify objects in documents
CN104980404B (en) Method and system for protecting account information security
US20180075066A1 (en) Method and apparatus for displaying electronic photo, and mobile device
WO2017016114A1 (en) Background realization method, apparatus, device and storage medium for search result page
JP6505849B2 (en) Generation of element identifier
WO2016018682A1 (en) Processing image to identify object for insertion into document
TW201523421A (en) Determining images of article for extraction
US20130230248A1 (en) Ensuring validity of the bookmark reference in a collaborative bookmarking system
JPWO2015140922A1 (en) Information processing system, information processing method, and information processing program
US11699174B2 (en) Media processing techniques for enhancing content
US11921773B1 (en) System to generate contextual queries
CN111488534B (en) Advertisement detection method and device, electronic equipment and computer readable storage medium
US12106333B2 (en) Media processing techniques for enhancing content
US9047254B1 (en) Detection and validation of expansion types of expandable content items

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20925777

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20925777

Country of ref document: EP

Kind code of ref document: A1