CN111683098A - Anti-crawler method and device, electronic equipment and storage medium - Google Patents

Anti-crawler method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111683098A
CN111683098A CN202010527694.XA CN202010527694A CN111683098A CN 111683098 A CN111683098 A CN 111683098A CN 202010527694 A CN202010527694 A CN 202010527694A CN 111683098 A CN111683098 A CN 111683098A
Authority
CN
China
Prior art keywords
text
icon
content
display
text content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010527694.XA
Other languages
Chinese (zh)
Other versions
CN111683098B (en
Inventor
张发恩
戴辉辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Innovation Qizhi Chengdu Technology Co ltd
Original Assignee
Innovation Qizhi Chengdu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innovation Qizhi Chengdu Technology Co ltd filed Critical Innovation Qizhi Chengdu Technology Co ltd
Priority to CN202010527694.XA priority Critical patent/CN111683098B/en
Publication of CN111683098A publication Critical patent/CN111683098A/en
Application granted granted Critical
Publication of CN111683098B publication Critical patent/CN111683098B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • H04L63/0435Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply symmetric encryption, i.e. same key used for encryption and decryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/565Conversion or adaptation of application format or content

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application provides an anti-crawler method, an anti-crawler device, electronic equipment and a storage medium, and belongs to the field of data security. The method comprises the following steps: acquiring text content from a server; converting a specified text in the text content into an icon based on the mapping relation between characters and the icon, and obtaining display content consisting of a residual text and the icon, wherein the residual text is other texts in the text content except the specified text; displaying the display content in a page. According to the method, the characters of partial text contents are replaced by the icons through the display terminal, so that the page display contents are prevented from being crawled, and the anti-crawler processing efficiency is improved.

Description

Anti-crawler method and device, electronic equipment and storage medium
Technical Field
The application relates to the technical field of data security, in particular to an anti-crawler method, an anti-crawler device, electronic equipment and a storage medium.
Background
An HTML (HyperText Markup Language) document is an important carrier of internet data, which may contain sensitive or important information, and some organizations or individuals usually crawl web pages by using a web crawler in order to quickly extract the sensitive or important information in the HTML document from the web pages. A web crawler is a program or script that automatically crawls the world Wide Web according to certain rules.
In the prior art, for anti-crawler, the front end and the back end are usually adopted to encrypt and decrypt interactive data, namely, to symmetrically encrypt and decrypt sensitive parameter information and results, but the method has the defect that plaintext information is always displayed on a page, and once the plaintext information is displayed on the page, the desired information can be crawled down in a page analysis mode. Therefore, a method of changing text attributes in an HTML document has appeared, but the existing method of converting text attributes has a problem of low processing efficiency.
Disclosure of Invention
In view of the above, an object of the embodiments of the present application is to provide an anti-crawler method, an anti-crawler device, an electronic device, and a storage medium, so as to solve the problem in the prior art that the crawler processing efficiency is poor.
The embodiment of the application provides an anti-crawler method, which is applied to a display end, and comprises the following steps: acquiring text content from a server; converting a specified text in the text content into an icon based on the mapping relation between characters and the icon, and obtaining display content consisting of a residual text and the icon, wherein the residual text is other texts in the text content except the specified text; displaying the display content in a page.
In the implementation mode, the designated text in the text content to be displayed is converted into the icon, so that the web crawler cannot accurately crawl the text content, the page anti-crawler function is ensured, meanwhile, the conversion operation of the text and the icon is executed by the display terminal, the large-batch conversion operation by the server is avoided, and the crawler processing efficiency is improved.
Optionally, the obtaining the text content includes: acquiring encryption information containing the text content from the server; and decrypting the encrypted information based on a symmetric encryption algorithm matched with the server side to obtain the text content.
In the implementation mode, the display end and the server end encrypt data such as text content when transmitting the data, and the data security is improved.
Optionally, before the converting the specified text in the text content into the icon based on the mapping relationship between the characters and the icon, the method further includes: acquiring the mapping relation from the server; and storing the mapping relation based on the json format or the list format.
In the implementation mode, the display end stores the mapping relation as a file in a json format or a list format, so that the text and the icon are replaced based on local data, and the crawler sending processing efficiency is improved.
Optionally, the converting the specified text in the text content into an icon based on the mapping relationship between the characters and the icon includes: determining the attribute name of the cascading style sheet corresponding to the specified text based on the mapping relation; and replacing the designated text with an icon corresponding to the attribute name of the cascading style sheet.
In the implementation mode, the text and the icon are replaced based on the attribute name of the cascading style sheet, text characters are prevented from being used as the icon name, and the efficiency and the safety of the text and icon replacement operation are guaranteed.
The embodiment of the application also provides an anti-crawler method, which is applied to a server side and comprises the following steps: determining text content to be displayed by a display end and a mapping relation between characters corresponding to the text content and icons; and sending the text content and the mapping relation to the display end, wherein the display end is used for converting the specified text in the text content into an icon based on the mapping relation between the characters and the icon, obtaining display content consisting of residual text and the icon, and displaying the display content in a page, wherein the residual text is other texts in the text content except the specified text.
In the implementation mode, the server side sends the text content to be displayed by the display side to the display side, the display side carries out replacement operation of the appointed text and the appointed icon, the mapping relation between the characters and the icon is only needed to be sent once corresponding to one display side, and the server side is not needed to carry out large-batch replacement operation, so that the load of the server side is reduced, and the anti-crawler processing efficiency is improved.
Optionally, before the determining the text content that needs to be displayed on the display end and the mapping relationship between the characters corresponding to the text content and the icons, the method further includes: generating an icon corresponding to the designated character by using a vector icon library, wherein the display style of the icon is the same as that of the corresponding character; and adopting the ASCII code of the character corresponding to the icon as the attribute name of the cascading style sheet of the icon.
In the implementation mode, the icon corresponding to the designated character is generated through the vector icon library, the icon generation operation efficiency is improved, and meanwhile, the security of anti-crawling is improved by adopting the ASCII code of the character corresponding to the icon as the attribute name of the cascading style sheet of the icon.
Optionally, the sending the text content and the mapping relationship to the display end includes: encrypting the text content and the mapping relation based on a symmetric encryption algorithm matched with the display end to obtain encrypted information; and sending the encrypted information to the display terminal.
In the implementation mode, the display end and the server end encrypt data such as text content when transmitting the data, and the data security is improved.
The embodiment of the application further provides an anti-reptile device, is applied to the display end, the device includes: the text acquisition module is used for acquiring text contents from the server; the text replacement module is used for converting the specified text in the text content into the icon based on the mapping relation between the characters and the icon, and obtaining display content consisting of the residual text and the icon, wherein the residual text is other texts in the text content except the specified text; and the display module is used for displaying the display content in a page.
In the implementation mode, the designated text in the text content to be displayed is converted into the icon, so that the web crawler cannot accurately crawl the text content, the page anti-crawler function is ensured, meanwhile, the conversion operation of the text and the icon is executed by the display terminal, the large-batch conversion operation by the server is avoided, and the crawler processing efficiency is improved.
Optionally, the text obtaining module is specifically configured to: acquiring encryption information containing the text content from the server; and decrypting the encrypted information based on a symmetric encryption algorithm matched with the server side to obtain the text content.
In the implementation mode, the display end and the server end encrypt data such as text content when transmitting the data, and the data security is improved.
Optionally, the anti-crawler further comprises: the mapping relation obtaining module is used for obtaining the mapping relation from the server; and storing the mapping relation based on the json format or the list format.
In the implementation mode, the display end stores the mapping relation as a file in a json format or a list format, so that the text and the icon are replaced based on local data, and the crawler sending processing efficiency is improved.
Optionally, the text replacement module is specifically configured to: determining the attribute name of the cascading style sheet corresponding to the specified text based on the mapping relation; and replacing the designated text with an icon corresponding to the attribute name of the cascading style sheet.
In the implementation mode, the text and the icon are replaced based on the attribute name of the cascading style sheet, text characters are prevented from being used as the icon name, and the efficiency and the safety of the text and icon replacement operation are guaranteed.
The embodiment of the application further provides an anti-crawler device, which is applied to a server, the device comprises: the content determining module is used for determining the text content to be displayed by the display end and the mapping relation between the characters corresponding to the text content and the icons; and the sending module is used for sending the text content and the mapping relation to the display end, converting the specified text in the text content into an icon by the display end based on the mapping relation between the characters and the icon, obtaining display content consisting of residual text and the icon, and displaying the display content in a page, wherein the residual text is other texts in the text content except the specified text.
In the implementation mode, the server side sends the text content to be displayed by the display side to the display side, the display side carries out replacement operation of the appointed text and the appointed icon, the mapping relation between the characters and the icon is only needed to be sent once corresponding to one display side, and the server side is not needed to carry out large-batch replacement operation, so that the load of the server side is reduced, and the anti-crawler processing efficiency is improved.
Optionally, the anti-crawler apparatus further comprises: the icon generating module is used for generating an icon corresponding to the designated character by utilizing a vector icon library, and the display style of the icon is the same as that of the corresponding character; and adopting the ASCII code of the character corresponding to the icon as the attribute name of the cascading style sheet of the icon.
In the implementation mode, the icon corresponding to the designated character is generated through the vector icon library, the icon generation operation efficiency is improved, and meanwhile, the security of anti-crawling is improved by adopting the ASCII code of the character corresponding to the icon as the attribute name of the cascading style sheet of the icon.
Optionally, the sending module is specifically configured to: encrypting the text content and the mapping relation based on a symmetric encryption algorithm matched with the display end to obtain encrypted information; and sending the encrypted information to the display terminal.
In the implementation mode, the display end and the server end encrypt data such as text content when transmitting the data, and the data security is improved.
An embodiment of the present application further provides an electronic device, where the electronic device includes a memory and a processor, where the memory stores program instructions, and the processor executes steps in any one of the above implementation manners when reading and executing the program instructions.
An embodiment of the present application further provides a storage medium, where computer program instructions are stored in the storage medium, and when the computer program instructions are read and executed by a processor, the steps in any one of the above implementation manners are performed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic flowchart of an anti-crawler method applied to a display device according to an embodiment of the present disclosure;
FIG. 2 is a flowchart illustrating a conversion procedure for specifying a text and an icon according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of an anti-crawler method applied to a client according to an embodiment of the present disclosure;
fig. 4 is a schematic block diagram of an anti-crawler apparatus applied to a display end according to an embodiment of the present application;
fig. 5 is a schematic block diagram of an anti-crawler apparatus applied to a server according to an embodiment of the present disclosure.
Icon: 30-an anti-crawler device; 31-a text acquisition module; 32-text replacement module; 33-a display module; 40-an anti-crawler device; 41-a content determination module; 42-sending module.
Detailed Description
The technical solution in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
Referring to fig. 1, fig. 1 is a schematic flow chart of an anti-crawler method applied to a display end according to an embodiment of the present application, where the anti-crawler method includes the following specific steps:
step S12: and acquiring the text content from the server.
Alternatively, the text content is typically an HTML document, but may be a document in other formats. For example, "zhang san of the first company published the table document in the second issue".
Further, the text content sent by the server can be encrypted information, so that an attacker is prevented from illegally acquiring the text content in the data transmission process, and the data transmission safety in the crawler sending processing flow is improved. The specific steps of the decryption may include: acquiring encryption information containing text content from a server; and decrypting the encrypted information based on a symmetric encryption algorithm matched with the server side to obtain the text content.
It should be understood that other Encryption algorithms such as asymmetric Encryption algorithm may be used for Encryption and decryption operations in addition to the symmetric Encryption algorithm such as AES (Advanced Encryption Standard), DES (Data Encryption Standard), IDEA (International Data Encryption algorithm).
Step S14: and converting the specified text in the text content into the icon based on the mapping relation between the characters and the icon to obtain display content consisting of the residual text and the icon, wherein the residual text is other text except the specified text in the text content.
Before replacement of the designated text and the icon is performed based on the mapping relationship, the mapping relationship indicating the replacement operation of the designated text and the icon needs to be received from the server. Specifically, the receiving of the mapping relationship may include: acquiring a mapping relation from a server; the mapping relationship is saved based on the json format or the list format. After the mapping relation is stored in the local display terminal based on the json format or the list format, the designated text and the designated icon can be replaced directly based on the local mapping relation without performing large-batch replacement work by the server, so that the load of the server is reduced, and the overall efficiency of crawler sending processing is improved.
It should be understood that after receiving the text content of the server, icon replacement needs to be performed on the specified text based on the mapping relationship between the characters and the icons, the specified text and the icons that need to be replaced in the text content transmitted by different servers may be different, and the mapping relationship stored by the display terminal may also be multiple. Therefore, the embodiment may also select the mapping relationship corresponding to the server first, and specifically includes the following steps: and determining a specified mapping relation corresponding to the server side from the mapping relations stored locally based on the server side, so as to convert the specified text in the text content into the icon by adopting the specified mapping relation.
In other embodiments, in addition to determining the designated mapping relationship corresponding to the server based on the server identity identifier such as the serial number of the server, different text contents transmitted by the same server may need to adopt different mapping relationships, and therefore, the designated mapping relationship may be determined based on the encryption mode of the encryption information, the text content identifier, and the like.
Optionally, referring to fig. 2, the step S14 of "converting the specified text in the text content into the icon based on the mapping relationship between the character and the icon" may specifically include the following steps:
step S142: and determining the attribute name of the cascading style sheet corresponding to the specified text based on the mapping relation.
Cascading Style Sheets (CSSs) is a computer Language used to represent file styles such as HTML or XML (Extensible Markup Language). The CSS can not only statically modify the web page, but also dynamically format elements of the web page in coordination with various scripting languages. Specifically, the CSS can accurately control the typesetting of element positions in the webpage at a pixel level, supports almost all font and font styles and has the capability of editing webpage objects and model styles. Therefore, in the present embodiment, CSS virtualization processing is adopted to replace the designated text and icon, and the attribute name of the cascading style sheet can be regarded as the name corresponding to the icon.
Further, when the text and the icon are replaced, the specified text needing to be replaced by the corresponding icon is determined based on the mapping relation, and the specified text can be preset specified text based on sensitive words, privacy attributes and the like, such as a person name, a place name, a telephone number and the like.
Taking "zhangsan of the first company publishes a table document in the second issue" as an example, the display determines a corresponding mapping relationship based on the identity of the server that sends the text content, and then the display reads the local mapping relationship, wherein the specified text that needs to be replaced by the icon includes "first company", "zhangsan" and "second issue", the attribute name of the cascading style sheet corresponding to "first company" is "diyignong si", "zhangsan" is "and the attribute name of the cascading style sheet corresponding to" zhangsan "is" and "dierqikan". The attribute names of the cascading style sheet are in one-to-one correspondence with the icons, the icons corresponding to each section of the appointed texts are respectively determined based on the attribute names of the cascading style sheet, the display effect of the icons corresponding to each section of the appointed texts is the same as that of the appointed texts, for example, the icon corresponding to the first company is a picture with the display content of the first company.
Step S144: and replacing the designated text with an icon corresponding to the attribute name of the cascading style sheet.
It should be understood that the icons may be in any image file format such as svg, png, etc.
Step S16: and displaying the display content in a page.
The CSS pseudo-processing technology is characterized in that an output response HTML text has partial text missing and is hidden in a CSS pseudo-class, a user can see normal display presentation only when the CSS is rendered, and for a crawler program, the output text is incomplete and cannot be crawled. For the crawler technology which sends a request to acquire HTML text content of a resource through a program so as to extract important text data, a document is loaded in a memory in a browser mode, and after the document is loaded and rendered, the crawler technology which reads the rendered text has a good precaution effect by injecting a json program for crawling.
In order to cooperate with the above anti-crawler method applied to the display end, the embodiment further provides an anti-crawler method applied to the client. Referring to fig. 3, fig. 3 is a schematic flowchart illustrating an anti-crawler method applied to a client according to an embodiment of the present disclosure. The anti-crawler method applied to the client side comprises the following specific steps:
step S22: and determining the text content required to be displayed by the display end and the mapping relation between the characters corresponding to the text content and the icons.
The designated texts to be replaced by the icons in the text content may not be the same, so that a local uniform mapping relation of the client or a mapping relation corresponding to the text content can be obtained, and the mapping relation is sent to the display terminal.
It should be understood that the mapping relationship includes the specified text and the icon, and therefore the icon corresponding to the specified text needs to be determined first, and then the present embodiment may further include the following steps before step S22:
step S211: and generating an icon corresponding to the specified character by using a vector icon library, wherein the display style of the icon is the same as that of the corresponding character.
The designated characters can be characters corresponding to designated texts preset based on sensitive words, privacy attributes and the like, such as a person name, a place name, a telephone number and the like.
Step S212: and adopting the ASCII code of the character corresponding to the icon as the attribute name of the cascading style sheet of the icon.
Specifically, a corresponding relation between the cascading style sheet attribute names of the characters and the icons is established, the ASCII codes of the characters such as English letters and numbers can be used as the cascading style sheet attribute names of the corresponding icons, mapping modes such as the ASCII codes and the pinyin of the Chinese characters can be used as the cascading style sheet attribute names of the Chinese character icons for the Chinese characters in the characters, each character is mapped to the corresponding icon according to the mapping relation, and from the visual experience, the page is displayed as the character, but the actual content displayed on the page is the icon.
Step S24: and sending the text content and the mapping relation to a display end, converting the specified text in the text content into an icon by the display end based on the mapping relation between the characters and the icon, obtaining display content consisting of the residual text and the icon, and displaying the display content in the page, wherein the residual text is other texts except the specified text in the text content.
Optionally, when the mapping relationship and the text content are sent, the mapping relationship and the text content may be encrypted, and the specific steps may be as follows: encrypting the text content and the mapping relation based on a symmetric encryption algorithm matched with the display end to obtain encryption information; and sending the encrypted information to a display end.
In the embodiment, the service end only needs to determine the text content and send the text content and the mapping relation, so that the situation that a large amount of text contents need to be replaced by designated texts and icons when one service end corresponds to a large number of display ends is avoided, the load of the service end is greatly reduced, and the anti-crawling processing efficiency is improved.
In order to cooperate with the above anti-crawler method applied to the display end, the embodiment of the application further provides an anti-crawler device applied to the display end. Referring to fig. 4, fig. 4 is a schematic block diagram of an anti-crawler apparatus applied to a display end according to an embodiment of the present disclosure.
The anti-crawler apparatus 30 includes:
a text obtaining module 31, configured to obtain text content from a server;
the text replacement module 32 is configured to convert the specified text in the text content into an icon based on the mapping relationship between the characters and the icon, and obtain display content composed of the remaining text and the icon, where the remaining text is another text except the specified text in the text content;
and a display module 33, configured to display the display content in the page.
Optionally, the text obtaining module 31 is specifically configured to: acquiring encryption information containing text content from a server; and decrypting the encrypted information based on a symmetric encryption algorithm matched with the server side to obtain the text content.
Optionally, the anti-crawler apparatus 30 further comprises: the mapping relation obtaining module is used for obtaining the mapping relation from the server; the mapping relationship is saved based on the json format or the list format.
Optionally, the text replacement module 32 is specifically configured to: determining the attribute name of the cascading style sheet corresponding to the specified text based on the mapping relation; and replacing the designated text with an icon corresponding to the attribute name of the cascading style sheet.
In order to cooperate with the anti-crawler method applied to the server, the embodiment of the application further provides an anti-crawler device applied to the server. Referring to fig. 5, fig. 5 is a schematic block diagram of an anti-crawler apparatus applied to a server according to an embodiment of the present disclosure.
The anti-crawler apparatus 40 includes:
a content determining module 41, configured to determine text content to be displayed on the display end, and a mapping relationship between characters and icons corresponding to the text content;
and the sending module 42 is configured to send the text content and the mapping relationship to the display end, and is configured to convert the specified text in the text content into the icon based on the mapping relationship between the characters and the icon by the display end, obtain display content composed of the remaining text and the icon, and display the display content in the page, where the remaining text is another text except the specified text in the text content.
Optionally, the anti-crawler apparatus 40 further comprises: the icon generating module is used for generating an icon corresponding to the specified character by utilizing a vector icon library, and the display style of the icon is the same as that of the corresponding character; and adopting the ASCII code of the character corresponding to the icon as the attribute name of the cascading style sheet of the icon.
Optionally, the sending module 42 is specifically configured to: encrypting the text content and the mapping relation based on a symmetric encryption algorithm matched with the display end to obtain encryption information; and sending the encrypted information to a display end.
The embodiment of the present application further provides an electronic device, which includes a memory and a processor, where the memory stores program instructions, and when the processor reads and runs the program instructions, the processor executes the steps in any one of the methods of the anti-crawler method provided in this embodiment.
It should be understood that the electronic device may be a Personal Computer (PC), a tablet PC, a smart phone, a Personal Digital Assistant (PDA), or other electronic device having a logical computing function.
The embodiment of the application also provides a readable storage medium, wherein computer program instructions are stored in the readable storage medium, and the computer program instructions are read by a processor and executed to execute the steps in the hyper-crawler method.
To sum up, the embodiment of the present application provides an anti-crawler method, an anti-crawler device, an electronic device, and a storage medium, where the anti-crawler method applied to a display end includes: acquiring text content from a server; converting a specified text in the text content into an icon based on the mapping relation between characters and the icon, and obtaining display content consisting of a residual text and the icon, wherein the residual text is other texts in the text content except the specified text; displaying the display content in a page.
In the implementation mode, the designated text in the text content to be displayed is converted into the icon, so that the web crawler cannot accurately crawl the text content, the page anti-crawler function is ensured, meanwhile, the conversion operation of the text and the icon is executed by the display terminal, the large-batch conversion operation by the server is avoided, and the crawler processing efficiency is improved.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. The apparatus embodiments described above are merely illustrative, and for example, the block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of devices according to various embodiments of the present application. In this regard, each block in the block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams, and combinations of blocks in the block diagrams, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Therefore, the present embodiment further provides a readable storage medium, in which computer program instructions are stored, and when the computer program instructions are read and executed by a processor, the computer program instructions perform the steps of any of the block data storage methods. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a RanDom Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. An anti-crawler method is applied to a display terminal, and comprises the following steps:
acquiring text content from a server;
converting a specified text in the text content into an icon based on the mapping relation between characters and the icon, and obtaining display content consisting of a residual text and the icon, wherein the residual text is other texts in the text content except the specified text;
displaying the display content in a page.
2. The method of claim 1, wherein the obtaining text content comprises:
acquiring encryption information containing the text content from the server;
and decrypting the encrypted information based on a symmetric encryption algorithm matched with the server side to obtain the text content.
3. The method according to claim 1, wherein before the converting the specified text in the text content into the icon based on the mapping relationship between the characters and the icon, the method further comprises:
acquiring the mapping relation from the server;
and storing the mapping relation based on the json format or the list format.
4. The method of claim 1, wherein converting the specified text in the text content into the icon based on the mapping relationship between the characters and the icon comprises:
determining the attribute name of the cascading style sheet corresponding to the specified text based on the mapping relation;
and replacing the designated text with an icon corresponding to the attribute name of the cascading style sheet.
5. An anti-crawler method is applied to a server side, and comprises the following steps:
determining text content to be displayed by a display end and a mapping relation between characters corresponding to the text content and icons;
and sending the text content and the mapping relation to the display end, wherein the display end is used for converting the specified text in the text content into an icon based on the mapping relation between the characters and the icon, obtaining display content consisting of residual text and the icon, and displaying the display content in a page, wherein the residual text is other texts in the text content except the specified text.
6. The method according to claim 5, wherein before the determining the text content to be displayed on the display side and the mapping relationship between the characters corresponding to the text content and the icons, the method further comprises:
generating an icon corresponding to the designated character by using a vector icon library, wherein the display style of the icon is the same as that of the corresponding character;
and adopting the ASCII code of the character corresponding to the icon as the attribute name of the cascading style sheet of the icon.
7. The method of claim 5, wherein sending the text content and the mapping relationship to the display comprises:
encrypting the text content and the mapping relation based on a symmetric encryption algorithm matched with the display end to obtain encrypted information;
and sending the encrypted information to the display terminal.
8. An anti-crawler device, applied to a display side, the device comprising:
the text acquisition module is used for acquiring text contents from the server;
the text replacement module is used for converting the specified text in the text content into the icon based on the mapping relation between the characters and the icon, and obtaining display content consisting of the residual text and the icon, wherein the residual text is other texts in the text content except the specified text;
and the display module is used for displaying the display content in a page.
9. An anti-crawler device, applied to a server, the device comprising:
the content determining module is used for determining the text content to be displayed by the display end and the mapping relation between the characters corresponding to the text content and the icons;
and the sending module is used for sending the text content and the mapping relation to the display end, converting the specified text in the text content into an icon by the display end based on the mapping relation between the characters and the icon, obtaining display content consisting of residual text and the icon, and displaying the display content in a page, wherein the residual text is other texts in the text content except the specified text.
10. An electronic device comprising a memory having stored therein program instructions and a processor that, when executed, performs the steps of the method of any of claims 1-7.
CN202010527694.XA 2020-06-10 2020-06-10 Anti-crawler method and device, electronic equipment and storage medium Active CN111683098B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010527694.XA CN111683098B (en) 2020-06-10 2020-06-10 Anti-crawler method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010527694.XA CN111683098B (en) 2020-06-10 2020-06-10 Anti-crawler method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111683098A true CN111683098A (en) 2020-09-18
CN111683098B CN111683098B (en) 2022-12-23

Family

ID=72435350

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010527694.XA Active CN111683098B (en) 2020-06-10 2020-06-10 Anti-crawler method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111683098B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112422543A (en) * 2020-11-09 2021-02-26 建信金融科技有限责任公司 Anti-crawler method and device
CN112769787A (en) * 2020-12-29 2021-05-07 深圳一科互联有限公司 Website system network security anti-crawler calculation method and device
CN113987569A (en) * 2021-10-14 2022-01-28 武汉联影医疗科技有限公司 Anti-crawler method and device, computer equipment and storage medium
CN116976280A (en) * 2023-09-22 2023-10-31 北京国科恒通科技股份有限公司 Vector icon-based power grid GIS graphic primitive rendering method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341160A (en) * 2016-05-03 2017-11-10 北京京东尚科信息技术有限公司 A kind of method and device for intercepting reptile
CN107766237A (en) * 2017-09-22 2018-03-06 北京锐安科技有限公司 Method of testing, device, server and the storage medium of web crawlers
CN110069688A (en) * 2019-03-16 2019-07-30 平安城市建设科技(深圳)有限公司 Page display method, server, storage medium and the device of anti-crawler
CN110851682A (en) * 2019-10-17 2020-02-28 上海易点时空网络有限公司 Text anti-crawler method, server and display terminal
CN111008348A (en) * 2019-11-28 2020-04-14 盛业信息科技服务(深圳)有限公司 Anti-crawler method, terminal, server and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341160A (en) * 2016-05-03 2017-11-10 北京京东尚科信息技术有限公司 A kind of method and device for intercepting reptile
CN107766237A (en) * 2017-09-22 2018-03-06 北京锐安科技有限公司 Method of testing, device, server and the storage medium of web crawlers
CN110069688A (en) * 2019-03-16 2019-07-30 平安城市建设科技(深圳)有限公司 Page display method, server, storage medium and the device of anti-crawler
CN110851682A (en) * 2019-10-17 2020-02-28 上海易点时空网络有限公司 Text anti-crawler method, server and display terminal
CN111008348A (en) * 2019-11-28 2020-04-14 盛业信息科技服务(深圳)有限公司 Anti-crawler method, terminal, server and computer readable storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112422543A (en) * 2020-11-09 2021-02-26 建信金融科技有限责任公司 Anti-crawler method and device
CN112769787A (en) * 2020-12-29 2021-05-07 深圳一科互联有限公司 Website system network security anti-crawler calculation method and device
CN113987569A (en) * 2021-10-14 2022-01-28 武汉联影医疗科技有限公司 Anti-crawler method and device, computer equipment and storage medium
CN116976280A (en) * 2023-09-22 2023-10-31 北京国科恒通科技股份有限公司 Vector icon-based power grid GIS graphic primitive rendering method and device
CN116976280B (en) * 2023-09-22 2023-12-01 北京国科恒通科技股份有限公司 Vector icon-based power grid GIS graphic primitive rendering method and device

Also Published As

Publication number Publication date
CN111683098B (en) 2022-12-23

Similar Documents

Publication Publication Date Title
CN111683098B (en) Anti-crawler method and device, electronic equipment and storage medium
JP6206866B2 (en) Apparatus and method for holding obfuscated data in server
US6601108B1 (en) Automatic conversion system
US8041127B2 (en) Method and system for obscuring and securing financial data in an online banking application
Chen et al. Coverless information hiding method based on the Chinese mathematical expression
US8042036B1 (en) Generation of a URL containing a beginning and an ending point of a selected mark-up language document portion
US20110258535A1 (en) Integrated document viewer with automatic sharing of reading-related activities across external social networks
US20210149842A1 (en) System and method for display of document comparisons on a remote device
CA2632793A1 (en) Information server and mobile delivery system and method
US8887290B1 (en) Method and system for content protection for a browser based content viewer
CN101167297A (en) Method and apparatus for adding signature information to electronic documents
JP3473676B2 (en) Method, apparatus, and recording medium for controlling hard copy of document described in hypertext
IL129633A (en) Automatic conversion system
WO2018044918A1 (en) Data transmission using dynamically rendered message content prestidigitation
Mir Copyright for web content using invisible text watermarking
JP2020077134A (en) Translation apparatus, control program of translation apparatus, and translation method using translation apparatus
WO2023155712A1 (en) Page generation method and apparatus, page display method and apparatus, and electronic device and storage medium
Schubotz et al. Mathoid: Robust, scalable, fast and accessible math rendering for wikipedia
CN111859853A (en) Webpage text encryption and decryption method based on random font
KR20230131753A (en) Web font service method of font service system
JP2020077356A (en) Translation apparatus, control program of translation apparatus, and translation method using translation apparatus
CN115982503B (en) Website information acquisition method and system based on cloud platform
JP5907130B2 (en) Information processing device
CN102387181A (en) Login method and device
JP2023537541A (en) Privacy-preserving data collection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant