CN111901332A - Webpage content reverse crawling method and system - Google Patents

Webpage content reverse crawling method and system Download PDF

Info

Publication number
CN111901332A
CN111901332A CN202010729997.XA CN202010729997A CN111901332A CN 111901332 A CN111901332 A CN 111901332A CN 202010729997 A CN202010729997 A CN 202010729997A CN 111901332 A CN111901332 A CN 111901332A
Authority
CN
China
Prior art keywords
encrypted
font file
data
server
encryption
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010729997.XA
Other languages
Chinese (zh)
Inventor
熊文国
匡保春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baichuan Yingfu Technology Co Ltd
Original Assignee
Beijing Baichuan Yingfu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baichuan Yingfu Technology Co Ltd filed Critical Beijing Baichuan Yingfu Technology Co Ltd
Priority to CN202010729997.XA priority Critical patent/CN111901332A/en
Publication of CN111901332A publication Critical patent/CN111901332A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/145Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Storage Device Security (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a webpage content reverse-crawling method and a system, wherein the method comprises the following steps: the server receives the encrypted data acquisition request and/or the encrypted font file acquisition request, sends the data to be encrypted to the encryption server, and sends the encrypted font file acquisition request to the encryption server; the encryption server generates a corresponding relation among a new font file, a number and a messy code, modifies the data to be encrypted transmitted by the server through the corresponding relation to generate encrypted data, and feeds the encrypted data back to the browser through the server; the encryption server acquires the content of the new font file, encrypts the content of the new font file and feeds the encrypted font file back to the browser through the server; the browser puts the encrypted data returned by the server at a designated position before rendering, puts the encrypted font file at a designated label, and automatically converts the encrypted font file into real data for displaying during rendering.

Description

Webpage content reverse crawling method and system
Technical Field
The invention relates to the technical field of computers, in particular to a webpage content reverse-crawling method and system.
Background
Web crawlers (also known as web spiders, web robots, in the middle of the FOAF community, more often called web chasers) are programs or scripts that automatically capture web information according to certain rules. Other less commonly used names are ants, automatic indexing, simulation programs, or worms.
The web crawler is a program for automatically extracting web pages, downloads web pages from the world wide web for a search engine, and is an important component of the search engine. The traditional crawler obtains the URL on the initial webpage from the URL of one or a plurality of initial webpages, continuously extracts new URLs from the current webpage and puts the new URLs into a queue in the process of capturing the webpage until certain stop conditions of the system are met. The workflow of the focused crawler is complex, and links irrelevant to the subject need to be filtered according to a certain webpage analysis algorithm, and useful links are reserved and put into a URL queue to be captured. Then, it will select the next web page URL from the queue according to a certain search strategy, and repeat the above process until reaching a certain condition of the system. In addition, all web pages crawled by the crawler will be stored by the system, analyzed, filtered, and indexed for later query and retrieval.
In the prior art, web crawlers have frequent behaviors, important data can be easily acquired by the crawlers, and data information cannot be effectively protected.
Disclosure of Invention
The invention aims to provide a webpage content reverse-crawling method and system, and aims to solve the problems in the prior art.
The invention provides a webpage content reverse crawling method, which comprises the following steps:
the server receives an encrypted data acquisition request and/or an encrypted font file acquisition request sent by the browser, sends data to be encrypted to the encryption server based on the encrypted data acquisition request, and sends the encrypted font file acquisition request to the encryption server;
the encryption server responds to the data encryption request, generates a corresponding relation of a new font file, a number and a messy code, modifies the data to be encrypted transmitted by the server through the corresponding relation, generates encrypted data and feeds the encrypted data back to the browser through the server; the encryption server responds to the encrypted font file acquisition request, acquires the content of the new font file, encrypts the content of the new font file, and feeds back the encrypted font file to the browser through the server;
the browser puts the encrypted data returned by the server at a designated position before rendering, puts the encrypted font file at a designated label, and automatically converts the encrypted font file into real data for displaying during rendering.
The invention provides a webpage content reverse-crawling system, which comprises:
the server is used for receiving an encrypted data acquisition request and/or an encrypted font file acquisition request sent by the browser, sending data to be encrypted to the encryption server based on the encrypted data acquisition request, and sending the encrypted font file acquisition request to the encryption server;
the encryption server is used for responding to the data encryption request, generating a corresponding relation of a new font file, a number and a messy code, modifying the data to be encrypted transmitted by the server through the corresponding relation, generating encrypted data and feeding the encrypted data back to the browser through the server; the encryption server responds to the encrypted font file acquisition request, acquires the content of the new font file, encrypts the content of the new font file, and feeds back the encrypted font file to the browser through the server;
the browser is used for sending an encrypted data acquisition request and/or an encrypted font file acquisition request to the server, receiving the fed back encrypted data and the encrypted font file, placing the encrypted data returned by the server at a specified position before rendering, placing the encrypted font file on a specified label, and automatically converting the encrypted font file into real data for displaying during rendering.
By adopting the embodiment of the invention, the sensitive information can be processed by anti-crawling, so that the crawler acquires invalid data, important data can be protected, and meanwhile, malicious competition of the same row, data copying behavior and the like can be avoided.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart of a method for web content crawling back in accordance with an embodiment of the present invention;
FIG. 2 is a detailed flowchart of a method for crawling back web page content according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating the effect of a method for crawling back web page content according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a web content crawling back system according to an embodiment of the present invention.
Detailed Description
The invention relates to a protective measure for reverse crawling of webpage content, and provides a strategy for preventing a crawler from collecting sensitive data. In particular, the main purpose of the embodiments of the present invention is not to encrypt, but to provide a strategy that can prevent a crawler from gathering sensitive data. Compared with the generation of word-long pictures, the mode can well typeset and also supports the zooming without sawteeth. In addition, advanced crawlers can be prevented from extracting text by OCR functions. It should be noted that, in the embodiment of the present invention, only the difficulty of crawler collection is increased, but crawler collection cannot be shielded.
The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", and the like, indicate orientations and positional relationships based on those shown in the drawings, and are used only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be considered as limiting the present invention.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise. Furthermore, the terms "mounted," "connected," and "connected" are to be construed broadly and may, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Method embodiment
According to an embodiment of the present invention, a method for crawling back web page content is provided, fig. 1 is a flowchart of the method for crawling back web page content according to the embodiment of the present invention, and as shown in fig. 1, the method for crawling back web page content according to the embodiment of the present invention specifically includes:
step 101, a server receives an encrypted data acquisition request and/or an encrypted font file acquisition request sent by a browser, sends data to be encrypted to an encryption server based on the encrypted data acquisition request, and sends the encrypted font file acquisition request to the encryption server; in step 101, the sending, by the server, the data to be encrypted to the encryption server based on the encrypted data acquisition request specifically includes:
and the server carries out token verification, if the verification fails, the server directly returns a failure page, if the verification succeeds, the server extracts the keywords in the encrypted data acquisition request, and corresponding data to be encrypted is taken out from the database according to the keywords.
102, an encryption server responds to a data encryption request, generates a corresponding relation of a new font file, a number and a messy code, modifies data to be encrypted transmitted by the server through the corresponding relation, generates encrypted data, and feeds the encrypted data back to a browser through the server; the encryption server responds to the encrypted font file acquisition request, acquires the content of the new font file, encrypts the content of the new font file, and feeds back the encrypted font file to the browser through the server;
in step 102, the following steps are specifically included:
the encryption server responds to a data encryption request, modifies the FontMap.ttf font file, generates a new font file BaiInfo.ttf, generates a corresponding relation between numbers and messy codes, and modifies the data to be encrypted, which is sent by the server, through the corresponding relation to generate encrypted data.
Ttf, the content of a new font file BaiInfo is obtained by the encryption server in response to the encrypted font file obtaining request, the content is subjected to base64 encryption, the encrypted font file is added to a response header of the encrypted font file obtaining request, and the response header is fed back to the browser through the server.
And 103, before rendering, the browser places the encrypted data returned by the server at a specified position, places the encrypted font file at a specified label, and automatically converts the encrypted font file into real data for displaying during rendering. Step 103 specifically includes the following processing:
before rendering, the browser generates random data at a designated position, modifies cs hidden random data, places encrypted data returned by a server at the designated position through cs pseudo elements, places an encrypted font file under cs @ font-face through js, and automatically converts the encrypted font file into real data through font-face before rendering to display to a user.
The above technical solution of the embodiment of the present invention is described in detail below with reference to fig. 2.
In the embodiment of the invention, the method mainly comprises three parts of processing:
the processing of the browser part, the processing of the server authentication extraction data part, and the processing of the encryption part, as shown in fig. 2, specifically include:
1. the browser requests data with parameters, sends a post request, the server firstly performs token verification, and the verification fails and directly returns a failure page; and after the verification is successful, extracting the keywords in the request, and taking the data out of the database by the server with the keywords.
2. FontMap.ttf is a fixed font file, a post request triggers and modifies the file to generate a new font file BaiInfo.ttf and generate a corresponding relation between numbers and messy codes, the encryption part modifies the data transmitted by the server through the corresponding relation, for example, the data transmitted by the server is 0, the corresponding relation is {0: 'zha', 1: '' }, the processed data is directly returned to the server 'zha', the encryption part directly returns the encrypted data to the server after processing all the data, and the server returns the encrypted data to the browser until the post request is finished.
3. While sending the post request, ajax or other methods can be used to send the get request to the server: ttf, reading the content in the file, encrypting the content by base64, adding the encrypted content to a response header of the get request, and finally, directly returning a response by the server; it can be seen that the get request is responsible for obtaining the contents of the encrypted font file.
4. The browser part performs the following processing before rendering: generating random data at a designated position through js or other modes, modifying cs to hide the random data, placing messy codes returned by a server at the designated position through cs pseudo elements, placing the content of a get request response head under cs @ font-face through js processing, and automatically converting the content into real data through the font-face before rendering by a browser. In this step, as shown in fig. 3, the processing results are: the browsing source code is false data, and the user sees the true data.
It can be seen from the above processing that in the embodiment of the present invention, the crawler data is modified for the web page data by means of font encryption, and the browser display data is unchanged. The font encryption also easily discovers an encryption rule through downloading and analyzing the font file, and a crawler engineer can easily crack the font file, so that the algorithm encryption is carried out on the numbers aiming at the encryption rule, and the numbers are stored in a specific service. The encryption rule changes randomly according to the rule according to the access time, and the numbers in the webpage are randomly selected and encrypted by odd numbers and even numbers. The crawler cannot make effective judgment on real data and conversion data to perform anti-crawling processing on the data.
System embodiment
According to an embodiment of the present invention, a webpage content anti-crawling system is provided, fig. 4 is a schematic diagram of the webpage content anti-crawling system according to the embodiment of the present invention, and as shown in fig. 4, the webpage content anti-crawling system according to the embodiment of the present invention specifically includes:
the server 40 is configured to receive an encrypted data acquisition request and/or an encrypted font file acquisition request sent by the browser 44, send data to be encrypted to the encryption server 42 based on the encrypted data acquisition request, and send the encrypted font file acquisition request to the encryption server 42; the server 40 is specifically configured to:
and (4) token verification is carried out, if the verification fails, a failure page is directly returned, if the verification succeeds, the keywords in the encrypted data acquisition request are extracted, and corresponding data to be encrypted are taken out from the database according to the keywords.
The encryption server 42 is used for responding to a data encryption request, generating a corresponding relation of a new font file, a number and a messy code, modifying data to be encrypted transmitted by the server through the corresponding relation, generating encrypted data, and feeding the encrypted data back to the browser 44 through the server 40; the encryption server responds to the encrypted font file acquisition request, acquires the content of the new font file, encrypts the content of the new font file, and feeds back the encrypted font file to the browser 44 through the server 40; the encryption server 42 is specifically configured to:
and responding to a data encryption request, modifying the font file of FontMap.ttf, generating a new font file BaiInfo.ttf, generating a corresponding relation between numbers and messy codes, and modifying the data to be encrypted sent by the server through the corresponding relation to generate encrypted data. And responding to the encrypted font file acquisition request, acquiring the content of a new font file BaiInfo.ttf, carrying out base64 encryption on the content, adding the encrypted font file into a response header of the encrypted font file acquisition request, and feeding back the response header to the browser through the server.
And the browser 44 is configured to send an encrypted data acquisition request and/or an encrypted font file acquisition request to the server 40, receive the fed back encrypted data and encrypted font file, place the encrypted data returned by the server at a designated location before rendering, place the encrypted font file on a designated tag, and automatically convert the encrypted font file into real data for display during rendering. The browser 44 is specifically configured to:
before rendering, generating random data at a designated position, modifying cs hidden random data, placing encrypted data returned by a server at the designated position through cs pseudo elements, placing an encrypted font file under cs @ font-face through js, and automatically converting the encrypted font file into real data through font-face before rendering to be displayed for a user.
The embodiment of the present invention is a system embodiment corresponding to the above method embodiment, and specific operations of each module may be understood with reference to the description of the method embodiment, which is not described herein again.
In summary, according to the technical scheme of the embodiment of the invention, the sensitive information can be processed by anti-crawling, so that the crawler acquires invalid data, important data can be protected, and meanwhile, malicious competition of the same row, data copying behavior and the like can be avoided.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for crawling back web page content, comprising:
the method comprises the steps that a server receives an encrypted data acquisition request and/or an encrypted font file acquisition request sent by a browser, sends data to be encrypted to an encryption server based on the encrypted data acquisition request, and sends the encrypted font file acquisition request to the encryption server;
the encryption server responds to a data encryption request, generates a corresponding relation of a new font file, a number and a messy code, modifies the data to be encrypted transmitted by the server through the corresponding relation, generates encrypted data and feeds the encrypted data back to the browser through the server; the encryption server responds to the encrypted font file acquisition request, acquires the content of the new font file, encrypts the content of the new font file, and feeds back the encrypted font file to the browser through the server;
the browser puts the encrypted data returned by the server at a designated position before rendering, puts the encrypted font file at a designated label, and automatically converts the encrypted font file into real data for displaying during rendering.
2. The method according to claim 1, wherein the server sending the data to be encrypted to the encryption server based on the encrypted data acquisition request specifically comprises:
and the server carries out token verification, if the verification fails, a failure page is directly returned, if the verification succeeds, the key words in the encrypted data acquisition request are extracted, and corresponding data to be encrypted are taken out from a database according to the key words.
3. The method according to claim 1, wherein the encryption server responds to a request for data encryption to generate a new correspondence between a font file, a number, and a scrambling code, and modifies the data to be encrypted, which is transmitted from the server, through the correspondence, to generate the encrypted data specifically includes:
the encryption server responds to a data encryption request, modifies the FontMap.ttf font file, generates a new font file BaiInfo.ttf, generates a corresponding relation between numbers and messy codes, and modifies the data to be encrypted sent by the server through the corresponding relation to generate encrypted data.
4. The method according to claim 3, wherein the encrypting server responds to the encrypted font file acquiring request, acquires the content of the new font file, encrypts the content of the new font file, and feeds back the encrypted font file to the browser through the server specifically includes:
and the encryption server responds to the encrypted font file acquisition request, acquires the content of a new font file BaiInfo.ttf, performs base64 encryption on the content, adds the encrypted font file to a response header of the encrypted font file acquisition request, and feeds the response header back to the browser through the server.
5. The method according to claim 1, wherein before rendering, the browser places encrypted data returned by the server in a designated location, places the encrypted font file in a designated tag, and automatically converts the encrypted font file into real data for display during rendering specifically includes:
the browser generates random data at a designated position before rendering, modifies cs hidden random data, places encrypted data returned by the server at the designated position through cs pseudo elements, places the encrypted font file under cs @ font-face through js, and automatically converts the encrypted font file into real data through font-face before rendering to display to a user.
6. A system for crawling web page content, comprising:
the server is used for receiving an encrypted data acquisition request and/or an encrypted font file acquisition request sent by a browser, sending data to be encrypted to the encryption server based on the encrypted data acquisition request, and sending the encrypted font file acquisition request to the encryption server;
the encryption server is used for responding to a data encryption request, generating a corresponding relation of a new font file, a number and a messy code, modifying the data to be encrypted transmitted by the server through the corresponding relation, generating encrypted data and feeding the encrypted data back to the browser through the server; the encryption server responds to the encrypted font file acquisition request, acquires the content of the new font file, encrypts the content of the new font file, and feeds back the encrypted font file to the browser through the server;
the browser is used for sending an encrypted data acquisition request and/or an encrypted font file acquisition request to the server, receiving the fed back encrypted data and the encrypted font file, placing the encrypted data returned by the server at a specified position before rendering, placing the encrypted font file at a specified label, and automatically converting the encrypted font file into real data for displaying during rendering.
7. The system of claim 6, wherein the server is specifically configured to:
and carrying out token verification, if the verification fails, directly returning a failure page, if the verification succeeds, extracting the keywords in the encrypted data acquisition request, and taking out corresponding data to be encrypted from a database according to the keywords.
8. The system of claim 6, wherein the encryption server is specifically configured to:
and responding to a data encryption request, modifying the font file of FontMap.ttf, generating a new font file BaiInfo.ttf, generating a corresponding relation between numbers and messy codes, and modifying the data to be encrypted sent by the server through the corresponding relation to generate encrypted data.
9. The system of claim 8, wherein the encryption server is specifically configured to:
and responding to the encrypted font file acquisition request, acquiring the content of a new font file BaiInfo.ttf, carrying out base64 encryption on the content, adding the encrypted font file into a response header of the encrypted font file acquisition request, and feeding back the response header to the browser through the server.
10. The system of claim 6, wherein the browser is specifically configured to:
before rendering, generating random data at a designated position, modifying cs hidden random data, placing encrypted data returned by the server at the designated position through cs pseudo elements, placing the encrypted font file under cs @ font-face through js, and automatically converting the encrypted font file into real data through font-face before rendering to be displayed for a user.
CN202010729997.XA 2020-07-27 2020-07-27 Webpage content reverse crawling method and system Pending CN111901332A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010729997.XA CN111901332A (en) 2020-07-27 2020-07-27 Webpage content reverse crawling method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010729997.XA CN111901332A (en) 2020-07-27 2020-07-27 Webpage content reverse crawling method and system

Publications (1)

Publication Number Publication Date
CN111901332A true CN111901332A (en) 2020-11-06

Family

ID=73190021

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010729997.XA Pending CN111901332A (en) 2020-07-27 2020-07-27 Webpage content reverse crawling method and system

Country Status (1)

Country Link
CN (1) CN111901332A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105812366A (en) * 2016-03-14 2016-07-27 携程计算机技术(上海)有限公司 Server, anti-crawler system and anti-crawler verification method
US20170126631A1 (en) * 2015-11-03 2017-05-04 Box, Inc. Securing shared documents using dynamic natural language steganography
CN107818108A (en) * 2016-09-13 2018-03-20 阿里巴巴集团控股有限公司 A kind of webpage rendering intent, apparatus and system
CN108429757A (en) * 2018-03-26 2018-08-21 成都睿码科技有限责任公司 A kind of the counter of guarding website resource climbs method
CN109241391A (en) * 2018-09-20 2019-01-18 四川长虹电器股份有限公司 A kind of anti-crawler method climbed of solution font
CN109543454A (en) * 2019-01-25 2019-03-29 腾讯科技(深圳)有限公司 A kind of anti-crawler method and relevant device
CN109862031A (en) * 2019-03-13 2019-06-07 娄奥林 A kind of methods of pair of anti-crawler of encryption
CN109977685A (en) * 2019-03-21 2019-07-05 古联(北京)数字传媒科技有限公司 Web page contents encryption method, encryption device and system
CN110414221A (en) * 2019-07-11 2019-11-05 东软集团股份有限公司 Data processing method, device, storage medium and electronic equipment
CN110620657A (en) * 2019-08-23 2019-12-27 上海科技发展有限公司 Webpage word processing method, system and device
CN111008348A (en) * 2019-11-28 2020-04-14 盛业信息科技服务(深圳)有限公司 Anti-crawler method, terminal, server and computer readable storage medium
CN111212033A (en) * 2019-12-16 2020-05-29 北京淇瑀信息科技有限公司 Page display method and device based on combined web crawler defense technology and electronic equipment

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170126631A1 (en) * 2015-11-03 2017-05-04 Box, Inc. Securing shared documents using dynamic natural language steganography
CN105812366A (en) * 2016-03-14 2016-07-27 携程计算机技术(上海)有限公司 Server, anti-crawler system and anti-crawler verification method
CN107818108A (en) * 2016-09-13 2018-03-20 阿里巴巴集团控股有限公司 A kind of webpage rendering intent, apparatus and system
CN108429757A (en) * 2018-03-26 2018-08-21 成都睿码科技有限责任公司 A kind of the counter of guarding website resource climbs method
CN109241391A (en) * 2018-09-20 2019-01-18 四川长虹电器股份有限公司 A kind of anti-crawler method climbed of solution font
CN109543454A (en) * 2019-01-25 2019-03-29 腾讯科技(深圳)有限公司 A kind of anti-crawler method and relevant device
CN109862031A (en) * 2019-03-13 2019-06-07 娄奥林 A kind of methods of pair of anti-crawler of encryption
CN109977685A (en) * 2019-03-21 2019-07-05 古联(北京)数字传媒科技有限公司 Web page contents encryption method, encryption device and system
CN110414221A (en) * 2019-07-11 2019-11-05 东软集团股份有限公司 Data processing method, device, storage medium and electronic equipment
CN110620657A (en) * 2019-08-23 2019-12-27 上海科技发展有限公司 Webpage word processing method, system and device
CN111008348A (en) * 2019-11-28 2020-04-14 盛业信息科技服务(深圳)有限公司 Anti-crawler method, terminal, server and computer readable storage medium
CN111212033A (en) * 2019-12-16 2020-05-29 北京淇瑀信息科技有限公司 Page display method and device based on combined web crawler defense technology and electronic equipment

Similar Documents

Publication Publication Date Title
US9621566B2 (en) System and method for detecting phishing webpages
US10097360B2 (en) Automated test to tell computers and humans apart
CN105184159B (en) The recognition methods of webpage tamper and device
Xiang et al. Cantina+ a feature-rich machine learning framework for detecting phishing web sites
US7581245B2 (en) Technique for evaluating computer system passwords
CN103888490B (en) A kind of man-machine knowledge method for distinguishing of full automatic WEB client side
JP7330891B2 (en) System and method for direct in-browser markup of elements in Internet content
CN106095979B (en) URL merging processing method and device
US20120047122A1 (en) System, method and computer readable medium for web crawling
US20090198673A1 (en) Forum Mining for Suspicious Link Spam Sites Detection
CN105302815B (en) The filter method and device of the uniform resource position mark URL of webpage
Haruta et al. Visual similarity-based phishing detection scheme using image and CSS with target website finder
CN105635064B (en) CSRF attack detection method and device
RU2676247C1 (en) Web resources clustering method and computer device
CN105763543A (en) Phishing site identification method and device
CN105589943A (en) Method and device for picture adaptability processing of search result page and server
CN107800686A (en) A kind of fishing website recognition methods and device
CN106776615A (en) Heating power drawing generating method and device
CN104317884B (en) The acquisition methods and device of website sources page type
CN111090797A (en) Data acquisition method and device, computer equipment and storage medium
US9665574B1 (en) Automatically scraping and adding contact information
CN114157568B (en) Browser secure access method, device, equipment and storage medium
CN104281629A (en) Method and device for extracting picture from webpage and client equipment
Piñeiro et al. Web architecture for URL-based phishing detection based on Random Forest, Classification Trees, and Support Vector Machine
CN111901332A (en) Webpage content reverse crawling method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201106

RJ01 Rejection of invention patent application after publication