WO2018171572A1 - 恶意网址识别方法、计算设备及存储介质 - Google Patents

恶意网址识别方法、计算设备及存储介质 Download PDF

Info

Publication number
WO2018171572A1
WO2018171572A1 PCT/CN2018/079548 CN2018079548W WO2018171572A1 WO 2018171572 A1 WO2018171572 A1 WO 2018171572A1 CN 2018079548 W CN2018079548 W CN 2018079548W WO 2018171572 A1 WO2018171572 A1 WO 2018171572A1
Authority
WO
WIPO (PCT)
Prior art keywords
webpage
url
content
identified
malicious
Prior art date
Application number
PCT/CN2018/079548
Other languages
English (en)
French (fr)
Inventor
刘健
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018171572A1 publication Critical patent/WO2018171572A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9554Retrieval from the web using information identifiers, e.g. uniform resource locators [URL] by using bar codes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing

Definitions

  • the embodiments of the present application relate to the field of network security, and in particular, to a malicious website identification method, a computing device, and a storage medium.
  • the process for the application to identify the malicious website includes: the background server of the application obtains the webpage content corresponding to the webpage, and detects whether the webpage content includes a preset keyword, and if included, the background server adds the webpage to the webpage. Malicious URL database.
  • the background server detects whether the web address included in the webpage access request belongs to the malicious web address database, and if so, the background server sends a malicious web address prompt to the application.
  • the embodiment of the present application provides a malicious website identification scheme, which can improve the recognition coverage rate of a malicious website.
  • a method for identifying a malicious website is provided, which is applied to a computing device, and the method includes: obtaining a URL to be identified; acquiring a reference URL corresponding to the to-be-identified URL, where the reference URL is The first webpage content corresponding to the to-be-identified webpage; detecting whether the first webpage content includes malicious content, and detecting whether the second webpage content corresponding to the reference webpage contains malicious content; when detecting the first webpage When at least one of the content and the second webpage content includes malicious content, determining that the to-be-identified webpage is a malicious webpage.
  • a method for identifying a malicious web address is provided, which is applied to a computing device, and the method includes: acquiring a URL to be identified; detecting that the first webpage content corresponding to the to-be-identified web address does not include And obtaining, by the malicious content, a reference URL corresponding to the to-be-identified URL, where the reference URL is in the first webpage content corresponding to the to-be-identified webpage; and detecting whether the second webpage content corresponding to the referenced webpage contains malicious content; When it is detected that the second webpage content corresponding to the reference webpage contains malicious content, determining that the to-be-identified webpage is a malicious webpage.
  • a computing device includes: a processor and a memory; the memory stores computer readable instructions, where the processor is configured to: obtain a URL to be identified; Determining a reference URL corresponding to the identifier, the reference URL being in the first webpage content corresponding to the to-be-identified webpage; detecting whether the first webpage content includes malicious content, and detecting the second webpage corresponding to the referenced webpage Whether the content includes malicious content; when it is detected that at least one of the first webpage content and the second webpage content includes malicious content, determining that the to-be-identified webpage is a malicious webpage.
  • a computing device includes: a processor and a memory; the memory stores computer readable instructions, where the processor is configured to: obtain a URL to be identified; When the first webpage content corresponding to the to-be-identified webpage does not contain the malicious content, the reference webpage corresponding to the to-be-identified webpage is obtained, and the reference webpage is in the first webpage content corresponding to the to-be-identified webpage; Whether the second webpage content corresponding to the reference webpage contains malicious content; and when detecting that the second webpage content corresponding to the reference webpage contains malicious content, determining that the to-be-identified webpage is a malicious webpage.
  • the identifying URL is a malicious URL.
  • the malicious website identification solution provided by the embodiment of the present application can identify the malicious website included in the normal webpage according to the reference relationship between the URLs, and restrict the user access to the normal webpage even if the criminals encapsulate the malicious web address and add it to the normal webpage. Web pages, which increase the visibility of malicious URLs and ensure the security of Internet access.
  • FIG. 1A is a schematic diagram showing an implementation environment provided by an embodiment of the present application.
  • FIG. 1B is a schematic diagram showing an implementation environment provided by an embodiment of the present application.
  • FIG. 2A is a schematic diagram of an implementation process of a server identifying a malicious website according to an embodiment of the present application
  • 2B is a schematic diagram of an implementation process of identifying a malicious website in the embodiment of the present application.
  • FIG. 3 is a flowchart of a malicious website identification method provided by an embodiment of the present application.
  • 4A is a flowchart of a malicious website identification method provided by another embodiment of the present application.
  • FIG. 4B is a flowchart of a process of identifying a reference URL involved in the malicious web address identification method shown in FIG. 4A;
  • 4C is a schematic diagram of an implementation of a process of identifying a reference URL
  • 4D and 4E are schematic diagrams of interfaces provided by an exemplary embodiment
  • FIG. 5 is a structural block diagram of a malicious website identification apparatus provided by an embodiment of the present application.
  • FIG. 6 is a structural block diagram of a computing device provided by an embodiment of the present application.
  • FIG. 7 illustrates a flow diagram of a malicious web address identification method 700 in accordance with some embodiments of the present application.
  • Multiple as referred to herein means two or more. "and/or”, describing the association relationship of the associated objects, indicating that there may be three relationships, for example, A and/or B, which may indicate that there are three cases where A exists separately, A and B exist at the same time, and B exists separately.
  • the character "/" generally indicates that the contextual object is an "or" relationship.
  • Encapsulation carrier An entity used to carry data, and the data carried in the encapsulation carrier cannot be directly identified.
  • the data carried by the encapsulation carrier is a webpage of a webpage, and the webpage is represented by an IP (Internet Protocol) address or a URL (Uniform Resource Locator).
  • the corresponding extraction technology can be used to extract the data carried in the package carrier.
  • the package carrier is a two-dimensional code or a barcode
  • the data carried in the two-dimensional code or the barcode can be extracted by a two-dimensional code or barcode recognition technology.
  • Malicious content refers to text content, picture content or video content of a webpage containing preset keywords, wherein the preset keyword has an illegal attribute.
  • the preset keywords are "bet”, “casino”, “entertainment city” and the like.
  • the webpage content corresponding to the malicious webpage directly or indirectly contains the malicious content.
  • FIG. 1A is a schematic diagram of an implementation environment provided by an embodiment of the present application, where the smart device 110 and the server 120 are included.
  • the smart device 110 is an electronic device (which may also be referred to as a terminal device or a computing device) having an Internet access function.
  • the electronic device is a smartphone, a tablet, an MP3 player (Moving Picture Experts Group Audio Layer III), an MP4 (Moving Picture Experts Group Audio Layer IV) player or Personal computer and so on.
  • the smart device 110 runs an application with a malicious web address recognition function, which is a browser application, an instant messaging application, a social application or a rich media application, and the like. For example, when a user uses a browser application for Internet access, the smart device 110 can detect the security of the web address that the user wants to access. In one embodiment, for a malicious web address, smart device 110 displays a warning identifier. In one embodiment, smart device 110 may also restrict user access to malicious web sites. For the secure web address, the smart device 110 normally displays the corresponding web page content.
  • a malicious web address recognition function is a browser application, an instant messaging application, a social application or a rich media application, and the like.
  • the smart device 110 can detect the security of the web address that the user wants to access. In one embodiment, for a malicious web address, smart device 110 displays a warning identifier. In one embodiment, smart device 110 may also restrict user access to malicious web sites. For the secure web address, the smart device 110 normally displays the corresponding web page content.
  • the smart device 110 and the server 120 are connected by a wired network or a wireless network.
  • the server 120 is a server, a server cluster composed of several servers, or a cloud computing center.
  • server 120 is a backend server in smart device 110 (with a malicious web address recognition function) application. After receiving the webpage access request sent by the smart device 110 by the application, the server 120 identifies the to-be-identified web address included in the webpage access request, and returns a corresponding recognition result to the smart device 110 for display by the smart device 110.
  • a URL database 121 and a URL reference relationship database 122 are built into the server 120.
  • the URL database 121 stores the verified malicious website (or both the verified secure website and the malicious website).
  • the URL reference relation database 122 stores a reference relationship between the URLs.
  • the server 120 When identifying the URL to be identified, the server 120 combines the data in the URL database 121 and the URL reference relation database 122 to detect whether the webpage content to be identified and the webpage content corresponding to the URL to be identified contains malicious content, and in both webpages. When the malicious content is not included in the content, it is determined that the to-be-identified URL is a secure URL. Otherwise, the server 120 determines that the to-be-identified URL is a malicious web address.
  • the wireless or wired network described above uses standard communication techniques and/or protocols.
  • the network is usually the Internet, but can also be any network, including but not limited to a Local Area Network (LAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a mobile, a wired or a wireless. Any combination of networks, private networks, or virtual private networks).
  • data exchanged over a network is represented using techniques and/or formats including Hyper Text Markup Language (HTML), Extensible Markup Language (XML), and the like.
  • SSL Secure Socket Layer
  • TLS Transport Layer Security
  • VPN Virtual Private Network
  • IPsec Internet Protocol Security
  • Regular encryption techniques are used to encrypt all or some of the links.
  • the above described data communication techniques may also be replaced or supplemented using custom and/or dedicated data communication techniques.
  • FIG. 1B shows a schematic diagram of an implementation environment including a smart device 130 and a server 140 in accordance with an embodiment of the present application.
  • the smart device 130 can be a variety of computing devices with Internet access capabilities.
  • Server 140 can be a server, a server cluster consisting of several servers, or a cloud computing center.
  • the smart device 130 can send an access request to the server 140 for the server 140 to return a response message.
  • smart device 130 can send a first web content request to server 140.
  • Server 140 may return the first web page content to smart device 130.
  • the smart device 130 can include a lightweight database.
  • smart device 130 can include a web site database 131 and a web address reference relationship database 132.
  • the URL database 131 and the URL reference relation database 132 are consistent with the contents of the URL database 121 and the URL reference relationship database 122, respectively. In this way, the smart device 130 can perform the malicious web address identification method described below. In order to simplify the description, the malicious website identification method will be described below by taking the execution subject as the server 120 as an example.
  • the malicious website identification methods provided by the various embodiments of the present application are all used in the server 120 shown in FIG. 1A, which will be described below by using an exemplary embodiment.
  • a URL database 210 is constructed in server 200, which stores a verified malicious web address.
  • the server 200 detects whether the to-be-identified URL is stored in the URL database 210.
  • the server 200 determines that the to-be-identified web address is a malicious web address, and feeds the corresponding identification result to the smart device through the recognition result feedback interface; when detecting that the to-be-identified web address is not stored, the server 200 That is, the webpage content identification function module 220 detects whether the webpage content corresponding to the webpage to be identified contains malicious content, and feeds the recognition result to the smart device through the recognition result feedback interface.
  • the identification result indicates that the to-be-identified web address is a secure web address
  • the smart device normally displays the webpage content; when the recognition result indicates that the to-be-identified webpage is a malicious webpage, the smart device restricts the user's access.
  • not only the website database 210 is constructed in the server 200, but also a web address reference relation database 230 indicating a reference relationship between the websites.
  • the server 200 detects whether the reference address corresponding to the to-be-identified URL is stored in the URL reference relation database 230; when it is detected that the URL corresponding to the to-be-identified URL is stored, When the address is referenced, the server 200 identifies the webpage content corresponding to the identified webpage and the reference webpage by the webpage content identification function module 220, and then determines whether the to-be-identified webpage is a malicious webpage by combining the recognition results of the two.
  • the server can identify the webpage to the malicious webpage according to the reference relationship between the webpages, thereby preventing the smart device from displaying such a webpage.
  • the security risks caused by web pages The following description is made using the illustrative embodiments.
  • FIG. 3 is a flowchart of a method for identifying a malicious website according to an embodiment of the present application.
  • the method for identifying a malicious website may be performed in a computing device.
  • the computing device can be smart device 130 or server 120.
  • the method shown in FIG. 3 is illustrated by using the server 120 shown in FIG. 1A as an example.
  • the method includes:
  • Step 301 Obtain a website to be identified.
  • the to-be-identified web address is a web address extracted from the webpage access request when the server 120 receives the webpage access request sent by the browser application.
  • the to-be-identified web address is a web address extracted by the server from the obtained instant messaging message.
  • the to-be-identified web address is the server from the social information (such as comment information, sharing information, etc.).
  • the URL that was extracted.
  • Step 302 Obtain a reference URL corresponding to the URL to be identified.
  • the reference URL may be located in the first webpage content corresponding to the URL to be identified.
  • the reference URL can be a link address in the first webpage content.
  • the reference URL is encapsulated into the package carrier and added to the first webpage content corresponding to the URL to be identified.
  • the package carrier is a two-dimensional code or barcode. That is, the reference URL is encapsulated into a two-dimensional code or a barcode (the encapsulated representation is a two-dimensional code or a barcode), and then added to the first webpage content of the to-be-identified webpage.
  • the server 120 detects whether the URL reference relation database includes the reference URL corresponding to the to-be-identified URL, and when detecting the reference URL corresponding to the to-be-identified URL, Get the reference URL in the database.
  • the server 120 obtains the reference website www.aaa1.com corresponding to the to-be-identified website www.aaa.com from the URL reference relation database, and indicates that www.aaa1.com is packaged into the two-dimensional code and added to www.aaa.
  • the corresponding web page of .com that is, the QR code is displayed on the webpage corresponding to www.aaa.com).
  • the server 120 when detecting that the URL reference relation database does not include the reference URL corresponding to the to-be-identified URL, the server 120 acquires the first webpage content corresponding to the to-be-identified webpage. The server 120 extracts the reference URL from the package carrier included in the first webpage content, and stores the to-be-identified URL and the reference URL in the URL reference relation database to facilitate subsequent calls.
  • Step 303 Detect whether the content of the first webpage includes malicious content, and detect whether the content of the second webpage corresponding to the reference webpage contains malicious content.
  • the server 120 After obtaining the URL to be identified and the corresponding reference URL, the server 120 further detects whether the content of the first webpage corresponding to the to-be-identified webpage and the content of the second webpage corresponding to the referenced webpage contain malicious content.
  • the server 120 first detects whether the first webpage content contains malicious content. When detecting that the first webpage content contains malicious content, the server 120 directly determines that the to-be-identified webpage is a malicious webpage, and does not need to further detect the second webpage content. Upon detecting that the first webpage content does not contain malicious content, the server 120 may further detect whether the second webpage content contains malicious content. Step 304: When it is detected that at least one of the first webpage content and the second webpage content includes malicious content, determine that the to-be-identified webpage is a malicious webpage.
  • malicious URLs can be restricted from access.
  • the server 120 when detecting that the first webpage content does not contain malicious content, and the second webpage content includes malicious content, the server 120 determines that the to-be-identified webpage is a malicious webpage (the first webpage content includes a malicious webpage) Package carrier). When the first webpage content does not contain malicious content, and the second webpage content does not contain malicious content, the server 120 determines that the to-be-identified webpage is a secure webpage.
  • the server 120 restricts the terminal from accessing the to-be-identified URL, or the corresponding malicious website prompts when the terminal accesses the to-be-identified URL, whether the URL to be identified directly contains malicious content or indirectly contains malicious content. information.
  • the server 120 can not only identify the website that directly contains the malicious content, but also identify the website address that indirectly contains the malicious content (that is, the package carrier that includes the malicious website), so that the recognition coverage of the malicious website is significantly improved. improve.
  • the method for identifying a malicious website obtains the reference URL corresponding to the to-be-identified web address while obtaining the to-be-identified web address, and further includes detecting the webpage content corresponding to the to-be-identified web address and the reference webpage.
  • the to-be-identified URL is a malicious web address.
  • the server may identify the malicious website included in the normal webpage according to the reference relationship between the URLs, and restrict the user from accessing the webpage. Normal web pages, which increase the recognition coverage of malicious URLs and ensure the security of Internet access.
  • the server 120 when the URL reference database maintained by the server 120 does not include the reference URL corresponding to the URL to be identified (for example, the server obtains the to-be-identified URL for the first time), the server needs to further process the content of the webpage corresponding to the identified URL. Detecting to determine whether the URL to be identified contains a corresponding reference URL (ie, determining whether the URL to be identified is directed to another URL). The following description is made using the illustrative embodiments.
  • FIG. 4A is a flowchart of a method for identifying a malicious website according to another embodiment of the present application. This embodiment is described by using the malicious website identification method for the server 120 shown in FIG. 1A as an example. include:
  • Step 401 Obtain a website to be identified.
  • the implementation of this step is similar to the foregoing step 301, and details are not described herein again.
  • a URL database is built into the server, the URL database containing verified malicious URLs.
  • the URL database contains both verified secure URLs and malicious URLs.
  • the data storage structure in the URL database is as shown in Table 1.
  • the server 120 After obtaining the URL to be identified, the server 120 first detects whether the URL to be identified is included in the URL database. When it is detected that the URL to be identified is included in the URL database, and the URL type of the to-be-identified URL is malicious, the server 120 directly determines that the to-be-identified URL is a malicious website. When it is detected that the to-be-identified URL is not included in the URL database, or the URL is included in the URL database, and the URL type of the to-be-identified URL is secure, the server 120 performs the following step 402.
  • Step 402 Detect whether the URL referenced database contains the reference URL corresponding to the to-be-identified URL.
  • each reference record in the URL reference relation database is described as (URL A, URL B), wherein the URL B is a reference URL of the URL A, that is, the URL B is encapsulated into the package carrier, and is added to The URL of the URL A is in the content.
  • the server 120 uses the URL to be identified as a search term, and detects whether the URL reference database contains the reference URL corresponding to the to-be-identified URL.
  • the following step 403 may be performed; when the reference URL corresponding to the to-be-identified URL is detected, The first time the malicious URL is identified for the to-be-identified URL, the following step 404 is performed.
  • the URLs in the reference record are represented by a hash value.
  • the server searches in the URL reference relation database according to the hash value of the URL to be identified.
  • Step 403 When it is detected that the URL reference relation database includes the reference network address corresponding to the to-be-identified URL, obtain the reference URL corresponding to the to-be-identified URL from the URL reference relation database.
  • the server 120 obtains the reference website www.aaa1.com from the reference record shown in Table 2.
  • Step 404 When it is detected that the URL reference relation database does not include the reference URL corresponding to the to-be-identified URL, obtain the first webpage content corresponding to the to-be-identified URL.
  • the server When it is detected that the URL reference relation database does not include the reference URL corresponding to the to-be-identified URL, the server needs to further obtain the first webpage content corresponding to the to-be-identified URL, and based on the first webpage, in order to determine whether the to-be-identified URL can be directed to another webpage. The content determines whether the URL to be identified is directed to another URL.
  • the server 120 may simulate the browser to access the to-be-identified web address, so as to obtain the first webpage content corresponding to the to-be-identified web address.
  • Step 405 Extract a reference URL from a package carrier included in the first webpage content.
  • the server 120 detects whether the first webpage content includes the package carrier, and extracts the reference URL from the package carrier when detecting that the first webpage content includes the package carrier.
  • the step includes the following steps.
  • Step 405A Perform a screenshot operation on the content of the first webpage to obtain a screenshot of the webpage.
  • the server After obtaining the content of the first webpage, the server performs a screenshot operation on the content of the first webpage to obtain at least one screenshot of the webpage, and the screenshot of the webpage includes the text content and the image content.
  • the server performs a screenshot operation on the first webpage content corresponding to www.aaa.com to obtain a webpage screenshot 41.
  • Step 405B Identify the package carrier included in the screenshot of the webpage by a predetermined image recognition technology, and the predetermined image recognition technology includes at least one of a two-dimensional code recognition technology and a barcode recognition technology.
  • the server uses the two-dimensional code recognition technology or the barcode recognition technology to identify that the webpage screenshot includes Package carrier.
  • the server identifies the two-dimensional code 42 included in the web page screenshot 41 by the two-dimensional code recognition technology.
  • the embodiment of the present application is only schematically illustrated by the two-dimensional code recognition technology and the barcode recognition technology.
  • the server may also use other graphic code recognition. The technology recognizes this, and this embodiment does not limit this.
  • Step 405C Determine a reference URL of the package encapsulated in the package carrier according to the recognition result.
  • the server 120 recognizes the two-dimensional code 42 by the two-dimensional code recognition technology and extracts it to the reference website www.aaa1.com.
  • the server 120 determines that the to-be-identified webpage does not include the corresponding reference webpage, thereby determining the recognition result of the webpage to be identified as the final identifier. result.
  • Step 406 Store the to-be-identified URL and the reference URL into the URL reference relation database.
  • the server 120 can directly obtain the corresponding reference URL from the URL reference relation database, and the server 120 associates the identified URL with the reference URL.
  • Step 407 Detect whether the content of the first webpage contains malicious content.
  • the server obtains the to-be-identified web address and its corresponding reference web address. Further, the server 120 detects whether the webpage content corresponding to each of the to-be-identified webpage and the reference webpage contains malicious content.
  • the method for detecting whether the content of the first webpage includes malicious content in an embodiment, when the webpage database does not include the webpage to be identified, the server detects whether the content of the first webpage includes a preset keyword, and detects When the first webpage content includes a preset keyword, it is determined that the first webpage content contains malicious content. When detecting that the preset content keyword is not included in the first webpage content, the server 120 determines that the first webpage content does not contain malicious content.
  • the preset keyword has an illegal attribute.
  • the server 120 when the URL database includes the to-be-identified URL, and the URL type corresponding to the to-be-identified URL is secure, the server 120 directly determines that the first webpage content does not contain malicious content.
  • Step 408 Detect whether the reference URL is included in the URL database.
  • the verified URL is stored in the URL database (the URL database occupies a small storage space), and when it is detected that the URL database includes the reference URL, the server 120 performs the following step 409; when detecting When the reference URL is not included in the URL database, the server 120 performs the following steps 410 to 412.
  • the URL database stores both the verified malicious URL and the secure URL (the URL database occupies a small storage space), and the reference URL is included in the detection URL database, and the reference URL is a secure URL.
  • the server 120 determines that the referenced URL does not contain malicious content.
  • the server 120 determines that the reference URL contains malicious content.
  • the server 120 performs the following steps 410 to 412.
  • Step 409 When it is detected that the reference URL is included in the URL database, it is determined that the second webpage content corresponding to the reference URL contains malicious content.
  • Step 410 When it is detected that the reference URL is not included in the URL database, obtain the second webpage content corresponding to the reference webpage, and the second webpage content is obtained by simulating the browser accessing the reference webpage.
  • the server 120 determines that the security of the reference URL is not verified, and simulates that the browser accesses the reference URL to obtain the second webpage content corresponding to the reference URL.
  • Step 411 Detect whether a preset keyword is included in the content of the second webpage.
  • the server 120 detects whether the acquired second webpage content includes a preset keyword, if it is similar to whether the first webpage content is included in the step 407.
  • the server 120 determines that the second webpage content contains malicious content. When it is detected that the preset content keyword is not included in the second webpage content, the server 120 determines that the malicious content is not included in the second webpage content.
  • Step 412 When it is detected that the second webpage content includes a preset keyword, determining that the second webpage content contains malicious content, and adding the reference address to the webpage database.
  • the server 120 can directly identify that the webpage content corresponding to the reference webpage contains malicious content, and when detecting that the second webpage content includes the preset keyword, the server 120 adds the reference URL to the In the URL database, it is convenient for subsequent calls.
  • step 409 there is no strict sequence relationship between the foregoing step 409 and the steps 410 to 412, that is, the step 409 and the steps 410 to 412 can be performed simultaneously.
  • the embodiment is performed only by the step 409 before the steps 410 to 412.
  • the examples are schematically illustrated.
  • Step 413 When it is detected that at least one of the first webpage content and the second webpage content corresponding to the reference webpage contains malicious content, determine that the to-be-identified webpage is a malicious webpage.
  • Step 414 returning a recognition result to the smart device, and the smart device is configured to perform a predetermined operation according to the recognition result.
  • the server 120 when the recognition result indicates that the to-be-identified web address is a malicious web address, when the to-be-identified web address is extracted by the server from the webpage access request, the server 120 returns malicious to the browser application (installed in the smart device). URL reminder and restrict access to the URL to be identified.
  • the browser application displays the notification of the malicious URL received.
  • the server 120 identifies the URL and identifies the URL as malicious.
  • the malicious website reminder information 45 is returned to the browser application.
  • the browser application displays the malicious URL alert message 45 and restricts the user from continuing to access the URL.
  • the server 120 when the identification result indicates that the to-be-identified web address is a malicious web address, when the to-be-identified web address is extracted by the server from the instant messaging message, the server 120 sends the instant messaging message to the message receiving party, and sends the malicious message.
  • URL identification directive when the identification result indicates that the to-be-identified web address is a malicious web address, when the to-be-identified web address is extracted by the server from the instant messaging message, the server 120 sends the instant messaging message to the message receiving party, and sends the malicious message.
  • the instant messaging application marks the to-be-identified URL as a malicious web address according to the malicious website identification instruction.
  • the server when the server extracts the website www.aaa.com from the instant messaging message sent by "small A", it detects that the content of the corresponding webpage of www.aaa.com contains the malicious website www.aaa1
  • the .com QR code sends a malicious URL identification command to the instant messaging application of the instant messaging message.
  • the receiving instant messaging application receives the instant messaging message containing the web address
  • the malicious web address tag 46 is displayed on the peripheral side of the web address, and the smart device is prohibited from calling other applications (such as browsers) that do not have malicious web address recognition function.
  • the URL when the server extracts the website www.aaa.com from the instant messaging message sent by "small A", it detects that the content of the corresponding webpage of www.aaa.com contains the malicious website www.aaa1
  • the .com QR code sends a malicious URL identification command to the instant messaging application of the instant messaging message.
  • the malicious web address tag 46 is displayed on the peripheral side of the web address, and the smart device
  • the server 120 when the identification result indicates that the to-be-identified web address is a malicious web address, when the to-be-identified web address is extracted from the social information by the server, the server 120 sets a malicious web address identifier for the social information, indicating that the user is in the social information. Contains malicious URLs.
  • the server 120 when the server 120 detects that a certain comment information includes the website www.aaa.com, and the webpage content includes a two-dimensional code of the malicious website www.aaa1.com, the server 120 comments for the piece of information. Set a malicious URL identifier, or delete this comment.
  • the server 120 updates when the predetermined update condition is met.
  • the URL references at least one of the reference relationship stored in the relational database and the URL stored in the update URL database.
  • the predetermined update condition includes receiving at least one of an update instruction or reaching a preset time interval.
  • the server detects whether the URL type corresponding to each URL in the URL database is accurate every 72 hours, and detects whether the reference relationship in the URL reference relation database is established. For another example, when the server receives the malicious website report information reported by the user, the server updates the URL type of the corresponding website in the URL database.
  • the server updates the data in the URL database and the URL reference relation database through the update mechanism, ensures the timeliness and accuracy of the data in the database, and further improves the accuracy of the malicious website identification.
  • FIG. 5 is a structural block diagram of a malicious website identification apparatus provided by an embodiment of the present application.
  • the malicious website identification device can reside in the computing device.
  • the computing device can be, for example, the server 120 of FIG. 1A or the smart terminal 130 of FIG. 1B.
  • the device includes a first acquisition module 510, a second acquisition module 520, a detection module 530, and a determination module 540.
  • the first obtaining module 510 is configured to implement the functions of the foregoing step 301 or 401;
  • the second obtaining module 520 is configured to implement the function of the foregoing step 302;
  • the detecting module 530 is configured to implement the function of step 303 above;
  • the determining module 540 is configured to implement the functions of the foregoing step 304 or 413.
  • the second obtaining module 520 includes: a first acquiring unit and a second acquiring unit;
  • a first obtaining unit configured to implement the functions of steps 404 and 405 described above;
  • the second obtaining unit is configured to implement the function of step 403 above.
  • the first obtaining unit is further configured to implement the functions of the foregoing steps 405A to 405C.
  • the detecting module 530 includes: a first detecting unit and a first determining unit;
  • a first detecting unit configured to implement the function of the foregoing step 408
  • the first determining unit is configured to implement the function of step 409 described above.
  • the detecting module 530 further includes: a third acquiring unit, a second detecting unit, and a second determining unit;
  • a third obtaining unit configured to implement the function of step 410 above;
  • a second detecting unit configured to implement the function of step 411 described above;
  • the second determining unit is configured to implement the function of the above step 412.
  • the device further includes: an update module;
  • an update module configured to update at least one of a reference relationship stored in the URL reference relation database and a URL stored in the URL database when the predetermined update condition is met, where the predetermined update condition includes receiving the update instruction or reaching at least a preset time interval One.
  • FIG. 6 is a structural block diagram of a computing device provided by an embodiment of the present application.
  • Computing device 600 includes a central processing unit (CPU) 601, a system memory 604 including random access memory (RAM) 602 and read only memory (ROM) 603, and a system bus 605 that connects system memory 604 and central processing unit 601.
  • the computing device 600 also includes a basic input/output system (I/O system) 606 that facilitates transfer of information between various devices within the computer, and a large capacity for storing the operating system 613, applications 614, and other program modules 615.
  • the application 614 may include, for example, a malicious web address identifying device.
  • the malicious website identification device can implement the malicious website identification method in the present application, thereby improving the recognition coverage rate of the malicious website.
  • the basic input/output system 606 includes a display 608 for displaying information and an input device 609 such as a mouse or keyboard for user input of information.
  • the display 608 and input device 609 are both connected to the central processing unit 601 via an input and output controller 610 that is coupled to the system bus 605.
  • the basic input/output system 606 can also include an input output controller 610 for receiving and processing input from a plurality of other devices, such as a keyboard, mouse, or electronic stylus.
  • input and output controller 610 also provides output to a display screen, printer, or other type of output device.
  • the mass storage device 607 is connected to the central processing unit 601 by a mass storage controller (not shown) connected to the system bus 605.
  • the mass storage device 607 and its associated computer readable medium provide non-volatile storage for the computing device 600. That is, the mass storage device 607 can include a computer readable medium (not shown) such as a hard disk or a CD-ROM drive.
  • the computer readable medium can include computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include RAM, ROM, EPROM, EEPROM, flash memory or other solid state storage technologies, CD-ROM, DVD or other optical storage, tape cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices.
  • RAM random access memory
  • ROM read only memory
  • EPROM Erasable programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • the computing device 600 can also be operated by a remote computer connected to the network via a network such as the Internet. That is, the computing device 600 can be connected to the network 612 through a network interface unit 611 connected to the system bus 605, or can be connected to other types of networks or remote computer systems using the network interface unit 611 (not shown) ).
  • the memory further includes one or more programs, the one or more programs being stored in a memory, the one or more programs including instructions for performing the malicious web address identification method provided by the embodiments of the present application.
  • ROM Read Only Memory
  • RAM Random Access Memory
  • disk optical disk
  • FIG. 7 illustrates a flow diagram of a malicious web address identification method 700 in accordance with some embodiments of the present application.
  • Method 700 can be performed, for example, in computing device 600, but is not limited thereto.
  • step S701 a website to be identified is obtained.
  • the implementation of step S701 is consistent with step 301, and details are not described herein again.
  • step S702 when it is detected that the first webpage content corresponding to the to-be-identified webpage does not contain malicious content, the reference webpage corresponding to the to-be-identified webpage is acquired.
  • the reference URL is in the first webpage content corresponding to the to-be-identified webpage.
  • the reference URL is the link address in the first webpage content.
  • the reference URL is encapsulated into the package carrier and added to the first webpage content corresponding to the URL to be identified.
  • step S702 may acquire the first webpage content corresponding to the to-be-identified web address.
  • the first webpage content is obtained by accessing the to-be-identified webpage by using a simulated browser.
  • Step S702 can also extract the reference URL from the package carrier included in the first webpage content.
  • step S702 can perform a screenshot operation on the content of the first webpage to obtain a screenshot of the webpage.
  • Step S702 can also identify the package carrier included in the screenshot of the webpage by using image recognition technology. Based on this, step S702 can determine the reference URL of the package encapsulated in the package carrier according to the recognition result.
  • step S702 may obtain a reference URL corresponding to the to-be-identified URL from the URL reference relation database.
  • the URL reference relational database stores a reference relationship between the URLs.
  • step S703 it is detected whether the second webpage content corresponding to the reference webpage contains malicious content.
  • step S703 may detect whether a reference URL is included in the URL database.
  • a verified malicious URL is stored in the URL database.
  • step S703 may determine that the second web content contains malicious content.
  • the method 700 may perform step S704 to determine that the to-be-identified webpage is a malicious webpage. It should be noted that a more specific implementation of the method 700 is shown in the method shown in FIG. 4A, and details are not described herein again.
  • the method 700 may obtain a reference URL in the first webpage content when the first webpage content corresponding to the to-be-identified webpage does not contain malicious content, and determine the to-be-identified webpage according to whether the second webpage content corresponding to the reference webpage contains malicious content. Whether it is a malicious URL, which greatly improves the recognition coverage of malicious URLs.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

本申请实施例公开了恶意网址识别方法、计算设备及存储介质。其中,一种恶意网址识别方法包括:获取待识别网址;获取与所述待识别网址对应的引用网址,所述引用网址处于所述待识别网址对应的第一网页内容中;检测所述第一网页内容是否包含恶意内容,并检测所述引用网址对应的第二网页内容是否包含恶意内容;当检测到所述第一网页内容和所述第二网页内容中至少一个包含恶意内容时,则确定所述待识别网址为恶意网址。本申请实施例中,根据网址间的引用关系能够识别出正常网页中包含的恶意网址,并限制用户访问该正常网页,从而提高了恶意网址的识别覆盖率,确保互联网访问的安全性。

Description

恶意网址识别方法、计算设备及存储介质
本申请要求于2017年03月21日提交中国专利局、申请号为201710171054.8、发明名称为“恶意网址识别方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及网络安全领域,特别涉及恶意网址识别方法、计算设备及存储介质。
背景技术
互联网为人们生活提供便利的同时,也带来了诸多安全问题。比如,互联网中充斥的大量恶意网址给用户访问带来巨大的安全隐患。为了避免用户访问恶意网址,越来越多的应用程序开始具备恶意网址识别功能。
现有技术中,应用程序识别恶意网址的过程包括:应用程序的后台服务器获取网址对应的网页内容,并检测该网页内容中是否包含预设关键词,若包含,后台服务器则将该网址添加到恶意网址数据库中。后续接收到应用程序发送的网页访问请求时,后台服务器即检测该网页访问请求中包含的网址是否属于恶意网址数据库,若属于,后台服务器则向应用程序发送恶意网址提示。
发明内容
本申请实施例提供了恶意网址识别方案,能够提高了恶意网址的识别覆盖率。
根据本申请实施例的一个方面,提供了一种恶意网址识别方法,应用于计算设备,所述方法包括:获取待识别网址;获取与所述待识别网址对应的引用网址,所述引用网址处于所述待识别网址对应的第一网页内容中;检测所述第一网页内容是否包含恶意内容,并检测所述引用网址对应的第二网页内容是否包含恶意内容;当检测到所述第一网页内容和所述第二网页内容中至少一个包含恶意内容时,则确定所述待识别网址为恶意网址。
根据本申请实施例的另一个方面,提供了一种恶意网址识别方法,应用于计算设备,所述方法包括:获取待识别网址;在检测到所述待识别网址对应的第一网页内容未包含恶意内容时,获取与所述待识别网址对应的引用网址,所述引用网址处于所述待识别网址对应的第一网页内容中;检测所述引用网址对应的第二网页内容是否包含恶意内容;在检测到所述引用网址对应的第二网页内容包含恶意内容时,确定所述待识别网址为恶意网址。
根据本申请实施例的另一个方面,提供了一种计算设备,包括:处理器和存储器;所述存储器中存储有计算机可读指令,可以使所述处理器:获取待识别网址;获取与所述待识别网址对应的引用网址,所述引用网址处于所述待识别网址对应的第一网页内容中;检测所述第一网页内容是否包含恶意内容,并检测所述引用网址对应的第二网页内容是否包含恶意内容;当检测到所述第一网页内容和所述第二网页内容中至少一个包含恶意内容时,则确定所述待识别网址为恶意网址。
根据本申请实施例的另一个方面,提供了一种计算设备,包括:处理器和存储器;所述存储器中存储有计算机可读指令,可以使所述处理器:获取待识别网址;在检测到所述待识别网址对应的第一网页内容未包含恶意内容时,获取与所述待识别网址对应的引用网址,所述引用网址处于所述待识别网址对应的第一网页内容中;检测所述引用网址对应的第二网页内容是否包含恶意内容;在检测到所述引用网址对应的第二网页内容包含恶意内容时,确定所述待识别网址为恶意网址。
根据本申请实施例的方案,通过在获取待识别网址的同时,获取该待识别网址对应的引用网址,进而在检测到待识别网址和引用网址对应的网页内容中包含恶意内容时,确定该待识别网址为恶意网址。采用本申请实施例提供的恶意网址识别方案,即便不法分子将恶意网址封装后添加到正常网页中,也可以根据网址间的引用关系识别出正常网页中包含的恶意网址,并限制用户访问该正常网页,从而提高了恶意网址的识别覆盖率,确保互联网访问的安全性。
附图简要说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所 需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1A示出了本申请一个实施例提供的实施环境的示意图;
图1B示出了本申请一个实施例提供的实施环境的示意图;
图2A是本申请一个实施例的服务器识别恶意网址过程的实施示意图;
图2B是本申请实施例中识别恶意网址过程的实施示意图;
图3示出了本申请一个实施例提供的恶意网址识别方法的流程图;
图4A示出了本申请另一个实施例提供的恶意网址识别方法的流程图;
图4B是图4A所示恶意网址识别方法涉及的识别引用网址过程的流程图;
图4C是识别引用网址过程的实施示意图;
图4D和4E是示意性实施例提供的界面示意图;
图5示出了本申请一个实施例提供的恶意网址识别装置的结构方框图;
图6示出了本申请一个实施例提供的计算设备的结构方框图;
图7示出了根据本申请一些实施例的恶意网址识别方法700的流程图。
实施本申请的方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
为了方便理解,下面对本申请实施例中涉及的名词进行解释。
封装载体:一种用于承载数据的实体,且封装载体中承载的数据无法直接识别。本申请各个实施例中,该封装载体承载的数据为网页的网址,该网址采用IP(Internet Protocol,互联网协议)地址或URL(Uniform Resource Locator,统一资源定位符)表示。
针对不同类型的封装载体,采用相应的提取技术能够提取出封装载体中承载的数据。比如,当该封装载体为二维码或条形码时,通过二维码或条形码识 别技术,能够提取出二维码或条形码中承载的数据。
恶意内容:指网页中包含预设关键词的文字内容、图片内容或视频内容,其中,该预设关键词具有非法属性。比如,该预设关键词为“下注”、“赌场”、“娱乐城”等等。本申请实施例中,恶意网址对应的网页内容直接或间接包含恶意内容。
请参考图1A,其示出了本申请一个实施例提供的实施环境的示意图,该实施环境中包括智能设备110和服务器120。
智能设备110是具有互联网访问功能的电子设备(也可以称为终端设备或者计算设备)。该电子设备为智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio LayerⅢ,动态影像压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio LayerⅣ,动态影像压缩标准音频层面4)播放器或个人计算机等等。
在一种实施方式中,智能设备110中运行有具有恶意网址识别功能的应用程序,该应用程序为浏览器应用程序、即时通信应用程序、社交类应用程序或富媒体应用程序等等。比如,当用户使用浏览器应用程序进行互联网访问时,智能设备110能够对用户所要访问网址的安全性进行检测。在一个实施例中,对于恶意网址,智能设备110显示警告标识。在一个实施例中,智能设备110还可以限制用户访问恶意网址。对于安全网址,智能设备110正常显示相应的网页内容。
智能设备110与服务器120之间通过有线网络或无线网络相连。
服务器120为一台服务器、由若干台服务器组成的服务器集群或云计算中心。
在一种实施方式中,服务器120为智能设备110中(具有恶意网址识别功能)应用程序的后台服务器。服务器120接收到智能设备110通过应用程序发送的网页访问请求后,对网页访问请求中包含的待识别网址进行识别,并向智能设备110返回相应的识别结果,供智能设备110进行显示。在一个实施例中,服务器120中构建有网址数据库121和网址引用关系数据库122。其中,网址数据库121中存储有经过验证的恶意网址(或同时存储有经过验证的安全网址和恶意网址)。网址引用关系数据库122中存储有网址之间的引用关系。在对待识别网址进行识别时,服务器120结合网址数据库121和网址引用关系数据库122 中的数据,检测待识别网址以及待识别网址对应引用网址的网页内容中是否包含恶意内容,并在两者的网页内容中均不包含的恶意内容时,确定待识别网址为安全网址。否则,服务器120确定待识别网址为恶意网址。
在一些实施例中,上述的无线网络或有线网络使用标准通信技术和/或协议。网络通常为因特网、但也可以是任何网络,包括但不限于局域网(Local Area Network,LAN)、城域网(Metropolitan Area Network,MAN)、广域网(Wide Area Network,WAN)、移动、有线或者无线网络、专用网络或者虚拟专用网络的任何组合)。在一些实施例中,使用包括超文本标记语言(Hyper Text Mark-up Language,HTML)、可扩展标记语言(Extensible Markup Language,XML)等的技术和/或格式来代表通过网络交换的数据。此外还可以使用诸如安全套接字层(Secure Socket Layer,SSL)、传输层安全(Transport Layer Security,TLS)、虚拟专用网络(Virtual Private Network,VPN)、网际协议安全(Internet Protocol Security,IPsec)等常规加密技术来加密所有或者一些链路。在另一些实施例中,还可以使用定制和/或专用数据通信技术取代或者补充上述数据通信技术。
图1B示出了根据本申请一个实施例的实施环境的示意图,该实施环境中包括智能设备130和服务器140。智能设备130可以是具有互联网访问功能的各种计算设备。服务器140可以是一台服务器、由若干台服务器组成的服务器集群或云计算中心。这里,智能设备130可以向服务器140发送访问请求,以便服务器140返回响应消息。例如,智能设备130可以向服务器140发送第一网页内容请求。服务器140可以向智能设备130返回第一网页内容。智能设备130可以包括轻量级数据库。例如,智能设备130可以包括网址数据库131和网址引用关系数据库132。其中,网址数据库131和网址引用关系数据库132分别与网址数据库121和网址引用关系数据库122中内容一致。这样,智能设备130可以执行下文所述的恶意网址识别方法。为了简化描述,下文中以执行主体为服务器120为例,对恶意网址识别方法进行说明。
本申请各个实施例提供的恶意网址识别方法均用于图1A所示的服务器120,下面采用示意性的实施例进行说明。
如图2A所示,在一些实施例中,服务器200中构建有网址数据库210,该网址数据库210存储有经过验证的恶意网址。当通过识别请求接收接口获取到智能设备发送的待识别网址时,服务器200检测网址数据库210中是否存储有 该待识别网址。当检测到存储有该待识别网址,服务器200即确定该待识别网址为恶意网址,并通过识别结果反馈接口将相应的识别结果反馈给智能设备;当检测到未存储该待识别网址,服务器200即通过网页内容识别功能模块220,检测待识别网址对应的网页内容中是否包含恶意内容,并通过识别结果反馈接口将识别结果反馈给智能设备。当识别结果指示待识别网址为安全网址时,智能设备即正常显示网页内容;当识别结果指示待识别网址为恶意网址时,智能设备则限制用户访问。
在本申请一些实施例中,如图2B所示,服务器200中不仅构建有网址数据库210,还可以构建有指示网址间引用关系的网址引用关系数据库230。服务器200通过识别请求接收接口获取到智能设备发送的待识别网址后,检测网址引用关系数据库230中是否存储有与该待识别网址对应的引用地址;当检测到存储有与该待识别网址对应的引用地址时,服务器200通过网页内容识别功能模块220分别对待识别网址和引用网址对应的网页内容进行识别,进而综合两者的识别结果判定待识别网址是否为恶意网址。
通过本申请实施例提供的恶意网址识别方法,即便恶意网址被封装后添加到不包含恶意内容的网页中,服务器也能够根据网页间引用关系识别该网页导向恶意网址,从而避免智能设备显示此类网页所带来的安全隐患。下面采用示意性的实施例进行说明。
请参考图3,其示出了本申请一个实施例提供的恶意网址识别方法的流程图,本实施例以该恶意网址识别方法可以在计算设备中执行。计算设备可以是智能设备130或者服务器120。下文以图1A所示的服务器120为例说明图3所示方法,该方法包括:
步骤301,获取待识别网址。
在一个实施例中,当该服务器120为浏览器应用程序对应的后台服务器时,待识别网址为服务器120接收到浏览器应用程序发送的网页访问请求时,从网页访问请求中提取到的网址。
在一个实施例中,当该服务器120为即时通讯应用程序对应的后台服务器时,待识别网址为服务器从获取到的即时通讯消息中提取到的网址。
在一个实施例中,当该服务器120为社交类应用程序(比如微博、博客等等)对应的后台服务器120时,待识别网址为服务器从社交信息(比如评论信 息、分享信息等等)中提取到的网址。
步骤302,获取与待识别网址对应的引用网址。这里,引用网址可以位于待识别网址对应的第一网页内容中。在一个实施例中,引用网址可以是第一网页内容中一个链接地址。在另一个实施例中,该引用网址被封装到封装载体后添加到待识别网址对应的第一网页内容中。在一个实施例中,该封装载体为二维码或条形码。即,引用网址被封装成二维码或条形码(封装后的表现形式即为二维码或条形码)后,添加到待识别网址的第一网页内容中。
在一种实施方式中,获取到待识别网址后,服务器120检测网址引用关系数据库中是否包含待识别网址对应的引用网址,并在检测到包含待识别网址对应的引用网址时,从网址引用关系数据库中获取该引用网址。
比如,服务器120从网址引用关系数据库中获取到待识别网址www.aaa.com对应的引用网址www.aaa1.com,表示www.aaa1.com被封装到二维码中后,添加到www.aaa.com对应的网页中(即www.aaa.com对应的网页中显示有该二维码)。
在另一种实施方式中,在检测到网址引用关系数据库中不包含待识别网址对应的引用网址时,服务器120获取待识别网址对应的第一网页内容。服务器120从第一网页内容包含的封装载体中提取引用网址,并将待识别网址和引用网址关联存储到网址引用关系数据库中,以方便后续调用。
步骤303,检测第一网页内容中是否包含恶意内容,并检测引用网址对应的第二网页内容中是否包含恶意内容。
服务器120获取到待识别网址及其对应的引用网址后,进一步检测待识别网址对应的第一网页内容以及引用网址对应的第二网页内容中是否包含恶意内容。
在一种实施方式中,服务器120首先检测第一网页内容中是否包含恶意内容。在检测到第一网页内容中包含恶意内容时,服务器120直接确定待识别网址为恶意网址,无需对第二网页内容进行进一步检测。在检测到第一网页内容中不包含恶意内容,服务器120可以进一步检测第二网页内容中是否包含恶意内容。步骤304,当检测到第一网页内容和第二网页内容中至少一个包含恶意内容时,确定待识别网址为恶意网址。这里,恶意网址可以被限制访问。
在一个实施例中,当检测到第一网页内容中不包含恶意内容,且第二网页 内容中包含恶意内容时,服务器120确定待识别网址为恶意网址(第一网页内容中包含导向恶意网址的封装载体)。当第一网页内容中不包含恶意内容,且第二网页内容中不包含恶意内容时,服务器120确定待识别网址为安全网址。
在一种实施方式中,不论该待识别网址是直接包含恶意内容,还是间接包含恶意内容,服务器120均限制终端访问该待识别网址,或,在终端访问该待识别网址时反馈相应恶意网址提示信息。
本申请实施例中,服务器120不仅能够识别出直接包含恶意内容的网址,还能够识别出间接包含恶意内容(即包含导向恶意网址的封装载体)的网址,从而使得恶意网址的识别覆盖率得到显著提高。
综上所述,本实施例提供的恶意网址识别方法,通过在获取待识别网址的同时,获取该待识别网址对应的引用网址,进而在检测到待识别网址和引用网址对应的网页内容中包含恶意内容时,确定该待识别网址为恶意网址。采用本申请实施例提供的恶意网址识别方法,即便不法分子将恶意网址封装后添加到正常网页中,服务器也可以根据网址间的引用关系识别出正常网页中包含的恶意网址,并限制用户访问该正常网页,从而提高了恶意网址的识别覆盖率,确保互联网访问的安全性。
在实际实施过程中,当服务器120维护的网址引用关系数据库中不包含待识别网址对应的引用网址时(比如服务器是首次获取到该待识别网址),服务器需要进一步对待识别网址对应的网页内容进行检测,从而确定待识别网址是否包含对应的引用网址(即确定待识别网址是否导向其他网址)。下面采用示意性的实施例进行说明。
请参考图4A,其示出了本申请另一个实施例提供的恶意网址识别方法的流程图,本实施例以该恶意网址识别方法用于图1A所示的服务器120为例进行说明,该方法包括:
步骤401,获取待识别网址。本步骤的实施方式与上述步骤301相似,本实施例在此不再赘述。
在一个实施例中,服务器中构建有网址数据库,该网址数据库中包含经过验证的恶意网址。
在一种实施方式中,该网址数据库中同时包含经过验证的安全网址和恶意网址。示意性的,该网址数据库中的数据存储结构如表一所示。
表一
编号 网址 网址类型
1 www.aaa.com 安全
2 www.bbb.com 安全
3 www.aaa1.com 恶意
4 www.bbb1.com 恶意
当获取到待识别网址后,服务器120首先检测网址数据库中是否包含该待识别网址。当检测到网址数据库中包含该待识别网址,且该待识别网址的网址类型为恶意时,服务器120直接确定待识别网址为恶意网址。当检测到网址数据库中不包含该待识别网址,或,网址数据库中包含该待识别网址,且该待识别网址的网址类型为安全时,服务器120执行下述步骤402。
步骤402,检测网址引用关系数据库中是否包含待识别网址对应的引用网址。
在一种实施方式中,网址引用关系数据库中各条引用记录被描述为(网址A,网址B),其中,网址B为网址A的引用网址,即网址B被封装到封装载体后,添加到的网址A的网页内容中。
示意性的,该网址引用关系数据库中的引用记录如表二所示。
表二
编号 引用记录
1 (www.aaa.com,www.aaa1.com)
2 (www.bbb.com,www.bbb1.com)
3 (www.ccc.com,www.ccc1.com)
在一个实施例中,服务器120以待识别网址为检索词,检测网址引用关系数据库中是否包含该待识别网址对应的引用网址。当检测到包含该待识别网址对应的引用网址(表示之前已经对该待识别网址进行过恶意网址识别),可以执行下述步骤403;当检测到不包含该待识别网址对应的引用网址(表示首次对该待识别网址进行恶意网址识别),则执行下述步骤404。
在一个实施例中,为了提高效率,引用记录中的网址均采用哈希值表示,相应的,服务器根据待识别网址的哈希值在网址引用关系数据库中进行检索。
步骤403,当检测到网址引用关系数据库中包含待识别网址对应的引用网 址,从网址引用关系数据库中获取待识别网址对应的引用网址。
比如,当获取到的待识别网址为www.aaa.com时,服务器120从表二所示的引用记录中获取到引用网址www.aaa1.com。
步骤404,当检测到网址引用关系数据库中不包含待识别网址对应的引用网址,获取待识别网址对应的第一网页内容。
当检测到网址引用关系数据库中不包含待识别网址对应的引用网址时,为了确定待识别网址是否能够导向其他网址,服务器需要进一步获取该待识别网址对应的第一网页内容,并基于第一网页内容确定待识别网址是否导向其他网址。
在一种实施方式中,服务器120可以模拟浏览器访问该待识别网址,从而获取待识别网址对应的第一网页内容。
步骤405,从第一网页内容包含的封装载体中提取引用网址。
进一步的,服务器120检测第一网页内容中是否包含封装载体,并在检测到第一网页内容包含封装载体时,从该封装载体提取引用网址。
在一种实施方式中,当引用网址被封装成二维码或条形码添加到第一网页内容时,如图4B所示,本步骤包括如下步骤。
步骤405A,对第一网页内容进行截图操作,得到网页截图。
服务器获取到第一网页内容后,对第一网页内容进行截图操作,得到至少一张网页截图,该网页截图中即包含文字内容以及图片内容。
示意性的,如图4C所示,服务器对www.aaa.com对应的第一网页内容进行截图操作,得到网页截图41。
步骤405B,通过预定图像识别技术识别网页截图中包含的封装载体,预定图像识别技术包括二维码识别技术和条形码识别技术中的至少一种。
在一种实施方式中,由于不法分子通常将恶意网址封装成不易直接识别的二维码或条形码,因此,得到网页截图后,服务器通过二维码识别技术或条形码识别技术,识别网页截图中包含的封装载体。
示意性的,如图4B所示,服务器通过二维码识别技术,识别出网页截图41中包含的二维码42。
需要说明的是,本申请实施例仅以二维码识别技术和条形码识别技术进行示意性说明,在其他实施方式中,当引用网址被封装成其他图形码时,服务器 还可以采用其他图形码识别技术进行识别,本实施例并不对此进行限定。
步骤405C,根据识别结果确定封装载体中封装的引用网址。
示意性的,如图4C所示,服务器120通过二维码识别技术识别二维码42后,提取到引用网址www.aaa1.com。
需要说明的是,当通过上述步骤405A至405C无法从第一网页内容中提取到引用网址时,服务器120确定待识别网址不包含对应的引用网址,从而将对待识别网址的识别结果确定为最终识别结果。
步骤406,将待识别网址和引用网址关联存储到网址引用关系数据库中。
为了使后续再次获取到该待识别网址时,服务器120能够直接从网址引用关系数据库中获取到对应的引用网址,服务器120对待识别网址和引用网址进行关联存储。
步骤407,检测第一网页内容中是否包含恶意内容。
通过上述步骤401至406,服务器获取到待识别网址及其对应的引用网址;进一步的,服务器120检测待识别网址和引用网址各自对应的网页内容中是否包含恶意内容。
针对检测第一网页内容中是否包含恶意内容的过程,在一种实施方式中,当网址数据库中不包含该待识别网址时,服务器检测第一网页内容是否包含预设关键词,并在检测到第一网页内容中包含预设关键词时,确定第一网页内容中包含恶意内容。在检测到第一网页内容中不包含预设关键词时,服务器120确定第一网页内容中不包含恶意内容。其中,该预设关键词具有非法属性。
在另一种实施方式中,当网址数据库包含该待识别网址,且待识别网址对应的网址类型为安全时,服务器120直接确定第一网页内容中不包含恶意内容。
步骤408,检测网址数据库中是否包含引用网址。
在一种实施方式中,网址数据库中存储有经过验证的恶意网址(网址数据库所占存储空间较小),当检测到网址数据库包含该引用网址时,服务器120执行下述步骤409;当检测到网址数据库中不包含该引用网址时,服务器120执行下述步骤410至412。
在另一种实施方式中,网址数据库中同时存储有经过验证的恶意网址和安全网址(网址数据库所占存储空间较小),当检测网址数据库中包含该引用网址,且该引用网址为安全网址时,服务器120确定引用网址中不包含恶意内容。当 检测网址数据库中包含该引用网址,且该引用网址为恶意网址时,服务器120确定引用网址中包含恶意内容。当网址数据库中不包含该引用网址时,服务器120执行下述步骤410至412。
步骤409,当检测到网址数据库中包含该引用网址,则确定引用网址对应的第二网页内容中包含恶意内容。
步骤410,当检测到网址数据库中不包含该引用网址,则获取引用网址对应的第二网页内容,第二网页内容通过模拟浏览器访问引用网址得到。
当网址数据库中不包含该引用网址时,服务器120确定该引用网址的安全性未经过验证,并模拟浏览器访问该引用网址,得到引用网址对应的第二网页内容。
步骤411,检测第二网页内容中是否包含预设关键词。在一个实施例中,与上述步骤407中检测第一网页内容中是否包含预设关键词相似的,服务器120检测获取到的第二网页内容中是否包含预设关键词。
当检测到第二网页内容中包含预设关键词时,服务器120确定第二网页内容中包含恶意内容。当检测到第二网页内容中不包含预设关键词时,服务器120确定第二网页内容中不包含恶意内容。
步骤412,当检测到第二网页内容中包含预设关键词,则确定第二网页内容中包含恶意内容,并将引用地址添加到网址数据库中。
为了在后续再次获取该引用网址时,服务器120能够直接识别出该引用网址对应的网页内容中包含恶意内容,当检测到第二网页内容中包含预设关键词时,服务器120将引用网址添加到网址数据库中,方便后续调用。
需要说明的是,上述步骤409与步骤410至412之间并不存在严格的先后关系,即步骤409与步骤410至412可以同时执行,本实施例仅以步骤409在步骤410至412之前执行为例进行示意性说明。
步骤413,当检测到第一网页内容和引用网址对应的第二网页内容中至少一个包含恶意内容时,确定待识别网址为恶意网址。
本步骤的实施方式与上述步骤304相似,本实施例在此不再赘述。
步骤414,向智能设备返回识别结果,智能设备用于根据识别结果执行预定操作。
在一个实施例中,当识别结果指示待识别网址为恶意网址时,当该待识别 网址是由服务器从网页访问请求中提取时,服务器120向(智能设备中安装的)浏览器应用程序返回恶意网址提醒信息,并限制访问该待识别网址。
相应的,浏览器应用程序对接收到恶意网址提醒信息进行显示。
示意性的,如图4D所示,当用户在浏览器应用程序中输入网址“www.aaa.com”并点击访问按键44时,服务器120对该网址进行识别,并在识别出该网址为恶意网址时(由于www.aaa.com对应网页内容中包含恶意网址www.aaa1.com的二维码),向浏览器应用程序返回恶意网址提醒信息45。这样,浏览器应用程序对该恶意网址提醒信息45进行显示,并限制用户继续访问该网址。
在一个实施例中,当识别结果指示待识别网址为恶意网址时,当该待识别网址是由服务器从即时通讯消息中提取时,服务器120向消息接收方发送该即时通讯消息的同时,发送恶意网址标识指令。
相应的,即时通讯应用程序根据恶意网址标识指令,标记该待识别网址为恶意网址。
示意性的,如图4E所示,当服务器从“小A”发送的即时通讯消息中提取到网址www.aaa.com,并检测到www.aaa.com对应网页内容中包含恶意网址www.aaa1.com的二维码时,向该即时通讯消息的接收方即时通讯应用程序发送恶意网址标识指令。接收方即时通讯应用程序接收到包含该网址的即时通讯消息时,在该网址的周侧显示恶意网址标记46,并禁止智能设备调用其他不具备恶意网址识别功能的应用程序(比如浏览器)访问该网址。
在一个实施例中,当识别结果指示待识别网址为恶意网址时,当该待识别网址是由服务器从社交信息中提取时,服务器120为该社交信息设置恶意网址标识,指示用户该社交信息中包含恶意网址。
比如,服务器120检测到某条评论信息中包含网址www.aaa.com,且www.aaa.com对应网页内容中包含恶意网址www.aaa1.com的二维码时,服务器120为此条评论信息设置恶意网址标识,或,删除此条评论信息。
由于网址的网址类型以及网址间的引用关系可能会发生变更,因此,为了确保网址数据库以及网址引用关系数据库中数据的准确性,在一种实施方式中,当满足预定更新条件时,服务器120更新网址引用关系数据库中存储的引用关系和更新网址数据库中存储的网址中至少一个。其中,预定更新条件包括接收 到更新指令或达到预设时间间隔中的至少一种。
比如,服务器每隔72小时检测网址数据库中各个网址对应的网址类型是否准确,并检测网址引用关系数据库中的引用关系是否成立。又比如,服务器在接收到用户通过应用程序上报的恶意网址举报信息时,对网址数据库中相应网址的网址类型进行更新。
在本实施例中,服务器通过更新机制对网址数据库以及网址引用关系数据库中数据进行更新,确保数据库中数据的时效性以及准确性,进一步提高恶意网址识别的准确性。
下述为本申请装置实施例,对于装置实施例中未详尽描述的细节,可以参考上述一一对应的方法实施例。
请参考图5,其示出了本申请一个实施例提供的恶意网址识别装置的结构方框图。该恶意网址识别装置可以驻留在计算设备中。计算设备例如可以是图1A中服务器120或者图1B中智能终端130。该装置包括:第一获取模块510、第二获取模块520、检测模块530和确定模块540。
第一获取模块510,用于实现上述步骤301或401的功能;
第二获取模块520,用于实现上述步骤302的功能;
检测模块530,用于实现上述步骤303的功能;
确定模块540,用于实现上述步骤304或413的功能。
可选的,第二获取模块520包括:第一获取单元和第二获取单元;
第一获取单元,用于实现上述步骤404和405的功能;
第二获取单元,用于实现上述步骤403的功能。
可选的,第一获取单元,还用于实现上述步骤405A至405C的功能。
可选的,检测模块530,包括:第一检测单元和第一确定单元;
第一检测单元,用于实现上述步骤408的功能;
第一确定单元,用于实现上述步骤409的功能。
可选的,检测模块530还包括:第三获取单元、第二检测单元和第二确定单元;
第三获取单元,用于实现上述步骤410的功能;
第二检测单元,用于实现上述步骤411的功能;
第二确定单元,用于实现上述步骤412的功能。
可选的,该装置还包括:更新模块;
更新模块,用于当满足预定更新条件时,更新网址引用关系数据库中存储的引用关系和网址数据库中存储的网址中至少一个,预定更新条件包括接收到更新指令或达到预设时间间隔中的至少一种。
请参考图6,其示出了本申请一个实施例提供的计算设备的结构方框图。
计算设备600包括中央处理单元(CPU)601、包括随机存取存储器(RAM)602和只读存储器(ROM)603的系统存储器604,以及连接系统存储器604和中央处理单元601的系统总线605。所述计算设备600还包括帮助计算机内的各个器件之间传输信息的基本输入/输出系统(I/O系统)606,和用于存储操作系统613、应用程序614和其他程序模块615的大容量存储设备607。这里,应用程序614例如可以包括恶意网址识别装置。这里,恶意网址识别装置可以实施本申请中恶意网址识别方法,从而提高对恶意网址的识别覆盖率。
所述基本输入/输出系统606包括有用于显示信息的显示器608和用于用户输入信息的诸如鼠标、键盘之类的输入设备609。其中所述显示器608和输入设备609都通过连接到系统总线605的输入输出控制器610连接到中央处理单元601。所述基本输入/输出系统606还可以包括输入输出控制器610以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器610还提供输出到显示屏、打印机或其他类型的输出设备。
所述大容量存储设备607通过连接到系统总线605的大容量存储控制器(未示出)连接到中央处理单元601。所述大容量存储设备607及其相关联的计算机可读介质为计算设备600提供非易失性存储。也就是说,所述大容量存储设备607可以包括诸如硬盘或者CD-ROM驱动器之类的计算机可读介质(未示出)。
不失一般性,所述计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、EPROM、EEPROM、闪存或其他固态存储其技术,CD-ROM、DVD或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知所述计算机存储介质不局限于上述几种。上述的系统存储器604和大容量存储设备607可以统称为存储器。
根据本申请的各种实施例,所述计算设备600还可以通过诸如因特网等网 络连接到网络上的远程计算机运行。也即计算设备600可以通过连接在所述系统总线605上的网络接口单元611连接到网络612,或者说,也可以使用网络接口单元611来连接到其他类型的网络或远程计算机系统(未示出)。
所述存储器还包括一个或者一个以上的程序,所述一个或者一个以上程序存储于存储器中,所述一个或者一个以上程序包含用于进行本申请实施例提供的恶意网址识别方法的指令。
本领域普通技术人员可以理解上述实施例的恶意网址识别方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)、磁盘或光盘等。
图7示出了根据本申请一些实施例的恶意网址识别方法700的流程图。方法700例如可以在计算设备600中执行,但不限于此。
如图7所示,在步骤S701中,获取待识别网址。步骤S701的实施方式与步骤301一致,这里不再赘述。
在步骤S702中,在检测到待识别网址对应的第一网页内容未包含恶意内容时,获取与所述待识别网址对应的引用网址。其中,所述引用网址处于待识别网址对应的第一网页内容中。例如,引用网址为第一网页内容中链接地址。又例如,引用网址被封装到封装载体后被添加到待识别网址对应的第一网页内容中。
在一个实施例中,步骤S702可以获取所述待识别网址对应的所述第一网页内容。所述第一网页内容通过模拟浏览器访问所述待识别网址得到。步骤S702还可以从第一网页内容包含的封装载体中提取引用网址。例如,步骤S702可以对第一网页内容进行截图操作,得到网页截图。步骤S702还可以通过图像识别技术识别网页截图中包含的封装载体。在此基础上,步骤S702可以根据识别结果确定封装载体中封装的引用网址。
在另一个实施例中,步骤S702可以从网址引用关系数据库中获取待识别网址对应的引用网址。网址引用关系数据库中存储有网址之间的引用关系。
在步骤S703中,检测所述引用网址对应的第二网页内容是否包含恶意内容。
在一个实施例中,步骤S703可以检测网址数据库中是否包含引用网址。网址数据库中存储有经过验证的恶意网址。在确定引用网址属于网址数据库中恶 意网址时,步骤S703可以确定第二网页内容中包含恶意内容。
在步骤S703检测到所述引用网址对应的第二网页内容包含恶意内容时,方法700可以执行步骤S704,确定待识别网址为恶意网址。需要说明的是,方法700更具体的实施方式请参见图4A所示的方法,这里不再赘述。
综上,方法700可以在待识别网址对应的第一网页内容未包含恶意内容时,获取第一网页内容中引用网址,并根据引用网址对应的第二网页内容是否包含恶意内容来确定待识别网址是否为恶意网址,从而极大提高对恶意网址的识别覆盖率。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (25)

  1. 一种恶意网址识别方法,应用于计算设备,所述方法包括:
    获取待识别网址;
    获取与所述待识别网址对应的引用网址,所述引用网址处于所述待识别网址对应的第一网页内容中;
    检测所述第一网页内容是否包含恶意内容,并检测所述引用网址对应的第二网页内容是否包含恶意内容;
    当检测到所述第一网页内容和所述第二网页内容中至少一个包含恶意内容时,则确定所述待识别网址为恶意网址。
  2. 根据权利要求1所述的方法,其中,所述获取与所述待识别网址对应的引用网址,包括:
    获取所述待识别网址对应的所述第一网页内容,所述第一网页内容通过模拟浏览器访问所述待识别网址得到;
    从所述第一网页内容包含的封装载体中提取所述引用网址。
  3. 根据权利要求1所述的方法,其中,所述获取所述待识别网址对应的所述引用网址,包括:
    从网址引用关系数据库中获取所述待识别网址对应的所述引用网址,所述网址引用关系数据库中存储有网址之间的引用关系。
  4. 根据权利要求2所述的方法,其中,所述封装载体为二维码或条形码;
    所述从所述第一网页内容包含的所述封装载体中提取所述引用网址,包括:
    对所述第一网页内容进行截图操作,得到网页截图;
    通过预定图像识别技术识别所述网页截图中包含的所述封装载体,所述预定图像识别技术包括二维码识别技术和条形码识别技术中的至少一种;
    根据识别结果确定所述封装载体中封装的所述引用网址。
  5. 根据权利要求1所述的方法,其中,所述检测所述引用网址对应的第二网页内容中是否包含恶意内容,包括:
    检测网址数据库中是否包含所述引用网址,所述网址数据库中存储有经过验证的恶意网址;
    当检测到所述网址数据库中包含所述引用网址时,确定所述引用网址对应的所述第二网页内容中包含恶意内容。
  6. 根据权利要求5所述的方法,其中,所述检测所述引用网址对应的第二网页内容中是否包含恶意内容,还包括:
    当检测到所述网址数据库中不包含所述引用网址时,获取所述引用网址对应的所述第二网页内容,所述第二网页内容通过模拟浏览器访问所述引用网址得到;
    检测所述第二网页内容中是否包含预设关键词;
    当检测到所述第二网页内容中包含所述预设关键词时,确定所述第二网页内容包含恶意内容,并将所述引用地址添加到所述网址数据库中。
  7. 根据权利要求5所述的方法,还包括:
    当满足预定更新条件时,更新所述网址引用关系数据库中存储的引用关系,和更新所述网址数据库中存储的网址,所述预定更新条件包括接收到更新指令和达到预设时间间隔中的至少一种。
  8. 一种恶意网址识别方法,应用于计算设备,所述方法包括:
    获取待识别网址;
    在检测到所述待识别网址对应的第一网页内容未包含恶意内容时,获取与所述待识别网址对应的引用网址,所述引用网址处于所述待识别网址对应的第一网页内容中;
    检测所述引用网址对应的第二网页内容是否包含恶意内容;
    在检测到所述引用网址对应的第二网页内容包含恶意内容时,确定所述待识别网址为恶意网址。
  9. 如权利要求8所述的方法,其中,所述获取与所述待识别网址对应的引用网址,包括:
    获取所述待识别网址对应的所述第一网页内容,所述第一网页内容通过模拟浏览器访问所述待识别网址得到;
    从所述第一网页内容包含的封装载体中提取所述引用网址。
  10. 如权利要求8所述的方法,其中,所述获取与所述待识别网址对应的所述引用网址,包括:
    从网址引用关系数据库中获取所述待识别网址对应的所述引用网址,所述网址引用关系数据库中存储有网址之间的引用关系。
  11. 如权利要求9所述的方法,其中,所述从所述第一网页内容包含的所 述封装载体中提取所述引用网址,包括:
    对所述第一网页内容进行截图操作,得到网页截图;
    通过图像识别技术识别所述网页截图中包含的所述封装载体;
    根据识别结果确定所述封装载体中封装的所述引用网址。
  12. 如权利要求8所述的方法,其中,所述检测所述引用网址对应的第二网页内容是否包含恶意内容,包括:
    检测网址数据库中是否包含所述引用网址,所述网址数据库中存储有经过验证的恶意网址;
    在确定所述引用网址处于所述网址数据库中恶意网址时,确定所述第二网页内容中包含恶意内容。
  13. 一种计算设备,包括:处理器和存储器;所述存储器中存储有计算机可读指令,可以使所述处理器:
    获取待识别网址;
    获取与所述待识别网址对应的引用网址,所述引用网址处于所述待识别网址对应的第一网页内容中;
    检测所述第一网页内容是否包含恶意内容,并检测所述引用网址对应的第二网页内容是否包含恶意内容;
    当检测到所述第一网页内容和所述第二网页内容中至少一个包含恶意内容时,则确定所述待识别网址为恶意网址。
  14. 根据权利要求13所述的计算设备,其中,所述处理器进一步执行所述计算机可读指令,用于:
    获取所述待识别网址对应的所述第一网页内容,所述第一网页内容通过模拟浏览器访问所述待识别网址得到;
    从所述第一网页内容包含的所述封装载体中提取所述引用网址。
  15. 根据权利要求13所述的计算设备,其中,所述处理器进一步执行所述计算机可读指令,用于:
    从网址引用关系数据库中获取所述待识别网址对应的所述引用网址,所述网址引用关系数据库中存储有网址之间的引用关系。
  16. 根据权利要求14所述的计算设备,其中,所述封装载体为二维码或条形码;所述处理器进一步执行所述计算机可读指令,用于:
    对所述第一网页内容进行截图操作,得到网页截图;
    通过预定图像识别技术识别所述网页截图中包含的所述封装载体,所述预定图像识别技术包括二维码识别技术和条形码识别技术中的至少一种;
    根据识别结果确定所述封装载体中封装的所述引用网址。
  17. 根据权利要求13所述的计算设备,其中,所述处理器进一步执行所述计算机可读指令,用于:
    检测网址数据库中是否包含所述引用网址,所述网址数据库中存储有经过验证的恶意网址;
    当检测到所述网址数据库中包含所述引用网址时,确定所述引用网址对应的所述第二网页内容中包含恶意内容。
  18. 根据权利要求17所述的计算设备,其中,所述处理器进一步执行所述计算机可读指令,用于:
    当检测到所述网址数据库中不包含所述引用网址时,获取所述引用网址对应的所述第二网页内容,所述第二网页内容通过模拟浏览器访问所述引用网址得到;
    检测所述第二网页内容中是否包含预设关键词;
    当检测到所述第二网页内容中包含所述预设关键词时,确定所述第二网页内容包含恶意内容,并将所述引用地址添加到所述网址数据库中。
  19. 根据权利要求17所述的计算设备,其中,所述处理器进一步执行所述计算机可读指令,用于:
    当满足预定更新条件时,更新所述网址引用关系数据库中存储的引用关系,和更新所述网址数据库中存储的网址,所述预定更新条件包括接收到更新指令和达到预设时间间隔中的至少一种。
  20. 一种计算设备,包括:处理器和存储器;所述存储器中存储有计算机可读指令,可以使所述处理器:
    获取待识别网址;
    在检测到所述待识别网址对应的第一网页内容未包含恶意内容时,获取与所述待识别网址对应的引用网址,所述引用网址处于所述待识别网址对应的第一网页内容中;
    检测所述引用网址对应的第二网页内容是否包含恶意内容;
    在检测到所述引用网址对应的第二网页内容包含恶意内容时,确定所述待识别网址为恶意网址。
  21. 如权利要求20所述的计算设备,其中,所述处理器进一步执行所述计算机可读指令,用于:
    获取所述待识别网址对应的所述第一网页内容,所述第一网页内容通过模拟浏览器访问所述待识别网址得到;
    从所述第一网页内容包含的所述封装载体中提取所述引用网址。
  22. 如权利要求20所述的计算设备,其中,所述处理器进一步执行所述计算机可读指令,用于:
    从网址引用关系数据库中获取所述待识别网址对应的所述引用网址,所述网址引用关系数据库中存储有网址之间的引用关系。
  23. 如权利要求21所述的计算设备,其中,所述处理器进一步执行所述计算机可读指令,用于:
    对所述第一网页内容进行截图操作,得到网页截图;
    通过图像识别技术识别所述网页截图中包含的所述封装载体;
    根据识别结果确定所述封装载体中封装的所述引用网址。
  24. 如权利要求20所述的计算设备,其中,所述处理器进一步执行所述计算机可读指令,用于:
    检测网址数据库中是否包含所述引用网址,所述网址数据库中存储有经过验证的恶意网址;
    在确定所述引用网址处于所述网址数据库中恶意网址时,确定所述第二网页内容中包含恶意内容。
  25. 一种非易失性存储介质,存储有一个或多个程序,所述一个或多个程序包括指令,所述指令当由计算设备执行时,使得所述计算设备执行权利要求1-12中任一项所述方法的指令。
PCT/CN2018/079548 2017-03-21 2018-03-20 恶意网址识别方法、计算设备及存储介质 WO2018171572A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710171054.8 2017-03-21
CN201710171054.8A CN106992975B (zh) 2017-03-21 2017-03-21 恶意网址识别方法及装置

Publications (1)

Publication Number Publication Date
WO2018171572A1 true WO2018171572A1 (zh) 2018-09-27

Family

ID=59411702

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/079548 WO2018171572A1 (zh) 2017-03-21 2018-03-20 恶意网址识别方法、计算设备及存储介质

Country Status (2)

Country Link
CN (1) CN106992975B (zh)
WO (1) WO2018171572A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106992975B (zh) * 2017-03-21 2021-01-12 腾讯科技(深圳)有限公司 恶意网址识别方法及装置
CN111274507B (zh) * 2020-01-21 2023-03-10 腾讯科技(深圳)有限公司 网页内容的浏览方法、装置、设备及存储介质
CN112702331A (zh) * 2020-12-21 2021-04-23 赛尔网络有限公司 基于敏感词的恶意链接识别方法、装置、电子设备及介质
CN113630414A (zh) * 2021-08-09 2021-11-09 中国电信股份有限公司 标识码验证方法、系统、网关设备和存储介质
CN114553486B (zh) * 2022-01-20 2023-07-21 北京百度网讯科技有限公司 非法数据的处理方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810425A (zh) * 2012-11-13 2014-05-21 腾讯科技(深圳)有限公司 恶意网址的检测方法及装置
WO2014099615A1 (en) * 2012-12-20 2014-06-26 Mcafee Inc. Just-in-time, email embedded url reputation determination
CN104615695A (zh) * 2015-01-23 2015-05-13 腾讯科技(深圳)有限公司 一种恶意网址的检测方法及系统
CN105260370A (zh) * 2014-07-17 2016-01-20 中兴通讯股份有限公司 一种二维码信息获取方法、装置及终端
CN106992975A (zh) * 2017-03-21 2017-07-28 腾讯科技(深圳)有限公司 恶意网址识别方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103023905B (zh) * 2012-12-20 2015-12-02 北京奇虎科技有限公司 一种用于检测恶意链接的设备、方法及系统
CN103036896B (zh) * 2012-12-20 2015-07-01 北京奇虎科技有限公司 用于检测恶意链接的方法及系统
CN104679798B (zh) * 2013-12-03 2018-04-27 腾讯科技(深圳)有限公司 网页检测方法及装置
CN105391674B (zh) * 2014-09-04 2020-10-16 腾讯科技(深圳)有限公司 一种信息处理方法及系统、服务器、客户端

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810425A (zh) * 2012-11-13 2014-05-21 腾讯科技(深圳)有限公司 恶意网址的检测方法及装置
WO2014099615A1 (en) * 2012-12-20 2014-06-26 Mcafee Inc. Just-in-time, email embedded url reputation determination
CN105260370A (zh) * 2014-07-17 2016-01-20 中兴通讯股份有限公司 一种二维码信息获取方法、装置及终端
CN104615695A (zh) * 2015-01-23 2015-05-13 腾讯科技(深圳)有限公司 一种恶意网址的检测方法及系统
CN106992975A (zh) * 2017-03-21 2017-07-28 腾讯科技(深圳)有限公司 恶意网址识别方法及装置

Also Published As

Publication number Publication date
CN106992975B (zh) 2021-01-12
CN106992975A (zh) 2017-07-28

Similar Documents

Publication Publication Date Title
WO2018171572A1 (zh) 恶意网址识别方法、计算设备及存储介质
US10536475B1 (en) Threat assessment based on coordinated monitoring of local communication clients
US10505981B2 (en) Techniques for detecting malicious behavior using an accomplice model
US9742721B2 (en) Method, system, server and client device for message synchronizing
US20210365445A1 (en) Technologies for collecting, managing, and providing contact tracing information for infectious disease response and mitigation
US10642904B2 (en) Infrastructure enabling intelligent execution and crawling of a web application
US10515212B1 (en) Tracking sensitive data in a distributed computing environment
KR101748196B1 (ko) 표시할 메시지 데이터 결정
US20170041259A1 (en) Instant message processing method, apparatus, and system
CN110537180B (zh) 用于直接浏览器内标记因特网内容中的元素的系统和方法
US10521818B2 (en) Extending audience reach in messaging campaigns using super profiles
CN111414407A (zh) 数据库的数据查询方法、装置、计算机设备及存储介质
US20150067472A1 (en) Web browser fingerprinting
US8407766B1 (en) Method and apparatus for monitoring sensitive data on a computer network
WO2015139539A1 (zh) 一种视频信息推送方法及装置
CN108090351A (zh) 用于处理请求消息的方法和装置
WO2021042508A1 (zh) 网页生成方法、装置、计算机设备和存储介质
US20130179421A1 (en) System and Method for Collecting URL Information Using Retrieval Service of Social Network Service
CN113536185B (zh) 应用页面的加载方法、存储介质、及其相关设备
US9665574B1 (en) Automatically scraping and adding contact information
JP2014534542A (ja) ユーザ作成コンテンツの処理方法及び装置
US10176153B1 (en) Generating custom markup content to deter robots
CN111797297B (zh) 页面数据处理方法、装置、计算机设备及存储介质
CN105550183A (zh) 一种网页中识别信息的标识方法及电子设备
US10726069B2 (en) Classification of log entry types

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18772520

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18772520

Country of ref document: EP

Kind code of ref document: A1