WO2019127658A1

WO2019127658A1 - Method and system for identifying malicious image on the basis of url paths of similar images

Info

Publication number: WO2019127658A1
Application number: PCT/CN2018/072242
Authority: WO
Inventors: 蔡昭权; 胡松; 胡辉; 蔡映雪; 陈伽; 黄翰; 梁椅辉; 罗伟; 黄思博
Original assignee: 惠州学院
Priority date: 2017-12-30
Filing date: 2018-01-11
Publication date: 2019-07-04
Also published as: CN110020258A

Abstract

A method and system for identifying a malicious image on the basis of URL paths of similar images. The method comprises: when a page element of a webpage is determined to be comprising a URL path of an image, acquiring a user IP and/or ID recorded in the page content of the webpage, acquiring, on the basis of the URL path of the image, a domain name that the URL comprises or an IP address to which the URL points, and outputting a first weight factor and a second weight factor on the basis of a query related to the user ID, IP, and the domain name; also, searching in a third-party image database for all similar images of the images and acquiring URL paths of the similar images, querying domains/IPs that the URL of all of the similar images comprise and outputting a third weight factor; and, with the first weight factor, the second weight factor, and the third weight factor combined, identifying whether the image is a malicious image.

Description

Method and system for identifying harmful pictures based on URL path of approximate graph

Technical field

The present disclosure pertains to the field of information security, for example, to a method of identifying a harmful picture and a system therefor.

Background technique

In the information society, information flows are everywhere, including but not limited to text, video, audio, pictures, and so on. Among them, compared with video, image files include certain visual information and relatively low storage space and bandwidth requirements. With the popularity of mobile Internet, the network is full of harmful image content, such as illegal drugs, pornography, violence, etc. Pictures of content, or harmful images that are induced to join cults, suicide groups, criminal groups, etc., are more harmful than harmful texts and harmful audio due to their visual intuition and impact, so these harmful pictures are identified. It is necessary to filter, delete, and eliminate hazards.

For the identification of harmful pictures on the network, the current technology can be divided into two major categories, one is the traditional method, mainly through various classifiers. The other is the method of deep learning, especially the application of convolutional neural networks. However, the above two methods have deficiencies in recognition efficiency.

In the case of the development of big data and artificial intelligence, how to effectively identify harmful pictures becomes a problem to be considered.

Summary of the invention

The present disclosure provides a method for identifying a harmful picture based on an approximate map URL path, including:

Step a), when it is determined that the page element of the webpage includes a URL path of the image, identifying an IP address or an IP address segment of the user recorded in the page content of the webpage, and/or identifying the content recorded in the page content of the webpage User ID, and querying in the first database whether the IP address or the same network segment IP address exists, and/or querying whether the ID exists in the first database, and querying the result and/or ID according to the user's IP address. The query result outputs a first weighting factor;

Step b): obtaining a domain name included in the URL and/or an IP address pointed to by the URL according to a URL path of the picture, performing a whois query in the second database based on the domain name included in the URL, and/or based on The IP address pointed to by the URL, in the second database, whether the IP address included in the URL or the IP address of the same network segment exists, and the second weighting factor is output according to the query result of the whois query result and/or the IP address. ;

Step c): input the URL path of the picture into a third-party picture database, search all approximate pictures of the picture in a third-party picture database, obtain URL paths of all approximate pictures, and obtain all the URL paths based on all approximate pictures. The domain name contained in the URL of the approximation map and/or the IP address pointed to by the URL of the approximation map; and, based on the domain name contained in the URLs of all approximation maps, the whois query is performed in the second database, and/or based on all approximation maps The IP address pointed to by the URL, in the second database, whether the IP address included in the URL or the IP address of the same network segment exists, and the third weighting factor is output according to the query result of the whois query and/or the IP address;

Step d), integrating the first weighting factor and the second weighting factor and the third weighting factor to identify whether the picture belongs to a harmful picture.

In addition, the present disclosure also discloses a system for identifying harmful pictures based on an approximate map URL path, including:

a first weighting factor generating module, configured to: identify, when the page element of the webpage includes a URL path of the webpage, identify an IP address or an IP address segment of the user recorded in the page content of the webpage, and/or identify the webpage The user ID recorded in the page content, and in the first database, whether the IP address or the same network segment IP address exists, and/or whether the ID exists in the first database, and according to the user's IP address The query result and/or the ID query result output a first weighting factor;

a second weighting factor generating module, configured to: obtain a domain name included in the URL and/or an IP address pointed to by the URL according to a URL path of the image, and perform whois in the second database based on the domain name included in the URL Querying, and/or querying, according to the IP address pointed by the URL, whether the IP address or the same network segment IP address included in the URL exists in the second database, and the query result according to the whois query result and/or the IP address , outputting a second weighting factor;

a third weighting factor generating module, configured to: input a URL path of the image into a third-party image database, search all approximate images of the image in a third-party image database, and obtain URL paths of all approximate images, and based on all approximations The URL path of the figure obtains the domain name contained in the URL of all approximate maps and/or the IP address pointed to by the URL of the approximate graph; and,

Based on the domain name contained in the URL of all approximation maps, the whois query is performed in the second database, and/or the IP address included in the URL is queried in the second database based on the IP address pointed to by the URLs of all approximate maps. Or the IP address of the same network segment, and output a third weighting factor according to the query result of the whois query and/or the IP address;

And an identifying module, configured to integrate the first weighting factor and the second weighting factor and the third weighting factor to identify whether the picture belongs to a harmful picture.

Through the method and its system, the present disclosure can combine a database created by big data, and can provide a scheme for identifying harmful pictures more efficiently without much image processing.

DRAWINGS

Figure 1 is a schematic illustration of the method of one embodiment of the present disclosure;

2 is a schematic diagram of a system in accordance with an embodiment of the present disclosure.

Detailed ways

In order to make those skilled in the art understand the technical solutions disclosed in the present disclosure, the technical solutions of the various embodiments will be described below in conjunction with the embodiments and related drawings, which are a part of the embodiments of the present disclosure, instead of All embodiments. The terms "first", "second", etc., as used in this disclosure, are used to distinguish different objects, and are not intended to describe a particular order. Moreover, "including" and "having" and any variations thereof are intended to be inclusive and not exclusive. For example, a process, or method, or system, or product or device that comprises a series of steps or units is not limited to the listed steps or units, but optionally includes steps or units not listed, or optional Also includes other steps or units inherent to these processes, methods, systems, products, or devices.

References to "an embodiment" herein mean that a particular feature, structure, or characteristic described in connection with the embodiments can be included in at least one embodiment of the present disclosure. The appearances of the phrases in various places in the specification are not necessarily referring to the same embodiments, and are not exclusive or alternative embodiments that are mutually exclusive. Those skilled in the art will appreciate that the embodiments described herein can be combined with other embodiments.

Referring to FIG. 1 , FIG. 1 is a schematic flowchart diagram of a method for identifying a harmful picture based on an approximate path of a URL according to an embodiment of the present disclosure. As shown, the method includes:

Step S100: When it is determined that the page element of the webpage includes the URL path of the image, identify the IP address or IP address segment of the user recorded in the page content of the webpage, and/or identify the user recorded in the page content of the webpage. ID, and in the first database, query whether the IP address or the same network segment IP address exists, and/or query whether the ID exists in the first database, and query the result and/or ID according to the user's IP address. The result outputs a first weighting factor;

It can be understood that the first database maintains a known IP address or IP address segment of a user who has posted a harmful picture, and a list of user IDs that have posted harmful pictures.

This is because harmful images generally form sticky users. Some of these users will participate in the transmission of harmful pictures and most of the IP addresses and IDs are relatively fixed. Even a considerable number of users have the same ID on different websites or forums. ID.

For example, when it is recognized that the IP address of the user recorded in the content of the web page is 192.168.10.3:

If the IP address is recorded in the first database, the first weighting factor may be exemplarily 1.0;

If the IP address recorded in the database is only 192.168.10.4, then 192.168.10.3 is moderately suspected as the alternate address of the user who has posted the harmful picture or the newly replaced address, and the first weighting factor can be exemplified as 0.6;

If the IP addresses recorded in the database are 192.168.10.4 and 192.168.10.5, and even all the IP addresses of the 192.168.10.X network segment are recorded, then 192.168.10.3 is highly suspected as the alternate address of the user who has posted harmful pictures or The newly changed address, the first weighting factor can be exemplified as 0.9;

If the IP address recorded in the database includes multiple 192.168.XX network segments and no 192.168.10.X network segment, then 192.168.10.3 is cautiously suspected as the address of the user who has posted harmful pictures. The first weighting factor can be An example is 0.4.

For another example, in the case where the identified user ID is called "tudou":

If the user ID named "tudou" is recorded in the first database, the first weighting factor may be exemplarily 1.0;

If the ID recorded in the database has "tudou1", "tudou2", "tudou*", or an approximate ID, then "tudou" is slightly suspected as the alternate ID of the same user, and the first weighting factor can be exemplified as 0.3. ;

If the ID in the database does not have "tudou" or an approximate ID, then the first weighting factor can be exemplarily 0;

In particular, the above steps also have a comprehensive query of the user IP and ID, that is, by examining whether to publish or discuss the user IP and ID of the picture, whether it belongs to the IP (or IP address segment) already existing in the first database and / or ID.

Suppose the user's IP query factor is u, the ID query factor is v, and the first weighting factor is x, where 0≤u≤1, 0≤v≤1, 0≤x≤1, and the first weighting factor can be determined according to the following formula. :

x=d×u+e×v, where d+e=1, d, e represent the weights of the user IP query factor and the ID query factor, respectively.

For example, d=e=1/2;

For example, d and e are not equal, and may be adjusted according to the weight of each query factor and the actual situation of determining the first weighting factor.

It can be understood that the closer x is to 1, the heavier the first weighting factor is, and the greater the probability that the related picture belongs to the user in the database.

The above formula for calculating x is a linear formula, but in practice, a nonlinear formula may also be used.

Further, whether it is a linear formula or a nonlinear formula, it can be considered to determine the relevant formula and its parameters by training or fitting.

Step S200: Obtain a domain name included in the URL and/or an IP address pointed by the URL according to a URL path of the picture, perform a whois query in the second database, and/or based on the domain name included in the URL. The IP address pointed to by the URL, in the second database, whether the IP address included in the URL or the IP address of the same network segment exists, and the second weighting factor is output according to the query result of the whois query and/or the IP address;

It can be understood that the second database maintains a list of known domain names that have posted harmful pictures and/or a list of known IP addresses and IP address segments of websites that have posted harmful pictures. Compared with the previous steps, it is easy to understand that the release here refers to which IP and/or domain name corresponds to the website.

The Whois query is to examine the association of domain name registrants with harmful images. The second database can maintain the following information: the domain name, the information of the domain name registrant that publishes a large number of erotic pictures, reaction pictures, or cult pictures on the Internet, and the corresponding harmful picture.

For example, if the domain name is www.a.com:

If the domain name address, the identifier of the corresponding harmful picture, and its whois information are recorded in the second database, the second weighting factor may be exemplarily 1.0;

If the second database does not record the identifier of any harmful image of the above domain name www.a.com, but can query the domain name registrant of the domain name, and the domain name of other websites registered by the domain name registrant of the domain name, and the second database Including the other websites publishing a large number of harmful pictures on the Internet, even if the second database does not record any harmful pictures of the above domain name www.a.com, the website corresponding to the domain name of www.a.com is still highly Suspected to be the source of the harmful picture, the second weighting factor can be exemplified as 0.9;

If the second database does not record the identifier of any harmful image of the above domain name www.a.com, but can query the domain name registrant of the domain name, and the domain name of other websites registered by the domain name registrant of the domain name, the second database Does not include any identifier for the other website to publish harmful pictures, the second weighting factor may be exemplarily 0;

It is easy to understand that if the second database does not record the identifier of any harmful image of the above domain name www.a.com, and the domain name of other websites registered by the domain name registrant of the domain name is not queried, then the second weighting factor can also An example is 0.

Exemplarily, the IP address pointed by the URL may be obtained according to the URL path of the picture, and the IP address/IP address segment query is performed to output a second weighting factor.

For example, if the IP address is 192.168.20.3:

If the IP address is recorded in the second database, the second weighting factor may be exemplarily 1.0;

If the IP address recorded in the second database is only 192.168.20.4, then 192.168.20.3 is slightly suspected as the alternate address of the website to which the picture belongs or the newly replaced address, and the second weighting factor can be exemplified as 0.6;

If the IP address recorded in the second database is 192.168.20.4 and 192.168.20.5, and even all the IP addresses of the 192.168.20.X network segment are recorded, then 192.168.20.3 is highly suspected as the alternate address of the website to which the picture belongs or The newly replaced address, the second weighting factor can be exemplified as 0.9;

If the IP address recorded in the database includes multiple 192.168.XX network segments and there is no 192.168.20.X network segment, then 192.168.20.3 is cautiously suspected as the address of the website to which the harmful picture belongs. The second weighting factor can be exemplified. Is 0.4.

In particular, the above steps also have a situation in which the IP list and the domain name list are comprehensively considered, that is, the case where the second weighting factor is jointly determined by the IP query of the picture URL and the domain name whois query.

Suppose the IP query factor of the picture URL is i, the domain name whois query factor is j, and the second weighting factor is y, where 0≤i≤1, 0≤j≤1, 0≤y≤1, and the second formula can be determined according to the following formula Weighting factor:

y=m×i+n×j, where m+n=1, m and n represent the weights of the IP query factor and the domain name whois query factor, respectively.

For example, m=n=1/2;

For example, m and n are not equal, and may be adjusted according to the weight of each query factor and the actual situation of determining the second weighting factor.

It can be understood that the closer y is to 1, the heavier the second weighting factor is, and the greater the probability that the related picture belongs to a harmful picture.

The above formula for calculating y belongs to the linear formula, but in practical applications, a nonlinear formula may also be used.

Step S300, input a URL path of the picture into a third-party picture database, search all approximate pictures of the picture in a third-party picture database, obtain URL paths of all approximate pictures, and obtain all approximations based on URL paths of all approximate pictures. The domain name contained in the URL of the figure and/or the IP address pointed to by the URL of the approximate map; and, based on the domain name contained in the URL of all approximate maps, the whois query is performed in the second database, and/or the URL based on all approximate maps Pointing to the IP address, querying in the second database whether the IP address or the IP address of the same network segment exists in the URL, and outputting a third weighting factor according to the query result of the whois query and/or the IP address;

The step S300 is to perform a map search query in the third-party image database, and output a third weighting factor according to the IP and/or domain name whois query of the URL path of the approximate graph in the query result. The third weighting factor is determined according to the query situation of the IP and/or domain name of the URL path of the approximate graph in the second database, for example, counting the number of occurrences of the whois information of the IP or domain name in the second database. It can be understood that when the number of occurrences satisfies the corresponding threshold condition, the third weighting factor may be 1.0, or may be 0.8 or 0.4, depending on the specific threshold condition.

In addition, it should be emphasized that step S300 still involves less image processing and its recognition. Image processing is performed by a third party image database, and the present disclosure may not involve much image processing. Take a third-party image database like www.tineye.com as an example. Suppose the image is indeed an erotic image, and many similar images are found in a database like www.tineye.com, and the approximate image is in the URL. The domain name and/or IP is also recorded in the second database. It can be understood that even if no picture recognition is performed on the picture or the approximate picture, the S300 step can give a third weighting factor, which may be 1.0, or may be 0.6 - Obviously, if the domain name and/or IP in the URL of all the approximated maps retrieved are recorded by the second database, the third weighting factor factor is likely to be 1.0. That is to say, step S300 is equivalent to scoring the domain name and/or IP corresponding to the URL of the approximation map to determine whether it belongs to the domain name and/or IP having the prior record, and if a considerable number of approximate URLs correspond to the domain name and/or Or IP has a history, then there is reason to highly suspect that the picture is a harmful picture.

However, step S300 does not exclude the prior art technical means for identifying harmful information of a picture, that is, the step S300 can perform image processing in combination with a conventional method, or can be combined with a deep learning model. Processing, which in turn identifies harmful images. In addition, the third-party image database is based on the content to perform an approximate map search or based on other means, and the present disclosure is not limited.

Step S400, integrating the first weighting factor and the second weighting factor and the third weighting factor to identify whether the picture belongs to a harmful picture.

Exemplarily, the first weighting factor is x, the second weighting factor is y, and the third weighting factor is z, wherein 0≤x≤1, 0≤y≤1, 0≤z≤1, which can be integrated according to the following formula The above weighting factor calculates the harmful coefficient of the picture W:

W = a × x + b × y + c × z, where a + b + c = 1, a, b, c respectively represent the weight of each weighting factor.

For example, a=b=c=1/3;

For example, a, b, and c are not equal, and may be adjusted according to each weighting factor and the actual situation of identifying harmful content.

It can be understood that the closer W is to 1, the greater the probability that the related picture belongs to a harmful picture.

The formula for calculating W above is a linear formula, but in practice, a nonlinear formula may also be used.

In summary, for the above embodiments, all the steps basically do not involve specific image processing, but a different approach, mainly utilizing related queries and obtaining relevant weighting factors. Step S400 integrates (also referred to as fusion) multiple weighting factors to identify harmful pictures. Those skilled in the art are aware that specific image processing and recognition are relatively time-consuming costs, while queries are relatively more time-saving. It will be apparent that the above embodiment proposes an efficient method of identifying harmful pictures. Additionally, the above-described embodiments are apparently capable of further integrating and updating the first database, the second database, and other databases in conjunction with big data and/or artificial intelligence.

In another embodiment, the second database is a third party database.

For example, a number of websites that perform whois queries, as well as lists of pornographic websites maintained by third parties, lists of violent websites, lists of reaction websites, databases of cult website lists, or lists of IP addresses and IP address lists of websites that record harmful pictures. database. .

In another embodiment, for a web address (eg, a forum or web page) identified as a harmful picture, the IP address information of the publisher of the unwanted picture recorded on the website is collected and the first database is updated. This is because harmful pictures generally form sticky users. Some of these users will participate in the transmission of harmful pictures and most of the IP addresses are relatively fixed. If the relevant website itself records the IP address information of the publisher of the harmful pictures, The present disclosure updates the aforementioned first database by collecting its IP address information.

In another embodiment, step S200 further includes:

Further, the security of the domain name is queried in a third-party domain name security list to output a security factor, and the second weighting factor related to the domain name is corrected by the security factor.

For example, virustotal.com is a third-party domain name security screening website. It can be understood that if the third-party information believes that the relevant domain name contains a virus or a Trojan, the second weighting factor should be raised, which is rooted in the fact that the related website is more insecure.

It can be appreciated that the described embodiment focuses on correcting the second weighting factor from a network security perspective to prevent the user from suffering other losses. This is because cyber security is related to the privacy and property rights of users. If the websites related to harmful pictures have network security risks, they will bring harm to users or privacy damage in addition to the harmful pictures.

In another embodiment, step S400 further includes: when the recognition is harmful, further submitting the picture to the third party picture database. In this way, it is convenient for the third-party image database to consider whether to update its data.

In another embodiment, step S300 further includes the following:

Step c1): crawling audio in the webpage;

Step c2): Identify whether harmful content is included in the audio, and if so, correct the third weighting factor.

For this embodiment, if the audio is identified as including pornographic content, violent content, reactionary political speech, cult inflammatory speech, or horrific hatred, which indicates that the relevant website is threatening, then the third weighting factor is modified. For example, increase the third weighting factor.

As described above, if combined with big data technology, the present disclosure can effectively combine multiple dimensions and multiple modes, and combine IP information, domain name information, image information, and audio information to quickly identify harmful pictures.

Further, the above embodiment may be implemented on the router side or the network provider side to filter related pictures in advance.

Corresponding to the method, referring to FIG. 2, the disclosure discloses, in another embodiment, a system for identifying harmful pictures based on an approximate map URL path, including:

The identification module is configured to synthesize the first weighting factor and the second weighting factor and the third weighting factor to identify whether the picture belongs to a harmful picture.

Similar to the embodiments of the various methods described above,

Preferably, the second database is a third party database.

More preferably, the second weighting factor generating module further includes:

And a correction unit, configured to: further query the security of the domain name in the third-party domain name security list to output a security factor, and modify the second weighting factor related to the domain name by the security factor.

More preferably, the identification module is further configured to: when the recognition is harmful, further submit the picture to the third-party picture database.

More preferably, the third weighting factor generating module further corrects the third weighting factor by:

An audio crawling unit for crawling audio in the webpage;

An audio recognition unit for identifying whether harmful content is included in the audio, and if so, correcting the third weighting factor.

The present disclosure, in another embodiment, discloses a system for identifying unwanted pictures, including:

a processor and a memory having stored therein executable instructions, the processor executing the instructions to perform the following operations:

The present disclosure, in another embodiment, also discloses a computer storage medium storing executable instructions for performing a method of identifying a harmful picture as follows:

Step c): input the URL path of the picture into a third-party picture database, search all approximate pictures of the picture in a third-party picture database, obtain URL paths of all approximate pictures, and obtain all the URL paths based on all approximate pictures. The domain name contained in the URL of the approximate graph and/or the IP address pointed to by the URL of the approximate graph; and,

For the above system, it may comprise: at least one processor (eg CPU), at least one sensor (eg accelerometer, gyroscope, GPS module or other positioning module), at least one memory, at least one communication bus, wherein the communication bus To achieve connection communication between various components. The device may further include at least one receiver, at least one transmitter, wherein the receiver and the transmitter may be wired transmission ports, or may be wireless devices (including, for example, including antenna devices) for signaling with other node devices. Or the transmission of data. The memory may be a high speed RAM memory or a non-volatile memory such as at least one disk memory. The memory may optionally be at least one storage device located remotely from the aforementioned processor. A set of program code is stored in the memory, and the processor can call the code stored in the memory to perform related functions via the communication bus.

An embodiment of the present disclosure further provides a computer storage medium, wherein the computer storage medium can store a program, the program including some or all of the steps of the method for identifying a harmful picture described in the foregoing method embodiments.

The steps in the method of the embodiment of the present disclosure may be sequentially adjusted, merged, and deleted according to actual needs.

Modules and units in the system of the embodiments of the present disclosure may be combined, divided, and deleted according to actual needs. It should be noted that, for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence. Because certain steps may be performed in other sequences or concurrently in accordance with the present invention. In the following, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions, modules, and units involved are not necessarily required by the present invention.

In the above embodiments, the descriptions of the various embodiments are different, and the details that are not detailed in a certain embodiment can be referred to the related descriptions of other embodiments.

In the several embodiments provided by the present disclosure, it should be understood that the disclosed system can be implemented in other manners. For example, the embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or integrated. Go to another system, or some features can be ignored or not executed. In addition, the coupling or direct coupling or communication connection of the various units or components to each other may be an indirect coupling or communication connection through some interfaces, devices or units, and may be electrical or otherwise.

The units described as separate components may or may not be physically separate, may be located in one place, or may be distributed over multiple network elements. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present disclosure may contribute to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a smart phone, a personal digital assistant, a wearable device, a laptop, a tablet) to perform all or part of the steps of the methods described in various embodiments of the present disclosure. The foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like. .

The above embodiments are only used to illustrate the technical solutions of the present disclosure, and are not intended to be limiting; although the present disclosure has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the examples are modified, or equivalent to some of the technical features are included; and the modifications or substitutions do not depart from the scope of the technical solutions of the embodiments of the present disclosure.

Claims

A method for identifying harmful pictures based on an approximate map URL path, comprising:

Step a), when it is determined that the page element of the webpage includes a URL path of the image, identifying an IP address or an IP address segment of the user recorded in the page content of the webpage, and/or identifying the content recorded in the page content of the webpage User ID, and querying in the first database whether the IP address or the same network segment IP address exists, and/or querying whether the ID exists in the first database, and querying the result and/or ID according to the user's IP address. The query result outputs a first weighting factor;

Step b): obtaining a domain name included in the URL and/or an IP address pointed to by the URL according to a URL path of the picture, performing a whois query in the second database based on the domain name included in the URL, and/or based on The IP address pointed to by the URL, in the second database, whether the IP address included in the URL or the IP address of the same network segment exists, and the second weighting factor is output according to the query result of the whois query result and/or the IP address. ;

Step c): input the URL path of the picture into a third-party picture database, search all approximate pictures of the picture in a third-party picture database, obtain URL paths of all approximate pictures, and obtain all the URL paths based on all approximate pictures. The domain name contained in the URL of the approximation map and/or the IP address pointed to by the URL of the approximation map; and, based on the domain name contained in the URLs of all approximation maps, the whois query is performed in the second database, and/or based on all approximation maps The IP address pointed to by the URL, in the second database, whether to store the IP address included in the URL or the IP address of the same network segment, and output a third weighting factor according to the query result of the whois query and/or the IP address;

Step d), integrating the first weighting factor and the second weighting factor and the third weighting factor to identify whether the picture belongs to a harmful picture.
The method of claim 1 wherein said second database is a third party database.
The method of claim 1 wherein step b) further comprises:

Further, the security of the domain name is queried in a third-party domain name security list to output a security factor, and the second weighting factor is corrected by the security factor.
The method of claim 1 wherein step d) further comprises:

When identified as harmful, the picture is further submitted to the third party picture database.
The method of claim 1 wherein step c) further comprises the following:

Step c1): crawling audio in the webpage;

Step c2): Identify whether harmful content is included in the audio, and if so, correct the third weighting factor.
A system for identifying harmful pictures based on an approximate map URL path, comprising:

a first weighting factor generating module, configured to: identify, when the page element of the webpage includes a URL path of the webpage, identify an IP address or an IP address segment of the user recorded in the page content of the webpage, and/or identify the webpage The user ID recorded in the page content, and in the first database, whether the IP address or the same network segment IP address exists, and/or whether the ID exists in the first database, and according to the user's IP address The query result and/or the ID query result output a first weighting factor;

a second weighting factor generating module, configured to: obtain a domain name included in the URL and/or an IP address pointed to by the URL according to a URL path of the image, and perform whois in the second database based on the domain name included in the URL Querying, and/or querying, according to the IP address pointed by the URL, whether the IP address or the same network segment IP address included in the URL exists in the second database, and the query result according to the whois query result and/or the IP address , outputting a second weighting factor;

a third weighting factor generating module, configured to: input a URL path of the image into a third-party image database, search all approximate images of the image in a third-party image database, and obtain URL paths of all approximate images, and based on all approximations The URL path of the figure obtains the domain name contained in the URL of all approximate maps and/or the IP address pointed to by the URL of the approximate map; and, based on the domain name contained in the URLs of all approximate maps, the whois query is performed in the second database, and / Or based on the IP address pointed to by the URL, in the second database, query whether the IP address or the IP address of the same network segment is included in the URL, and output the first according to the query result of the whois query result and/or the IP address. Three weighting factor;

The identification module is configured to synthesize the first weighting factor and the second weighting factor and the third weighting factor to identify whether the picture belongs to a harmful picture.
The system of claim 6 wherein preferably said second database is a third party database.
The system of claim 6 wherein the second weighting factor generation module further comprises:

And a correction unit, configured to: further query, in a third-party domain name security list, the security of the domain name to output a security factor, and modify the second weighting factor by using the security factor.
The system of claim 6 wherein said identifying module is further for: when said identifying is harmful, further submitting said picture to said third party picture database.
The system according to claim 6, wherein the third weighting factor generating module further corrects the third weighting factor by:

An audio crawling unit for crawling audio in the webpage;

An audio recognition unit for identifying whether harmful content is included in the audio, and if so, correcting the third weighting factor.
A system for identifying unwanted images, including:

a processor and a memory having stored therein executable instructions, the processor executing the instructions to perform the following operations:

Step a), when it is determined that the page element of the webpage includes a URL path of the image, identifying an IP address or an IP address segment of the user recorded in the page content of the webpage, and/or identifying the content recorded in the page content of the webpage User ID, and querying in the first database whether the IP address or the same network segment IP address exists, and/or querying whether the ID exists in the first database, and querying the result and/or ID according to the user's IP address. The query result outputs a first weighting factor;

Step b): obtaining a domain name included in the URL and/or an IP address pointed to by the URL according to a URL path of the picture, performing a whois query in the second database based on the domain name included in the URL, and/or based on The IP address pointed to by the URL, in the second database, whether the IP address included in the URL or the IP address of the same network segment exists, and the second weighting factor is output according to the query result of the whois query result and/or the IP address. ;

Step c): input the URL path of the picture into a third-party picture database, search all approximate pictures of the picture in a third-party picture database, obtain URL paths of all approximate pictures, and obtain all the URL paths based on all approximate pictures. The domain name contained in the URL of the approximation map and/or the IP address pointed to by the URL of the approximation map; and, based on the domain name contained in the URLs of all approximation maps, the whois query is performed in the second database, and/or based on all approximation maps The IP address pointed to by the URL, in the second database, whether the IP address included in the URL or the IP address of the same network segment exists, and the third weighting factor is output according to the query result of the whois query and/or the IP address;

Step d), integrating the first weighting factor and the second weighting factor and the third weighting factor to identify whether the picture belongs to a harmful picture.
A computer storage medium storing executable instructions for performing a method of identifying a harmful picture as follows:

Step a), when it is determined that the page element of the webpage includes a URL path of the image, identifying an IP address or an IP address segment of the user recorded in the page content of the webpage, and/or identifying the content recorded in the page content of the webpage User ID, and querying in the first database whether the IP address or the same network segment IP address exists, and/or querying whether the ID exists in the first database, and querying the result and/or ID according to the user's IP address. The query result outputs a first weighting factor;

Step b): obtaining a domain name included in the URL and/or an IP address pointed to by the URL according to a URL path of the picture, performing a whois query in the second database based on the domain name included in the URL, and/or based on The IP address pointed to by the URL, in the second database, whether the IP address included in the URL or the IP address of the same network segment exists, and the second weighting factor is output according to the query result of the whois query result and/or the IP address. ;

Step c): input the URL path of the picture into a third-party picture database, search all approximate pictures of the picture in a third-party picture database, obtain URL paths of all approximate pictures, and obtain all the URL paths based on all approximate pictures. The domain name contained in the URL of the approximate graph and/or the IP address pointed to by the URL of the approximate graph; and,

Based on the domain name contained in the URL of all approximation maps, the whois query is performed in the second database, and/or the IP address included in the URL is queried in the second database based on the IP address pointed to by the URLs of all approximate maps. Or the IP address of the same network segment, and output a third weighting factor according to the query result of the whois query and/or the IP address;

Step d), integrating the first weighting factor and the second weighting factor and the third weighting factor to identify whether the picture belongs to a harmful picture.