CN114363039A

CN114363039A - Method, device, equipment and storage medium for identifying fraud websites

Info

Publication number: CN114363039A
Application number: CN202111652275.XA
Authority: CN
Inventors: 马恒恒; 黄晓青; 高华; 尚程; 傅强; 梁彧; 蔡琳; 田野; 王杰; 杨满智; 金红; 陈晓光
Original assignee: Eversec Beijing Technology Co Ltd
Current assignee: Eversec Beijing Technology Co Ltd
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2022-04-15

Abstract

The invention discloses a method, a device, equipment and a storage medium for identifying fraud websites, wherein the method comprises the following steps: the method comprises the steps of obtaining an internet access log corresponding to a target user, and screening a target fraud-related event from the internet access log according to a preset fraud-related rule base; the fraud-related rule base comprises a plurality of fraud-related rules; acquiring a target website corresponding to the target fraud-related event, and matching target background information characteristics of the target website with a preset background information characteristic library; and determining a fraud identification result corresponding to the target website according to the matching result. The technical scheme of the embodiment of the invention can effectively determine the fraud websites, improve the identification efficiency of the fraud websites and reduce the identification cost of the fraud websites.

Description

Method, device, equipment and storage medium for identifying fraud websites

Technical Field

The embodiment of the invention relates to the technical field of network security, in particular to a method, a device, equipment and a storage medium for identifying fraud websites.

Background

With the rapid development of the internet industry, the number of events for illegal persons to use the network to make a crime is increasing, and the perfection of information and network security technology is very important. In recent years, crime cases implemented by fraud websites in the internet are very common, and the life and property safety of common people is seriously harmed.

In the prior art, after an internet fraud event occurs, data accessed by a victim on the internet is generally analyzed, a fraud website within a time window of the fraud event or an internet access behavior log of the victim is found, fraud sample data is output from the internet access behavior log, information such as a registered/controlled mailbox and a contact mailbox of a registered domain name can be traced back to other registered domain names and mailboxes through characteristic fingerprint analysis, and information such as other registered domain names and mailboxes and other identity position information and related characteristic information of a website foreground are combined to find background and fraud molecule information of the fraud website.

However, in the prior art, foreground information (e.g., domain name, uniform resource locator) of websites is mainly analyzed, so as to realize the identification of fraud websites and the location of fraud molecules. Due to the large amount of data (such as logs, intelligence and the like) needing to be processed, the fraud website identification process is time-consuming and has high requirements on data analysts; secondly, in the prior art, only foreground information is used to mine the registration subject information related to the fraud domain name or Uniform Resource Locator (URL), which has a certain sidedness and results in a low accuracy of the identification result of the fraud website.

Disclosure of Invention

Embodiments of the present invention provide a method, an apparatus, a device and a storage medium for identifying a fraud website, which can effectively determine the fraud website, improve the efficiency of identifying the fraud website and reduce the cost of identifying the fraud website.

In a first aspect, an embodiment of the present invention provides a method for identifying a fraud website, the method including:

the method comprises the steps of obtaining an internet access log corresponding to a target user, and screening a target fraud-related event from the internet access log according to a preset fraud-related rule base; the fraud-related rule base comprises a plurality of fraud-related rules;

acquiring a target website corresponding to the target fraud-related event, and matching target background information characteristics of the target website with a preset background information characteristic library;

and determining a fraud identification result corresponding to the target website according to the matching result.

Optionally, the fraud-related rule includes: a domain name, a Uniform Resource Locator (URL) and keywords corresponding to the fraud-related event;

screening a target fraud-related event from the Internet access log according to a preset fraud-related rule base, wherein the method comprises the following steps:

and screening a target fraud-related event from the Internet access log according to the domain name, the URL and the keyword corresponding to the fraud-related event.

Optionally, the background information feature library includes background address features corresponding to the fraud websites and background URL features;

matching the target background information characteristics of the target website with a preset background information characteristic library, wherein the matching comprises the following steps:

and matching the target background address characteristic and the target background URL characteristic of the target website with the background address characteristic and the background URL characteristic corresponding to the fraud website.

Optionally, the background address feature includes: keywords in the background login address;

the background URL features include: keywords in the background URL path.

Optionally, determining, according to the matching result, a fraud identification result corresponding to the target website, including:

and judging whether the matching result exceeds a preset threshold value, if so, determining that the target website is a fraud website.

Optionally, after determining that the target website is a fraud website, further comprising:

and outputting the website of the target website to an early warning platform so that the early warning platform processes the target website and fraud molecules corresponding to the target website.

In a second aspect, an embodiment of the present invention further provides an apparatus for identifying a fraud website, where the apparatus includes:

the event screening module is used for acquiring an internet access log corresponding to a target user and screening a target fraud-related event from the internet access log according to a preset fraud-related rule base; the fraud-related rule base comprises a plurality of fraud-related rules;

the characteristic matching module is used for acquiring a target website corresponding to the target fraud-related event and matching target background information characteristics of the target website with a preset background information characteristic library;

and the result determining module is used for determining a fraud identification result corresponding to the target website according to the matching result.

the event screening module comprises:

and the fraud-related event screening unit is used for screening a target fraud-related event from the Internet access log according to the domain name, the URL and the keyword corresponding to the fraud-related event.

Optionally, the background information feature library includes: background address features corresponding to the fraud websites, and background URL features;

the feature matching module includes:

and the background feature matching unit is used for matching the target background address feature and the target background URL feature of the target website with the background address feature and the background URL feature corresponding to the fraud website.

Optionally, the background address feature includes: keywords in the background login address; the background URL features include: keywords in the background URL path.

Optionally, the result determining module includes:

the result judging unit is used for judging whether the matching result exceeds a preset threshold value or not;

a fraud website determining unit, configured to determine that the target website is a fraud website when the matching result exceeds a preset threshold;

and the website output unit is used for outputting the website of the target website to the early warning platform after the target website is determined to be a fraud website, so that the early warning platform processes the target website and fraud molecules corresponding to the target website.

In a third aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method for identifying fraudulent websites according to any of the embodiments of the present invention.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements a method for identifying a fraud website provided in any embodiment of the present invention.

According to the technical scheme of the embodiment of the invention, the internet access log corresponding to the target user is obtained, the target fraud-related event is screened from the internet access log according to the preset fraud-related rule base, the target website corresponding to the target fraud-related event is obtained, the target background information characteristic of the target website is matched with the preset background information characteristic base, and the fraud identification result corresponding to the target website is determined according to the matching result, so that the fraud website can be effectively determined, the identification efficiency of the fraud website is improved, and the identification cost of the fraud website is reduced.

Drawings

FIG. 1 is a flow chart of a method for identifying fraud websites in one embodiment of the present invention;

FIG. 2 is a flowchart of a method for identifying fraud websites in the second embodiment of the present invention;

FIG. 3 is a flowchart of a method for identifying fraud websites in the third embodiment of the present invention;

FIG. 4 is a block diagram of an identification apparatus of a fraud website in the fourth embodiment of the present invention;

fig. 5 is a schematic structural diagram of a computer device in the fifth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a fraud website identification method provided in an embodiment of the present invention, which can be applied to the case of identifying fraud websites, and the method can be executed by a fraud website identification device, which can be implemented by software and/or hardware, and can be generally integrated in a terminal or a server having a data processing function, and specifically includes the following steps:

and 110, acquiring an internet access log corresponding to the target user, and screening the target fraud-related event from the internet access log according to a preset fraud-related rule base.

In this embodiment, the target user may be a victim after being fraudulently carried out by an illegal molecule, and the fraud-related rule library may include a plurality of fraud-related rules preset by a third-party research and judgment unit according to a plurality of fraud-related events, and the fraud-related rules may include event characteristics ubiquitous in the plurality of fraud-related events.

In this step, the corresponding internet access log may be obtained from the internet log of the target user, and an event satisfying the fraud-related rule in the internet access log is taken as a target fraud-related event.

And step 120, acquiring a target website corresponding to the target fraud-related event, and matching the target background information characteristics of the target website with a preset background information characteristic library.

In this embodiment, the website background is mainly used for performing information management on the website foreground, such as issuing, updating, deleting, and other operations of texts, pictures, videos, and other daily-use documents, and also includes statistics and management of member information, order information, and visitor information. The website background can also be understood as a rapid operation management system of a website database and files, so that foreground contents can be updated and adjusted in time.

In this step, the target background information features may be sequentially matched with each background information feature included in the background information feature library. The background information feature library can be obtained by a third-party research and judgment unit and constructed according to a plurality of fraud websites, and background features commonly existing in the plurality of fraud websites can be included in the background information feature library.

Step 130, determining a fraud identification result corresponding to the target website according to the matching result.

In one implementation manner of this embodiment, determining a fraud identification result corresponding to the target website according to the matching result includes: and judging whether the matching result exceeds a preset threshold value, if so, determining that the target website is a fraud website, and if not, determining that the target website is not the fraud website.

In this embodiment, the website foreground is oriented to the website access user, and is generally used to provide publicly released content and pages, such as product information, news information, enterprise introduction, enterprise contact information, submitted messages, and the like, for the user. If the fraud websites are identified only according to the website foreground information, the identification method has certain sidedness and the accuracy of the identification result is low. Compared with the existing identification method, in the embodiment, the fraud website can be quickly and effectively determined by screening the target fraud-related event from the internet access log according to the fraud-related rule base and matching the target background information characteristic of the target website with the preset background information characteristic base.

Secondly, the identification method of the fraud websites provided in the embodiment can be integrated in the computer device, thereby reducing the time consumption of identifying the fraud websites and reducing the requirements on the data analysts.

Example two

This embodiment is a further refinement of the above embodiment, and the same or corresponding terms as those of the above embodiment are explained, and this embodiment is not described again. Fig. 2 is a flowchart of a method for identifying a fraud website provided in the second embodiment, the fraud-related rule includes: a domain name, a Uniform Resource Locator (URL) and keywords corresponding to the fraud-related event; the background information feature library comprises: background address features corresponding to fraud websites, and background URL features. The technical solution of this embodiment may be combined with one or more methods in the solutions of the foregoing embodiments, as shown in fig. 2, the method provided in this embodiment may further include:

and step 210, obtaining an internet access log corresponding to the target user.

Step 220, screening the target fraud-related event from the internet access log according to the domain name, the URL and the keyword corresponding to the fraud-related event.

In this embodiment, the third party research and judgment unit can use the domain name, URL and keyword commonly existed in the plurality of fraud-related events as the fraud-related rule.

In this step, the event including the domain name, URL, and keyword in the internet access log may be taken as a target fraud-related event. The advantage of such an arrangement is that target fraud-related events can be effectively screened in the internet access log, thereby improving the accuracy of fraud website identification results.

And step 230, matching the target background address characteristics and the target background URL characteristics of the target website with background address characteristics and background URL characteristics corresponding to the fraud website.

In this embodiment, the third party research and judgment unit may construct the background information feature library according to the background address features and the background URL features commonly existing in the plurality of fraud websites.

In this step, the target background address feature may be compared with the background address feature corresponding to the fraud website, the target background URL feature may be compared with the background URL feature corresponding to the fraud website, and the matching result between the target background information feature and the background information feature library may be determined according to the comparison result.

In an implementation manner of this embodiment, the background address features in the background information feature library include: keywords in the background login address; background URL features include: keywords in the background URL path.

In a specific embodiment, the background login address of most fraud websites is "xxx.com/admin" or "xxx.com/admin.php", so that keywords such as "admin", "m.php", "admfor" can be used as the background address feature.

In another specific embodiment, the background URL paths of most fraud websites include keywords such as "number", "management", etc., so that the keywords such as "number", "management", etc. can be used as the background URL features.

In this embodiment, the fraud-related rule base and the background information feature base can be updated according to a plurality of subsequently determined fraud-related events and fraud websites, so as to improve the accuracy of the fraud website identification result.

And 240, determining a fraud identification result corresponding to the target website according to the matching result.

According to the technical scheme of the embodiment of the invention, the internet access log corresponding to the target user is obtained, the target fraud-related event is screened from the internet access log according to the domain name, URL and keyword corresponding to the fraud-related event, the target background address characteristic and the target background URL characteristic of the target website are matched with the background address characteristic and the background URL characteristic corresponding to the fraud website, and the fraud identification result corresponding to the target website is determined according to the matching result, so that the fraud website can be effectively determined, the identification efficiency of the fraud website is improved, and the identification cost of the fraud website is reduced.

EXAMPLE III

This embodiment is a further refinement of the above embodiment, and the same or corresponding terms as those of the above embodiment are explained, and this embodiment is not described again. Fig. 3 is a flowchart of a method for identifying a fraud website provided in the third embodiment, where the technical solution of the third embodiment may be combined with one or more methods in the solutions of the foregoing embodiments, as shown in fig. 3, the method provided in the third embodiment may further include:

and 310, acquiring an internet access log corresponding to the target user, and screening the target fraud event from the internet access log according to a preset fraud rule base.

And 320, acquiring a target website corresponding to the target fraud-related event, and matching the target background information characteristics of the target website with a preset background information characteristic library.

Step 330, determining whether the matching result exceeds a preset threshold, if yes, executing step 340, and if not, executing step 350.

Step 340, determining that the target website is a fraud website, and outputting the website of the target website to the early warning platform, so that the early warning platform processes the target website and fraud molecules corresponding to the target website.

In the present embodiment, if the matching result exceeds the preset threshold, it can be considered that background features ubiquitous in a plurality of fraud websites may be included in the target website, and thus the target website can be determined as a fraud website.

After the target website is determined to be a fraud website, the website address of the target website can be output to an early warning platform. Specifically, the early warning platform can quickly and effectively discover fraud molecules and victim sensitive information stored in a background of the target website according to the website of the target website, and dissuades the target website, so that the operation difficulty of the fraud website can be increased, and the safety of the internet access process can be improved.

Step 350, determining that the target website is not a fraud website.

According to the technical scheme, the internet access log corresponding to the target user is obtained, the target fraud-related event is screened from the internet access log according to the preset fraud-related rule base, the target website corresponding to the target fraud-related event is obtained, the target background information characteristic of the target website is matched with the preset background information characteristic base, whether the matching result exceeds the preset threshold value or not is judged, if yes, the target website is determined to be a fraud website, the website of the target website is output to the early warning platform, and the early warning platform processes the target website and the fraud molecule corresponding to the target website, so that the fraud website can be effectively determined, the identification efficiency of the fraud website is improved, and the identification cost of the fraud website is reduced.

Example four

Fig. 4 is a block diagram of an apparatus for identifying fraud websites according to a fourth embodiment of the present invention, the apparatus including: an event screening module 410, a feature matching module 420, and a result determination module 430.

The event screening module 410 is configured to obtain an internet access log corresponding to a target user, and screen a target fraud-related event from the internet access log according to a preset fraud-related rule base; the fraud-related rule base comprises a plurality of fraud-related rules;

the feature matching module 420 is configured to acquire a target website corresponding to the target fraud-related event, and match a target background information feature of the target website with a preset background information feature library;

a result determining module 430, configured to determine a fraud identification result corresponding to the target website according to the matching result.

On the basis of the above embodiments, the fraud-related rule includes: a domain name, a uniform resource locator, URL, and a keyword corresponding to the fraud-related event. The background information feature library comprises: background address features corresponding to fraud websites, and background URL features. The background address features include: keywords in the background login address; the background URL features include: keywords in the background URL path.

The event screening module 410 includes:

The feature matching module 420 includes:

The result determination module 430 includes:

The identification device for the fraud websites provided by the embodiment of the invention can execute the identification method for the fraud websites provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

EXAMPLE five

Fig. 5 is a schematic structural diagram of a computer apparatus according to a fifth embodiment of the present invention, as shown in fig. 5, the computer apparatus includes a processor 510, a memory 520, an input device 530, and an output device 540; the number of the processors 510 in the computer device may be one or more, and one processor 510 is taken as an example in fig. 5; the processor 510, the memory 520, the input device 530 and the output device 540 in the computer apparatus may be connected by a bus or other means, and the connection by the bus is exemplified in fig. 5.

Memory 520, as a computer-readable storage medium, can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to a method for identifying fraud websites (e.g., event screening module 410, feature matching module 420, and result determining module 430 in an identifying device of a fraud website) in the embodiments of the present invention. The processor 510 executes various functional applications and data processing of the computer device by executing the software programs, instructions and modules stored in the memory 520, namely, implements one of the above-mentioned methods for identifying fraudulent websites. That is, the program when executed by the processor implements:

The memory 520 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 520 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 520 may further include memory located remotely from processor 510, which may be connected to a computer device through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 530 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus, and may include a keyboard and a mouse, etc. The output device 540 may include a display device such as a display screen.

EXAMPLE six

The sixth embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method according to any embodiment of the present invention. Of course, the embodiment of the present invention provides a computer-readable storage medium, which can perform the relevant operations in the method for identifying a fraud website provided by any embodiment of the present invention. That is, the program when executed by the processor implements:

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.

It should be noted that, in the embodiment of the fraud website identification apparatus, the units and modules included in the fraud website identification apparatus are merely divided according to the functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method of identifying a fraudulent website, said method comprising:

2. The method as recited in claim 1, wherein said fraud-related rule comprises: a domain name, a Uniform Resource Locator (URL) and keywords corresponding to the fraud-related event;

3. The method of claim 2, wherein the background information feature library comprises: background address features corresponding to the fraud websites, and background URL features;

4. The method of claim 3, wherein the background address features comprise: keywords in the background login address;

the background URL features include: keywords in the background URL path.

5. The method as recited in claim 1, wherein determining a fraud identification result corresponding to said target website according to a matching result comprises:

6. The method as recited in claim 5, further comprising, after determining that said target website is a fraud website:

7. An apparatus for identifying fraudulent websites, said apparatus comprising:

8. The apparatus as recited in claim 7, wherein said fraud-related rule comprises: a domain name, a Uniform Resource Locator (URL) and keywords corresponding to the fraud-related event;

the event screening module comprises:

9. A computer device, comprising:

one or more processors;

storage means for storing one or more programs;

the one or more programs, when executed by said one or more processors, cause said one or more processors to execute said programs, implementing a method of identifying fraud websites of any of claims 1-6.

10. A computer-readable storage medium, on which a computer program is stored, the program, when being executed by a processor, implementing the method for identifying fraud websites of any one of claims 1-6.