CN116232760B

CN116232760B - Fraud website identification early warning method, device, equipment and storage medium

Info

Publication number: CN116232760B
Application number: CN202310483478.3A
Authority: CN
Inventors: 阮宝江
Original assignee: Nanjing Boshengyu Network Technology Co ltd
Current assignee: Nanjing Boshengyu Network Technology Co ltd
Priority date: 2023-05-04
Filing date: 2023-05-04
Publication date: 2023-07-21
Anticipated expiration: 2043-05-04
Also published as: CN116232760A

Abstract

The application relates to a fraud website identification and early warning method, a device, equipment and a storage medium, wherein the method comprises the steps of collecting and archiving existing fraud website data and constructing a fraud website database; acquiring real-time internet access data of a user side, capturing access website data characteristics, and preprocessing to obtain first characteristic data; based on a fraud website database, performing preliminary detection on the first characteristic data; in response to detecting that the fraud website database does not contain the first characteristic data, performing secondary processing on the characteristics of the access website data to obtain second characteristic data; inputting the second characteristic data into a preset fraud website identification model, and judging whether the website corresponding to the characteristic of the accessed website data is a fraud website according to the output result; if yes, intercepting the target website and sending early warning information to the user side. The method and the device can accurately and efficiently identify the fraud-related websites, and can early warn cheaters in time.

Description

Fraud website identification early warning method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of network security technologies, and in particular, to a fraud website identification and early warning method, device, equipment, and storage medium.

Background

In the prior art, the fraud website identification method mainly adopts a content-based matching technology, and performs identification on fraud websites by matching text keywords of webpage content captured from websites corresponding to the websites or performing image classification by constructing a deep learning model on webpage images captured from the corresponding websites. Therefore, how to efficiently and accurately identify and timely early warn the fraud websites is a problem to be solved urgently.

Disclosure of Invention

Based on the above, it is necessary to provide a fraud website identification and early warning method, device, equipment and storage medium for the technical problems.

In one aspect, a fraud website identification and early warning method is provided, the method comprising:

collecting and archiving existing fraud website data, and constructing a fraud website database based on the existing fraud website data;

acquiring real-time Internet access data of a user side, capturing access website data characteristics, and preprocessing the access website data characteristics to obtain first characteristic data;

performing preliminary detection on the first characteristic data based on the fraud website database;

in response to detecting that the fraud website database does not contain the first characteristic data, performing secondary processing on the access website data characteristic to obtain second characteristic data;

inputting the second characteristic data into a preset fraud website identification model, and judging whether the website corresponding to the characteristic of the accessed website data is a fraud website according to an output result;

if yes, intercepting the target website and sending early warning information to the user side.

In one embodiment, the method further comprises: the collecting and archiving the existing fraud website data, constructing a fraud website database based on the existing fraud website data comprises:

acquiring attribute information of the existing fraud website data, wherein the attribute information comprises a server address, fraud information category and the number of text illegal keywords;

classifying and grading the fraud website data based on the attribute information, and marking classification results;

and storing the marked hierarchical classification result in a source database to generate the fraud website database.

In one embodiment, the method further comprises: the acquiring real-time internet access data of the user side, capturing access website data characteristics, preprocessing the access website data characteristics, and obtaining first characteristic data comprises the following steps:

identifying text illegal keywords in the access website data characteristics, and counting the number of the text illegal keywords;

acquiring the server address of the access website data characteristic;

and classifying and marking the access website data features based on the number of the text illegal keywords and the server address to obtain the first feature data.

In one embodiment, the method further comprises: the preliminary detection of the first characteristic data based on the fraud website database comprises:

selecting data corresponding to the classification result of the same specification in the fraud website database based on the classification mark;

respectively comparing the number of the text illegal keywords corresponding to the target data and the classification marks with the server addresses;

responding to a successful comparison result, intercepting a website corresponding to the first characteristic data, and sending early warning information to the user side;

and responding to an unsuccessful comparison result, and judging that the fraud website database does not contain the first characteristic data.

In one embodiment, the method further comprises: and in response to detecting that the fraud website database does not contain the first characteristic data, performing secondary processing on the access website data characteristics to obtain second characteristic data, wherein the obtaining of the second characteristic data comprises the following steps:

defining the state set corresponding to the data characteristics of the access website asThe probability of each state constituting a fraud site is +.>And->The constructing the risk prediction model corresponding to the access website data features comprises the following steps:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing risk prediction value->Represents a proportionality constant->Representing the fitting value +.>The fit function is represented as a function of the fit,representing constant coefficients, ++>Indicating the degree of confusion of domain names>Representing the total length of the domain name>Number of domain names->Representing the number of non-standard ports, +.>Representing the number of illegal keywords of text, +.>Indicating ICP recording result, if recording is available, +.>Without recording, if there is no recordThen->Representing preliminary detection similarity;

defining the risk prediction value as second characteristic data.

In one embodiment, the method further comprises: the preset fraud website identification model comprises the following steps:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing the output value +.>Representing correction factors->Is->Taking the whole number upwards to indicate the characteristic statistics of the data of the access website, < +.>Representing text feature statistics function corresponding to website +.>Representing the dynamic parameters.

In one embodiment, the method further comprises: the step of judging whether the website corresponding to the access website data characteristic is a fraud website according to the output result comprises the following steps:

in response to detecting that the output value is greater than a first preset value and less than or equal to a second preset value, judging that the website corresponding to the access website data characteristic is a fraud website and is of a pressure drop type fraud;

in response to detecting that the output value is larger than a second preset value, judging that the website corresponding to the access website data characteristic is a fraud website and is a long-term high-transmission type fraud;

and in response to the fact that the output value is smaller than or equal to a first preset value, judging that the website corresponding to the data characteristic of the access website is a non-fraud website.

In another aspect, there is provided a fraud website identification and pre-warning device, the device comprising:

the database construction module is used for collecting and archiving the existing fraud website data and constructing a fraud website database based on the existing fraud website data;

the first characteristic data acquisition module is used for acquiring real-time internet access data of the user side, capturing access website data characteristics, and preprocessing the access website data characteristics to obtain first characteristic data;

the detection module is used for carrying out preliminary detection on the first characteristic data based on the fraud website database;

the second characteristic data acquisition module is used for carrying out secondary processing on the access website data characteristic to obtain second characteristic data in response to the fact that the fraud website database does not contain the first characteristic data;

the judging module is used for inputting the second characteristic data into a preset fraud website identification model, and judging whether the website corresponding to the characteristic of the access website data is a fraud website according to the output result;

and the early warning information sending module is used for intercepting the target website when the website corresponding to the access website data characteristic is a fraud website, and sending early warning information to the user side.

In yet another aspect, a computer device is provided comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of:

In yet another aspect, a computer readable storage medium is provided, having stored thereon a computer program which when executed by a processor performs the steps of:

The fraud website identification and early warning method, device, equipment and storage medium, wherein the method comprises the following steps: collecting and archiving existing fraud website data, and constructing a fraud website database based on the existing fraud website data; acquiring real-time Internet access data of a user side, capturing access website data characteristics, and preprocessing the access website data characteristics to obtain first characteristic data; performing preliminary detection on the first characteristic data based on the fraud website database; in response to detecting that the fraud website database does not contain the first characteristic data, performing secondary processing on the access website data characteristic to obtain second characteristic data; inputting the second characteristic data into a preset fraud website identification model, and judging whether the website corresponding to the characteristic of the accessed website data is a fraud website according to an output result; if yes, intercepting the target website and sending early warning information to the user side, the method and the device can accurately and efficiently identify the fraud-related websites and timely early warn the cheater.

Drawings

FIG. 1 is an application environment diagram of a fraud site identification pre-warning method in one embodiment;

FIG. 2 is a flow chart of a fraud site identification and pre-warning method according to an embodiment;

FIG. 3 is a schematic diagram of a non-standard port of a fraud site identification pre-warning method in one embodiment;

FIG. 4 is a website interception schematic diagram of a fraud website identification and pre-warning method in one embodiment;

FIG. 5 is a block diagram of a fraud site identification and pre-warning device according to an embodiment;

fig. 6 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

It should be understood that throughout this description, unless the context clearly requires otherwise, the words "comprise," "comprising," and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, it is the meaning of "including but not limited to".

It should also be appreciated that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more.

It should be noted that the terms "S1", "S2", and the like are used for the purpose of describing steps only, and are not intended to be limited to the order or sequence of steps or to limit the present application, but are merely used for convenience in describing the method of the present application and are not to be construed as indicating the sequence of steps. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be regarded as not exist and not within the protection scope of the present application.

The fraud website identification and early warning method provided by the application can be applied to an application environment shown in figure 1. The terminal 102 communicates with a data processing platform disposed on the server 104 through a network, where the terminal 102 may be, but is not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices, and the server 104 may be implemented by a stand-alone server or a server cluster composed of a plurality of servers.

Example 1: in one embodiment, as shown in fig. 2-4, a fraud website identification and early warning method is provided, and the method is applied to the terminal in fig. 1 for illustration, and includes the following steps:

s1: and collecting and archiving the existing fraud website data, and constructing a fraud website database based on the existing fraud website data.

It should be noted that, the existing fraud website data can be obtained through channels such as internet record, offline record, etc., and the attribute information of the existing fraud website data is extracted, where the attribute information includes server address, fraud information category and number of text illegal keywords;

classifying and grading the fraud website data based on the attribute information, and marking classification results, wherein the classification results are exemplified by that if the server address of the fraud address is in southeast Asia, the fraud information is classified into 1 class, when the fraud information is classified into a high-speed type, the fraud information is classified into a class A, and when the number of the illegal keywords of the text is within a first preset range, the fraud information is defined as a class A, so that the marking of the fraud address is 1-A-a;

The classification and hierarchical storage of the existing fraud website data can be used for detecting the first characteristic data in a follow-up efficient manner.

S2: acquiring real-time Internet access data of a user side, capturing access website data characteristics, and preprocessing the access website data characteristics to obtain first characteristic data.

It should be noted that, the access website data features include the text illegal keywords and the number thereof, the server address, the domain name confusion degree (such as the total length of the domain name, the number of the domain names, the number of the non-standard ports represented), the ICP record result (including whether record is carried out or not), the number of the non-standard ports, and as shown in fig. 3, by way of example, the fraud website http:// zzz608.com, the fraud molecule can implement fraud by using 8990, 5318, 9900, 8866 and other non-standard ports, and the cost of the fraud molecule is lower by adopting the fraud mode.

Specifically, in the step, identifying the text illegal keywords in the data characteristics of the access website, and counting the number of the text illegal keywords;

acquiring the server address of the access website data characteristic;

and classifying and marking the access website data features based on the number of the illegal keywords of the text and the server address to obtain the first feature data, wherein the marking mode is the same as the step S1 and is not repeated here.

S3: and performing preliminary detection on the first characteristic data based on the fraud website database.

It should be noted that this step specifically includes:

selecting corresponding data in the classification result of the same specification in the fraud website database based on the classification mark obtained in the step S2;

the number of the text illegal keywords corresponding to the target data and the classification marks and the server address are respectively compared, wherein a comparison algorithm utilized in the embodiment is generally a similarity calculation method, which is a common technical means in the field and is not repeated herein;

in response to a successful comparison result, intercepting a website corresponding to the first characteristic data, and sending early warning information to the user side, for example, as shown in fig. 4, when the website corresponding to the first characteristic data is judged to be a fraud website, the website is intercepted, and a fraud information reminding page is displayed to remind the user of taking precautions;

S4: and in response to detecting that the fraud website database does not contain the first characteristic data, carrying out secondary processing on the characteristics of the access website data to obtain second characteristic data.

It should be noted that, when the existing database does not include the website, the website needs to be calculated based on the model constructed by deep learning to secondarily determine whether the new website is a fraud website, specifically:

defining the risk prediction value as second characteristic data.

S5: and inputting the second characteristic data into a preset fraud website identification model, and judging whether the website corresponding to the characteristic of the accessed website data is a fraud website according to the output result.

It should be noted that, the output result of the fraud website identification model is obtained by using the calculated risk prediction value change to determine whether the website corresponding to the accessed website data feature is a fraud website, and specifically, the preset fraud website identification model includes:

Further, determining whether the website corresponding to the accessed website data feature is a fraud website according to the output result includes:

as shown in table 1, the pressure drop fraud refers to a black ash industrial chain of various kinds of fraud, and in recent years, the fraud events hit by multiple actions are mostly known and prevented by users, while the high-level fraud is developed based on the high-speed development of the network, is generally not easily perceived by users, and is more in number of frauds.

Table 1: a fraud type table.

The first preset value and the second preset value can be set according to actual requirements.

S6: if yes, intercepting the target website and sending early warning information to the user side.

It should be noted that, as shown in fig. 4, when the website corresponding to the accessed website data feature is determined to be a fraud website, the website is intercepted, and a fraud information reminding page is displayed to remind the user of taking precautions, in addition, the website is marked by the method of step S1 and synchronously stored in a fraud website database, so that the detection of the subsequent fraud website is facilitated, and the recognition efficiency is improved;

if the website corresponding to the access website data characteristic is judged to be a non-fraud website, no interception operation is needed.

In order to verify and explain the technical effects adopted in the method, the traditional technical scheme and the method are adopted for comparison test, and the test results are compared by means of scientific demonstration to verify the true effects of the method.

The traditional technical scheme is as follows: the fraud website identification method mainly adopts a content-based matching technology, and carries out matching by capturing text keywords of webpage content from websites corresponding to websites, or carries out image classification by constructing a deep learning model corresponding to webpage images captured by websites, but has single mode, low identification efficiency and accuracy, and is easy to generate conditions of missing report, false report and the like, and has higher identification accuracy and efficiency compared with the traditional method for verification.

Table 2: comparison table of experimental results.

As can be seen from the table, compared with the traditional experimental method, the method has higher identification accuracy and efficiency and lower missing report rate, and the effectiveness of the method is reflected.

In the fraud website identification and early warning method, the method comprises the following steps: collecting and archiving existing fraud website data, and constructing a fraud website database based on the existing fraud website data; acquiring real-time Internet access data of a user side, capturing access website data characteristics, and preprocessing the access website data characteristics to obtain first characteristic data; performing preliminary detection on the first characteristic data based on the fraud website database; in response to detecting that the fraud website database does not contain the first characteristic data, performing secondary processing on the access website data characteristic to obtain second characteristic data; inputting the second characteristic data into a preset fraud website identification model, and judging whether the website corresponding to the characteristic of the accessed website data is a fraud website according to an output result; if yes, intercepting the target website and sending early warning information to the user side, the method and the device can accurately and efficiently identify the fraud-related websites and timely early warn the cheater.

It should be understood that, although the steps in the flowchart of fig. 2 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.

Example 2: in one embodiment, as shown in FIG. 5, there is provided a fraud website identification pre-warning device, comprising: the system comprises a database construction module, a first characteristic data acquisition module, a detection module, a second characteristic data acquisition module, a judgment module and an early warning information sending module, wherein:

As a preferred implementation manner, in the embodiment of the present invention, the database construction module is specifically configured to:

In a preferred embodiment of the present invention, the first feature data obtaining module is specifically configured to:

acquiring the server address of the access website data characteristic;

As a preferred implementation manner, in the embodiment of the present invention, the detection module is specifically configured to:

In a preferred embodiment of the present invention, the second feature data obtaining module is specifically configured to:

defining the risk prediction value as second characteristic data.

In an embodiment of the present invention, the determining module is specifically configured to:

inputting the second characteristic data into a preset fraud website identification model to obtain an output result, wherein the preset fraud website identification model comprises:

As a preferred implementation manner, in the embodiment of the present invention, the determining module is specifically further configured to:

For specific limitation of the fraud site identification and early warning device, reference may be made to the limitation of the fraud site identification and early warning method hereinabove, and the description thereof will not be repeated here. The modules in the fraud website identification and early warning device can be realized by all or part of software, hardware and combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

Example 3: in one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by the processor, implements a fraud website identification pre-warning method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 6 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of when executing the computer program:

s1: collecting and archiving existing fraud website data, and constructing a fraud website database based on the existing fraud website data;

s2: acquiring real-time Internet access data of a user side, capturing access website data characteristics, and preprocessing the access website data characteristics to obtain first characteristic data;

s3: performing preliminary detection on the first characteristic data based on the fraud website database;

s4: in response to detecting that the fraud website database does not contain the first characteristic data, performing secondary processing on the access website data characteristic to obtain second characteristic data;

s5: inputting the second characteristic data into a preset fraud website identification model, and judging whether the website corresponding to the characteristic of the accessed website data is a fraud website according to an output result;

Example 4: in one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application.

Claims

1. The fraud website identification and early warning method is characterized by comprising the following steps of:

if yes, intercepting the target website and sending early warning information to the user side;

wherein, the responding to the detection that the fraud website database does not contain the first characteristic data, the secondary processing of the access website data characteristic, and the obtaining of the second characteristic data comprise:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing risk prediction value->Represents a proportionality constant->Representing the fitting value +.>Representing a fitting function->Representing constant coefficients, ++>Indicating the degree of confusion of domain names>Representing the total length of the domain name>Number of domain names->Indicating the number of non-standard ports,representing the number of illegal keywords of text, +.>Indicating ICP recording result, if recording is available, +.>Without recording, let' t>，/>Representing preliminary detection similarity;

defining the risk prediction value as second characteristic data;

the preset fraud website identification model comprises the following steps:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing the output value +.>Representing correction factors->Is->Taking the whole number upwards to indicate the characteristic statistics of the data of the access website, < +.>Representing text feature statistics function corresponding to website +.>Representing the dynamic parameters;

the step of judging whether the website corresponding to the access website data characteristic is a fraud website according to the output result comprises the following steps:

2. The fraud website identification pre-warning method of claim 1, wherein said collecting and archiving existing fraud website data, constructing a fraud website database based on said existing fraud website data comprises:

3. The fraud website identification and early warning method according to claim 2, wherein the obtaining real-time internet access data of the user terminal, capturing access website data features, preprocessing the access website data features, and obtaining first feature data includes:

acquiring the server address of the access website data characteristic;

4. The fraud website identification and pre-warning method of claim 3, wherein said preliminary detection of said first characteristic data based on said fraud website database comprises:

responding to a comparison success result, intercepting a website corresponding to the first characteristic data, and sending early warning information to the user side;

5. A fraud website identification pre-warning device, the device comprising:

the early warning information sending module is used for intercepting a target website when the website corresponding to the access website data characteristic is a fraud website, and sending early warning information to a user side;

defining the risk prediction value as second characteristic data;

the preset fraud website identification model comprises the following steps:

6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 4 when the computer program is executed.

7. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 4.