CN113450139A - Flow detection system based on interaction strategy, storage medium and electronic equipment - Google Patents

Flow detection system based on interaction strategy, storage medium and electronic equipment Download PDF

Info

Publication number
CN113450139A
CN113450139A CN202110625793.6A CN202110625793A CN113450139A CN 113450139 A CN113450139 A CN 113450139A CN 202110625793 A CN202110625793 A CN 202110625793A CN 113450139 A CN113450139 A CN 113450139A
Authority
CN
China
Prior art keywords
flow
dimensional data
abnormal
data characteristics
prediction result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110625793.6A
Other languages
Chinese (zh)
Inventor
王硕
周星杰
李霞
孙泽懿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Minglue Artificial Intelligence Group Co Ltd
Original Assignee
Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Minglue Artificial Intelligence Group Co Ltd filed Critical Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority to CN202110625793.6A priority Critical patent/CN113450139A/en
Publication of CN113450139A publication Critical patent/CN113450139A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0248Avoiding fraud
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Computer Security & Cryptography (AREA)
  • Finance (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a flow detection method, a flow detection system, a storage medium and electronic equipment based on an interaction strategy, wherein the detection method comprises the following steps: a pretreatment step: obtaining return log information of each user visiting contact, and preprocessing flow logs of the return log information; a characteristic obtaining step: extracting single-dimensional data characteristics from the preprocessed flow logs and receiving multi-dimensional data characteristics; and flow testing: obtaining a flow prediction result through the trained isolated forest model according to the single-dimensional data characteristics and the multi-dimensional data characteristics; a comparison step: and comparing and identifying the flow prediction result with the abnormal flow identified by the initialization rule to obtain the abnormal flow. The invention provides a weak supervision advertisement anti-fraud method based on an interaction strategy, which can effectively reduce the cost caused by rule change in the abnormal flow detection process and improve the interpretability and the identification performance of an abnormal flow model.

Description

Flow detection system based on interaction strategy, storage medium and electronic equipment
Technical Field
The invention belongs to the field of flow detection based on an interaction strategy, and particularly relates to flow detection based on the interaction strategy, a system, a storage medium and electronic equipment.
Background
With the development of internet finance in recent years, various terminal products emerge endlessly, marketing advertisement services develop continuously, and are driven by benefits, a large number of fraudulent molecules forge data, a large number of false accounts are maliciously registered, group partner packages and the like, the technical means of fraud are higher and higher, and the cost is lower and lower. Advertisement traffic often faces black-yielding batch attacks that penetrate various links of a service link, such as exposure, clicking, false transformation, malicious transformation and the like, and constitute a great threat to the benign development of advertisement services while seriously jeopardizing the rights and interests of advertisers. The anti-cheating advertisement is characterized by the concealment and dilution of behaviors, small quantity of group bad samples and high aggregation, a plurality of challenges are provided for the traditional method, and the deep mining of the complex network relationship behind the user becomes the important point for solving the group cheating. To identify these fraudulent users, and reduce various types of losses, anti-fraudsters use expert rules and predictive models to intercept fraudulent traffic.
Most of the existing methods are based on expert rules and machine learning prediction models. The expert rule-based method is characterized in that relevant rule templates are defined by means of business experience and expert rules for filtering, the expert rules and business backgrounds are strongly depended on, the cheating rules of the black product industry are thousands of times, different flow cheating methods in different fields are different, and the cheating modes are different, so that the generalization and the robustness of the expert rule-based method are not ideal. The method based on the machine learning prediction model reduces the business background requirements of technicians, does not need strong background knowledge to construct business rules, but the construction of the feature engineering of the method is important to the quality of a final model, the feature construction of the technicians usually depends on technical experience, and the performance of the model cannot be adjusted according to the performance of a model prediction result, so that the weakly supervised advertisement anti-fraud algorithm based on the interaction strategy is provided.
Disclosure of Invention
The embodiment of the application provides a flow detection method, a flow detection system, a storage medium and electronic equipment for an interaction strategy, and at least solves the problem that the existing flow detection method for the interaction strategy cannot adjust the model performance according to the performance of a model prediction result.
The invention provides a flow detection method of an interaction strategy, which comprises the following steps:
a pretreatment step: obtaining return log information of each user visiting contact, and preprocessing flow logs of the return log information;
a characteristic obtaining step: extracting single-dimensional data characteristics from the preprocessed flow logs and receiving multi-dimensional data characteristics;
and flow testing: obtaining a flow prediction result through the trained isolated forest model according to the single-dimensional data characteristics and the multi-dimensional data characteristics;
a comparison step: and comparing and identifying the flow prediction result with the abnormal flow identified by the initialization rule to obtain the abnormal flow.
The above flow rate detection method further includes:
sequencing feedback step: after the single-dimensional data features and the multi-dimensional data features are sorted according to the importance of the data features, at least one data feature with the top importance and the abnormal flow are fed back;
and (3) model improvement step: and analyzing at least one data feature with the front importance and the abnormal flow to obtain a new multi-dimensional data feature, and performing a new round of training and interaction on the isolated forest model according to the new multi-dimensional data feature and the single-dimensional data feature.
The flow detection method, wherein the preprocessing step includes:
analyzing the characteristic field from the flow log, and filling the missing value of the analyzed characteristic field.
The flow detection method, wherein the characteristic obtaining step includes:
extracting the characteristics of the single-dimensional data: extracting the single-dimensional data characteristics from the preprocessed flow logs;
multi-dimensional data feature extraction: extracting the multi-dimensional data features from an expert rule base.
The flow detection method, wherein the extracting of the single-dimensional data features includes:
discrete variable coding step: encoding discrete variables in the flow log by adopting an onehot algorithm;
a continuous variable coding step: and adopting zone mapping for continuous variables in the flow log, and mapping values of different zones into different values for encoding.
The flow detection method may further include the step of comparing:
a prediction step: predicting normal and abnormal flows in a test set through the soliton model to obtain a prediction result;
an identification step: and comparing the prediction result with the abnormal flow identified by the initialized rule, and identifying different abnormal flows.
The invention also provides a flow detection system based on the interaction strategy, which comprises the following steps:
the preprocessing module acquires return log information of each user visiting contact and preprocesses a flow log of the return log information;
the characteristic acquisition module extracts single-dimensional data characteristics from the preprocessed flow logs and receives multi-dimensional data characteristics;
the flow testing module obtains a flow prediction result through the trained isolated forest model according to the single-dimensional data characteristics and the multi-dimensional data characteristics;
and the comparison module compares and identifies the flow prediction result and the abnormal flow identified by the initialization rule to obtain the abnormal flow.
The above flow rate detection system further includes:
the sequencing feedback module is used for sequencing the single-dimensional data characteristics and the multi-dimensional data characteristics according to the importance of the data characteristics and then feeding back at least one data characteristic with the top importance and the abnormal flow;
and the model perfecting module analyzes at least one data feature with a front importance and the abnormal flow to obtain a new multi-dimensional data feature, and performs a new round of training and interaction on the isolated forest model according to the new multi-dimensional data feature and the single-dimensional data feature.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the flow detection method as described in any one of the above when executing the computer program.
The invention also provides a storage medium on which a computer program is stored, characterized in that the program, when executed by a processor, implements a flow detection method as described in any one of the above.
The invention has the beneficial effects that:
the invention belongs to the field of prediction and optimization in marketing intelligent technology, and provides a weakly supervised advertisement anti-fraud method based on an interaction strategy, which can effectively reduce the cost caused by rule change in the abnormal flow detection process and improve the interpretability and the identification performance of an abnormal flow model.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application.
In the drawings:
FIG. 1 is a flow chart of a traffic detection method of the interaction strategy of the present invention;
FIG. 2 is a flow chart illustrating the substeps of step S2 in FIG. 1;
FIG. 3 is a flowchart illustrating the substeps of step S21 in FIG. 2;
FIG. 4 is a flowchart illustrating the substeps of step S4 in FIG. 1;
FIG. 5 is a detailed flow chart of the traffic detection method of the interaction strategy of the present invention;
FIG. 6 is a schematic diagram of the structure of the traffic detection system of the interaction strategy of the present invention;
fig. 7 is a frame diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The present invention is described in detail with reference to the embodiments shown in the drawings, but it should be understood that these embodiments are not intended to limit the present invention, and those skilled in the art should understand that functional, methodological, or structural equivalents or substitutions made by these embodiments are within the scope of the present invention.
Before describing in detail the various embodiments of the present invention, the core inventive concepts of the present invention are summarized and described in detail by the following several embodiments.
The first embodiment is as follows:
referring to fig. 1, fig. 1 is a flowchart of a traffic detection method based on an interaction policy. As shown in fig. 1, the traffic detection method based on the interaction policy of the present invention includes:
preprocessing step S1: obtaining return log information of each user visiting contact, and preprocessing flow logs of the return log information;
a feature acquisition step S2: extracting single-dimensional data characteristics from the preprocessed flow logs and receiving multi-dimensional data characteristics;
flow rate test step S3: obtaining a flow prediction result through the trained isolated forest model according to the single-dimensional data characteristics and the multi-dimensional data characteristics;
comparison step S4: comparing and identifying the flow prediction result with the abnormal flow identified by the initialization rule to obtain the abnormal flow;
ranking feedback step S5: after the single-dimensional data features and the multi-dimensional data features are sorted according to the importance of the data features, at least one data feature with the top importance and the abnormal flow are fed back;
model improvement step S6: and analyzing at least one data feature with the front importance and the abnormal flow to obtain a new multi-dimensional data feature, and performing a new round of training and interaction on the isolated forest model according to the new multi-dimensional data feature and the single-dimensional data feature.
And analyzing the characteristic field from the flow log, and filling the missing value of the analyzed characteristic field.
Referring to fig. 2, fig. 2 is a flowchart of the feature obtaining step S2. As shown in fig. 2, the feature acquiring step S2 includes:
single-dimensional data feature extraction step S21: extracting the single-dimensional data characteristics from the preprocessed flow logs;
multidimensional data feature extraction step S22: extracting the multi-dimensional data features from an expert rule base.
Referring to fig. 3, fig. 3 is a flowchart of the single-dimensional data feature extraction step S21. As shown in fig. 3, the single-dimensional data feature extraction step S21 includes:
discrete variable encoding step S211: encoding discrete variables in the flow log by adopting an onehot algorithm;
continuous variable encoding step S212: and adopting zone mapping for continuous variables in the flow log, and mapping values of different zones into different values for encoding.
Referring to fig. 4, fig. 4 is a flowchart of the comparison step S4. As shown in fig. 4, the comparing step S4 includes:
prediction step S41: predicting normal and abnormal flows in a test set through the soliton model to obtain a prediction result;
identification step S42: and comparing the prediction result with the abnormal flow identified by the initialized rule, and identifying different abnormal flows.
As shown in fig. 5, the specific steps are as follows:
and a preprocessing step, namely acquiring returned log information of each user visiting contact through an advertisement flow monitoring system, and preprocessing the flow log. Analyzing a characteristic field from the flow log, filling missing values of the analyzed field, and adopting different filling modes according to different field characteristics, such as: an os field, populated with mode or unk; the number of seconds a request is received, average padding is used, etc.
And (4) a characteristic engineering construction step, wherein the construction of the characteristic engineering plays a very important role in the final effect of the model. The interactive strategy is adopted in the text, and the data characteristics are gradually enriched. The method mainly comprises two parts, namely, single-dimensional data feature extraction after log analysis. And secondly, extracting the multi-dimensional data features returned by the expert knowledge.
And (4) extracting the characteristics of the analyzed single-dimensional data, wherein the extracting step comprises the coding of discrete variables and continuous variables. The discrete variables are encoded by onehot, such as os, md, region and other characteristics. The continuous variable uses a zone map to map the values of different zones to different values, such as the number of seconds a request is received.
And (3) extracting the multi-dimensional data features returned by the expert knowledge, wherein initially, the expert summarizes the expert knowledge from an expert rule base and provides a relevant feature method to enrich inherent features. After the model completes the initial training, the expert refines the relevant rules from the feature importance ranking of top-k returned by the model, and uses the expert knowledge to summarize the features, and returns the features to the model rich feature engineering.
And (3) model training and testing, wherein in the actual flow returned by the system, the magnitude difference between the normal flow and the abnormal flow is large, the proportion of the abnormal flow in the whole sample is small, the difference between the characteristic expression of the abnormal flow and the characteristic expression of the normal flow is large, the abnormal flow is distributed sparsely and is far away from a high-density group. Therefore, the isolated forest algorithm is used as a basic model to identify abnormal traffic. Firstly, randomly selecting n pieces of data from training data as subsamples, and putting the subsamples into root nodes of an isolated tree; then randomly appointing a dimension, randomly generating a cutting point p in the range of the current node data, wherein the cutting point is generated between the maximum value and the minimum value of the appointed dimension in the current node data; generating a hyperplane according to the selection of the cutting point, dividing the data space of the current node into two subspaces, placing a point smaller than p in the current dimensionality on the left branch of the current node, placing a point larger than p on the right branch of the current node, repeating the operations on the left branch and the right branch, and continuously constructing new leaf nodes until only one piece of data or the tree grows to a preset height. And finally, integrating the results of all the isolated trees, and because the cutting process is random, cutting from the head for many times, and then calculating the average value of the cutting results of each time until the results converge.
And comparing the result difference and ordering the feature importance, predicting normal and abnormal flows in the test set by adopting the trained model, and comparing the normal and abnormal flows with the abnormal flows identified by adopting the initialized rule to identify the different abnormal flows. And (4) carrying out importance ranking on the characteristics used by model prediction, and selecting the characteristics of top5 and the difference flow to feed back to experts.
And executing an interactive strategy step, adopting the characteristics and the differential flow of the top5 returned by result difference comparison and characteristic importance sequencing, analyzing the rule characteristics contained in the top5 characteristics and the differential flow by using expert knowledge by an expert, adding the rule characteristics into an expert rule base, and adopting the characteristic engineering of the identified new rule enrichment model. And then carrying out a new round of model training and interaction until the model converges or expert knowledge cannot judge, namely the identified abnormal flow cannot be described by adopting rules.
Example two:
referring to fig. 6, fig. 6 is a schematic structural diagram of a traffic detection system based on an interaction policy according to the present invention.
Fig. 6 shows an interaction policy-based traffic detection system according to the present invention, which includes:
the preprocessing module acquires return log information of each user visiting contact and preprocesses a flow log of the return log information;
the characteristic acquisition module extracts single-dimensional data characteristics from the preprocessed flow logs and receives multi-dimensional data characteristics;
the flow testing module obtains a flow prediction result through the trained isolated forest model according to the single-dimensional data characteristics and the multi-dimensional data characteristics;
and the comparison module compares and identifies the flow prediction result and the abnormal flow identified by the initialization rule to obtain the abnormal flow.
The sequencing feedback module is used for sequencing the single-dimensional data characteristics and the multi-dimensional data characteristics according to the importance of the data characteristics and then feeding back at least one data characteristic with the top importance and the abnormal flow;
and the model perfecting module analyzes at least one data feature with a front importance and the abnormal flow to obtain a new multi-dimensional data feature, and performs a new round of training and interaction on the isolated forest model according to the new multi-dimensional data feature and the single-dimensional data feature.
Example three:
referring to fig. 7, this embodiment discloses an embodiment of an electronic device. The electronic device may include a processor 81 and a memory 82 storing computer program instructions.
Specifically, the processor 81 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 82 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 82 may include a Hard Disk Drive (Hard Disk Drive, abbreviated to HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 82 may include removable or non-removable (or fixed) media, where appropriate. The memory 82 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 82 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, Memory 82 includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), Electrically rewritable ROM (EAROM), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), an Extended data output Dynamic Random-Access Memory (EDODRAM), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.
The memory 82 may be used to store or cache various data files for processing and/or communication use, as well as possible computer program instructions executed by the processor 81.
The processor 81 reads and executes the computer program instructions stored in the memory 82 to implement any one of the above-described embodiments of the traffic detection method based on the interaction policy.
In some of these embodiments, the electronic device may also include a communication interface 83 and a bus 80. As shown in fig. 7, the processor 81, the memory 82, and the communication interface 83 are connected via the bus 80 to complete communication therebetween.
The communication interface 83 is used for implementing communication between modules, devices, units and/or equipment in the embodiment of the present application. The communication port 83 may also be implemented with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.
The bus 80 includes hardware, software, or both to couple the components of the electronic device to one another. Bus 80 includes, but is not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, Bus 80 may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (Front Side Bus), an FSB (FSB), a Hyper Transport (HT) Interconnect, an ISA (ISA) Bus, an Infini Band Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a microchannel Architecture (MCA) Bus, a PCI (Peripheral Component Interconnect) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a Video Electronics Bus (audio Electronics Association), abbreviated VLB) bus or other suitable bus or a combination of two or more of these. Bus 80 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
The electronic device may implement the methods described in conjunction with fig. 1-4 based on traffic detection of an interaction policy.
In addition, in combination with the traffic detection method based on the interaction policy in the foregoing embodiments, embodiments of the present application may provide a computer-readable storage medium to implement the method. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any of the above-described embodiments of a method for interactive policy based traffic detection.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
In summary, the beneficial effects of the invention are that the invention provides a weakly supervised advertisement anti-fraud method based on an interaction strategy, which can effectively reduce the cost caused by rule change in the abnormal traffic detection process and improve the interpretability and the identification performance of the abnormal traffic model.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A traffic detection method based on an interaction strategy is characterized by comprising the following steps:
a pretreatment step: obtaining return log information of each user visiting contact, and preprocessing flow logs of the return log information;
a characteristic obtaining step: extracting single-dimensional data characteristics from the preprocessed flow logs and receiving multi-dimensional data characteristics;
and flow testing: obtaining a flow prediction result through the trained isolated forest model according to the single-dimensional data characteristics and the multi-dimensional data characteristics;
a comparison step: and comparing and identifying the flow prediction result with the abnormal flow identified by the initialization rule to obtain the abnormal flow.
2. The flow sensing method of claim 1, further comprising:
sequencing feedback step: after the single-dimensional data features and the multi-dimensional data features are sorted according to the importance of the data features, at least one data feature with the top importance and the abnormal flow are fed back;
and (3) model improvement step: and analyzing at least one data feature with the front importance and the abnormal flow to obtain a new multi-dimensional data feature, and performing a new round of training and interaction on the isolated forest model according to the new multi-dimensional data feature and the single-dimensional data feature.
3. The flow sensing method of claim 1, wherein the preprocessing step comprises:
analyzing the characteristic field from the flow log, and filling the missing value of the analyzed characteristic field.
4. The flow sensing method according to claim 1, wherein the feature obtaining step includes:
extracting the characteristics of the single-dimensional data: extracting the single-dimensional data characteristics from the preprocessed flow logs;
multi-dimensional data feature extraction: extracting the multi-dimensional data features from an expert rule base.
5. The flow sensing method of claim 4, wherein the single-dimensional data feature extraction step comprises:
discrete variable coding step: encoding discrete variables in the flow log by adopting an onehot algorithm;
a continuous variable coding step: and adopting zone mapping for continuous variables in the flow log, and mapping values of different zones into different values for encoding.
6. The flow sensing method of claim 1, wherein the comparing step comprises:
a prediction step: predicting normal and abnormal flows in a test set through the soliton model to obtain a prediction result;
an identification step: and comparing the prediction result with the abnormal flow identified by the initialized rule, and identifying different abnormal flows.
7. An interactive policy based traffic detection system, comprising:
the preprocessing module acquires return log information of each user visiting contact and preprocesses a flow log of the return log information;
the characteristic acquisition module extracts single-dimensional data characteristics from the preprocessed flow logs and receives multi-dimensional data characteristics;
the flow testing module obtains a flow prediction result through the trained isolated forest model according to the single-dimensional data characteristics and the multi-dimensional data characteristics;
and the comparison module compares and identifies the flow prediction result and the abnormal flow identified by the initialization rule to obtain the abnormal flow.
8. The flow sensing system of claim 7, further comprising:
the sequencing feedback module is used for sequencing the single-dimensional data characteristics and the multi-dimensional data characteristics according to the importance of the data characteristics and then feeding back at least one data characteristic with the top importance and the abnormal flow;
and the model perfecting module analyzes at least one data feature with a front importance and the abnormal flow to obtain a new multi-dimensional data feature, and performs a new round of training and interaction on the isolated forest model according to the new multi-dimensional data feature and the single-dimensional data feature.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the detection method of any one of claims 1 to 4 when executing the computer program.
10. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the detection method according to any one of claims 1 to 4.
CN202110625793.6A 2021-06-04 2021-06-04 Flow detection system based on interaction strategy, storage medium and electronic equipment Pending CN113450139A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110625793.6A CN113450139A (en) 2021-06-04 2021-06-04 Flow detection system based on interaction strategy, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110625793.6A CN113450139A (en) 2021-06-04 2021-06-04 Flow detection system based on interaction strategy, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN113450139A true CN113450139A (en) 2021-09-28

Family

ID=77810789

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110625793.6A Pending CN113450139A (en) 2021-06-04 2021-06-04 Flow detection system based on interaction strategy, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN113450139A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116524297A (en) * 2023-04-28 2023-08-01 迈杰转化医学研究(苏州)有限公司 Weak supervision learning training method based on expert feedback

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111200530A (en) * 2019-12-31 2020-05-26 畅捷通信息技术股份有限公司 Method and device for performing root cause analysis based on KPI (Key performance indicator)
CN112311803A (en) * 2020-11-06 2021-02-02 杭州安恒信息技术股份有限公司 Rule base updating method and device, electronic equipment and readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111200530A (en) * 2019-12-31 2020-05-26 畅捷通信息技术股份有限公司 Method and device for performing root cause analysis based on KPI (Key performance indicator)
CN112311803A (en) * 2020-11-06 2021-02-02 杭州安恒信息技术股份有限公司 Rule base updating method and device, electronic equipment and readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116524297A (en) * 2023-04-28 2023-08-01 迈杰转化医学研究(苏州)有限公司 Weak supervision learning training method based on expert feedback
CN116524297B (en) * 2023-04-28 2024-02-13 迈杰转化医学研究(苏州)有限公司 Weak supervision learning training method based on expert feedback

Similar Documents

Publication Publication Date Title
Chu et al. Detecting social spam campaigns on twitter
CN106599686A (en) Malware clustering method based on TLSH character representation
CN109413028A (en) SQL injection detection method based on convolutional neural networks algorithm
CN105956472A (en) Method and system for identifying whether webpage includes malicious content or not
CN106708952B (en) A kind of Webpage clustering method and device
CN108319672B (en) Mobile terminal bad information filtering method and system based on cloud computing
CN111488623A (en) Webpage tampering detection method and related device
CN111371778B (en) Attack group identification method, device, computing equipment and medium
JP2014502753A (en) Web page information detection method and system
CN112070120A (en) Threat information processing method, device, electronic device and storage medium
CN112733146B (en) Penetration testing method, device and equipment based on machine learning and storage medium
CN108229170B (en) Software analysis method and apparatus using big data and neural network
CN111260220B (en) Group control equipment identification method and device, electronic equipment and storage medium
CN112487422B (en) Malicious document detection method and device, electronic equipment and storage medium
WO2020082763A1 (en) Decision trees-based method and apparatus for detecting phishing website, and computer device
CN111723371A (en) Method for constructing detection model of malicious file and method for detecting malicious file
CN111740957A (en) Automatic XSS attack detection method based on FP-tree optimization
CN109756467B (en) Phishing website identification method and device
CN114692593B (en) Network information safety monitoring and early warning method
CN114817808A (en) Illegal website identification method, device, electronic device and storage medium
CN113450139A (en) Flow detection system based on interaction strategy, storage medium and electronic equipment
Alkhatib et al. Mining the dark web: A novel approach for placing a dark website under investigation
CN108717637B (en) Automatic mining method and system for E-commerce safety related entities
CN115314268B (en) Malicious encryption traffic detection method and system based on traffic fingerprint and behavior
Sushma et al. Deep learning for phishing website detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination