CN113450139A

CN113450139A - Flow detection system based on interaction strategy, storage medium and electronic equipment

Info

Publication number: CN113450139A
Application number: CN202110625793.6A
Authority: CN
Inventors: 王硕; 周星杰; 李霞; 孙泽懿
Original assignee: Shanghai Minglue Artificial Intelligence Group Co Ltd
Current assignee: Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority date: 2021-06-04
Filing date: 2021-06-04
Publication date: 2021-09-28

Abstract

The application discloses a flow detection method, a flow detection system, a storage medium and electronic equipment based on an interaction strategy, wherein the detection method comprises the following steps: a pretreatment step: obtaining return log information of each user visiting contact, and preprocessing flow logs of the return log information; a characteristic obtaining step: extracting single-dimensional data characteristics from the preprocessed flow logs and receiving multi-dimensional data characteristics; and flow testing: obtaining a flow prediction result through the trained isolated forest model according to the single-dimensional data characteristics and the multi-dimensional data characteristics; a comparison step: and comparing and identifying the flow prediction result with the abnormal flow identified by the initialization rule to obtain the abnormal flow. The invention provides a weak supervision advertisement anti-fraud method based on an interaction strategy, which can effectively reduce the cost caused by rule change in the abnormal flow detection process and improve the interpretability and the identification performance of an abnormal flow model.

Description

Flow detection system based on interaction strategy, storage medium and electronic equipment

Technical Field

The invention belongs to the field of flow detection based on an interaction strategy, and particularly relates to flow detection based on the interaction strategy, a system, a storage medium and electronic equipment.

Background

With the development of internet finance in recent years, various terminal products emerge endlessly, marketing advertisement services develop continuously, and are driven by benefits, a large number of fraudulent molecules forge data, a large number of false accounts are maliciously registered, group partner packages and the like, the technical means of fraud are higher and higher, and the cost is lower and lower. Advertisement traffic often faces black-yielding batch attacks that penetrate various links of a service link, such as exposure, clicking, false transformation, malicious transformation and the like, and constitute a great threat to the benign development of advertisement services while seriously jeopardizing the rights and interests of advertisers. The anti-cheating advertisement is characterized by the concealment and dilution of behaviors, small quantity of group bad samples and high aggregation, a plurality of challenges are provided for the traditional method, and the deep mining of the complex network relationship behind the user becomes the important point for solving the group cheating. To identify these fraudulent users, and reduce various types of losses, anti-fraudsters use expert rules and predictive models to intercept fraudulent traffic.

Most of the existing methods are based on expert rules and machine learning prediction models. The expert rule-based method is characterized in that relevant rule templates are defined by means of business experience and expert rules for filtering, the expert rules and business backgrounds are strongly depended on, the cheating rules of the black product industry are thousands of times, different flow cheating methods in different fields are different, and the cheating modes are different, so that the generalization and the robustness of the expert rule-based method are not ideal. The method based on the machine learning prediction model reduces the business background requirements of technicians, does not need strong background knowledge to construct business rules, but the construction of the feature engineering of the method is important to the quality of a final model, the feature construction of the technicians usually depends on technical experience, and the performance of the model cannot be adjusted according to the performance of a model prediction result, so that the weakly supervised advertisement anti-fraud algorithm based on the interaction strategy is provided.

Disclosure of Invention

The embodiment of the application provides a flow detection method, a flow detection system, a storage medium and electronic equipment for an interaction strategy, and at least solves the problem that the existing flow detection method for the interaction strategy cannot adjust the model performance according to the performance of a model prediction result.

The invention provides a flow detection method of an interaction strategy, which comprises the following steps:

a pretreatment step: obtaining return log information of each user visiting contact, and preprocessing flow logs of the return log information;

a characteristic obtaining step: extracting single-dimensional data characteristics from the preprocessed flow logs and receiving multi-dimensional data characteristics;

and flow testing: obtaining a flow prediction result through the trained isolated forest model according to the single-dimensional data characteristics and the multi-dimensional data characteristics;

a comparison step: and comparing and identifying the flow prediction result with the abnormal flow identified by the initialization rule to obtain the abnormal flow.

The above flow rate detection method further includes:

sequencing feedback step: after the single-dimensional data features and the multi-dimensional data features are sorted according to the importance of the data features, at least one data feature with the top importance and the abnormal flow are fed back;

and (3) model improvement step: and analyzing at least one data feature with the front importance and the abnormal flow to obtain a new multi-dimensional data feature, and performing a new round of training and interaction on the isolated forest model according to the new multi-dimensional data feature and the single-dimensional data feature.

The flow detection method, wherein the preprocessing step includes:

analyzing the characteristic field from the flow log, and filling the missing value of the analyzed characteristic field.

The flow detection method, wherein the characteristic obtaining step includes:

extracting the characteristics of the single-dimensional data: extracting the single-dimensional data characteristics from the preprocessed flow logs;

multi-dimensional data feature extraction: extracting the multi-dimensional data features from an expert rule base.

The flow detection method, wherein the extracting of the single-dimensional data features includes:

discrete variable coding step: encoding discrete variables in the flow log by adopting an onehot algorithm;

a continuous variable coding step: and adopting zone mapping for continuous variables in the flow log, and mapping values of different zones into different values for encoding.

The flow detection method may further include the step of comparing:

a prediction step: predicting normal and abnormal flows in a test set through the soliton model to obtain a prediction result;

an identification step: and comparing the prediction result with the abnormal flow identified by the initialized rule, and identifying different abnormal flows.

The invention also provides a flow detection system based on the interaction strategy, which comprises the following steps:

the preprocessing module acquires return log information of each user visiting contact and preprocesses a flow log of the return log information;

the characteristic acquisition module extracts single-dimensional data characteristics from the preprocessed flow logs and receives multi-dimensional data characteristics;

the flow testing module obtains a flow prediction result through the trained isolated forest model according to the single-dimensional data characteristics and the multi-dimensional data characteristics;

and the comparison module compares and identifies the flow prediction result and the abnormal flow identified by the initialization rule to obtain the abnormal flow.

The above flow rate detection system further includes:

the sequencing feedback module is used for sequencing the single-dimensional data characteristics and the multi-dimensional data characteristics according to the importance of the data characteristics and then feeding back at least one data characteristic with the top importance and the abnormal flow;

and the model perfecting module analyzes at least one data feature with a front importance and the abnormal flow to obtain a new multi-dimensional data feature, and performs a new round of training and interaction on the isolated forest model according to the new multi-dimensional data feature and the single-dimensional data feature.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the flow detection method as described in any one of the above when executing the computer program.

The invention also provides a storage medium on which a computer program is stored, characterized in that the program, when executed by a processor, implements a flow detection method as described in any one of the above.

The invention has the beneficial effects that:

the invention belongs to the field of prediction and optimization in marketing intelligent technology, and provides a weakly supervised advertisement anti-fraud method based on an interaction strategy, which can effectively reduce the cost caused by rule change in the abnormal flow detection process and improve the interpretability and the identification performance of an abnormal flow model.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application.

In the drawings:

FIG. 1 is a flow chart of a traffic detection method of the interaction strategy of the present invention;

FIG. 2 is a flow chart illustrating the substeps of step S2 in FIG. 1;

FIG. 3 is a flowchart illustrating the substeps of step S21 in FIG. 2;

FIG. 4 is a flowchart illustrating the substeps of step S4 in FIG. 1;

FIG. 5 is a detailed flow chart of the traffic detection method of the interaction strategy of the present invention;

FIG. 6 is a schematic diagram of the structure of the traffic detection system of the interaction strategy of the present invention;

fig. 7 is a frame diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.

It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.

Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.

The present invention is described in detail with reference to the embodiments shown in the drawings, but it should be understood that these embodiments are not intended to limit the present invention, and those skilled in the art should understand that functional, methodological, or structural equivalents or substitutions made by these embodiments are within the scope of the present invention.

Before describing in detail the various embodiments of the present invention, the core inventive concepts of the present invention are summarized and described in detail by the following several embodiments.

The first embodiment is as follows:

referring to fig. 1, fig. 1 is a flowchart of a traffic detection method based on an interaction policy. As shown in fig. 1, the traffic detection method based on the interaction policy of the present invention includes:

preprocessing step S1: obtaining return log information of each user visiting contact, and preprocessing flow logs of the return log information;

a feature acquisition step S2: extracting single-dimensional data characteristics from the preprocessed flow logs and receiving multi-dimensional data characteristics;

flow rate test step S3: obtaining a flow prediction result through the trained isolated forest model according to the single-dimensional data characteristics and the multi-dimensional data characteristics;

comparison step S4: comparing and identifying the flow prediction result with the abnormal flow identified by the initialization rule to obtain the abnormal flow;

ranking feedback step S5: after the single-dimensional data features and the multi-dimensional data features are sorted according to the importance of the data features, at least one data feature with the top importance and the abnormal flow are fed back;

model improvement step S6: and analyzing at least one data feature with the front importance and the abnormal flow to obtain a new multi-dimensional data feature, and performing a new round of training and interaction on the isolated forest model according to the new multi-dimensional data feature and the single-dimensional data feature.

And analyzing the characteristic field from the flow log, and filling the missing value of the analyzed characteristic field.

Referring to fig. 2, fig. 2 is a flowchart of the feature obtaining step S2. As shown in fig. 2, the feature acquiring step S2 includes:

single-dimensional data feature extraction step S21: extracting the single-dimensional data characteristics from the preprocessed flow logs;

multidimensional data feature extraction step S22: extracting the multi-dimensional data features from an expert rule base.

Referring to fig. 3, fig. 3 is a flowchart of the single-dimensional data feature extraction step S21. As shown in fig. 3, the single-dimensional data feature extraction step S21 includes:

discrete variable encoding step S211: encoding discrete variables in the flow log by adopting an onehot algorithm;

continuous variable encoding step S212: and adopting zone mapping for continuous variables in the flow log, and mapping values of different zones into different values for encoding.

Referring to fig. 4, fig. 4 is a flowchart of the comparison step S4. As shown in fig. 4, the comparing step S4 includes:

prediction step S41: predicting normal and abnormal flows in a test set through the soliton model to obtain a prediction result;

identification step S42: and comparing the prediction result with the abnormal flow identified by the initialized rule, and identifying different abnormal flows.

As shown in fig. 5, the specific steps are as follows:

and a preprocessing step, namely acquiring returned log information of each user visiting contact through an advertisement flow monitoring system, and preprocessing the flow log. Analyzing a characteristic field from the flow log, filling missing values of the analyzed field, and adopting different filling modes according to different field characteristics, such as: an os field, populated with mode or unk; the number of seconds a request is received, average padding is used, etc.

And (4) a characteristic engineering construction step, wherein the construction of the characteristic engineering plays a very important role in the final effect of the model. The interactive strategy is adopted in the text, and the data characteristics are gradually enriched. The method mainly comprises two parts, namely, single-dimensional data feature extraction after log analysis. And secondly, extracting the multi-dimensional data features returned by the expert knowledge.

And (4) extracting the characteristics of the analyzed single-dimensional data, wherein the extracting step comprises the coding of discrete variables and continuous variables. The discrete variables are encoded by onehot, such as os, md, region and other characteristics. The continuous variable uses a zone map to map the values of different zones to different values, such as the number of seconds a request is received.

And (3) extracting the multi-dimensional data features returned by the expert knowledge, wherein initially, the expert summarizes the expert knowledge from an expert rule base and provides a relevant feature method to enrich inherent features. After the model completes the initial training, the expert refines the relevant rules from the feature importance ranking of top-k returned by the model, and uses the expert knowledge to summarize the features, and returns the features to the model rich feature engineering.

And (3) model training and testing, wherein in the actual flow returned by the system, the magnitude difference between the normal flow and the abnormal flow is large, the proportion of the abnormal flow in the whole sample is small, the difference between the characteristic expression of the abnormal flow and the characteristic expression of the normal flow is large, the abnormal flow is distributed sparsely and is far away from a high-density group. Therefore, the isolated forest algorithm is used as a basic model to identify abnormal traffic. Firstly, randomly selecting n pieces of data from training data as subsamples, and putting the subsamples into root nodes of an isolated tree; then randomly appointing a dimension, randomly generating a cutting point p in the range of the current node data, wherein the cutting point is generated between the maximum value and the minimum value of the appointed dimension in the current node data; generating a hyperplane according to the selection of the cutting point, dividing the data space of the current node into two subspaces, placing a point smaller than p in the current dimensionality on the left branch of the current node, placing a point larger than p on the right branch of the current node, repeating the operations on the left branch and the right branch, and continuously constructing new leaf nodes until only one piece of data or the tree grows to a preset height. And finally, integrating the results of all the isolated trees, and because the cutting process is random, cutting from the head for many times, and then calculating the average value of the cutting results of each time until the results converge.

And comparing the result difference and ordering the feature importance, predicting normal and abnormal flows in the test set by adopting the trained model, and comparing the normal and abnormal flows with the abnormal flows identified by adopting the initialized rule to identify the different abnormal flows. And (4) carrying out importance ranking on the characteristics used by model prediction, and selecting the characteristics of top5 and the difference flow to feed back to experts.

And executing an interactive strategy step, adopting the characteristics and the differential flow of the top5 returned by result difference comparison and characteristic importance sequencing, analyzing the rule characteristics contained in the top5 characteristics and the differential flow by using expert knowledge by an expert, adding the rule characteristics into an expert rule base, and adopting the characteristic engineering of the identified new rule enrichment model. And then carrying out a new round of model training and interaction until the model converges or expert knowledge cannot judge, namely the identified abnormal flow cannot be described by adopting rules.

Example two:

referring to fig. 6, fig. 6 is a schematic structural diagram of a traffic detection system based on an interaction policy according to the present invention.

Fig. 6 shows an interaction policy-based traffic detection system according to the present invention, which includes:

Example three:

referring to fig. 7, this embodiment discloses an embodiment of an electronic device. The electronic device may include a processor 81 and a memory 82 storing computer program instructions.

Specifically, the processor 81 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.

Memory 82 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 82 may include a Hard Disk Drive (Hard Disk Drive, abbreviated to HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 82 may include removable or non-removable (or fixed) media, where appropriate. The memory 82 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 82 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, Memory 82 includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), Electrically rewritable ROM (EAROM), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), an Extended data output Dynamic Random-Access Memory (EDODRAM), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.

The memory 82 may be used to store or cache various data files for processing and/or communication use, as well as possible computer program instructions executed by the processor 81.

The processor 81 reads and executes the computer program instructions stored in the memory 82 to implement any one of the above-described embodiments of the traffic detection method based on the interaction policy.

In some of these embodiments, the electronic device may also include a communication interface 83 and a bus 80. As shown in fig. 7, the processor 81, the memory 82, and the communication interface 83 are connected via the bus 80 to complete communication therebetween.

The communication interface 83 is used for implementing communication between modules, devices, units and/or equipment in the embodiment of the present application. The communication port 83 may also be implemented with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.

The bus 80 includes hardware, software, or both to couple the components of the electronic device to one another. Bus 80 includes, but is not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, Bus 80 may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (Front Side Bus), an FSB (FSB), a Hyper Transport (HT) Interconnect, an ISA (ISA) Bus, an Infini Band Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a microchannel Architecture (MCA) Bus, a PCI (Peripheral Component Interconnect) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a Video Electronics Bus (audio Electronics Association), abbreviated VLB) bus or other suitable bus or a combination of two or more of these. Bus 80 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.

The electronic device may implement the methods described in conjunction with fig. 1-4 based on traffic detection of an interaction policy.

In addition, in combination with the traffic detection method based on the interaction policy in the foregoing embodiments, embodiments of the present application may provide a computer-readable storage medium to implement the method. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any of the above-described embodiments of a method for interactive policy based traffic detection.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

In summary, the beneficial effects of the invention are that the invention provides a weakly supervised advertisement anti-fraud method based on an interaction strategy, which can effectively reduce the cost caused by rule change in the abnormal traffic detection process and improve the interpretability and the identification performance of the abnormal traffic model.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A traffic detection method based on an interaction strategy is characterized by comprising the following steps:

2. The flow sensing method of claim 1, further comprising:

3. The flow sensing method of claim 1, wherein the preprocessing step comprises:

4. The flow sensing method according to claim 1, wherein the feature obtaining step includes:

5. The flow sensing method of claim 4, wherein the single-dimensional data feature extraction step comprises:

6. The flow sensing method of claim 1, wherein the comparing step comprises:

7. An interactive policy based traffic detection system, comprising:

8. The flow sensing system of claim 7, further comprising:

9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the detection method of any one of claims 1 to 4 when executing the computer program.

10. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the detection method according to any one of claims 1 to 4.