CN113553499A

CN113553499A - Cheating detection method and system based on marketing fission and electronic equipment

Info

Publication number: CN113553499A
Application number: CN202110694939.2A
Authority: CN
Inventors: 高鹏; 康维鹏; 袁兰; 吴飞; 周伟华; 高峰; 潘晶
Original assignee: Hangzhou Mjoys Big Data Technology Co ltd
Current assignee: Hangzhou Mjoys Big Data Technology Co ltd
Priority date: 2021-06-22
Filing date: 2021-06-22
Publication date: 2021-10-26

Abstract

The application relates to a user cheating detection method based on marketing fission, wherein the method comprises the following steps: acquiring product data and user behavior data related to marketing behaviors, and marking cheating sample data in the product data and the user behavior data; respectively extracting the characteristics of the product data and the user behavior data, extracting the user characteristics and the product characteristics under a preset time window and a preset recording window to obtain a plurality of interval characteristics, and obtaining time sequence combination characteristics according to the time sequence combination interval characteristics; clustering the time-sequence combination features according to the similarity of the features to generate a plurality of clusters, and determining the maximum difference features of the clusters according to the information gain of the features in the clusters; and based on the class cluster, constructing a rule decision tree according to the maximum difference characteristics, determining nodes of the cheating sample data corresponding to the rule decision tree, and judging all objects in the nodes as the cheating data. Through the application, the problem that artificial cheating cannot be detected in the related technology is solved, and the accuracy of detecting the artificial cheating is improved.

Description

Cheating detection method and system based on marketing fission and electronic equipment

Technical Field

The application relates to the field of big data, in particular to a cheating detection method and system based on marketing fission and an electronic device.

Background

With the development of digital marketing, cheating behaviors such as false flow, manually brushing a single, and weeding wool are more and more concerned. The cheating behaviors generally comprise two categories of machine cheating and artificial cheating, the machine cheating generally comprises the step of sending mobile flow through a large number of testing machines or simulators or the step of manufacturing false flow through a crawler technology, a script brushing amount and the like, and the cost of the cheating behaviors is low and also accounts for a large amount.

Artificial cheating is the creation of false traffic by hiring a user or by incentives and inducements to have a user swipe advertisements, swipe clicks, swipe registered users, and take ad host phones. Because the artificial cheating has very high concealment, the detection capability of the artificial cheating is generally lower.

At present, no effective solution is provided for the problem of low capability of detecting artificial cheating in the related technology.

Disclosure of Invention

The embodiment of the application provides a cheating detection method and system based on marketing fission and electronic equipment, and aims to at least solve the problem that the detection capability of human work cheating is low in the related technology.

In a first aspect, an embodiment of the present application provides a cheating detection method based on marketing fission, the method including:

acquiring product data and user behavior data related to marketing behaviors, and marking cheating sample data in the product data and the user behavior data;

respectively extracting the characteristics of the product data and the user behavior data to respectively obtain user characteristics and product characteristics;

extracting the user characteristics and the product characteristics under a preset time window and a preset recording window to obtain interval characteristics, and recombining the interval characteristics according to a time sequence in a Skip-n mode to obtain time sequence combination characteristics;

clustering according to the similarity of the time sequence combination characteristics to generate a plurality of clusters, and determining the maximum difference characteristics according to the information gain of the characteristics in the clusters;

and constructing a rule decision tree according to the maximum difference characteristics based on the class cluster, determining nodes of the cheating sample data in the rule decision tree, and judging all objects in the nodes as cheating data.

In some embodiments, the clustering according to the similarity of the time-series combination features to generate a plurality of class clusters includes:

initializing a plurality of clustering centers, and acquiring Euclidean distances from the time sequence combination characteristics to the clustering centers;

and sequentially comparing the distances from the time sequence combination features to the clustering centers, and distributing the time sequence combination features to the clustering centers with the nearest Euclidean distance one by one for clustering to generate a plurality of clusters, wherein the time sequence combination features belong to one cluster and only belong to one cluster.

In some embodiments, the constructing a rule decision tree according to the maximum dissimilarity characteristics based on the class clusters comprises:

dividing the cluster into a plurality of sub-nodes by taking the maximum difference characteristics as a judgment condition;

calculating the maximum difference characteristics of the child nodes, and taking the maximum difference characteristics of the child nodes as a judgment condition to split the child nodes until the sample points in the child nodes are within a preset threshold range, or the similarity of the objects in the child nodes reaches a preset range;

and constructing a decision tree based on the cluster, the child nodes and the judgment condition.

In some embodiments, the performing feature extraction on the product data and the user behavior data respectively to obtain user features and product features respectively includes:

counting click access conditions of IP access pages and URLs of each product according to a preset period for the product data, and calculating average access time and access times in the preset period;

and for the user behavior data, counting the click access conditions of each user through the page and the URL according to a preset period, and calculating the average access time and the access times in the preset period.

In some embodiments, after combining the interval features in a Skip-n manner to obtain a plurality of time-series combination features, the method further includes:

vectorizing and representing the time sequence combination features, wherein the vectorizing and representing comprises the following steps:

for numerical data, after determining the segmentation grade according to the median and average scheme, carrying out numerical segmentation one-hot representation;

for character type data, word vector form representation is performed.

In some embodiments, the obtaining product data and user behavior data related to marketing behaviors, and the marking cheating sample data therein includes:

product data and user behavior data related to marketing behaviors are obtained, and automatic sampling and labeling are carried out based on preset simple rules;

and judging and determining the cheating sample data according to a manual inspection signal for the product data and the user behavior data after the automatic sampling and marking.

In some embodiments, the cluster center is determined by calculating the mean of the feature objects in the cluster in each dimension.

In a second aspect, embodiments of the present application provide a marketing fission-based user cheating detection system, the system comprising: the system comprises a data acquisition module, a characterization module, a clustering module and a decision module, wherein the data acquisition module is used for acquiring data;

the data acquisition module is used for acquiring product data and user behavior data related to marketing behaviors and marking cheating sample data in the product data and the user behavior data;

the characterization module is used for respectively performing feature extraction on the product data and the user behavior data to respectively obtain user features and product features;

the characterization module is further used for extracting the user characteristics and the product characteristics under a preset time window and a preset recording window to obtain interval characteristics, and recombining the interval characteristics according to a time sequence in a Skip-n mode to obtain time sequence combination characteristics;

the clustering module is used for clustering according to the similarity of the time sequence combination characteristics to generate a plurality of clusters, and determining the maximum difference characteristics according to the information gain of the characteristics in the clusters;

the decision module is used for constructing a rule decision tree according to the maximum difference characteristics based on the class cluster, determining nodes of the cheating sample data in the rule decision tree, and judging all objects in the nodes as cheating data.

In a third aspect, embodiments of the present application provide a computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing a marketing fission-based user cheating detection method as described in the first aspect above when executing the computer program.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a marketing fission-based user cheating detection method as described in the first aspect above.

Compared with the related technology, the marketing fission-based user cheating detection method provided by the embodiment of the application acquires the marketing behavior data, samples and labels the cheating behavior data, extracts the characteristics of the marketing behavior data, and combines the marketing behavior data according to the time sequence to obtain the time sequence combination characteristics; further, clustering is carried out based on the time sequence combination characteristics to obtain a plurality of clusters, the maximum difference characteristics are obtained, the clusters are split into a plurality of nodes according to the maximum difference characteristics, and finally the nodes where the cheating sample data of the sampling labels fall are determined, so that other data in the nodes are judged to belong to the cheating behavior data. The problem of lower ability of ascertaining people's work cheating among the correlation technique is solved, realize the self-discovery of online artifical cheating, promoted the accuracy of ascertaining people's work cheating.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a schematic illustration of an application environment of a marketing fission-based user cheating detection method according to an embodiment of the present application;

FIG. 2 is a flow diagram of a method of marketing fission-based user cheating detection according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a rule decision tree according to an embodiment of the present application;

FIG. 4 is a block diagram of a structure of a marketing fission-based user cheating detection system according to an embodiment of the present application;

fig. 5 is an internal structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.

It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.

Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.

The marketing fission-based user cheating detection method provided by the application can be applied to an application environment shown in fig. 1, fig. 1 is an application environment schematic diagram of the marketing fission-based user cheating detection method according to the embodiment of the application, and as shown in fig. 1, a terminal 10 and a server 11 communicate through a network. The user sends the access traffic to the server 11 through the terminal 10, and the server 11 may determine whether the access traffic sent by the user includes an artificial cheating action through an internal cheating detection algorithm, where the artificial cheating action includes, but is not limited to, advertisement swiping, click swiping, false user registration, and the like. It should be noted that the terminal 10 may be various personal computers, notebook computers, smart phones, tablet computers, portable wearable devices, and the like, and the server 11 may be an independent server or a server cluster formed by a plurality of servers.

The application provides a marketing fission-based user cheating detection method, fig. 2 is a flowchart of a marketing fission-based user cheating detection method according to an embodiment of the application, and as shown in fig. 2, the flowchart includes the following steps:

step S201, acquiring product data and user behavior data related to marketing behaviors, and marking cheating sample data in the product data and the user behavior data; the product data includes but is not limited to product name, promotion guest group label (including age segment, gender and education degree) and promotion region; the user behavior data includes, but is not limited to, user ID, user IP address, operation time, activity URL, page click information, page stay information, order quantity information, and the like. Furthermore, sampling and labeling the cheating sample data in the product data and the user behavior data in a mode of combining simple rules and manual review. It should be noted that, in this embodiment, first, a marketing behavior log burying point is determined, and then, user behavior data is obtained through the marketing behavior log burying point;

step S202, respectively extracting characteristics of the product data and the user behavior data to respectively obtain user characteristics and product characteristics; the specific way of feature extraction comprises: and counting the click access conditions on the user side according to a preset period, for example, which pages and URLs each user clicks to access. Or counting the click access conditions on the product side according to the prediction period, for example, the click access conditions under each IP access page and URL of the product. Finally, calculating the average access time and the access times in the period;

and step S203, extracting the user characteristics and the product characteristics under the preset time window and the preset recording window to obtain interval characteristics, and recombining the interval characteristics according to the time sequence in a Skip-n mode to obtain time sequence combination characteristics. It should be noted that, because the features obtained in step S202 are arranged according to a time sequence, when combining, framing needs to be performed according to a certain time window (that is, a preset time window in this embodiment), further, because a large amount of behavior data may exist in the certain time window, framing is further performed through a preset recording window to obtain an interval feature, optionally, the time window may take a value of 5 minutes, and the recording window may take a value of 10 times; finally, the interval characteristics are recombined in a Skip-n manner. It should be noted that, because a single certain timing characteristic is difficult to determine as a cheating behavior, in this embodiment, a Skip-n method is used to combine interval characteristics, so that a potential strong front-back association behavior in a cheating process can be found, and a Skip-n method can Skip some irrelevant characteristics in an intermediate timing sequence and then regenerate a new timing combination characteristic, for example: the sequential combination characteristics of net chat cheating can be three sequential processes of '1. adding friends', '2. half-month emotion establishment' and '3. borrowing money';

step S204, clustering is carried out according to the similarity of the time sequence combination characteristics to generate a plurality of clusters, and the maximum difference characteristics of the clusters are determined according to the information gain of the characteristics in the clusters; optionally, a k-means (kmeans) clustering algorithm is used for clustering, wherein after clustering is finished, the time sequence combination feature belongs to and only belongs to a nearest cluster. It should be noted that, in this embodiment, the distance from a feature object to the center of a class cluster is reflected as a similarity, and the smaller the distance from a feature object to the center of a class cluster is, the higher the similarity between the feature object and the class cluster is; in addition, the information gain is the difference between the empirical entropy H (D) of the set D and the conditional empirical entropy H (D | a) of the feature a under the given condition D, and is calculated by the following formula 1:

equation 1: g (D, a) ═ H (D) — H (D | a)

The empirical entropy H (D) is calculated by the following formula 2, and the conditional empirical entropy H (D | a) is calculated by the following formula 3, where D denotes the training data set, a denotes the features in the training data set, H (D | a) denotes the empirical conditional entropy of the features a with respect to the data set D, n denotes the total number of features of the data set D, and H (D | a ═ x_i) Feature A is fixed as x_iConditional entropy of time, Pi, represents the probability of feature a;

equation 2:

equation 3:

step S205, based on the class cluster, a rule decision tree is constructed according to the maximum difference characteristics, nodes of the rule decision tree corresponding to the cheating sample data are determined, and all objects in the nodes are determined to be cheating data. In this step, before splitting, the cluster is an otherwise impure set, and the cluster needs to be split into relatively pure subclasses, because the maximum difference feature reflects the feature that brings the maximum information gain to the cluster, and the information gain indicates the degree of information uncertainty reduction under a given condition, the maximum difference feature is selected as the splitting condition to split the cluster into a plurality of nodes step by step, thereby obtaining the purer subclasses. Finally, the specific node in the rule decision tree where the cheating sample data marked in step S201 falls is obtained, and it can be determined that other objects in the node are also the cheating sample data.

Through the steps S201 to S205, in this embodiment, after acquiring marketing behavior data, feature extraction, and time sequence combination, clustering the time sequence combination by using a clustering algorithm to generate a plurality of clusters, splitting the clusters into a plurality of nodes according to the maximum difference features, and constructing a rule decision tree according to the nodes and the decision relationship between the nodes; and finally, determining that the cheating data of the marked sample falls into a specific node in the rule decision tree, thereby judging that other data in the node are the cheating data. The problem that the artificial cheating behavior cannot be accurately identified due to the fact that the artificial cheating behavior has high hiding characteristics in the related technology is solved, the human working cheating behavior is detected based on the clustering algorithm, and accuracy of detection of the human working cheating behavior is improved.

In some embodiments, clustering according to the similarity of the time-series combination features, and generating a plurality of clusters includes:

initializing a plurality of clustering centers, and acquiring Euclidean distances from the time sequence combination characteristics to the clustering centers; determining a cluster center by calculating the mean value of the feature objects in the cluster in each dimension, calculating the cluster center by the following formula 4, and calculating the Euclidean distance by the following formula 5, wherein dis (X)_i，C_j) Representing the distance between a sample x and a cluster center c, m representing the dimension length of each vector, Xit representing the value of the sample Xi on the t-th dimension, Cjt representing the value of the cluster center Cj on the t-th dimension, and formula 5 integrally representing the Euclidean distance between the sample Xi and the cluster center Cj in the cluster;

equation 4: :

equation 5:

and sequentially comparing the distances from the time sequence combination characteristics to the clustering centers, and distributing the time sequence combination characteristics to the clustering centers with the shortest Euclidean distance one by one for clustering to generate a plurality of clusters, wherein after the clustering is finished, the time sequence combination characteristics belong to one cluster and only belong to one cluster.

In some embodiments, constructing the rule decision tree based on the most distinctive features based on the class clusters comprises: fig. 3 is a schematic diagram of a rule decision tree according to an embodiment of the present application, and as shown in fig. 3, first, a root node of a class cluster is split into a plurality of child nodes by using a maximum difference characteristic as a determination condition; secondly, calculating the maximum difference characteristics of the child nodes, and continuing splitting by taking the maximum difference characteristics of the child nodes as a judgment condition until the child nodes are split into leaf nodes, wherein the sample points in the leaf nodes are within a preset threshold range, or the similarity of the objects in the leaf nodes reaches a preset range. Alternatively, the preset range may be 90%. And finally, constructing a decision tree based on the cluster root node, each child node, the leaf nodes and the judgment conditions among the nodes. The embodiment splits the cluster by taking the maximum difference feature set as a judgment condition, so that the cluster can be split into more pure subclasses, and the splitting efficiency and accuracy are improved.

In some embodiments, the performing feature extraction on the product data and the user behavior data respectively, and the obtaining the user feature and the product feature respectively includes: counting click access conditions of IP access pages and URLs of products according to a preset period for product data, and calculating average access time and access times in the preset period; and for the user behavior data, counting the click access conditions of each user through the page and the URL according to a preset period, and calculating the average access time and the access times in the preset period. Wherein the preset period may be minutes, hours, days, etc.

In some embodiments, after combining the interval features in a Skip-n manner to obtain a plurality of time sequence combination features, vectorization representation of the time sequence combination features is also required, wherein for numerical data, a segmentation grade is determined according to a median and an average scheme, and then numerical segmentation one-hot representation is performed. And performing word vector representation on the character type data. It should be noted that one-hot representation and word vector representation are conventional means known to those skilled in the art, and are not described in detail in this embodiment

In some embodiments, obtaining product data and user behavior data related to marketing behaviors, and marking cheating sample data therein comprises: after product data and user behavior data related to marketing activities of the marketing network are acquired, automatic sampling and labeling are performed based on preset simple rules, for example, preliminary sampling and labeling are performed according to the access frequency of a single user per minute, the access frequency accumulated in a single hour and the like. Furthermore, due to the uncertainty of automatic sampling and labeling, the product data and the user cheating behavior data which are automatically sampled and labeled are judged according to the manual examination information, and cheating sample data are output. Through the mode of automatic marking + manual review, can reduce personnel's repetitive work to promote the marking efficiency of cheating sample data, simultaneously, also promoted the accuracy of marking.

It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.

The present embodiment further provides a system for detecting user cheating based on marketing fission, which is used to implement the foregoing embodiments and preferred embodiments, and the description of the system is omitted here for brevity. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

FIG. 4 is a block diagram of a structure of a marketing fission-based user cheating detection system according to an embodiment of the present application, as shown in FIG. 4, the system comprising: a data acquisition module 41, a characterization module 42, a clustering module 43, and a decision module 44, wherein;

the data acquisition module 41 is configured to acquire product data and user behavior data related to marketing behaviors and mark cheating sample data therein;

the characterization module 42 is configured to perform feature extraction on the product data and the user behavior data respectively to obtain a user feature and a product feature respectively;

the characterization module 42 is further configured to extract user features and product features under a preset time window and a preset recording window to obtain interval features, and recombine the interval features according to a time sequence in a Skip-n manner to obtain time sequence combination features;

the clustering module 43 is configured to perform clustering according to the similarity of the time sequence combination features to generate a plurality of clusters, and determine maximum difference features according to information gains of the features in the clusters;

the decision module 44 is configured to construct a rule decision tree according to the maximum difference features based on the class clusters, determine nodes of the rule decision tree corresponding to the cheating sample data, and determine all objects in the nodes as the cheating data.

The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.

In one embodiment, a computer device is provided, which may be a terminal. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a marketing fission-based cheating detection method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

In one embodiment, fig. 5 is a schematic diagram of an internal structure of an electronic device according to an embodiment of the present application, and as shown in fig. 5, an electronic device is provided, where the electronic device may be a server, and the internal structure diagram may be as shown in fig. 5. The electronic device comprises a processor, a network interface, an internal memory and a non-volatile memory connected by an internal bus, wherein the non-volatile memory stores an operating system, a computer program and a database. The processor is used for providing calculation and control capability, the network interface is used for communicating with an external terminal through network connection, the internal memory is used for providing an environment for an operating system and the running of a computer program, the computer program is executed by the processor to realize a cheating detection method based on marketing fission, and the database is used for storing data.

Those skilled in the art will appreciate that the configuration shown in fig. 5 is a block diagram of only a portion of the configuration associated with the present application, and does not constitute a limitation on the electronic device to which the present application is applied, and a particular electronic device may include more or less components than those shown in the drawings, or may combine certain components, or have a different arrangement of components.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It should be understood by those skilled in the art that various features of the above embodiments can be combined arbitrarily, and for the sake of brevity, all possible combinations of the features in the above embodiments are not described, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the features.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A marketing fission based cheating detection method, the method comprising:

2. The method of claim 1, wherein clustering based on the similarity of the time-series combination features to generate a plurality of clusters comprises:

3. The method of claim 1, wherein constructing a rule decision tree based on the cluster of classes according to the maximum dissimilarity features comprises:

4. The method of claim 1, wherein the performing feature extraction on the product data and the user behavior data respectively to obtain user features and product features respectively comprises:

5. The method of claim 1, wherein after combining the interval features in a Skip-n manner to obtain a plurality of time-series combined features, the method further comprises:

for character type data, word vector form representation is performed.

6. The method of claim 1, wherein obtaining product data and user behavior data related to marketing activities, and wherein tagging cheating sample data therein comprises:

7. The method according to claim 1, wherein the cluster center is determined by calculating the mean of the feature objects in the cluster in each dimension.

8. A marketing fission-based user cheating detection system, the system comprising: the system comprises a data acquisition module, a characterization module, a clustering module and a decision module, wherein the data acquisition module is used for acquiring data;

9. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements a marketing fission-based user cheating detection method of any one of claims 1-7.

10. A computer-readable storage medium, having stored thereon a computer program, characterized in that the program, when executed by a processor, implements a marketing fission-based user cheating detection method according to any one of claims 1 to 7.