CN113553499A - Cheating detection method and system based on marketing fission and electronic equipment - Google Patents
Cheating detection method and system based on marketing fission and electronic equipment Download PDFInfo
- Publication number
- CN113553499A CN113553499A CN202110694939.2A CN202110694939A CN113553499A CN 113553499 A CN113553499 A CN 113553499A CN 202110694939 A CN202110694939 A CN 202110694939A CN 113553499 A CN113553499 A CN 113553499A
- Authority
- CN
- China
- Prior art keywords
- data
- cheating
- features
- user
- product
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 28
- 230000004992 fission Effects 0.000 title claims abstract description 25
- 230000006399 behavior Effects 0.000 claims abstract description 67
- 238000003066 decision tree Methods 0.000 claims abstract description 25
- 238000000034 method Methods 0.000 claims abstract description 25
- 238000004590 computer program Methods 0.000 claims description 13
- 238000012512 characterization method Methods 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 8
- 238000003860 storage Methods 0.000 claims description 7
- 238000002372 labelling Methods 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 3
- 238000007689 inspection Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 2
- 230000001680 brushing effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000009432 framing Methods 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000009333 weeding Methods 0.000 description 1
- 210000002268 wool Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9537—Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
Abstract
The application relates to a user cheating detection method based on marketing fission, wherein the method comprises the following steps: acquiring product data and user behavior data related to marketing behaviors, and marking cheating sample data in the product data and the user behavior data; respectively extracting the characteristics of the product data and the user behavior data, extracting the user characteristics and the product characteristics under a preset time window and a preset recording window to obtain a plurality of interval characteristics, and obtaining time sequence combination characteristics according to the time sequence combination interval characteristics; clustering the time-sequence combination features according to the similarity of the features to generate a plurality of clusters, and determining the maximum difference features of the clusters according to the information gain of the features in the clusters; and based on the class cluster, constructing a rule decision tree according to the maximum difference characteristics, determining nodes of the cheating sample data corresponding to the rule decision tree, and judging all objects in the nodes as the cheating data. Through the application, the problem that artificial cheating cannot be detected in the related technology is solved, and the accuracy of detecting the artificial cheating is improved.
Description
Technical Field
The application relates to the field of big data, in particular to a cheating detection method and system based on marketing fission and an electronic device.
Background
With the development of digital marketing, cheating behaviors such as false flow, manually brushing a single, and weeding wool are more and more concerned. The cheating behaviors generally comprise two categories of machine cheating and artificial cheating, the machine cheating generally comprises the step of sending mobile flow through a large number of testing machines or simulators or the step of manufacturing false flow through a crawler technology, a script brushing amount and the like, and the cost of the cheating behaviors is low and also accounts for a large amount.
Artificial cheating is the creation of false traffic by hiring a user or by incentives and inducements to have a user swipe advertisements, swipe clicks, swipe registered users, and take ad host phones. Because the artificial cheating has very high concealment, the detection capability of the artificial cheating is generally lower.
At present, no effective solution is provided for the problem of low capability of detecting artificial cheating in the related technology.
Disclosure of Invention
The embodiment of the application provides a cheating detection method and system based on marketing fission and electronic equipment, and aims to at least solve the problem that the detection capability of human work cheating is low in the related technology.
In a first aspect, an embodiment of the present application provides a cheating detection method based on marketing fission, the method including:
acquiring product data and user behavior data related to marketing behaviors, and marking cheating sample data in the product data and the user behavior data;
respectively extracting the characteristics of the product data and the user behavior data to respectively obtain user characteristics and product characteristics;
extracting the user characteristics and the product characteristics under a preset time window and a preset recording window to obtain interval characteristics, and recombining the interval characteristics according to a time sequence in a Skip-n mode to obtain time sequence combination characteristics;
clustering according to the similarity of the time sequence combination characteristics to generate a plurality of clusters, and determining the maximum difference characteristics according to the information gain of the characteristics in the clusters;
and constructing a rule decision tree according to the maximum difference characteristics based on the class cluster, determining nodes of the cheating sample data in the rule decision tree, and judging all objects in the nodes as cheating data.
In some embodiments, the clustering according to the similarity of the time-series combination features to generate a plurality of class clusters includes:
initializing a plurality of clustering centers, and acquiring Euclidean distances from the time sequence combination characteristics to the clustering centers;
and sequentially comparing the distances from the time sequence combination features to the clustering centers, and distributing the time sequence combination features to the clustering centers with the nearest Euclidean distance one by one for clustering to generate a plurality of clusters, wherein the time sequence combination features belong to one cluster and only belong to one cluster.
In some embodiments, the constructing a rule decision tree according to the maximum dissimilarity characteristics based on the class clusters comprises:
dividing the cluster into a plurality of sub-nodes by taking the maximum difference characteristics as a judgment condition;
calculating the maximum difference characteristics of the child nodes, and taking the maximum difference characteristics of the child nodes as a judgment condition to split the child nodes until the sample points in the child nodes are within a preset threshold range, or the similarity of the objects in the child nodes reaches a preset range;
and constructing a decision tree based on the cluster, the child nodes and the judgment condition.
In some embodiments, the performing feature extraction on the product data and the user behavior data respectively to obtain user features and product features respectively includes:
counting click access conditions of IP access pages and URLs of each product according to a preset period for the product data, and calculating average access time and access times in the preset period;
and for the user behavior data, counting the click access conditions of each user through the page and the URL according to a preset period, and calculating the average access time and the access times in the preset period.
In some embodiments, after combining the interval features in a Skip-n manner to obtain a plurality of time-series combination features, the method further includes:
vectorizing and representing the time sequence combination features, wherein the vectorizing and representing comprises the following steps:
for numerical data, after determining the segmentation grade according to the median and average scheme, carrying out numerical segmentation one-hot representation;
for character type data, word vector form representation is performed.
In some embodiments, the obtaining product data and user behavior data related to marketing behaviors, and the marking cheating sample data therein includes:
product data and user behavior data related to marketing behaviors are obtained, and automatic sampling and labeling are carried out based on preset simple rules;
and judging and determining the cheating sample data according to a manual inspection signal for the product data and the user behavior data after the automatic sampling and marking.
In some embodiments, the cluster center is determined by calculating the mean of the feature objects in the cluster in each dimension.
In a second aspect, embodiments of the present application provide a marketing fission-based user cheating detection system, the system comprising: the system comprises a data acquisition module, a characterization module, a clustering module and a decision module, wherein the data acquisition module is used for acquiring data;
the data acquisition module is used for acquiring product data and user behavior data related to marketing behaviors and marking cheating sample data in the product data and the user behavior data;
the characterization module is used for respectively performing feature extraction on the product data and the user behavior data to respectively obtain user features and product features;
the characterization module is further used for extracting the user characteristics and the product characteristics under a preset time window and a preset recording window to obtain interval characteristics, and recombining the interval characteristics according to a time sequence in a Skip-n mode to obtain time sequence combination characteristics;
the clustering module is used for clustering according to the similarity of the time sequence combination characteristics to generate a plurality of clusters, and determining the maximum difference characteristics according to the information gain of the characteristics in the clusters;
the decision module is used for constructing a rule decision tree according to the maximum difference characteristics based on the class cluster, determining nodes of the cheating sample data in the rule decision tree, and judging all objects in the nodes as cheating data.
In a third aspect, embodiments of the present application provide a computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing a marketing fission-based user cheating detection method as described in the first aspect above when executing the computer program.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a marketing fission-based user cheating detection method as described in the first aspect above.
Compared with the related technology, the marketing fission-based user cheating detection method provided by the embodiment of the application acquires the marketing behavior data, samples and labels the cheating behavior data, extracts the characteristics of the marketing behavior data, and combines the marketing behavior data according to the time sequence to obtain the time sequence combination characteristics; further, clustering is carried out based on the time sequence combination characteristics to obtain a plurality of clusters, the maximum difference characteristics are obtained, the clusters are split into a plurality of nodes according to the maximum difference characteristics, and finally the nodes where the cheating sample data of the sampling labels fall are determined, so that other data in the nodes are judged to belong to the cheating behavior data. The problem of lower ability of ascertaining people's work cheating among the correlation technique is solved, realize the self-discovery of online artifical cheating, promoted the accuracy of ascertaining people's work cheating.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic illustration of an application environment of a marketing fission-based user cheating detection method according to an embodiment of the present application;
FIG. 2 is a flow diagram of a method of marketing fission-based user cheating detection according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a rule decision tree according to an embodiment of the present application;
FIG. 4 is a block diagram of a structure of a marketing fission-based user cheating detection system according to an embodiment of the present application;
fig. 5 is an internal structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The marketing fission-based user cheating detection method provided by the application can be applied to an application environment shown in fig. 1, fig. 1 is an application environment schematic diagram of the marketing fission-based user cheating detection method according to the embodiment of the application, and as shown in fig. 1, a terminal 10 and a server 11 communicate through a network. The user sends the access traffic to the server 11 through the terminal 10, and the server 11 may determine whether the access traffic sent by the user includes an artificial cheating action through an internal cheating detection algorithm, where the artificial cheating action includes, but is not limited to, advertisement swiping, click swiping, false user registration, and the like. It should be noted that the terminal 10 may be various personal computers, notebook computers, smart phones, tablet computers, portable wearable devices, and the like, and the server 11 may be an independent server or a server cluster formed by a plurality of servers.
The application provides a marketing fission-based user cheating detection method, fig. 2 is a flowchart of a marketing fission-based user cheating detection method according to an embodiment of the application, and as shown in fig. 2, the flowchart includes the following steps:
step S201, acquiring product data and user behavior data related to marketing behaviors, and marking cheating sample data in the product data and the user behavior data; the product data includes but is not limited to product name, promotion guest group label (including age segment, gender and education degree) and promotion region; the user behavior data includes, but is not limited to, user ID, user IP address, operation time, activity URL, page click information, page stay information, order quantity information, and the like. Furthermore, sampling and labeling the cheating sample data in the product data and the user behavior data in a mode of combining simple rules and manual review. It should be noted that, in this embodiment, first, a marketing behavior log burying point is determined, and then, user behavior data is obtained through the marketing behavior log burying point;
step S202, respectively extracting characteristics of the product data and the user behavior data to respectively obtain user characteristics and product characteristics; the specific way of feature extraction comprises: and counting the click access conditions on the user side according to a preset period, for example, which pages and URLs each user clicks to access. Or counting the click access conditions on the product side according to the prediction period, for example, the click access conditions under each IP access page and URL of the product. Finally, calculating the average access time and the access times in the period;
and step S203, extracting the user characteristics and the product characteristics under the preset time window and the preset recording window to obtain interval characteristics, and recombining the interval characteristics according to the time sequence in a Skip-n mode to obtain time sequence combination characteristics. It should be noted that, because the features obtained in step S202 are arranged according to a time sequence, when combining, framing needs to be performed according to a certain time window (that is, a preset time window in this embodiment), further, because a large amount of behavior data may exist in the certain time window, framing is further performed through a preset recording window to obtain an interval feature, optionally, the time window may take a value of 5 minutes, and the recording window may take a value of 10 times; finally, the interval characteristics are recombined in a Skip-n manner. It should be noted that, because a single certain timing characteristic is difficult to determine as a cheating behavior, in this embodiment, a Skip-n method is used to combine interval characteristics, so that a potential strong front-back association behavior in a cheating process can be found, and a Skip-n method can Skip some irrelevant characteristics in an intermediate timing sequence and then regenerate a new timing combination characteristic, for example: the sequential combination characteristics of net chat cheating can be three sequential processes of '1. adding friends', '2. half-month emotion establishment' and '3. borrowing money';
step S204, clustering is carried out according to the similarity of the time sequence combination characteristics to generate a plurality of clusters, and the maximum difference characteristics of the clusters are determined according to the information gain of the characteristics in the clusters; optionally, a k-means (kmeans) clustering algorithm is used for clustering, wherein after clustering is finished, the time sequence combination feature belongs to and only belongs to a nearest cluster. It should be noted that, in this embodiment, the distance from a feature object to the center of a class cluster is reflected as a similarity, and the smaller the distance from a feature object to the center of a class cluster is, the higher the similarity between the feature object and the class cluster is; in addition, the information gain is the difference between the empirical entropy H (D) of the set D and the conditional empirical entropy H (D | a) of the feature a under the given condition D, and is calculated by the following formula 1:
equation 1: g (D, a) ═ H (D) — H (D | a)
The empirical entropy H (D) is calculated by the following formula 2, and the conditional empirical entropy H (D | a) is calculated by the following formula 3, where D denotes the training data set, a denotes the features in the training data set, H (D | a) denotes the empirical conditional entropy of the features a with respect to the data set D, n denotes the total number of features of the data set D, and H (D | a ═ xi) Feature A is fixed as xiConditional entropy of time, Pi, represents the probability of feature a;
step S205, based on the class cluster, a rule decision tree is constructed according to the maximum difference characteristics, nodes of the rule decision tree corresponding to the cheating sample data are determined, and all objects in the nodes are determined to be cheating data. In this step, before splitting, the cluster is an otherwise impure set, and the cluster needs to be split into relatively pure subclasses, because the maximum difference feature reflects the feature that brings the maximum information gain to the cluster, and the information gain indicates the degree of information uncertainty reduction under a given condition, the maximum difference feature is selected as the splitting condition to split the cluster into a plurality of nodes step by step, thereby obtaining the purer subclasses. Finally, the specific node in the rule decision tree where the cheating sample data marked in step S201 falls is obtained, and it can be determined that other objects in the node are also the cheating sample data.
Through the steps S201 to S205, in this embodiment, after acquiring marketing behavior data, feature extraction, and time sequence combination, clustering the time sequence combination by using a clustering algorithm to generate a plurality of clusters, splitting the clusters into a plurality of nodes according to the maximum difference features, and constructing a rule decision tree according to the nodes and the decision relationship between the nodes; and finally, determining that the cheating data of the marked sample falls into a specific node in the rule decision tree, thereby judging that other data in the node are the cheating data. The problem that the artificial cheating behavior cannot be accurately identified due to the fact that the artificial cheating behavior has high hiding characteristics in the related technology is solved, the human working cheating behavior is detected based on the clustering algorithm, and accuracy of detection of the human working cheating behavior is improved.
In some embodiments, clustering according to the similarity of the time-series combination features, and generating a plurality of clusters includes:
initializing a plurality of clustering centers, and acquiring Euclidean distances from the time sequence combination characteristics to the clustering centers; determining a cluster center by calculating the mean value of the feature objects in the cluster in each dimension, calculating the cluster center by the following formula 4, and calculating the Euclidean distance by the following formula 5, wherein dis (X)i,Cj) Representing the distance between a sample x and a cluster center c, m representing the dimension length of each vector, Xit representing the value of the sample Xi on the t-th dimension, Cjt representing the value of the cluster center Cj on the t-th dimension, and formula 5 integrally representing the Euclidean distance between the sample Xi and the cluster center Cj in the cluster;
and sequentially comparing the distances from the time sequence combination characteristics to the clustering centers, and distributing the time sequence combination characteristics to the clustering centers with the shortest Euclidean distance one by one for clustering to generate a plurality of clusters, wherein after the clustering is finished, the time sequence combination characteristics belong to one cluster and only belong to one cluster.
In some embodiments, constructing the rule decision tree based on the most distinctive features based on the class clusters comprises: fig. 3 is a schematic diagram of a rule decision tree according to an embodiment of the present application, and as shown in fig. 3, first, a root node of a class cluster is split into a plurality of child nodes by using a maximum difference characteristic as a determination condition; secondly, calculating the maximum difference characteristics of the child nodes, and continuing splitting by taking the maximum difference characteristics of the child nodes as a judgment condition until the child nodes are split into leaf nodes, wherein the sample points in the leaf nodes are within a preset threshold range, or the similarity of the objects in the leaf nodes reaches a preset range. Alternatively, the preset range may be 90%. And finally, constructing a decision tree based on the cluster root node, each child node, the leaf nodes and the judgment conditions among the nodes. The embodiment splits the cluster by taking the maximum difference feature set as a judgment condition, so that the cluster can be split into more pure subclasses, and the splitting efficiency and accuracy are improved.
In some embodiments, the performing feature extraction on the product data and the user behavior data respectively, and the obtaining the user feature and the product feature respectively includes: counting click access conditions of IP access pages and URLs of products according to a preset period for product data, and calculating average access time and access times in the preset period; and for the user behavior data, counting the click access conditions of each user through the page and the URL according to a preset period, and calculating the average access time and the access times in the preset period. Wherein the preset period may be minutes, hours, days, etc.
In some embodiments, after combining the interval features in a Skip-n manner to obtain a plurality of time sequence combination features, vectorization representation of the time sequence combination features is also required, wherein for numerical data, a segmentation grade is determined according to a median and an average scheme, and then numerical segmentation one-hot representation is performed. And performing word vector representation on the character type data. It should be noted that one-hot representation and word vector representation are conventional means known to those skilled in the art, and are not described in detail in this embodiment
In some embodiments, obtaining product data and user behavior data related to marketing behaviors, and marking cheating sample data therein comprises: after product data and user behavior data related to marketing activities of the marketing network are acquired, automatic sampling and labeling are performed based on preset simple rules, for example, preliminary sampling and labeling are performed according to the access frequency of a single user per minute, the access frequency accumulated in a single hour and the like. Furthermore, due to the uncertainty of automatic sampling and labeling, the product data and the user cheating behavior data which are automatically sampled and labeled are judged according to the manual examination information, and cheating sample data are output. Through the mode of automatic marking + manual review, can reduce personnel's repetitive work to promote the marking efficiency of cheating sample data, simultaneously, also promoted the accuracy of marking.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
The present embodiment further provides a system for detecting user cheating based on marketing fission, which is used to implement the foregoing embodiments and preferred embodiments, and the description of the system is omitted here for brevity. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
FIG. 4 is a block diagram of a structure of a marketing fission-based user cheating detection system according to an embodiment of the present application, as shown in FIG. 4, the system comprising: a data acquisition module 41, a characterization module 42, a clustering module 43, and a decision module 44, wherein;
the data acquisition module 41 is configured to acquire product data and user behavior data related to marketing behaviors and mark cheating sample data therein;
the characterization module 42 is configured to perform feature extraction on the product data and the user behavior data respectively to obtain a user feature and a product feature respectively;
the characterization module 42 is further configured to extract user features and product features under a preset time window and a preset recording window to obtain interval features, and recombine the interval features according to a time sequence in a Skip-n manner to obtain time sequence combination features;
the clustering module 43 is configured to perform clustering according to the similarity of the time sequence combination features to generate a plurality of clusters, and determine maximum difference features according to information gains of the features in the clusters;
the decision module 44 is configured to construct a rule decision tree according to the maximum difference features based on the class clusters, determine nodes of the rule decision tree corresponding to the cheating sample data, and determine all objects in the nodes as the cheating data.
The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
In one embodiment, a computer device is provided, which may be a terminal. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a marketing fission-based cheating detection method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
In one embodiment, fig. 5 is a schematic diagram of an internal structure of an electronic device according to an embodiment of the present application, and as shown in fig. 5, an electronic device is provided, where the electronic device may be a server, and the internal structure diagram may be as shown in fig. 5. The electronic device comprises a processor, a network interface, an internal memory and a non-volatile memory connected by an internal bus, wherein the non-volatile memory stores an operating system, a computer program and a database. The processor is used for providing calculation and control capability, the network interface is used for communicating with an external terminal through network connection, the internal memory is used for providing an environment for an operating system and the running of a computer program, the computer program is executed by the processor to realize a cheating detection method based on marketing fission, and the database is used for storing data.
Those skilled in the art will appreciate that the configuration shown in fig. 5 is a block diagram of only a portion of the configuration associated with the present application, and does not constitute a limitation on the electronic device to which the present application is applied, and a particular electronic device may include more or less components than those shown in the drawings, or may combine certain components, or have a different arrangement of components.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be understood by those skilled in the art that various features of the above embodiments can be combined arbitrarily, and for the sake of brevity, all possible combinations of the features in the above embodiments are not described, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the features.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (10)
1. A marketing fission based cheating detection method, the method comprising:
acquiring product data and user behavior data related to marketing behaviors, and marking cheating sample data in the product data and the user behavior data;
respectively extracting the characteristics of the product data and the user behavior data to respectively obtain user characteristics and product characteristics;
extracting the user characteristics and the product characteristics under a preset time window and a preset recording window to obtain interval characteristics, and recombining the interval characteristics according to a time sequence in a Skip-n mode to obtain time sequence combination characteristics;
clustering according to the similarity of the time sequence combination characteristics to generate a plurality of clusters, and determining the maximum difference characteristics according to the information gain of the characteristics in the clusters;
and constructing a rule decision tree according to the maximum difference characteristics based on the class cluster, determining nodes of the cheating sample data in the rule decision tree, and judging all objects in the nodes as cheating data.
2. The method of claim 1, wherein clustering based on the similarity of the time-series combination features to generate a plurality of clusters comprises:
initializing a plurality of clustering centers, and acquiring Euclidean distances from the time sequence combination characteristics to the clustering centers;
and sequentially comparing the distances from the time sequence combination features to the clustering centers, and distributing the time sequence combination features to the clustering centers with the nearest Euclidean distance one by one for clustering to generate a plurality of clusters, wherein the time sequence combination features belong to one cluster and only belong to one cluster.
3. The method of claim 1, wherein constructing a rule decision tree based on the cluster of classes according to the maximum dissimilarity features comprises:
dividing the cluster into a plurality of sub-nodes by taking the maximum difference characteristics as a judgment condition;
calculating the maximum difference characteristics of the child nodes, and taking the maximum difference characteristics of the child nodes as a judgment condition to split the child nodes until the sample points in the child nodes are within a preset threshold range, or the similarity of the objects in the child nodes reaches a preset range;
and constructing a decision tree based on the cluster, the child nodes and the judgment condition.
4. The method of claim 1, wherein the performing feature extraction on the product data and the user behavior data respectively to obtain user features and product features respectively comprises:
counting click access conditions of IP access pages and URLs of each product according to a preset period for the product data, and calculating average access time and access times in the preset period;
and for the user behavior data, counting the click access conditions of each user through the page and the URL according to a preset period, and calculating the average access time and the access times in the preset period.
5. The method of claim 1, wherein after combining the interval features in a Skip-n manner to obtain a plurality of time-series combined features, the method further comprises:
vectorizing and representing the time sequence combination features, wherein the vectorizing and representing comprises the following steps:
for numerical data, after determining the segmentation grade according to the median and average scheme, carrying out numerical segmentation one-hot representation;
for character type data, word vector form representation is performed.
6. The method of claim 1, wherein obtaining product data and user behavior data related to marketing activities, and wherein tagging cheating sample data therein comprises:
product data and user behavior data related to marketing behaviors are obtained, and automatic sampling and labeling are carried out based on preset simple rules;
and judging and determining the cheating sample data according to a manual inspection signal for the product data and the user behavior data after the automatic sampling and marking.
7. The method according to claim 1, wherein the cluster center is determined by calculating the mean of the feature objects in the cluster in each dimension.
8. A marketing fission-based user cheating detection system, the system comprising: the system comprises a data acquisition module, a characterization module, a clustering module and a decision module, wherein the data acquisition module is used for acquiring data;
the data acquisition module is used for acquiring product data and user behavior data related to marketing behaviors and marking cheating sample data in the product data and the user behavior data;
the characterization module is used for respectively performing feature extraction on the product data and the user behavior data to respectively obtain user features and product features;
the characterization module is further used for extracting the user characteristics and the product characteristics under a preset time window and a preset recording window to obtain interval characteristics, and recombining the interval characteristics according to a time sequence in a Skip-n mode to obtain time sequence combination characteristics;
the clustering module is used for clustering according to the similarity of the time sequence combination characteristics to generate a plurality of clusters, and determining the maximum difference characteristics according to the information gain of the characteristics in the clusters;
the decision module is used for constructing a rule decision tree according to the maximum difference characteristics based on the class cluster, determining nodes of the cheating sample data in the rule decision tree, and judging all objects in the nodes as cheating data.
9. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements a marketing fission-based user cheating detection method of any one of claims 1-7.
10. A computer-readable storage medium, having stored thereon a computer program, characterized in that the program, when executed by a processor, implements a marketing fission-based user cheating detection method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110694939.2A CN113553499A (en) | 2021-06-22 | 2021-06-22 | Cheating detection method and system based on marketing fission and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110694939.2A CN113553499A (en) | 2021-06-22 | 2021-06-22 | Cheating detection method and system based on marketing fission and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113553499A true CN113553499A (en) | 2021-10-26 |
Family
ID=78102283
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110694939.2A Pending CN113553499A (en) | 2021-06-22 | 2021-06-22 | Cheating detection method and system based on marketing fission and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113553499A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116128534A (en) * | 2023-04-13 | 2023-05-16 | 上海二三四五网络科技有限公司 | User fission cheating identification method and device based on comprehensive similarity |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101187959A (en) * | 2006-11-17 | 2008-05-28 | 中兴通讯股份有限公司 | Game cheat detection method based on decision tree |
CN106022826A (en) * | 2016-05-18 | 2016-10-12 | 武汉斗鱼网络科技有限公司 | Cheating user recognition method and system in webcast platform |
CN106326497A (en) * | 2016-10-10 | 2017-01-11 | 合网络技术(北京)有限公司 | Cheating video user identification method and device |
CN107158707A (en) * | 2017-04-27 | 2017-09-15 | 浙江大学 | A kind of method for detecting abnormality and device played for MMORPGs |
CN107437124A (en) * | 2017-07-20 | 2017-12-05 | 大连大学 | A kind of operator based on big data analysis complains and trouble correlation analytic method |
CN107578277A (en) * | 2017-08-24 | 2018-01-12 | 国网浙江省电力公司电力科学研究院 | Rental housing client's localization method for power marketing |
WO2019136929A1 (en) * | 2018-01-13 | 2019-07-18 | 惠州学院 | Data clustering method and device based on k neighborhood similarity as well as storage medium |
CN110348895A (en) * | 2019-06-29 | 2019-10-18 | 北京淇瑀信息科技有限公司 | A kind of personalized recommendation method based on user tag, device and electronic equipment |
CN110458236A (en) * | 2019-08-14 | 2019-11-15 | 有米科技股份有限公司 | A kind of Advertising Copy style recognition methods and system |
CN110570217A (en) * | 2019-09-10 | 2019-12-13 | 北京百度网讯科技有限公司 | cheating detection method and device |
KR20200049262A (en) * | 2018-10-31 | 2020-05-08 | (주)포세듀 | System for providing online blinded employment examination and a method thereof |
CN111538983A (en) * | 2020-07-03 | 2020-08-14 | 杭州摸象大数据科技有限公司 | User password generation method and device, computer equipment and storage medium |
CN111598049A (en) * | 2020-05-29 | 2020-08-28 | 中国工商银行股份有限公司 | Cheating recognition method and apparatus, electronic device, and medium |
CN111612178A (en) * | 2020-05-19 | 2020-09-01 | 腾讯科技(深圳)有限公司 | Model diagnosis method and related equipment |
CN111753081A (en) * | 2019-03-28 | 2020-10-09 | 百度(美国)有限责任公司 | Text classification system and method based on deep SKIP-GRAM network |
CN112131199A (en) * | 2020-09-25 | 2020-12-25 | 杭州安恒信息技术股份有限公司 | Log processing method, device, equipment and medium |
-
2021
- 2021-06-22 CN CN202110694939.2A patent/CN113553499A/en active Pending
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101187959A (en) * | 2006-11-17 | 2008-05-28 | 中兴通讯股份有限公司 | Game cheat detection method based on decision tree |
CN106022826A (en) * | 2016-05-18 | 2016-10-12 | 武汉斗鱼网络科技有限公司 | Cheating user recognition method and system in webcast platform |
CN106326497A (en) * | 2016-10-10 | 2017-01-11 | 合网络技术(北京)有限公司 | Cheating video user identification method and device |
CN107158707A (en) * | 2017-04-27 | 2017-09-15 | 浙江大学 | A kind of method for detecting abnormality and device played for MMORPGs |
CN107437124A (en) * | 2017-07-20 | 2017-12-05 | 大连大学 | A kind of operator based on big data analysis complains and trouble correlation analytic method |
CN107578277A (en) * | 2017-08-24 | 2018-01-12 | 国网浙江省电力公司电力科学研究院 | Rental housing client's localization method for power marketing |
WO2019136929A1 (en) * | 2018-01-13 | 2019-07-18 | 惠州学院 | Data clustering method and device based on k neighborhood similarity as well as storage medium |
KR20200049262A (en) * | 2018-10-31 | 2020-05-08 | (주)포세듀 | System for providing online blinded employment examination and a method thereof |
CN111753081A (en) * | 2019-03-28 | 2020-10-09 | 百度(美国)有限责任公司 | Text classification system and method based on deep SKIP-GRAM network |
CN110348895A (en) * | 2019-06-29 | 2019-10-18 | 北京淇瑀信息科技有限公司 | A kind of personalized recommendation method based on user tag, device and electronic equipment |
CN110458236A (en) * | 2019-08-14 | 2019-11-15 | 有米科技股份有限公司 | A kind of Advertising Copy style recognition methods and system |
CN110570217A (en) * | 2019-09-10 | 2019-12-13 | 北京百度网讯科技有限公司 | cheating detection method and device |
CN111612178A (en) * | 2020-05-19 | 2020-09-01 | 腾讯科技(深圳)有限公司 | Model diagnosis method and related equipment |
CN111598049A (en) * | 2020-05-29 | 2020-08-28 | 中国工商银行股份有限公司 | Cheating recognition method and apparatus, electronic device, and medium |
CN111538983A (en) * | 2020-07-03 | 2020-08-14 | 杭州摸象大数据科技有限公司 | User password generation method and device, computer equipment and storage medium |
CN112131199A (en) * | 2020-09-25 | 2020-12-25 | 杭州安恒信息技术股份有限公司 | Log processing method, device, equipment and medium |
Non-Patent Citations (3)
Title |
---|
巩建光;: "基于数据挖掘方法的电信行业增值业务精确营销研究", 制造业自动化, no. 02 * |
张宇翔;孙菀;杨家海;周达磊;孟祥飞;肖春景;: "新浪微博反垃圾中特征选择的重要性分析", 通信学报, no. 08 * |
张玢玢;李兵;李岳欣;: "基于特征选择的企业微博转发机制研究", 情报杂志, no. 12 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116128534A (en) * | 2023-04-13 | 2023-05-16 | 上海二三四五网络科技有限公司 | User fission cheating identification method and device based on comprehensive similarity |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Arbia | Spatial econometrics | |
Alp et al. | Identifying topical influencers on twitter based on user behavior and network topology | |
Bansal et al. | On predicting elections with hybrid topic based sentiment analysis of tweets | |
WO2018188576A1 (en) | Resource pushing method and device | |
CN109472207B (en) | Emotion recognition method, device, equipment and storage medium | |
May Petry et al. | MARC: a robust method for multiple-aspect trajectory classification via space, time, and semantic embeddings | |
CN104376010B (en) | User recommendation method and device | |
US8924491B2 (en) | Tracking message topics in an interactive messaging environment | |
Samoilenko et al. | Analysing timelines of national histories across Wikipedia editions: A comparative computational approach | |
CN114648392B (en) | Product recommendation method and device based on user portrait, electronic equipment and medium | |
CN111090807A (en) | Knowledge graph-based user identification method and device | |
CN114238573A (en) | Information pushing method and device based on text countermeasure sample | |
CN113538070A (en) | User life value cycle detection method and device and computer equipment | |
CN114268747A (en) | Interview service processing method based on virtual digital people and related device | |
KR101738057B1 (en) | System for building social emotion network and method thereof | |
CN113553499A (en) | Cheating detection method and system based on marketing fission and electronic equipment | |
CN112989179A (en) | Model training and multimedia content recommendation method and device | |
CN114463040A (en) | Advertisement plan generating method, device, computer equipment and storage medium | |
CN113268589B (en) | Key user identification method, key user identification device, readable storage medium and computer equipment | |
CN109660621A (en) | A kind of content delivery method and service equipment | |
CN114943549A (en) | Advertisement delivery method and device | |
CN115131052A (en) | Data processing method, computer equipment and storage medium | |
WO2021081914A1 (en) | Pushing object determination method and apparatus, terminal device and storage medium | |
Chen et al. | Evaluating feasibility of image-based cognitive APIs for home context sensing | |
CN115248843A (en) | Method and device for assisting in generating record and record generating system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |