CN113553499A - Cheating detection method and system based on marketing fission and electronic equipment - Google Patents

Cheating detection method and system based on marketing fission and electronic equipment Download PDF

Info

Publication number
CN113553499A
CN113553499A CN202110694939.2A CN202110694939A CN113553499A CN 113553499 A CN113553499 A CN 113553499A CN 202110694939 A CN202110694939 A CN 202110694939A CN 113553499 A CN113553499 A CN 113553499A
Authority
CN
China
Prior art keywords
data
cheating
features
user
product
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110694939.2A
Other languages
Chinese (zh)
Inventor
高鹏
康维鹏
袁兰
吴飞
周伟华
高峰
潘晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Mjoys Big Data Technology Co ltd
Original Assignee
Hangzhou Mjoys Big Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Mjoys Big Data Technology Co ltd filed Critical Hangzhou Mjoys Big Data Technology Co ltd
Priority to CN202110694939.2A priority Critical patent/CN113553499A/en
Publication of CN113553499A publication Critical patent/CN113553499A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Abstract

The application relates to a user cheating detection method based on marketing fission, wherein the method comprises the following steps: acquiring product data and user behavior data related to marketing behaviors, and marking cheating sample data in the product data and the user behavior data; respectively extracting the characteristics of the product data and the user behavior data, extracting the user characteristics and the product characteristics under a preset time window and a preset recording window to obtain a plurality of interval characteristics, and obtaining time sequence combination characteristics according to the time sequence combination interval characteristics; clustering the time-sequence combination features according to the similarity of the features to generate a plurality of clusters, and determining the maximum difference features of the clusters according to the information gain of the features in the clusters; and based on the class cluster, constructing a rule decision tree according to the maximum difference characteristics, determining nodes of the cheating sample data corresponding to the rule decision tree, and judging all objects in the nodes as the cheating data. Through the application, the problem that artificial cheating cannot be detected in the related technology is solved, and the accuracy of detecting the artificial cheating is improved.

Description

Cheating detection method and system based on marketing fission and electronic equipment
Technical Field
The application relates to the field of big data, in particular to a cheating detection method and system based on marketing fission and an electronic device.
Background
With the development of digital marketing, cheating behaviors such as false flow, manually brushing a single, and weeding wool are more and more concerned. The cheating behaviors generally comprise two categories of machine cheating and artificial cheating, the machine cheating generally comprises the step of sending mobile flow through a large number of testing machines or simulators or the step of manufacturing false flow through a crawler technology, a script brushing amount and the like, and the cost of the cheating behaviors is low and also accounts for a large amount.
Artificial cheating is the creation of false traffic by hiring a user or by incentives and inducements to have a user swipe advertisements, swipe clicks, swipe registered users, and take ad host phones. Because the artificial cheating has very high concealment, the detection capability of the artificial cheating is generally lower.
At present, no effective solution is provided for the problem of low capability of detecting artificial cheating in the related technology.
Disclosure of Invention
The embodiment of the application provides a cheating detection method and system based on marketing fission and electronic equipment, and aims to at least solve the problem that the detection capability of human work cheating is low in the related technology.
In a first aspect, an embodiment of the present application provides a cheating detection method based on marketing fission, the method including:
acquiring product data and user behavior data related to marketing behaviors, and marking cheating sample data in the product data and the user behavior data;
respectively extracting the characteristics of the product data and the user behavior data to respectively obtain user characteristics and product characteristics;
extracting the user characteristics and the product characteristics under a preset time window and a preset recording window to obtain interval characteristics, and recombining the interval characteristics according to a time sequence in a Skip-n mode to obtain time sequence combination characteristics;
clustering according to the similarity of the time sequence combination characteristics to generate a plurality of clusters, and determining the maximum difference characteristics according to the information gain of the characteristics in the clusters;
and constructing a rule decision tree according to the maximum difference characteristics based on the class cluster, determining nodes of the cheating sample data in the rule decision tree, and judging all objects in the nodes as cheating data.
In some embodiments, the clustering according to the similarity of the time-series combination features to generate a plurality of class clusters includes:
initializing a plurality of clustering centers, and acquiring Euclidean distances from the time sequence combination characteristics to the clustering centers;
and sequentially comparing the distances from the time sequence combination features to the clustering centers, and distributing the time sequence combination features to the clustering centers with the nearest Euclidean distance one by one for clustering to generate a plurality of clusters, wherein the time sequence combination features belong to one cluster and only belong to one cluster.
In some embodiments, the constructing a rule decision tree according to the maximum dissimilarity characteristics based on the class clusters comprises:
dividing the cluster into a plurality of sub-nodes by taking the maximum difference characteristics as a judgment condition;
calculating the maximum difference characteristics of the child nodes, and taking the maximum difference characteristics of the child nodes as a judgment condition to split the child nodes until the sample points in the child nodes are within a preset threshold range, or the similarity of the objects in the child nodes reaches a preset range;
and constructing a decision tree based on the cluster, the child nodes and the judgment condition.
In some embodiments, the performing feature extraction on the product data and the user behavior data respectively to obtain user features and product features respectively includes:
counting click access conditions of IP access pages and URLs of each product according to a preset period for the product data, and calculating average access time and access times in the preset period;
and for the user behavior data, counting the click access conditions of each user through the page and the URL according to a preset period, and calculating the average access time and the access times in the preset period.
In some embodiments, after combining the interval features in a Skip-n manner to obtain a plurality of time-series combination features, the method further includes:
vectorizing and representing the time sequence combination features, wherein the vectorizing and representing comprises the following steps:
for numerical data, after determining the segmentation grade according to the median and average scheme, carrying out numerical segmentation one-hot representation;
for character type data, word vector form representation is performed.
In some embodiments, the obtaining product data and user behavior data related to marketing behaviors, and the marking cheating sample data therein includes:
product data and user behavior data related to marketing behaviors are obtained, and automatic sampling and labeling are carried out based on preset simple rules;
and judging and determining the cheating sample data according to a manual inspection signal for the product data and the user behavior data after the automatic sampling and marking.
In some embodiments, the cluster center is determined by calculating the mean of the feature objects in the cluster in each dimension.
In a second aspect, embodiments of the present application provide a marketing fission-based user cheating detection system, the system comprising: the system comprises a data acquisition module, a characterization module, a clustering module and a decision module, wherein the data acquisition module is used for acquiring data;
the data acquisition module is used for acquiring product data and user behavior data related to marketing behaviors and marking cheating sample data in the product data and the user behavior data;
the characterization module is used for respectively performing feature extraction on the product data and the user behavior data to respectively obtain user features and product features;
the characterization module is further used for extracting the user characteristics and the product characteristics under a preset time window and a preset recording window to obtain interval characteristics, and recombining the interval characteristics according to a time sequence in a Skip-n mode to obtain time sequence combination characteristics;
the clustering module is used for clustering according to the similarity of the time sequence combination characteristics to generate a plurality of clusters, and determining the maximum difference characteristics according to the information gain of the characteristics in the clusters;
the decision module is used for constructing a rule decision tree according to the maximum difference characteristics based on the class cluster, determining nodes of the cheating sample data in the rule decision tree, and judging all objects in the nodes as cheating data.
In a third aspect, embodiments of the present application provide a computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing a marketing fission-based user cheating detection method as described in the first aspect above when executing the computer program.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a marketing fission-based user cheating detection method as described in the first aspect above.
Compared with the related technology, the marketing fission-based user cheating detection method provided by the embodiment of the application acquires the marketing behavior data, samples and labels the cheating behavior data, extracts the characteristics of the marketing behavior data, and combines the marketing behavior data according to the time sequence to obtain the time sequence combination characteristics; further, clustering is carried out based on the time sequence combination characteristics to obtain a plurality of clusters, the maximum difference characteristics are obtained, the clusters are split into a plurality of nodes according to the maximum difference characteristics, and finally the nodes where the cheating sample data of the sampling labels fall are determined, so that other data in the nodes are judged to belong to the cheating behavior data. The problem of lower ability of ascertaining people's work cheating among the correlation technique is solved, realize the self-discovery of online artifical cheating, promoted the accuracy of ascertaining people's work cheating.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic illustration of an application environment of a marketing fission-based user cheating detection method according to an embodiment of the present application;
FIG. 2 is a flow diagram of a method of marketing fission-based user cheating detection according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a rule decision tree according to an embodiment of the present application;
FIG. 4 is a block diagram of a structure of a marketing fission-based user cheating detection system according to an embodiment of the present application;
fig. 5 is an internal structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The marketing fission-based user cheating detection method provided by the application can be applied to an application environment shown in fig. 1, fig. 1 is an application environment schematic diagram of the marketing fission-based user cheating detection method according to the embodiment of the application, and as shown in fig. 1, a terminal 10 and a server 11 communicate through a network. The user sends the access traffic to the server 11 through the terminal 10, and the server 11 may determine whether the access traffic sent by the user includes an artificial cheating action through an internal cheating detection algorithm, where the artificial cheating action includes, but is not limited to, advertisement swiping, click swiping, false user registration, and the like. It should be noted that the terminal 10 may be various personal computers, notebook computers, smart phones, tablet computers, portable wearable devices, and the like, and the server 11 may be an independent server or a server cluster formed by a plurality of servers.
The application provides a marketing fission-based user cheating detection method, fig. 2 is a flowchart of a marketing fission-based user cheating detection method according to an embodiment of the application, and as shown in fig. 2, the flowchart includes the following steps:
step S201, acquiring product data and user behavior data related to marketing behaviors, and marking cheating sample data in the product data and the user behavior data; the product data includes but is not limited to product name, promotion guest group label (including age segment, gender and education degree) and promotion region; the user behavior data includes, but is not limited to, user ID, user IP address, operation time, activity URL, page click information, page stay information, order quantity information, and the like. Furthermore, sampling and labeling the cheating sample data in the product data and the user behavior data in a mode of combining simple rules and manual review. It should be noted that, in this embodiment, first, a marketing behavior log burying point is determined, and then, user behavior data is obtained through the marketing behavior log burying point;
step S202, respectively extracting characteristics of the product data and the user behavior data to respectively obtain user characteristics and product characteristics; the specific way of feature extraction comprises: and counting the click access conditions on the user side according to a preset period, for example, which pages and URLs each user clicks to access. Or counting the click access conditions on the product side according to the prediction period, for example, the click access conditions under each IP access page and URL of the product. Finally, calculating the average access time and the access times in the period;
and step S203, extracting the user characteristics and the product characteristics under the preset time window and the preset recording window to obtain interval characteristics, and recombining the interval characteristics according to the time sequence in a Skip-n mode to obtain time sequence combination characteristics. It should be noted that, because the features obtained in step S202 are arranged according to a time sequence, when combining, framing needs to be performed according to a certain time window (that is, a preset time window in this embodiment), further, because a large amount of behavior data may exist in the certain time window, framing is further performed through a preset recording window to obtain an interval feature, optionally, the time window may take a value of 5 minutes, and the recording window may take a value of 10 times; finally, the interval characteristics are recombined in a Skip-n manner. It should be noted that, because a single certain timing characteristic is difficult to determine as a cheating behavior, in this embodiment, a Skip-n method is used to combine interval characteristics, so that a potential strong front-back association behavior in a cheating process can be found, and a Skip-n method can Skip some irrelevant characteristics in an intermediate timing sequence and then regenerate a new timing combination characteristic, for example: the sequential combination characteristics of net chat cheating can be three sequential processes of '1. adding friends', '2. half-month emotion establishment' and '3. borrowing money';
step S204, clustering is carried out according to the similarity of the time sequence combination characteristics to generate a plurality of clusters, and the maximum difference characteristics of the clusters are determined according to the information gain of the characteristics in the clusters; optionally, a k-means (kmeans) clustering algorithm is used for clustering, wherein after clustering is finished, the time sequence combination feature belongs to and only belongs to a nearest cluster. It should be noted that, in this embodiment, the distance from a feature object to the center of a class cluster is reflected as a similarity, and the smaller the distance from a feature object to the center of a class cluster is, the higher the similarity between the feature object and the class cluster is; in addition, the information gain is the difference between the empirical entropy H (D) of the set D and the conditional empirical entropy H (D | a) of the feature a under the given condition D, and is calculated by the following formula 1:
equation 1: g (D, a) ═ H (D) — H (D | a)
The empirical entropy H (D) is calculated by the following formula 2, and the conditional empirical entropy H (D | a) is calculated by the following formula 3, where D denotes the training data set, a denotes the features in the training data set, H (D | a) denotes the empirical conditional entropy of the features a with respect to the data set D, n denotes the total number of features of the data set D, and H (D | a ═ xi) Feature A is fixed as xiConditional entropy of time, Pi, represents the probability of feature a;
equation 2:
Figure BDA0003126772230000061
equation 3:
Figure BDA0003126772230000071
step S205, based on the class cluster, a rule decision tree is constructed according to the maximum difference characteristics, nodes of the rule decision tree corresponding to the cheating sample data are determined, and all objects in the nodes are determined to be cheating data. In this step, before splitting, the cluster is an otherwise impure set, and the cluster needs to be split into relatively pure subclasses, because the maximum difference feature reflects the feature that brings the maximum information gain to the cluster, and the information gain indicates the degree of information uncertainty reduction under a given condition, the maximum difference feature is selected as the splitting condition to split the cluster into a plurality of nodes step by step, thereby obtaining the purer subclasses. Finally, the specific node in the rule decision tree where the cheating sample data marked in step S201 falls is obtained, and it can be determined that other objects in the node are also the cheating sample data.
Through the steps S201 to S205, in this embodiment, after acquiring marketing behavior data, feature extraction, and time sequence combination, clustering the time sequence combination by using a clustering algorithm to generate a plurality of clusters, splitting the clusters into a plurality of nodes according to the maximum difference features, and constructing a rule decision tree according to the nodes and the decision relationship between the nodes; and finally, determining that the cheating data of the marked sample falls into a specific node in the rule decision tree, thereby judging that other data in the node are the cheating data. The problem that the artificial cheating behavior cannot be accurately identified due to the fact that the artificial cheating behavior has high hiding characteristics in the related technology is solved, the human working cheating behavior is detected based on the clustering algorithm, and accuracy of detection of the human working cheating behavior is improved.
In some embodiments, clustering according to the similarity of the time-series combination features, and generating a plurality of clusters includes:
initializing a plurality of clustering centers, and acquiring Euclidean distances from the time sequence combination characteristics to the clustering centers; determining a cluster center by calculating the mean value of the feature objects in the cluster in each dimension, calculating the cluster center by the following formula 4, and calculating the Euclidean distance by the following formula 5, wherein dis (X)i,Cj) Representing the distance between a sample x and a cluster center c, m representing the dimension length of each vector, Xit representing the value of the sample Xi on the t-th dimension, Cjt representing the value of the cluster center Cj on the t-th dimension, and formula 5 integrally representing the Euclidean distance between the sample Xi and the cluster center Cj in the cluster;
equation 4: :
Figure BDA0003126772230000072
equation 5:
Figure BDA0003126772230000073
and sequentially comparing the distances from the time sequence combination characteristics to the clustering centers, and distributing the time sequence combination characteristics to the clustering centers with the shortest Euclidean distance one by one for clustering to generate a plurality of clusters, wherein after the clustering is finished, the time sequence combination characteristics belong to one cluster and only belong to one cluster.
In some embodiments, constructing the rule decision tree based on the most distinctive features based on the class clusters comprises: fig. 3 is a schematic diagram of a rule decision tree according to an embodiment of the present application, and as shown in fig. 3, first, a root node of a class cluster is split into a plurality of child nodes by using a maximum difference characteristic as a determination condition; secondly, calculating the maximum difference characteristics of the child nodes, and continuing splitting by taking the maximum difference characteristics of the child nodes as a judgment condition until the child nodes are split into leaf nodes, wherein the sample points in the leaf nodes are within a preset threshold range, or the similarity of the objects in the leaf nodes reaches a preset range. Alternatively, the preset range may be 90%. And finally, constructing a decision tree based on the cluster root node, each child node, the leaf nodes and the judgment conditions among the nodes. The embodiment splits the cluster by taking the maximum difference feature set as a judgment condition, so that the cluster can be split into more pure subclasses, and the splitting efficiency and accuracy are improved.
In some embodiments, the performing feature extraction on the product data and the user behavior data respectively, and the obtaining the user feature and the product feature respectively includes: counting click access conditions of IP access pages and URLs of products according to a preset period for product data, and calculating average access time and access times in the preset period; and for the user behavior data, counting the click access conditions of each user through the page and the URL according to a preset period, and calculating the average access time and the access times in the preset period. Wherein the preset period may be minutes, hours, days, etc.
In some embodiments, after combining the interval features in a Skip-n manner to obtain a plurality of time sequence combination features, vectorization representation of the time sequence combination features is also required, wherein for numerical data, a segmentation grade is determined according to a median and an average scheme, and then numerical segmentation one-hot representation is performed. And performing word vector representation on the character type data. It should be noted that one-hot representation and word vector representation are conventional means known to those skilled in the art, and are not described in detail in this embodiment
In some embodiments, obtaining product data and user behavior data related to marketing behaviors, and marking cheating sample data therein comprises: after product data and user behavior data related to marketing activities of the marketing network are acquired, automatic sampling and labeling are performed based on preset simple rules, for example, preliminary sampling and labeling are performed according to the access frequency of a single user per minute, the access frequency accumulated in a single hour and the like. Furthermore, due to the uncertainty of automatic sampling and labeling, the product data and the user cheating behavior data which are automatically sampled and labeled are judged according to the manual examination information, and cheating sample data are output. Through the mode of automatic marking + manual review, can reduce personnel's repetitive work to promote the marking efficiency of cheating sample data, simultaneously, also promoted the accuracy of marking.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
The present embodiment further provides a system for detecting user cheating based on marketing fission, which is used to implement the foregoing embodiments and preferred embodiments, and the description of the system is omitted here for brevity. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
FIG. 4 is a block diagram of a structure of a marketing fission-based user cheating detection system according to an embodiment of the present application, as shown in FIG. 4, the system comprising: a data acquisition module 41, a characterization module 42, a clustering module 43, and a decision module 44, wherein;
the data acquisition module 41 is configured to acquire product data and user behavior data related to marketing behaviors and mark cheating sample data therein;
the characterization module 42 is configured to perform feature extraction on the product data and the user behavior data respectively to obtain a user feature and a product feature respectively;
the characterization module 42 is further configured to extract user features and product features under a preset time window and a preset recording window to obtain interval features, and recombine the interval features according to a time sequence in a Skip-n manner to obtain time sequence combination features;
the clustering module 43 is configured to perform clustering according to the similarity of the time sequence combination features to generate a plurality of clusters, and determine maximum difference features according to information gains of the features in the clusters;
the decision module 44 is configured to construct a rule decision tree according to the maximum difference features based on the class clusters, determine nodes of the rule decision tree corresponding to the cheating sample data, and determine all objects in the nodes as the cheating data.
The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
In one embodiment, a computer device is provided, which may be a terminal. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a marketing fission-based cheating detection method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
In one embodiment, fig. 5 is a schematic diagram of an internal structure of an electronic device according to an embodiment of the present application, and as shown in fig. 5, an electronic device is provided, where the electronic device may be a server, and the internal structure diagram may be as shown in fig. 5. The electronic device comprises a processor, a network interface, an internal memory and a non-volatile memory connected by an internal bus, wherein the non-volatile memory stores an operating system, a computer program and a database. The processor is used for providing calculation and control capability, the network interface is used for communicating with an external terminal through network connection, the internal memory is used for providing an environment for an operating system and the running of a computer program, the computer program is executed by the processor to realize a cheating detection method based on marketing fission, and the database is used for storing data.
Those skilled in the art will appreciate that the configuration shown in fig. 5 is a block diagram of only a portion of the configuration associated with the present application, and does not constitute a limitation on the electronic device to which the present application is applied, and a particular electronic device may include more or less components than those shown in the drawings, or may combine certain components, or have a different arrangement of components.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be understood by those skilled in the art that various features of the above embodiments can be combined arbitrarily, and for the sake of brevity, all possible combinations of the features in the above embodiments are not described, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the features.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A marketing fission based cheating detection method, the method comprising:
acquiring product data and user behavior data related to marketing behaviors, and marking cheating sample data in the product data and the user behavior data;
respectively extracting the characteristics of the product data and the user behavior data to respectively obtain user characteristics and product characteristics;
extracting the user characteristics and the product characteristics under a preset time window and a preset recording window to obtain interval characteristics, and recombining the interval characteristics according to a time sequence in a Skip-n mode to obtain time sequence combination characteristics;
clustering according to the similarity of the time sequence combination characteristics to generate a plurality of clusters, and determining the maximum difference characteristics according to the information gain of the characteristics in the clusters;
and constructing a rule decision tree according to the maximum difference characteristics based on the class cluster, determining nodes of the cheating sample data in the rule decision tree, and judging all objects in the nodes as cheating data.
2. The method of claim 1, wherein clustering based on the similarity of the time-series combination features to generate a plurality of clusters comprises:
initializing a plurality of clustering centers, and acquiring Euclidean distances from the time sequence combination characteristics to the clustering centers;
and sequentially comparing the distances from the time sequence combination features to the clustering centers, and distributing the time sequence combination features to the clustering centers with the nearest Euclidean distance one by one for clustering to generate a plurality of clusters, wherein the time sequence combination features belong to one cluster and only belong to one cluster.
3. The method of claim 1, wherein constructing a rule decision tree based on the cluster of classes according to the maximum dissimilarity features comprises:
dividing the cluster into a plurality of sub-nodes by taking the maximum difference characteristics as a judgment condition;
calculating the maximum difference characteristics of the child nodes, and taking the maximum difference characteristics of the child nodes as a judgment condition to split the child nodes until the sample points in the child nodes are within a preset threshold range, or the similarity of the objects in the child nodes reaches a preset range;
and constructing a decision tree based on the cluster, the child nodes and the judgment condition.
4. The method of claim 1, wherein the performing feature extraction on the product data and the user behavior data respectively to obtain user features and product features respectively comprises:
counting click access conditions of IP access pages and URLs of each product according to a preset period for the product data, and calculating average access time and access times in the preset period;
and for the user behavior data, counting the click access conditions of each user through the page and the URL according to a preset period, and calculating the average access time and the access times in the preset period.
5. The method of claim 1, wherein after combining the interval features in a Skip-n manner to obtain a plurality of time-series combined features, the method further comprises:
vectorizing and representing the time sequence combination features, wherein the vectorizing and representing comprises the following steps:
for numerical data, after determining the segmentation grade according to the median and average scheme, carrying out numerical segmentation one-hot representation;
for character type data, word vector form representation is performed.
6. The method of claim 1, wherein obtaining product data and user behavior data related to marketing activities, and wherein tagging cheating sample data therein comprises:
product data and user behavior data related to marketing behaviors are obtained, and automatic sampling and labeling are carried out based on preset simple rules;
and judging and determining the cheating sample data according to a manual inspection signal for the product data and the user behavior data after the automatic sampling and marking.
7. The method according to claim 1, wherein the cluster center is determined by calculating the mean of the feature objects in the cluster in each dimension.
8. A marketing fission-based user cheating detection system, the system comprising: the system comprises a data acquisition module, a characterization module, a clustering module and a decision module, wherein the data acquisition module is used for acquiring data;
the data acquisition module is used for acquiring product data and user behavior data related to marketing behaviors and marking cheating sample data in the product data and the user behavior data;
the characterization module is used for respectively performing feature extraction on the product data and the user behavior data to respectively obtain user features and product features;
the characterization module is further used for extracting the user characteristics and the product characteristics under a preset time window and a preset recording window to obtain interval characteristics, and recombining the interval characteristics according to a time sequence in a Skip-n mode to obtain time sequence combination characteristics;
the clustering module is used for clustering according to the similarity of the time sequence combination characteristics to generate a plurality of clusters, and determining the maximum difference characteristics according to the information gain of the characteristics in the clusters;
the decision module is used for constructing a rule decision tree according to the maximum difference characteristics based on the class cluster, determining nodes of the cheating sample data in the rule decision tree, and judging all objects in the nodes as cheating data.
9. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements a marketing fission-based user cheating detection method of any one of claims 1-7.
10. A computer-readable storage medium, having stored thereon a computer program, characterized in that the program, when executed by a processor, implements a marketing fission-based user cheating detection method according to any one of claims 1 to 7.
CN202110694939.2A 2021-06-22 2021-06-22 Cheating detection method and system based on marketing fission and electronic equipment Pending CN113553499A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110694939.2A CN113553499A (en) 2021-06-22 2021-06-22 Cheating detection method and system based on marketing fission and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110694939.2A CN113553499A (en) 2021-06-22 2021-06-22 Cheating detection method and system based on marketing fission and electronic equipment

Publications (1)

Publication Number Publication Date
CN113553499A true CN113553499A (en) 2021-10-26

Family

ID=78102283

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110694939.2A Pending CN113553499A (en) 2021-06-22 2021-06-22 Cheating detection method and system based on marketing fission and electronic equipment

Country Status (1)

Country Link
CN (1) CN113553499A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116128534A (en) * 2023-04-13 2023-05-16 上海二三四五网络科技有限公司 User fission cheating identification method and device based on comprehensive similarity

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101187959A (en) * 2006-11-17 2008-05-28 中兴通讯股份有限公司 Game cheat detection method based on decision tree
CN106022826A (en) * 2016-05-18 2016-10-12 武汉斗鱼网络科技有限公司 Cheating user recognition method and system in webcast platform
CN106326497A (en) * 2016-10-10 2017-01-11 合网络技术(北京)有限公司 Cheating video user identification method and device
CN107158707A (en) * 2017-04-27 2017-09-15 浙江大学 A kind of method for detecting abnormality and device played for MMORPGs
CN107437124A (en) * 2017-07-20 2017-12-05 大连大学 A kind of operator based on big data analysis complains and trouble correlation analytic method
CN107578277A (en) * 2017-08-24 2018-01-12 国网浙江省电力公司电力科学研究院 Rental housing client's localization method for power marketing
WO2019136929A1 (en) * 2018-01-13 2019-07-18 惠州学院 Data clustering method and device based on k neighborhood similarity as well as storage medium
CN110348895A (en) * 2019-06-29 2019-10-18 北京淇瑀信息科技有限公司 A kind of personalized recommendation method based on user tag, device and electronic equipment
CN110458236A (en) * 2019-08-14 2019-11-15 有米科技股份有限公司 A kind of Advertising Copy style recognition methods and system
CN110570217A (en) * 2019-09-10 2019-12-13 北京百度网讯科技有限公司 cheating detection method and device
KR20200049262A (en) * 2018-10-31 2020-05-08 (주)포세듀 System for providing online blinded employment examination and a method thereof
CN111538983A (en) * 2020-07-03 2020-08-14 杭州摸象大数据科技有限公司 User password generation method and device, computer equipment and storage medium
CN111598049A (en) * 2020-05-29 2020-08-28 中国工商银行股份有限公司 Cheating recognition method and apparatus, electronic device, and medium
CN111612178A (en) * 2020-05-19 2020-09-01 腾讯科技(深圳)有限公司 Model diagnosis method and related equipment
CN111753081A (en) * 2019-03-28 2020-10-09 百度(美国)有限责任公司 Text classification system and method based on deep SKIP-GRAM network
CN112131199A (en) * 2020-09-25 2020-12-25 杭州安恒信息技术股份有限公司 Log processing method, device, equipment and medium

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101187959A (en) * 2006-11-17 2008-05-28 中兴通讯股份有限公司 Game cheat detection method based on decision tree
CN106022826A (en) * 2016-05-18 2016-10-12 武汉斗鱼网络科技有限公司 Cheating user recognition method and system in webcast platform
CN106326497A (en) * 2016-10-10 2017-01-11 合网络技术(北京)有限公司 Cheating video user identification method and device
CN107158707A (en) * 2017-04-27 2017-09-15 浙江大学 A kind of method for detecting abnormality and device played for MMORPGs
CN107437124A (en) * 2017-07-20 2017-12-05 大连大学 A kind of operator based on big data analysis complains and trouble correlation analytic method
CN107578277A (en) * 2017-08-24 2018-01-12 国网浙江省电力公司电力科学研究院 Rental housing client's localization method for power marketing
WO2019136929A1 (en) * 2018-01-13 2019-07-18 惠州学院 Data clustering method and device based on k neighborhood similarity as well as storage medium
KR20200049262A (en) * 2018-10-31 2020-05-08 (주)포세듀 System for providing online blinded employment examination and a method thereof
CN111753081A (en) * 2019-03-28 2020-10-09 百度(美国)有限责任公司 Text classification system and method based on deep SKIP-GRAM network
CN110348895A (en) * 2019-06-29 2019-10-18 北京淇瑀信息科技有限公司 A kind of personalized recommendation method based on user tag, device and electronic equipment
CN110458236A (en) * 2019-08-14 2019-11-15 有米科技股份有限公司 A kind of Advertising Copy style recognition methods and system
CN110570217A (en) * 2019-09-10 2019-12-13 北京百度网讯科技有限公司 cheating detection method and device
CN111612178A (en) * 2020-05-19 2020-09-01 腾讯科技(深圳)有限公司 Model diagnosis method and related equipment
CN111598049A (en) * 2020-05-29 2020-08-28 中国工商银行股份有限公司 Cheating recognition method and apparatus, electronic device, and medium
CN111538983A (en) * 2020-07-03 2020-08-14 杭州摸象大数据科技有限公司 User password generation method and device, computer equipment and storage medium
CN112131199A (en) * 2020-09-25 2020-12-25 杭州安恒信息技术股份有限公司 Log processing method, device, equipment and medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
巩建光;: "基于数据挖掘方法的电信行业增值业务精确营销研究", 制造业自动化, no. 02 *
张宇翔;孙菀;杨家海;周达磊;孟祥飞;肖春景;: "新浪微博反垃圾中特征选择的重要性分析", 通信学报, no. 08 *
张玢玢;李兵;李岳欣;: "基于特征选择的企业微博转发机制研究", 情报杂志, no. 12 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116128534A (en) * 2023-04-13 2023-05-16 上海二三四五网络科技有限公司 User fission cheating identification method and device based on comprehensive similarity

Similar Documents

Publication Publication Date Title
Arbia Spatial econometrics
Alp et al. Identifying topical influencers on twitter based on user behavior and network topology
Bansal et al. On predicting elections with hybrid topic based sentiment analysis of tweets
WO2018188576A1 (en) Resource pushing method and device
CN109472207B (en) Emotion recognition method, device, equipment and storage medium
May Petry et al. MARC: a robust method for multiple-aspect trajectory classification via space, time, and semantic embeddings
CN104376010B (en) User recommendation method and device
US8924491B2 (en) Tracking message topics in an interactive messaging environment
Samoilenko et al. Analysing timelines of national histories across Wikipedia editions: A comparative computational approach
CN114648392B (en) Product recommendation method and device based on user portrait, electronic equipment and medium
CN111090807A (en) Knowledge graph-based user identification method and device
CN114238573A (en) Information pushing method and device based on text countermeasure sample
CN113538070A (en) User life value cycle detection method and device and computer equipment
CN114268747A (en) Interview service processing method based on virtual digital people and related device
KR101738057B1 (en) System for building social emotion network and method thereof
CN113553499A (en) Cheating detection method and system based on marketing fission and electronic equipment
CN112989179A (en) Model training and multimedia content recommendation method and device
CN114463040A (en) Advertisement plan generating method, device, computer equipment and storage medium
CN113268589B (en) Key user identification method, key user identification device, readable storage medium and computer equipment
CN109660621A (en) A kind of content delivery method and service equipment
CN114943549A (en) Advertisement delivery method and device
CN115131052A (en) Data processing method, computer equipment and storage medium
WO2021081914A1 (en) Pushing object determination method and apparatus, terminal device and storage medium
Chen et al. Evaluating feasibility of image-based cognitive APIs for home context sensing
CN115248843A (en) Method and device for assisting in generating record and record generating system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination