CN116976975A - Data identification method, device, equipment, medium and product - Google Patents

Data identification method, device, equipment, medium and product Download PDF

Info

Publication number
CN116976975A
CN116976975A CN202310315191.XA CN202310315191A CN116976975A CN 116976975 A CN116976975 A CN 116976975A CN 202310315191 A CN202310315191 A CN 202310315191A CN 116976975 A CN116976975 A CN 116976975A
Authority
CN
China
Prior art keywords
state
objects
digital resource
probability distribution
trigger
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310315191.XA
Other languages
Chinese (zh)
Inventor
张李均焕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202310315191.XA priority Critical patent/CN116976975A/en
Publication of CN116976975A publication Critical patent/CN116976975A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0248Avoiding fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Game Theory and Decision Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Business, Economics & Management (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application discloses a data identification method, a device, equipment, a medium and a product, and relates to cloud technology, wherein the method comprises the following steps: acquiring the triggering time consumed by a plurality of first objects to trigger the first digital resource from the first state and change the first digital resource to the second state respectively; clustering the plurality of first objects aiming at the same trigger duration in the trigger durations of the first digital resources to obtain a plurality of trigger duration groups; generating an evaluation probability distribution based on the number of trigger durations contained in each trigger duration group; the evaluation probability distribution is used for evaluating state changes triggered by the first digital resources by the plurality of first objects; a reference probability distribution is obtained, and trigger results of the plurality of first objects for the first digital resource are identified based on differences between the estimated probability distribution and the reference probability distribution. By adopting the embodiment of the application, the accuracy of identifying whether the state change of the object triggered by the digital resource is abnormal or not can be improved.

Description

Data identification method, device, equipment, medium and product
Technical Field
The present application relates to the field of cloud technologies, and in particular, to a data identification method, apparatus, device, medium, and product.
Background
After being pushed by the traffic master, other users may make corresponding triggers to the digital resource, such as clicking and viewing the digital resource, or closing a trigger for viewing a detail page of the digital resource, etc. The digital resource can be pushed by the flow master, for example, by an application platform or a website, and after the user triggers the digital resource pushed by the flow master, the heat of the digital resource can be correspondingly increased, and the flow master can obtain corresponding benefits from the provider of the digital resource according to the heat of the pushed digital resource.
However, in some cases, the triggering for the digital resource may not be triggered by a real user, but by a malicious partner through a corresponding malicious script batch amount, which may result in abnormally high hotness of the digital resource, thereby resulting in a damaged provider benefit of the digital resource.
At present, whether the triggering of the digital resource belongs to abnormal triggering is generally judged by the click rate or the click quantity of the digital resource, but the judgment result obtained by the mode is generally inaccurate, so how to accurately judge whether the triggering of the digital resource is abnormal is a problem to be solved.
Disclosure of Invention
The embodiment of the application provides a data identification method, a device, equipment, a medium and a product, which can improve the identification accuracy of whether the state change of an object triggered by digital resources is abnormal.
In a first aspect, the present application provides a data identification method, including:
acquiring the triggering time consumed by a plurality of first objects to trigger the first digital resource from the first state and change the first digital resource to the second state respectively;
clustering the plurality of first objects aiming at the same trigger duration in the trigger durations of the first digital resources to obtain a plurality of trigger duration groups;
generating an evaluation probability distribution based on the number of trigger durations contained in each trigger duration group; the evaluation probability distribution is used for evaluating state changes triggered by the first digital resources by the plurality of first objects;
acquiring a reference probability distribution, and identifying trigger results of the plurality of first objects for the first digital resource based on differences between the estimated probability distribution and the reference probability distribution;
wherein the reference probability distribution is a reference probability distribution for evaluating a state change triggered by the first digital resource by the plurality of first objects, and the trigger result is used for indicating whether the state change triggered by the first digital resource by the plurality of first objects is normal or abnormal.
In a second aspect, the present application provides a data recognition apparatus comprising:
the data acquisition unit is used for acquiring the triggering time consumed by the plurality of first objects to trigger the first digital resources from the first state and change the first digital resources to the second state respectively;
the data clustering unit is used for clustering the same trigger duration in the trigger durations of the plurality of first objects aiming at the first digital resource to obtain a plurality of trigger duration groups;
the probability acquisition unit is used for generating an evaluation probability distribution based on the number of the trigger time durations contained in each trigger time duration group; the evaluation probability distribution is used for evaluating state changes triggered by the first digital resources by the plurality of first objects;
the data identification unit is used for acquiring a reference probability distribution and identifying triggering results of the plurality of first objects aiming at the first digital resources based on the difference between the evaluation probability distribution and the reference probability distribution;
wherein the reference probability distribution is a reference probability distribution for evaluating a state change triggered by the first digital resource by the plurality of first objects, and the trigger result is used for indicating whether the state change triggered by the first digital resource by the plurality of first objects is normal or abnormal.
In a third aspect, the present application provides a computer device comprising a processor, a memory, wherein the memory is for storing a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the data identification method described above.
In a fourth aspect, the present application provides a computer readable storage medium having stored therein a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the above-described data identification method.
In a fifth aspect, the present application provides a computer program product or computer program comprising computer instructions which, when executed by a processor, implement the data identification method described above.
In the embodiment of the application, the triggering time consumed by the plurality of first objects to trigger the first digital resource from the first state and change the first digital resource to the second state is acquired; clustering the same trigger duration in the trigger durations of the plurality of first objects for the first digital resources to obtain a plurality of trigger duration groups; generating an evaluation probability distribution based on the number of trigger durations contained in each trigger duration group; a reference probability distribution is acquired, and trigger results of the plurality of first objects for the first digital resource are identified based on differences between the estimated probability distribution and the reference probability distribution. Since the evaluation probability distribution can represent the triggering time length of the actual consumption of the first digital resources triggered by the first objects, the difference between the evaluation probability distribution and the reference probability distribution can be determined by comparing the evaluation probability distribution with the reference probability distribution, so that whether the state change triggered by the first objects on the first digital resources is abnormal or not can be determined. By comparing the differences between the probability distributions generated by the two trigger durations, the anomaly recognition accuracy is higher than that of judging the click quantity and the click quantity of the digital resource, so that the recognition accuracy of whether the state change of the object triggered by the digital resource is abnormal can be improved. And when the triggering time length consumed by the plurality of first objects for triggering the first digital resources from the first state and changing to the second state respectively is acquired, corresponding evaluation probability distribution can be automatically generated, whether state change triggered by the digital resources is abnormal or not can be automatically identified based on the difference between the evaluation probability distribution and the reference probability distribution, cost can be saved, and data identification efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of a network architecture of a data identification system according to an embodiment of the present application;
fig. 2 is a schematic flow chart of a data identification method according to an embodiment of the present application;
fig. 3 is a schematic diagram of a trigger duration distribution between opening and closing of a landing page to which a digital resource belongs according to an embodiment of the present application;
FIG. 4 is a flowchart of another data identification method according to an embodiment of the present application;
FIG. 5 is a schematic diagram of the composition structure of a complete sub-graph according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a community partitioning according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a composition structure of a community according to an embodiment of the present application;
FIG. 8 is a schematic diagram of a composition structure of a plurality of communities according to an embodiment of the present application;
Fig. 9 is a schematic diagram of a composition structure of a data recognition device according to an embodiment of the present application;
fig. 10 is a schematic diagram of a composition structure of a computer device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Cloud Technology (Cloud Technology) refers to a hosting Technology for integrating hardware, software, network and other series resources in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied by the cloud computing business mode, can form a resource pool, and is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each product possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing. The scheme provided by the embodiment of the application belongs to cloud computing belonging to the field of cloud technology.
Cloud Computing (Cloud Computing) refers to the delivery and usage model of an IT infrastructure, meaning that required resources are obtained in an on-demand, easily scalable manner over a network; generalized cloud computing refers to the delivery and usage patterns of services, meaning that the required services are obtained in an on-demand, easily scalable manner over a network. Such services may be IT, software, internet related, or other services. Cloud Computing is a product of fusion of traditional computer and network technology developments such as Grid Computing (Grid Computing), distributed Computing (Distributed Computing), parallel Computing (Parallel Computing), utility Computing (Utility Computing), network storage (Network Storage Technologies), virtualization (Virtualization), load balancing (Load balancing), and the like. With the development of the internet, real-time data flow and diversification of connected devices, and the promotion of demands of search services, social networks, mobile commerce, open collaboration and the like, cloud computing is rapidly developed. Unlike the previous parallel distributed computing, the generation of cloud computing will promote the revolutionary transformation of the whole internet mode and enterprise management mode in concept. For example, the difference between the estimated probability distribution and the reference probability distribution, the difference between the estimated probability distribution corresponding to each pending object, and the like may be calculated by a cloud computing method.
The technical scheme of the application can be suitable for judging whether the state change triggered by a plurality of first objects on the first digital resource is normal or not, so as to determine whether the second object pushing the first digital resource is normal or not. The first object may for example comprise a user, i.e. a user clicking on a digital resource pushed by the second object. The first object may also include a malicious partner, i.e. the malicious partner may imitate the user's trigger for the digital resource by a corresponding malicious script batch amount. Controlling multiple devices, for example using an automation script or application, mimics the triggering of a digital resource by a user. The second object may refer to, for example, a traffic master, which may refer to, for example, a carrier that provides traffic, e.g., the second object may include, but is not limited to, media, a website, an application or application platform, and so forth. For example, in some application platform, such as a social application digital resource platform, a traffic owner may refer to a public number that has a degree of attention.
The method comprises the steps of obtaining the trigger time length consumed by the trigger operation of a plurality of first objects on the first digital resources to cause the state change of the first digital resources respectively, obtaining the evaluation probability distribution, comparing the evaluation probability distribution with the reference probability distribution, determining whether the trigger time length of the first objects on the trigger operation of the first digital resources to cause the state change of the first digital resources is abnormal or not, detecting a second object with the abnormality, identifying whether the data is abnormal or not, and improving the abnormality identification accuracy. The technical scheme of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like.
It should be specifically noted that, in the embodiment of the present application, data related to the first object information (for example, a triggering time period consumed by the first object to trigger the first digital resource from the first state and change the first digital resource to the second state, an identification of the first object, etc.) is related to, when the embodiment of the present application is applied to a specific product or technology, permission or consent of the first object needs to be obtained, and collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions. For example, the first object may refer to a user of the terminal device or the computer device.
Referring to fig. 1, fig. 1 is a schematic diagram of a network architecture of a data identification system according to an embodiment of the present application, as shown in fig. 1, a computer device may perform data interaction with terminal devices, where the number of terminal devices may be one or at least two, and the number of computer devices may be one or at least two. When the number of computer devices is plural, for example, the computer device 102a and the computer device 102b may be included, the computer device 102a and the computer device 102b may independently perform the data recognition operation or cooperatively perform the data recognition operation.
Taking the number of computer devices as an example for explanation, the computer device 102a may obtain the trigger duration consumed by the plurality of first objects to trigger the first digital resources from the first state to the second state respectively; clustering the same trigger duration in the trigger durations of the plurality of first objects for the first digital resources to obtain a plurality of trigger duration groups. Further, the computer device 102a may generate an evaluation probability distribution based on the number of trigger durations contained by each set of trigger durations, thereby identifying trigger results for the first digital resource for the plurality of first objects based on differences between the evaluation probability distribution and the baseline probability distribution. Optionally, the computer device 102a may also send trigger results for the first digital resources for the plurality of first objects to the terminal device 101, thereby presenting the trigger results on the terminal device 101.
Taking the number of the computer devices as a plurality of, for example, two, the computer device 102a may obtain the trigger duration consumed by the plurality of first objects to trigger the first digital resources from the first state and change to the second state respectively, store the trigger duration in the database 103 associated with the computer device 102a, and the computer device 102b may obtain the trigger duration from the database 103 associated with the computer device 102a, thereby generating an evaluation probability distribution, and further identify the trigger results of the plurality of first objects for the first digital resources. The triggering result of the first object aiming at the first digital resource can be identified by cooperatively executing the data identification operation by a plurality of computer devices, so that the device pressure is reduced.
It is understood that the computer devices mentioned in the embodiments of the present application include, but are not limited to, terminal devices or servers. In other words, the computer device may be a server or a terminal device, or may be a system formed by the server and the terminal device. The above-mentioned terminal device may be an electronic device, including, but not limited to, a mobile phone, a tablet computer, a desktop computer, a notebook computer, a palm computer, a vehicle-mounted device, an intelligent voice interaction device, an augmented Reality (AR/VR) device, a head mounted display, a wearable device, a smart speaker, a smart home appliance, an aircraft, a digital camera, a camera, and other mobile internet devices (mobile internet device, MID) with network access capability, etc. The servers mentioned above may be independent physical servers, or may be server clusters or distributed systems formed by a plurality of physical servers, or may be cloud servers that provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, vehicle-road collaboration, content distribution networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.
Further, referring to fig. 2, fig. 2 is a flow chart of a data identification method according to an embodiment of the present application; as shown in fig. 2, the data identification method can be applied to a computer device, and the data identification method includes, but is not limited to, the following steps:
s101, acquiring the triggering time consumed by a plurality of first objects to trigger the first digital resource from the first state and change the first digital resource to the second state.
In the embodiment of the application, the second object can push the digital resource, and any first object can correspondingly trigger the digital resource pushed by the second object, for example, click and view the digital resource, or close the detail page for viewing the digital resource, and the like. After the first object triggers the digital resource pushed by the second object, the heat of the digital resource is correspondingly increased, and the second object can obtain corresponding benefits from the provider of the digital resource according to the heat of the pushed digital resource. A provider of digital resources may refer to an advertiser, such as a distributor or a service provider of digital resources. Optionally, the first object may purchase a product corresponding to the digital resource, and may increase the heat of the digital resource, so that a distributor or a service provider of the digital resource may obtain benefits. Since the digital resource is pushed by the second object, the second object may participate in the revenue share of the digital resource. The higher the click rate for the digital resource, the higher the revenue that the second object receives under the same digital resource exposure. The first digital resource may include, but is not limited to, an advertising resource, a news resource, or other media data resource.
For example, an applet developer or public number may apply to open a digital resource bit as a second object, for example, and the applet developer or public number may push the digital resource. The second object, the applet developer or the public number author, may be divided into benefits when the first object exposes a click on a digital resource on a public number article or applet. In some digital resources (such as CPC digital resources) which are charged according to the click of the digital resources, the larger the click quantity of the digital resources is, the more benefits are obtained by the second objects, so that some second objects may manufacture ineffective digital resource exposure in a cheating way, thereby improving the digital resource benefits. However, the digital resource effect brought by the cheating mode is very poor, the benefits of the digital resource players or service providers can be damaged, the cheating seriously affects the digital resource release effect, the digital resource players or service providers stop or discard, the benefits of the digital resource players or service providers are damaged, and even the public relations risks are caused.
The CPC digital resource refers to a Cost Per Click (CPC) of the digital resource. In this CPC mode, the presenter or facilitator of the digital resource may pay for the act of the first object clicking on the digital resource, rather than for the exposure of the digital resource. CPC digital resources avoid the risk of exposing only non-clicks for the distributor or service provider of the digital resources, and are a popular way of charging for digital resources. The digital resource click that the presenter or service provider of the digital resource wishes to pay per se is a valid click by the real user, not a cheating click. In the links of exposing, clicking, and effecting the digital resource, the second object may expose, click, and effect the digital resource in a manner of cheating for some malicious purpose, for example, the manner of cheating may include operations of exposing, clicking, etc. the digital resource by a malicious partner through a corresponding malicious script batch, and such malicious behavior that is not really intended by the user is called digital resource cheating. For example, using an automation script or application to manipulate multiple terminal devices through one or more devices to control the cheating actions of an abnormal user clicking on a digital resource.
The technical scheme of the application is that whether links such as exposure, clicking and page display effect of the digital resource are normal or not is judged by identifying one or more links such as exposure, clicking and page display effect of the digital resource. For example, the technical scheme of the application can judge that a plurality of first objects respectively operate second objects to push the time sequence distribution of the first digital resources based on the characteristics of inconsistent time sequence distribution of the digital resources operated by the normal users and the cheating users and similarity and aggregation between the cheating objects, so as to determine whether the operation of the first objects on the first digital resources is abnormal or not, and further determine whether the second objects are cheated or not. Since the cheating is generally to control the cheating behavior of clicking the digital resource by the abnormal user by controlling the plurality of terminal devices, and the clicking of the normal user is not controlled by the unified script, the time difference distribution from the exposure of the digital resource to the clicking is more dispersed for the normal user, and the time difference distribution from the exposure of the digital resource to the clicking of the first object is different for the digital resource pushed by the different second object, and the time difference distribution is generally more dispersed. The time difference distribution from clicking the digital resource to opening the landing page to which the digital resource belongs is also more scattered by a normal user. Whereas the irregular user cheating behavior cannot completely mimic the normal user's operation behavior, and thus there is a difference from the probability distribution of the time difference of the normal user's operation behavior. Moreover, since the abnormal users generally use the same type of cheating script to perform cheating, the time difference can show a similar rule or have abnormal distribution such as obvious aggregation. It may be understood that the time difference mentioned in the embodiments of the present application may refer to a trigger duration consumed by the first digital resource triggered by the first state and changed to the second state.
As shown in fig. 3, fig. 3 is a schematic diagram of a trigger duration distribution of a landing page from open to close, where the trigger duration distribution of the landing page to which a digital resource belongs (hereinafter referred to as "landing page stay duration distribution") from open to close is provided in an embodiment of the present application, solid lines in 3a, 3b, and 3c in fig. 3 respectively indicate trigger duration distributions of the landing pages to which three different digital resources pushed by a second object belong (hereinafter referred to as "landing page reference stay duration distribution") from open to close, and dashed lines in fig. 3 indicate a reference trigger duration distribution of the landing pages to which the digital resource to which the second object belongs from open to close (hereinafter referred to as "landing page reference stay duration distribution"). The abscissa represents the residence time of the landing page to which the digital resource belongs, that is, the trigger time of the landing page to which the digital resource belongs from opening to closing, and the ordinate represents the PV duty, where the PV duty may represent the ratio between the number of times corresponding to the residence time of the landing page to which the digital resource belongs and the total number of times the landing page to which the digital resource belongs is closed. It can be seen that the landing page residence time distribution to which the digital resource pushed by the second object in 3a, 3b, and 3c belongs is significantly inconsistent with the landing page reference residence time distribution. The distribution of the stay time length of the landing page of different second objects may be different, and the reference stay time length distribution of the landing page of each second object is manually compared with the reference stay time length distribution of the landing page, so that time and labor are wasted, the second objects cannot be automatically classified and gathered, and abnormal second objects are difficult to find.
Therefore, in order to accurately find out whether the operation behavior of the first object on the digital resource is cheating, whether the second object has the cheating behavior is determined, so that the loss of a user or a service provider of the digital resource is reduced. Whether the trigger duration consumed by the plurality of first objects to trigger the first digital resource from the first state and change to the second state is abnormal or not can be determined by detecting the trigger duration consumed by the plurality of first objects to trigger the first digital resource from the first state and change to the second state respectively, so that whether the second object is abnormal or not, for example, whether the cheating behavior exists or not is determined. If there is an anomaly in the second object, relevant processing such as prompting, warning, click volume invalidation, etc. can be performed, thereby reducing the loss of the distributor or service provider of the digital resource.
In the embodiment of the application, the first digital resource can be, for example, an advertisement resource, and the second object can push the advertisement resource, and the first object can view the advertisement resource through the second object, so that clicking is performed on the advertisement resource, and the first object can further purchase a product corresponding to the advertisement resource. The first object may trigger and change the first digital resource from one state to another, e.g., the first digital resource may include four states: an exposure state, a selected state, a state in which the landing page is opened, and a state in which the landing page is closed after being opened. That is, by exposing the first digital resource, i.e., the first digital resource is in an exposed state; when the first object is detected to click on the first digital resource, the first digital resource is in a selected state; when the first digital resource is loaded in the second object, the first digital resource is in a state that the landing page to which the first digital resource belongs is opened, and then the first object can purchase a product corresponding to the first digital resource. When the first object is detected to trigger a closing instruction for the first digital resource, the first digital resource is in a state that the belonging landing page is closed after being opened.
In one embodiment, the first state and the second state are N, where N is a positive integer, and a first object triggers the first digital resource from a first state and changes to a second state to generate a trigger duration; wherein the N first states include at least one of: the first digital resource is in an exposure state, the first digital resource is in a selected state, or the first digital resource is in a state that the landing page to which the first digital resource belongs is opened; the N second states include at least one of: the first digital resource is in a selected state, and the first digital resource is in a state that the belonging landing page is opened, or in a state that the belonging landing page is closed after being opened; if one of the first states is that the first digital resource is in an exposure state, the second state corresponding to the first state is that the first digital resource is in a selected state; if one first state is that the first digital resource is in a selected state, a second state corresponding to the first state is that the first digital resource is in a state that the landing page to which the first digital resource belongs is opened; if the first state is the state that the first digital resource is in the opened state of the landing page, the second state corresponding to the first state is the state that the first digital resource is closed after the landing page is opened.
That is, the state change of the first digital resource includes three types, i.e., the first digital resource is triggered by the first state and the change to the second state includes three types: the exposure state is changed to the selected state, the selected state is changed to the state that the landing page is opened, and the state that the landing page is opened is changed to the state that the landing page is closed after being opened, namely the first digital resource can comprise three state changes.
In the embodiment of the application, the time of the first digital resource exposure, the time of the first object clicking the first digital resource, the time of the first digital resource to which the landing page belongs being opened, and the time of the first digital resource to which the landing page belongs being closed after being opened can be respectively obtained, so that the trigger time consumed by the first digital resource for changing the exposure state to the selected state is determined based on the time difference between the time of the first object clicking the first digital resource and the time of the first digital resource exposure. Alternatively, if the first object clicks on the first digital resource multiple times within a short time, the trigger time period consumed by the first digital resource for the change of the exposure state to the selected state may be determined based on a time difference between the time of clicking on the first digital resource for the first time and the time of exposing the first digital resource. Further, a trigger time period consumed by the first digital resource for the selected state change to the state in which the landing page to which the first digital resource belongs is opened may be determined based on a time difference between a time when the landing page to which the first digital resource belongs is opened and a time when the first object clicks the first digital resource. Further, the trigger time period consumed by the first digital resource for the state in which the belonging landing page is opened to the state in which the belonging landing page is closed after being opened may be determined based on the time difference between the time in which the first digital resource is closed after being opened and the time in which the first digital resource is opened.
In an alternative implementation manner, when the number of the computer devices is multiple, one computer device may access the client, and the client may be installed on the terminal device, then the client on the terminal device may report the trigger behavior data of the first object for the first digital resource, and then the computer device may acquire the trigger behavior data and further store the trigger behavior data in the database. The other computer device may be configured to perform real-time calculation on the trigger behavior data, and the computer device for real-time calculation may be connected to the computer device of the access client, so as to acquire the trigger behavior data from the database to perform calculation, and identify whether the trigger results of the plurality of first objects for the first digital resources are abnormal.
The reporting mode may include, but is not limited to, real-time reporting, periodic reporting, or reporting when a data request sent by the computer device is received. The trigger behavior data may include, but is not limited to, an identification of each first object, a time the first object triggered the first advertising resource, a scene the first object triggered the first advertising resource, an identification of the second object, and so forth. The time at which the first object triggers the first digital resource may include a time at which the first advertising resource is exposed (i.e., a time at which the first digital resource begins to be displayed in the display interface), a time at which the first object clicks on the first digital resource, a time at which the landing page to which the first digital resource belongs is opened, a time at which the landing page to which the first digital resource belongs is closed after being opened, and so on. It is to be understood that the time referred to herein may refer to a point in time. The identification of the first object may include, but is not limited to, an identification number (Identity, ID) of the first object or an internet protocol address (Internet Protocol, IP) of the terminal device used by the first object. The first object triggering the scene of the first advertisement resource may include triggering the first advertisement resource through a preset operation step or triggering the first advertisement resource through a shortcut operation. The triggering of the first advertisement resource by the preset operation step may, for example, refer to triggering a sliding operation by entering the second object, where the triggering is performed when the first advertisement resource is displayed in the display interface corresponding to the sliding operation. The triggering of the first advertisement resource by the shortcut operation may, for example, refer to directly entering a display interface corresponding to the first advertisement resource by the shortcut, and triggering the first advertisement resource in the display interface. The identification of the second object may include, but is not limited to, an ID of the second object.
Illustratively, the second object includes a public number or applet, and when the second object is a public number, the digital resource may be pushed at any location in the public number article, where any location may include, but is not limited to, a start location of the article, a middle location of the article, an end location of the article, or a location after the article ends, etc. If the digital resource is pushed at the end position of the public number article, the first object can expose the digital resource when browsing the public number article, so that the digital resource can be clicked. When the second object is an applet, the digital resource may be exposed before the program function in the applet is started, or exposed during the start, or exposed after the start is finished, etc., which is not limited by the embodiment of the present application.
Alternatively, the identification of the second object, the time t1 of exposure of the first digital resource pushed by the second object, the time t2 of clicking the first digital resource by the first object, the time t3 of opening the landing page to which the first digital resource belongs, the time t4 of closing the landing page to which the first digital resource belongs, and so on may be acquired from a database storing trigger behavior data of the plurality of first objects for the first digital resource. The method comprises the steps of determining which first objects are triggered by first digital resources pushed by a second object and which trigger states are changed, so that the trigger time consumed by a plurality of first objects for triggering the first digital resources from the first states and changing the first digital resources to the second states can be obtained.
For example, it may be determined that the trigger time period consumed by the first digital resource for the exposure state to change to the selected state is t2-t1 according to a time difference t1 between a time t2 when the first object clicks the first digital resource and a time when the first digital resource is exposed. The trigger time consumed by the first digital resource for changing the selected state to the state that the landing page is opened is t3-t2 can be further determined according to the time difference between the time t3 when the landing page to which the first digital resource belongs is opened and the time t2 when the first object clicks the first digital resource. The trigger time period consumed by the state that the first digital resource is opened to the state that the landing page is closed after being opened is t4-t3 can be further determined according to the time difference between the time t4 when the landing page to which the first digital resource belongs is closed and the time t3 when the landing page to which the first digital resource belongs is opened.
It may be understood that in the embodiment of the present application, the number of first digital resources pushed by the second object may be multiple, the first object may perform a triggering operation on one or more of the multiple first digital resources, the first object may also perform a triggering operation on the same first digital resource multiple times, a triggering duration consumed by the first object to trigger the first digital resource from the first state to the second state each time may be different, and each triggering duration may be determined according to an actual situation. For example, the first object may close the first digital resource immediately after clicking the first digital resource, or may close the landing page to which the first digital resource belongs after clicking the first digital resource and the first digital resource is on the landing page, or the first object may close immediately after the landing page to which the first digital resource belongs is on, or close after browsing for a period of time after the landing page to which the first digital resource belongs is on, that is, when the first digital resource is in the same type of state change, the trigger duration of each operation of the first object may not be equal.
S102, clustering the same trigger duration in the trigger durations of the plurality of first objects for the first digital resources to obtain a plurality of trigger duration groups.
In the embodiment of the present application, the triggering duration of the first object for the first digital resource refers to: the first object triggers the first digital resource from the first state and changes the trigger time length consumed by the second state, and because the acquired trigger time lengths of the plurality of first objects for the first digital resource may have the same condition, the same trigger time length in the trigger time lengths of the plurality of first objects for the first digital resource may be clustered, so as to obtain a plurality of trigger time length groups. The trigger time length in each trigger time length group is the same, and the trigger time lengths in any two trigger time length groups are different. Clustering may refer to merging the same trigger time durations into the same trigger time duration group.
In an optional implementation manner, a plurality of initial trigger duration groups may be preset, each initial trigger duration group is associated with one trigger duration, and when the same trigger duration in the trigger durations of the plurality of first objects for the first digital resources is obtained, the trigger duration associated with each initial trigger duration group may be directly searched, so that the trigger durations of the plurality of first objects for the first digital resources are added to the corresponding initial trigger duration groups, and a plurality of trigger duration groups are obtained. Or, the trigger time lengths of the plurality of first objects for the first digital resource can be divided according to the time length ranges, and the trigger time lengths belonging to the same time length range are added to the same trigger time length group, so that a plurality of trigger time length groups are obtained. The embodiment of the application can also perform clustering processing on the plurality of trigger time lengths in other modes to obtain a plurality of trigger time length groups, and the embodiment of the application is not limited to the above.
The state change of the first digital resource is changed into the exposure state change to the selected state, and the state change of the plurality of first objects for the first digital resource can be acquired, so that the trigger time periods consumed by the state change can be combined, for example, the trigger time periods of the state change of the plurality of first digital resources are all 1 second, and the trigger time period group with the trigger time period of 1 second can be obtained. For example, the trigger duration of the state change of the plurality of first digital resources is 2 seconds, then the trigger durations may be combined to obtain a trigger duration group with the trigger duration of 2 seconds, and in this way, each trigger duration may be divided into a corresponding trigger duration group.
For example, the trigger durations may be divided according to a granularity of 1 second, and the maximum trigger duration is 120 seconds, and then the trigger durations with the trigger durations of 1 second and the trigger durations of 2 seconds may be combined, and the trigger durations with the trigger durations of 3 seconds may be combined until the trigger durations with the trigger durations of 120 seconds are combined, so as to obtain 120 trigger duration groups. Further, when the state change of the first digital resource is obtained and the exposure state is changed to a plurality of trigger time length groups corresponding to the selected state, the number of trigger time lengths in each of the plurality of trigger time length groups corresponding to the state change can be counted, for example, the number of trigger time lengths corresponding to the state change can be expressed as: (second object, exposure state changed to selected state, trigger time period 0 seconds, 50 times), (second object, exposure state changed to selected state, trigger time period 1 seconds, 30 times), (second object, exposure state changed to selected state, trigger time period 2 seconds, 20 times), … …, (second object, exposure state changed to selected state, trigger time period 120 seconds, 10 times).
It is to be appreciated that the number of trigger durations in the set of trigger durations can be utilized to reflect the probability of occurrence of the trigger durations. The larger the number of trigger durations in the trigger duration group, the larger the probability of occurrence of such trigger durations. The probability distribution for representing the occurrence probability of each trigger duration can then be determined based on the number of trigger durations in the plurality of trigger duration groups, and then whether the trigger results of the plurality of first objects for the first digital resource are abnormal can be determined based on the probability distribution.
S103, based on the number of the trigger time durations contained in each trigger time duration group, an evaluation probability distribution is generated.
In the embodiment of the present application, since the state change triggered by the first object on the first digital resource includes three states, namely, the state of changing the exposure state to the selected state, the state of changing the selected state to the state of opening the landing page, and the state of changing the opened state of the landing page to the state of closing the landing page after opening the landing page, for convenience of description, the state of changing the exposure state to the selected state is referred to as a first state change, the state of changing the selected state to the state of opening the landing page is referred to as a second state change, and the state of changing the opened state of the landing page to the state of closing the landing page after opening the landing page is referred to as a third state change. For any one of three state changes triggered by the first object on the first digital resource, the triggering time lengths of a plurality of first objects under the state change can be obtained, the triggering time lengths of the first digital resource under the state change are clustered, and a plurality of triggering time length groups under the state are obtained, so that evaluation probability distribution is generated based on the quantity of the triggering time lengths contained in each triggering time length group under the state. The evaluation probability distribution is used for evaluating the state changes triggered by the first digital resource for the plurality of first objects, i.e. each state change may correspond to one evaluation probability distribution, each evaluation probability distribution being used for evaluating the state change corresponding to the evaluation probability distribution.
Alternatively, the evaluation probability distribution may be generated based on a ratio between the number of trigger durations contained in each trigger duration group under such a state change and the total number of trigger durations of the plurality of first objects for the first digital resource under such a state change. For example, the ratio between the number of trigger durations included in each trigger duration group in the first state change and the total number of trigger durations of the plurality of first objects for the first digital resource in the first state change may be obtained, so as to generate an evaluation probability distribution corresponding to the first state change. Or, the ratio between the number of the trigger durations contained in each trigger duration group under the second state change and the total number of the trigger durations of the plurality of first objects aiming at the first digital resource under the second state change can be obtained, so as to generate an evaluation probability distribution corresponding to the second state change. Or, the ratio between the number of the trigger durations included in each trigger duration group under the third state change and the total number of the trigger durations of the plurality of first objects for the first digital resource under the third state change may be obtained, so as to generate an evaluation probability distribution corresponding to the third state change.
Taking the generation of the evaluation probability distribution corresponding to the first state change as an example for illustration, for example, the total number of trigger durations of the plurality of first objects for the first digital resource under the first state change is pv1=200, and the number of trigger durations included in each trigger duration group under the first state change is respectively: (second object, exposure state changed to selected state, trigger time period 0 seconds, 50 times), (second object, exposure state changed to selected state, trigger time period 1 seconds, 30 times), (second object, exposure state changed to selected state, trigger time period 2 seconds, 20 times), … …, (second object, exposure state changed to selected state, trigger time period 120 seconds, 10 times), the generated evaluation probability distribution may be exp2clk (i) = (0.25,0.15,0.1,0,0,0, …, 0.05), that is, the evaluation probability distribution corresponding to the first state change, that is, the evaluation probability distribution corresponding to the exposure state change to the selected state, is generated based on the ratio between the number of trigger time periods contained in each trigger time period group under the first state change and the total number of trigger time periods of the plurality of first objects for the first digital resource under the first state change, respectively.
It can be understood that after the first object clicks on the first digital resource, that is, after the first digital resource is in the selected state, there is a delay in loading the first digital resource material, and the landing page to which the first digital resource belongs is opened. And because each state change has loopholes, for example, the client has the conditions of untimely data reporting, data missing reporting, data false reporting and the like, the number of times of clicking the first digital resource by the first object is larger than the number of times of opening the first digital resource, and the number of times of opening the first digital resource is larger than the number of times of closing the first digital resource. For example, the number of times of clicking the first digital resource by the first object is 200, the number of times of opening the first digital resource is 180, the number of times of closing the first digital resource is 160, and so on, then the total number of trigger durations of the plurality of first objects for the first digital resource corresponding to the corresponding exposure state change to the selected state is pv1=200, the total number of trigger durations of the plurality of first objects for the first digital resource corresponding to the state of the selected state change to the state of the belonging landing page being opened is pv2=180, and the total number of trigger durations of the plurality of first objects for the first digital resource corresponding to the state of the belonging landing page being opened is pv3=160.
It will be appreciated that the evaluation probability distribution corresponding to the second state change and the evaluation probability distribution corresponding to the third state change may be generated with reference to the above-described process of generating the evaluation probability distribution corresponding to the first state change. For example, the estimated probability distribution corresponding to the second state change may be clk2open (i) = (0.35,0.02,0.15,0,0,0, …, 0.01), where the total number of trigger durations corresponding to the second state change may be pv2=180. The estimated probability distribution corresponding to the third state change may be open2close (i) = (0.15,0.05,0.10,0,0,0, …, 0.1), where the total number of trigger durations corresponding to the third state change may be pv3=160.
S104, acquiring a reference probability distribution, and identifying triggering results of a plurality of first objects aiming at the first digital resources based on the difference between the evaluation probability distribution and the reference probability distribution.
In the embodiment of the application, the reference probability distribution can be generated in advance, and the generated reference probability distribution is stored in the probability distribution database, so that the reference probability distribution can be obtained from the probability distribution database. Or the reference probability distribution is generated after the estimated probability distribution is generated, which is not limited by the embodiment of the present application. The reference probability distribution may include, but is not limited to, a probability distribution corresponding to the above-mentioned composite index. The reference probability distribution may be a preset probability distribution, for example, a default probability distribution, or a probability distribution generated by calculating according to the second digital resource pushed by the second object, which is not limited in the embodiment of the present application. The three state changes corresponding to the first digital resource may use the same reference probability distribution, or may use one reference probability distribution for each state change, i.e. three reference probability distributions corresponding to the three state changes. The reference probability distribution is a reference probability distribution used for evaluating state changes of the first digital resources triggered by the first objects, and trigger results of the first objects on the first digital resources can be determined by comparing differences between the reference probability distribution and the evaluation probability distribution. The trigger result is used for indicating that the state change triggered by the plurality of first objects on the first digital resource is normal or abnormal. If the state change triggered by the plurality of first objects on the first digital resource is normal, the state change can be expressed as the triggering behavior of the normal user on the first digital resource, namely, the second object is not cheated. If the state change triggered by the plurality of first objects on the first digital resource is abnormal, the state change can be expressed as the triggering behavior of an abnormal user on the first digital resource, namely, the cheating behavior of the second object exists.
In one embodiment, the first digital resource may be pushed by the second object during a first period of time, and the second digital resource may refer to a digital resource pushed by the second object during a second period of time, the second period of time being a historical period of time prior to the first period of time, that is, the digital resource pushed by the second object during the historical period of time may be obtained, thereby generating the reference probability distribution based on the operation of the plurality of first objects on the digital resource. Specifically, a triggering duration consumed by the plurality of first objects to trigger the second digital resource from the first state and change the second digital resource to the second state respectively can be obtained; clustering the same trigger duration in the trigger durations of the plurality of first objects for the second digital resources to obtain a plurality of reference trigger duration groups; a reference probability distribution is generated based on the number of trigger durations contained by each reference trigger duration group.
The second digital resource and the first digital resource may be the same or different, and the number of the second digital resources may be a plurality of the second digital resources. The second period may refer to a period corresponding to a previous day of the first period, or a period corresponding to a previous week, or a period corresponding to a previous month, and so on. The second digital resource may also include three state changes, namely, an exposure state change to a selected state (first state change), a selected state change to a state in which the belonging landing page is opened (second state change), and a state in which the belonging landing page is opened to a state in which the belonging landing page is closed after being opened (third state change), respectively. In the embodiment of the application, a plurality of reference trigger time length groups can be acquired by referring to a method for acquiring a plurality of trigger time length groups, and a reference probability distribution is generated by referring to a method for generating an evaluation probability distribution, which is not described in detail.
Illustratively, the reference probability distribution corresponding to the change of the exposure state to the selected state may be expressed as: exp2clk_dp (i) = (0.2,0.1,0.1,0,0,0, …,0.1, …, 0), the reference probability distribution corresponding to the state where the selected state changes to the state where the belonging landing page is opened can be expressed as: clk2open_dp (i) = (0.2,0.1,0.1,0,0,0, …,0.1, …, 0), the reference probability distribution corresponding to the state of the belonging landing page being opened changing to the state of the belonging landing page being closed after being opened can be expressed as: open2close_dp (i) = (0.2,0.1,0.1,0,0,0, …,0.1, …, 0).
It is understood that a first state and a corresponding second state have an evaluation probability distribution and a reference probability distribution corresponding to the evaluation probability distribution, and N first states and N second states have N evaluation probability distributions and reference probability distributions corresponding to the N evaluation probability distributions, respectively.
That is, since the second digital resource includes three state changes, the reference probability distributions corresponding to the three state changes, that is, the reference probability distribution corresponding to the first state change, the reference probability distribution corresponding to the second state change, and the reference probability distribution corresponding to the third state change, can be generated, respectively. When the trigger results of the plurality of first objects for the first digital resource are identified based on the differences between the evaluation probability distribution and the reference probability distribution, the trigger results of the plurality of first objects for the first digital resource may be identified based on the differences between the evaluation probability distribution corresponding to the first state change and the reference probability distribution corresponding to the first state change, respectively. Based on a difference between the estimated probability distribution corresponding to the second state change and the reference probability distribution corresponding to the second state change, trigger results of the plurality of first objects for the first digital resource are identified. Based on the difference between the estimated probability distribution corresponding to the third state change and the reference probability distribution corresponding to the third state change, trigger results of the plurality of first objects for the first digital resource are identified.
In one embodiment, identifying the trigger result for the first digital resource for the plurality of first objects may include: obtaining the similarity between each evaluation probability distribution and the corresponding reference probability distribution; summing N similarities between the N estimated probability distributions and the corresponding N reference probability distributions to obtain comprehensive similarities between the N estimated probability distributions and the N reference probability distributions; if the comprehensive similarity is smaller than or equal to a first similarity threshold, determining that the triggering result is used for indicating that the state change triggered by the plurality of first objects on the first digital resource is abnormal; if the integrated similarity is greater than or equal to a second similarity threshold, determining that the triggering result is used for indicating that the state change triggered by the plurality of first objects on the first digital resource is normal; wherein the first similarity threshold is less than the second similarity threshold.
Wherein the similarity between the estimated probability distribution and the reference probability distribution may be used to reflect the difference between the probability distributions, the greater the similarity between the estimated probability distribution and the reference probability distribution, the smaller the difference between the estimated probability distribution and the reference probability distribution. The smaller the similarity between the estimated probability distribution and the reference probability distribution, the larger the difference between the estimated probability distribution and the reference probability distribution. Alternatively, the similarity calculation method may include, but is not limited to, KL divergence (Kullback-Leibler divergence), JS divergence (Jensen-Shannon divergence), cross Entropy (Cross Entropy), wasserstein distance, and the like.
Taking a similarity calculation method as an example for describing a KL divergence calculation method, obtaining the similarity between the evaluation probability distribution and the reference probability distribution, namely calculating the KL distance between the evaluation probability distribution and the reference probability distribution, wherein the KL divergence calculation formula is shown as a formula (1-1):
wherein H is p (Q) represents an estimated probability distribution, H (Q) represents a reference probability distribution, H p (Q)-(Q)Represents the KL distance between the estimated probability distribution and the reference probability distribution, P (x) represents the probability value in the estimated probability distribution, Q (x) represents the probability value in the reference probability distribution, x represents each numerical value in the probability distribution, and KL (q||p) represents the KL distance between the estimated probability distribution and the reference probability distribution.
The estimated probability distribution corresponding to the third state change is p= (0.6,0.4), the reference probability distribution corresponding to the third state change is q= (0.7,0.3), and the value of the KL3 divergence substituted into the above formula (1-1) is: KL3 (i) =0.7×log (0.7/0.6) +0.3×log (0.3/0.4). The KL distance may be used to measure the distance of two probability distributions, the larger the KL distance, the greater the difference between the two probability distributions. If the two probability distributions are identical, the KL distance between the two probability distributions is 0.
N similarities between the N evaluation probability distributions and the N reference probability distributions corresponding to the N evaluation probability distributions, namely, the similarities between the evaluation probability distribution corresponding to the first state change and the reference probability distribution corresponding to the first state change, the similarities between the evaluation probability distribution corresponding to the second state change and the reference probability distribution corresponding to the second state change, and the similarities between the evaluation probability distribution corresponding to the third state change and the reference probability distribution corresponding to the third state change can be calculated respectively through the formula.
In one embodiment, if the state change triggered by the plurality of first objects for the first digital resource only includes any one of the three state changes, a difference between the evaluation probability distribution corresponding to the state change and the reference probability distribution corresponding to the state change is obtained, and a trigger result of the plurality of first objects for the first digital resource may be identified. That is, the trigger results of the plurality of first objects for the first digital resource can be identified from the difference between the evaluation probability distribution corresponding to any one of the three state changes and the reference probability distribution corresponding to such a state change. For example, if the similarity between the estimated probability distribution corresponding to any one of the three state changes and the reference probability distribution corresponding to such state change is less than or equal to the first similarity threshold, determining that the trigger result is used to indicate that the state change triggered by the plurality of first objects on the first digital resource is abnormal. If the similarity between the estimated probability distribution corresponding to any one of the three state changes and the reference probability distribution corresponding to the state change is greater than a second similarity threshold, determining that the trigger result is used for indicating that the state change triggered by the plurality of first objects on the first digital resource is normal.
Wherein the first similarity threshold is less than the second similarity threshold. The similarity being less than or equal to the first similarity threshold indicates that there is a large difference between the two probability distributions, i.e. a large difference from the reference probability distribution, and indicates that the state change of the plurality of first objects triggered for the first digital resource is abnormal. The similarity being greater than the second similarity threshold indicates that the two probability distributions are substantially the same, i.e., the same or similar to the reference probability distribution, and that the state change triggered by the plurality of first objects for the first digital resource is normal.
For example, if the state change triggered by the plurality of first objects for the first digital resource is changed to the exposure state to the selected state, the similarity between the evaluation probability distribution corresponding to the exposure state to the selected state and the reference probability distribution corresponding to the state change may be obtained, and if the similarity is less than or equal to the first similarity threshold, the trigger result is determined to indicate that the state change triggered by the plurality of first objects for the first digital resource is abnormal. If the similarity is greater than the second similarity threshold, determining that the trigger result is used for indicating that the state change triggered by the plurality of first objects on the first digital resource is normal.
In another embodiment, if the state changes triggered by the plurality of first objects for the first digital resource include at least two of three state changes, a difference between the estimated probability distribution corresponding to the at least two state changes and the reference probability distribution corresponding to the at least two state changes is obtained, and a trigger result of the plurality of first objects for the first digital resource is identified. A state change corresponds to an estimated probability distribution, and a state change corresponds to a reference probability distribution.
Specifically, at least two similarities between the at least two estimated probability distributions and the corresponding at least two reference probability distributions may be summed to obtain a comprehensive similarity between the at least two estimated probability distributions and the at least two reference probability distributions; if the comprehensive similarity is smaller than or equal to a first similarity threshold, determining that the triggering result is used for indicating that the state change triggered by the plurality of first objects on the first digital resource is abnormal; if the integrated similarity is greater than or equal to the second similarity threshold, determining that the trigger result is used for indicating that the state change triggered by the plurality of first objects on the first digital resource is normal.
The integrated similarity may be used to indicate a similarity between at least two evaluation probability distributions and at least two corresponding reference probability distributions, and when the state changes of the plurality of first objects triggered by the first digital resources are at least two, the sum similarity obtained by summing the probability distributions respectively corresponding to the at least two state changes may reflect from at least two dimensions whether the state changes of the plurality of first objects triggered by the first digital resources are normal.
In one embodiment, if the state changes triggered by the plurality of first objects for the first digital resource include at least two of the three state changes, the total number of trigger durations of the plurality of first objects for the first digital resource under the at least two state changes may be weighted to determine a comprehensive similarity between the at least two evaluation probability distributions and the at least two reference probability distributions. Specifically, evaluation weights corresponding to the N similarities respectively may be obtained; the evaluation weight of any one of the N similarities refers to the total number of times that the first digital resource is triggered by the first objects from the N first states and is changed to the corresponding N second states; weighting the N similarities based on the evaluation weights of the N similarities to obtain N weighted similarities; and adding the N weighted similarities to obtain the comprehensive similarity.
The total number of the triggering time durations of the first objects for the first digital resource corresponding to the state that the exposure state changes to the selected state, the total number of the triggering time durations of the first objects for the first digital resource corresponding to the state that the selected state changes to the opened state of the landing page, and the total number of the triggering time durations of the first objects for the first digital resource corresponding to the state that the landing page is opened and the closed state of the landing page are all different. Therefore, the total number of the triggering time periods of the plurality of first objects corresponding to each state change aiming at the first digital resource is used for carrying out weighted summation on the similarity between the evaluation probability distribution corresponding to each state change and the reference probability distribution, the similarity can be adjusted by combining the total number corresponding to each state change, and finally the obtained comprehensive similarity can more accurately reflect the triggering result of the plurality of first objects aiming at the first digital resource, so that the abnormal recognition accuracy is improved.
Further optionally, after adding the N weighted similarities, the integrated similarity may be obtained based on a ratio between sums of evaluation weights corresponding to the N weighted similarities. For example, N is equal to 3, N weighted similarities are S1, S2, and S3, then the N weighted similarities are s1+s2+s3 after being added, and the sum of evaluation weights corresponding to the N similarities is pv1+pv2+pv3, then the overall similarity may be: (S1+S2+S3)/(Pv1+Pv2+Pv3).
Describing that the state change triggered by the plurality of first objects aiming at the first digital resource comprises three state changes as an example, wherein N is equal to 3, the evaluation probability distribution corresponding to the first state change is KL1 (i), and the total number of the triggering time periods of the plurality of first objects corresponding to the first state change aiming at the first digital resource is pv1; the evaluation probability distribution corresponding to the second state change is KL2 (i), and the total number of the triggering time periods of the plurality of first objects corresponding to the second state change for the first digital resource is pv2; the evaluation probability distribution corresponding to the third state change is KL3 (i), and the total number of the triggering time durations of the plurality of first objects corresponding to the third state change for the first digital resource is pv3, so that the comprehensive similarity can be obtained through calculation according to a formula (1-2):
KL all (i)=[KL1(i)*pv1+KL2(i)*pv2+KL3(i)*pv3]/(pv1+pv2+pv3)
(1-2)
Wherein KL is all (i) The comprehensive similarity is represented by KL1 (i) representing a KL distance between the estimated probability distribution corresponding to the first state change and the reference probability distribution corresponding to the first state change, KL2 (i) representing a KL distance between the estimated probability distribution corresponding to the second state change and the reference probability distribution corresponding to the second state change, KL3 (i) representing a KL distance between the estimated probability distribution corresponding to the third state change and the reference probability distribution corresponding to the third state change, and i representing the second object. pv1 represents a total number of trigger durations of the plurality of first objects corresponding to the first state change for the first digital resource, for example, a total number of times the digital resource is clicked. pv2 represents the total number of trigger durations of the plurality of first objects corresponding to the second state change for the first digital resource, for example, the total number of times after the landing page is opened. pv3 represents the total number of the trigger durations of the plurality of first objects corresponding to the third state change for the first digital resource, for example, the total number of times the landing page is closed after being opened.
That is, the trigger results of the plurality of first objects for the first digital resource can be identified from the difference between the evaluation probability distribution corresponding to any one of the three state changes and the reference probability distribution corresponding to such a state change. Or according to the difference between the evaluation probability distribution corresponding to at least two state changes in the three state changes and the reference probability distribution corresponding to the state changes, the triggering results of the plurality of first objects for the first digital resource are identified, which is not limited by the embodiment of the present application. The method has the advantages that through weighted summation of the similarity between the evaluation probability distribution corresponding to the multiple state changes and the reference probability distribution, the obtained comprehensive similarity can reflect whether the state changes triggered by the multiple first objects on the first digital resource are normal or not from three dimensions, namely, the three dimensions of the triggering time consumed from the exposure state change to the selected state, the triggering time consumed from the selected state change to the state of the opened landing page, and the triggering time consumed from the state of the opened landing page to the state of the closed landing page after the opened landing page are judged, and the accuracy of anomaly identification can be improved.
It can be understood that the embodiment of the application is described with respect to one second object, if a plurality of second objects exist, the state change triggered by the first digital resources pushed by the plurality of first objects with respect to each second object can be evaluated respectively by referring to the mode, so as to determine the triggering result of the first digital resources pushed by each second object, further determine whether each second object is abnormal, that is, whether each second object is cheated, and improve the accuracy of abnormality identification.
In one embodiment, the first digital resource may be an advertisement resource, and if the trigger result is used to indicate that the state change triggered by the plurality of first objects on the advertisement resource is abnormal, abnormal trigger prompt information for the advertisement resource is output; the abnormal triggering prompt information is used for prompting that the advertisement heat generated after the advertisement resource is triggered to change state is abnormal. The advertisement heat generated after the advertisement resource is triggered to change state may mean that the corresponding trigger of the first object on the advertisement resource is abnormal, for example, the trigger of clicking and viewing the advertisement resource is abnormal, or the trigger of closing the detail page for viewing the advertisement resource is abnormal, etc. That is, ad hotness anomalies may include, but are not limited to, a click of an ad resource by a plurality of first objects being anomalous, an exposure to an ad resource being anomalous, a view of an ad resource being anomalous, and a closing of a detail page of an ad resource being anomalous, etc.
Optionally, an abnormal triggering prompt message may be sent to the management terminal, and the management terminal may further determine a state change triggered by the advertisement resource by the plurality of first objects, so as to determine whether the second object is abnormal. By outputting the abnormal triggering prompt information, the advertisement heat generated after the advertisement resource is triggered to change in state can be prompted to be abnormal, and then the advertisement heat can be screened or processed later, so that the loss of a user or a server of the digital resource is reduced.
Optionally, it may be further determined which first objects of the plurality of first objects are abnormal with respect to the advertisement resource triggered state change, and whether the second object is a cheating object is further determined according to a ratio between the number of abnormal first objects and the total number of the plurality of first objects. For example, if the ratio between the number of abnormal first objects and the total number of the plurality of first objects is greater than the ratio threshold, then the second object is determined to be a cheating object. If the ratio between the number of abnormal first objects and the total number of the plurality of first objects is less than or equal to the ratio threshold, determining that the second object is not a cheating object. Further, if it is determined that the second object is not a cheating object, the triggering results of the plurality of first objects on the first digital resources may also be adjusted. For example, the trigger result indicates that the state change of the plurality of first objects triggered by the first digital resource is abnormal, and the adjusted trigger result may indicate that the state change of the plurality of first objects triggered by the first digital resource is normal.
Further optionally, if the ratio between the number of the abnormal first objects and the total number of the plurality of first objects is greater than the ratio threshold, after determining that the second object is a cheating object, the abnormal first object may be deleted for the state change triggered by the advertisement resource, for example, the trigger behavior data of the abnormal first object is deleted, so that the distributor or the server of the digital resource does not need to pay for the triggering operation of the first object for deleting part, and the loss may be reduced.
In the embodiment of the application, the triggering time consumed by the plurality of first objects to trigger the first digital resource from the first state and change the first digital resource to the second state is acquired; clustering the same trigger duration in the trigger durations of the plurality of first objects for the first digital resources to obtain a plurality of trigger duration groups; generating an evaluation probability distribution based on the number of trigger durations contained in each trigger duration group; a reference probability distribution is acquired, and trigger results of the plurality of first objects for the first digital resource are identified based on differences between the estimated probability distribution and the reference probability distribution. Since the evaluation probability distribution can represent the triggering time length of the actual consumption of the first digital resources triggered by the first objects, the difference between the evaluation probability distribution and the reference probability distribution can be determined by comparing the evaluation probability distribution with the reference probability distribution, so that whether the state change triggered by the first objects on the first digital resources is abnormal or not can be determined. By comparing the differences between the probability distributions generated by the two trigger durations, the anomaly recognition accuracy is higher than that of judging the click quantity and the click quantity of the digital resource, so that the recognition accuracy of whether the state change of the object triggered by the digital resource is abnormal can be improved. And when the triggering time length consumed by the plurality of first objects for triggering the first digital resources from the first state and changing to the second state respectively is acquired, corresponding evaluation probability distribution can be automatically generated, whether state change triggered by the digital resources is abnormal or not can be automatically identified based on the difference between the evaluation probability distribution and the reference probability distribution, cost can be saved, and data identification efficiency is improved.
Optionally, referring to fig. 4, fig. 4 is a flow chart of another data identification method according to an embodiment of the present application. The data identification method can be applied to computer equipment; as shown in fig. 4, the data identification method includes, but is not limited to, the following steps:
s201, acquiring the triggering time consumed by the plurality of first objects to trigger the first digital resource from the first state and change the first digital resource to the second state respectively.
S202, clustering is carried out on the same trigger duration in the trigger durations of the plurality of first objects aiming at the first digital resources, so that a plurality of trigger duration groups are obtained.
S203, based on the number of the trigger time durations contained in each trigger time duration group, an evaluation probability distribution is generated.
In the embodiment of the present application, the specific implementation manner of step S201 to step S203 may refer to the implementation manner of step S101 to step S103, and will not be described herein.
S204, acquiring a reference probability distribution, and taking the second object as the undetermined object if the similarity between the estimated probability distribution and the reference probability distribution is larger than a first similarity threshold and smaller than a second similarity threshold.
In the embodiment of the present application, the method for obtaining the reference probability distribution may refer to the method for obtaining the reference probability distribution in step S104, which is not described herein.
Wherein, there can be a plurality of pending objects, one pending object has a corresponding evaluation probability distribution, one pending object has a first digital resource pushed; the first digital resource is a digital resource pushed by the second object.
In one embodiment, if the number of the second objects is multiple, each second object may push the first digital resource, and the first digital resource pushed by each second object may be the same or different, for each second object in the second objects, a trigger duration consumed by the multiple first objects to trigger the first digital resource pushed by each second object from the first state and change to the second state may be obtained; clustering processing is carried out on the trigger time length corresponding to each second object respectively to obtain a plurality of trigger time length groups corresponding to each second object, and further evaluation probability distribution corresponding to each second object is generated; based on the similarity between the evaluation probability distribution corresponding to each second object and the reference probability distribution, the triggering results of the first digital resources pushed by the first objects for each second object are identified.
And if the similarity corresponding to each second object is smaller than or equal to the first similarity threshold, determining that the state change triggered by the first digital resources pushed by the plurality of first objects to each second object is abnormal. And if the similarity corresponding to each second object is larger than the second similarity threshold, determining that the state change triggered by the first digital resources pushed by the plurality of first objects to each second object is normal. If the similarity corresponding to a part of the second objects is smaller than or equal to the first similarity threshold, determining that the state change triggered by the first digital resources pushed by the plurality of first objects to the part of the second objects is abnormal. If the similarity corresponding to a part of the second objects is larger than the second similarity threshold, determining that the state change triggered by the first digital resources pushed by the plurality of first objects to the part of the second objects is normal. If the similarity corresponding to a part of the second objects is larger than the first similarity threshold and smaller than the second similarity threshold, the part of the second objects can be further judged, whether the state change triggered by the first digital resources pushed by the plurality of first objects to the part of the second objects is normal or not is determined, and the part of the second objects can be used as pending objects.
In the embodiment of the application, if the similarity between the estimated probability distribution and the reference probability distribution is greater than the first similarity threshold and smaller than the second similarity threshold, the difference between the two probability distributions is indicated to be smaller, and whether the state change triggered by the first digital resources pushed by the plurality of first objects to the second objects is abnormal is difficult to determine. Therefore, the second object can be considered as a suspected cheating second object, in this way, a plurality of suspected cheating second objects can be selected from a plurality of second objects, all the plurality of suspected cheating second objects are determined to be pending objects, and then the determined plurality of pending objects can be further judged to determine whether the determined object is a cheating object.
S205, clustering is carried out on the plurality of undetermined objects based on differences among the evaluation probability distribution corresponding to each undetermined object, so that K initial undetermined object groups are obtained.
In the embodiment of the application, clustering the plurality of undetermined objects may mean that two undetermined objects with differences smaller than a difference threshold are combined to the same initial undetermined object group, so that K initial undetermined object groups may be obtained. K is a positive integer, and the similarity between the evaluation probability distributions corresponding to every two objects to be determined in one initial object group is larger than or equal to a third similarity threshold.
Alternatively, the difference between the evaluation probability distributions respectively corresponding to each of the undetermined objects may be determined with reference to the aforementioned similarity calculation formula. For example, a similarity calculation formula may be used to calculate the similarity between every two undetermined objects, and clustering the plurality of undetermined objects may refer to dividing undetermined objects with a similarity greater than or equal to a third similarity threshold into the same initial undetermined object group, so as to obtain K initial undetermined object groups. By clustering the plurality of undetermined objects, undetermined objects with similar evaluation probability distribution can be divided into the same initial undetermined object group, and the evaluation probability distribution corresponding to undetermined objects in the same initial undetermined object group is similar.
S206, combining the K initial pending object groups to obtain one or more target pending object groups.
In the embodiment of the application, because a plurality of initial pending object groups are obtained, the plurality of initial pending object groups can be combined to obtain one or a plurality of target pending object groups. The merging process may be used to combine multiple initial pending objects together to form a large target pending object group with a high similarity between features of the pending objects in the target pending object group. The target set of pending objects may refer to a community, for example. Any target pending object group comprises one or more combined initial pending object groups, and the number of the same pending objects contained among the one or more initial pending object groups contained in any target pending object group is larger than or equal to a first number threshold. The first number threshold may be a difference between the total number of the pending objects in the initial pending object group having a smaller number of the pending objects in the plurality of initial pending object groups and 1, or the first number threshold may be set according to requirements, which is not limited in the embodiment of the present application.
Optionally, the method for merging the K initial pending object groups to obtain one or more target pending object groups may include: and calculating weight parameters among the K initial pending object groups, combining the initial pending object groups with the weight parameters larger than the weight threshold into one target pending object group, and thus obtaining one or more target pending object groups. The weight parameters among the K initial undetermined object groups can be calculated according to the formula (1-3):
weight=1/(KL+0.001) (1-3)
wherein weight represents a weight parameter between the K initial pending object groups, and KL represents a similarity distance between evaluation probability distributions corresponding to the K initial pending object groups.
S207, determining trigger results of the first digital resources of the first objects for each pending object based on one or more target pending object groups.
In the embodiment of the application, because one or more target pending object groups are determined, the quantity parameters and the clustering parameters of the one or more target pending object groups can be respectively acquired, and the triggering results of a plurality of first objects respectively aiming at the first digital resources of each pending object are determined based on the quantity parameters and the clustering parameters. The number parameter may be used to reflect the number of the pending objects in the target pending object group, and the clustering parameter may reflect the relevance between the pending objects, where the greater the clustering parameter, the stronger the relevance between the pending objects. The smaller the clustering parameter, the more relevant the association between the undetermined objects. If the number parameter of the target pending object group is greater than or equal to the second number threshold and the cluster parameter is greater than or equal to the coefficient threshold, it may be indicated that the state changes triggered by the plurality of first objects on the first digital resources pushed by the pending objects in the target pending object group are abnormal. If the number parameter of the target pending object group is smaller than the second number threshold and the cluster parameter is smaller than the coefficient threshold, the state change triggered by the first digital resources pushed by the pending objects in the target pending object group by the plurality of first objects can be indicated to be normal.
In one embodiment, the number parameter of the target pending object group may include a first number of pending objects included in the target pending object group, the cluster parameter of the target pending object group may include a comprehensive cluster evaluation coefficient of the target pending object group, a network map corresponding to the target pending object may be constructed, and a trigger result of the plurality of first objects for the first digital resources of each pending object may be determined by combining the network map. Specifically, a first number of pending objects contained in any target pending object group is obtained; constructing a network map based on any target pending object group; acquiring a second number of connected edges between neighbor nodes of any network node in the network graph, acquiring a third number of pairwise combination between the neighbor nodes of any network node, and determining a ratio between the second number and the third number as a clustering evaluation coefficient corresponding to any network node; acquiring the average value between the clustering evaluation coefficients corresponding to each network node in the network graph as the comprehensive clustering evaluation coefficient for any target undetermined object group; if the first number is greater than or equal to a second number threshold and the comprehensive clustering evaluation coefficient is greater than or equal to a coefficient threshold, determining that state changes triggered by the plurality of first objects on the first digital resources pushed by the undetermined objects in any target undetermined object group are abnormal; if the first number is smaller than the second number threshold or the comprehensive clustering evaluation coefficient is smaller than the coefficient threshold, determining that the state changes triggered by the first digital resources pushed by the plurality of first objects to the undetermined objects in any target undetermined object group are normal.
One undetermined object in any target undetermined object group is provided with a network node in the network graph, undetermined objects with similarity between corresponding evaluation probability distribution being greater than or equal to the third similarity threshold value are similar, network nodes of similar undetermined objects in the network graph are provided with connecting edges, and network nodes with connecting edges in the network graph are mutually adjacent nodes. The comprehensive clustering evaluation coefficient can be used for reflecting the relevance between the undetermined objects in any target undetermined object group, and the larger the comprehensive clustering evaluation coefficient is, the stronger the relevance between undetermined objects in any target undetermined object group is, and the more the number of connecting edges exist between nodes is.
In a specific implementation, when the network graph is constructed based on any target pending object group, the network graph can be constructed based on the weight parameters between the pending objects in any target pending object group, if the weight parameters between every two pending objects in any target pending object group are greater than the weight threshold, it is determined that a connecting edge is added between the network nodes corresponding to the two pending objects in the initial network graph, and if the connecting edge exists between the two pending objects corresponding to the network nodes in the initial network graph. Therefore, whether the connecting edge exists between the network nodes of any two undetermined objects in the target undetermined object group can be determined from the network graph.
In one possible case, if there is a connection edge between any two nodes in the network graph G, and the number of network nodes in the graph G is z, where z is a positive integer, the network graph G may be referred to as a complete subgraph, i.e., the initial pending object group may refer to a complete subgraph, i.e., there is a connection edge between any two nodes in the initial pending object group. If z is 5, the network graph G may be shown in fig. 5, where fig. 5 is a schematic diagram of a composition structure of a complete sub-graph provided by an embodiment of the present application, and the network graph in fig. 5 includes 5 network nodes, where a connecting edge exists between any two nodes.
Further, a plurality of initial pending object groups may be combined to obtain a target pending object group. If the initial undetermined object group is a complete sub-graph, the target undetermined object group can refer to a community, and adjacent complete sub-graphs can be combined into one community if the conditions are met. For example, in another possible scenario, two full subgraphs are adjacent if there are z-1 common network nodes between them. As shown in fig. 6, fig. 6 is a schematic diagram of dividing communities, in which two complete subgraphs with the number of network nodes being 3 respectively exist in fig. 6, that is, z is 3, and 2 common network nodes exist in the two complete subgraphs, so that the complete subgraphs with the number of network nodes being 3 are adjacent, and the largest set formed by a plurality of adjacent complete subgraphs adjacent to each other can be called a community, as in fig. 6, the two complete subgraphs can be divided into a community, that is, community 1. If a plurality of complete sub-graphs exist, the adjacent complete sub-graphs can be respectively combined to finally obtain one or more communities. As shown in FIG. 7, FIG. 7 is a schematic diagram of a composition structure of communities provided in an embodiment of the present application, where FIG. 7 may include 4 communities, namely community 1, community 2, community 3, and community 4. Where a community may refer to a target set of pending objects. In this way, a plurality of initial pending objects can be combined into one or more target pending object groups, i.e. a plurality of complete subgraphs are combined into one or more communities, and then whether each community is a cheating group partner can be judged subsequently.
Wherein one of the target set of pending objects corresponds to one of the network nodes in the network map. The number of the connecting edges between the network nodes of the undetermined objects and the number of the neighbor nodes of the network nodes of each undetermined object can be obtained from the network graph by constructing the network graph corresponding to one or more target undetermined object groups.
Alternatively, the manner of obtaining the cluster evaluation coefficients may be as shown in the formula (1-4):
wherein v represents any network node, e v Representing the cluster evaluation coefficient corresponding to any network node, e 1 Representing the number of links that have between neighboring nodes of any network node, e 2 Representing the number of pairwise combinations between neighboring nodes of any one network node.
Fig. 8 is a schematic diagram of a composition structure of a plurality of communities according to an embodiment of the present application, where, as shown in fig. 8a, the number of connected edges between neighboring nodes of a network node v is 6, and the number of pairwise combinations between neighboring nodes of the network node v isThe cluster evaluation coefficient corresponding to the network node v is 6/6=1. As shown in 8b of fig. 8, the number of connected edges between the neighboring nodes of the network node v is 3, and the number of pairwise combinations between the neighboring nodes of the network node v is +. >The cluster evaluation coefficient corresponding to the network node v is 3/6=0.5. As shown in 8c of fig. 8, the number of connected edges between the neighboring nodes of the network node v is 0, and the number of pairwise combinations between the neighboring nodes of the network node v is +.>The cluster evaluation coefficient corresponding to the network node v is 0/6=0.
When the comprehensive cluster evaluation coefficients of the target pending object group formed by 8b in fig. 8 are obtained, the cluster evaluation coefficients of 0.5, y1, y2, y3 and y4 of the network node v, network node 1, network node 2, network node 3 and network node 4 may be respectively obtained, and then the comprehensive cluster evaluation coefficients of the target pending object group are (0.5+y1+y2+y3+y4)/5.
If the first number is greater than or equal to the second number threshold and the comprehensive clustering evaluation coefficient is greater than or equal to the coefficient threshold, the method indicates that the number of the undetermined objects in the target undetermined object group and the connectivity between undetermined objects in the target undetermined object group meet the requirements of cheating and group partner, then the second objects in the target undetermined object group are determined to be the cheating and group partner, and the state changes triggered by the first digital resources pushed by the undetermined objects in any target undetermined object group by the first objects are determined to be abnormal. If the first number is smaller than the second number threshold or the comprehensive clustering evaluation coefficient is smaller than the coefficient threshold, the number of the undetermined objects in the target undetermined object group or the connectivity among undetermined objects in the target undetermined object group does not meet the requirement of cheating group partner, the second objects in the target undetermined object group are determined not to be the cheating group partner, and the state change triggered by the first digital resources pushed by the undetermined objects in any target undetermined object group by the first objects is determined to be normal.
By the data identification method in the embodiment of the application, more than 700 groups (target pending object groups) are found in the experimental process, so that whether the number of pending objects and the comprehensive clustering evaluation coefficient of each group meet the requirements can be analyzed, if so, the online matching warning can be further carried out, for example, a second object in each cheating group is determined, and reminding, warning and the like are carried out on each second object, thereby reducing the loss of a user or a service provider of digital resources.
In the embodiment of the application, the evaluation probability distribution corresponding to each second object is generated by extracting the triggering time length of the state change triggered by the first digital resource pushed by a plurality of first objects for each second object, the second objects inconsistent with the standard probability distribution (the reference probability distribution, such as the probability distribution of the comprehensive index of the evidence) are found out by a KL divergence measurement method, and then a graph network is constructed according to the similarity degree among different second objects, so that obviously aggregated machine cheating partners can be found out. The accuracy and coverage rate of cheating detection can be improved by comprehensively analyzing the differences between the evaluation probability distribution corresponding to the plurality of second objects and the standard probability distribution and the similarity of the evaluation probability distribution among the plurality of second objects to judge whether the second objects are cheated. In addition, by the method, the evaluation probability distribution corresponding to the second object can be automatically constructed to be compared with the standard probability distribution, and the second objects are compared in pairs, so that abnormal cheating groups can be mined, unsupervised abnormal recognition is achieved, the cheating behaviors can be recognized without malicious samples, and compared with codes of decompiled malicious users from the bottom layer, the method does not need to acquire malicious user clients, and saves labor cost.
In the embodiment of the application, the triggering time consumed by the plurality of first objects to trigger the first digital resource from the first state and change the first digital resource to the second state is acquired; clustering the same trigger duration in the trigger durations of the plurality of first objects for the first digital resources to obtain a plurality of trigger duration groups; generating an evaluation probability distribution based on the number of trigger durations contained in each trigger duration group; a reference probability distribution is acquired, and trigger results of the plurality of first objects for the first digital resource are identified based on differences between the estimated probability distribution and the reference probability distribution. Since the evaluation probability distribution can represent the triggering time length of the actual consumption of the first digital resources triggered by the first objects, the difference between the evaluation probability distribution and the reference probability distribution can be determined by comparing the evaluation probability distribution with the reference probability distribution, so that whether the state change triggered by the first objects on the first digital resources is abnormal or not can be determined. By comparing the differences between the probability distributions generated by the two trigger durations, the anomaly recognition accuracy is higher than that of judging the click quantity and the click quantity of the digital resource, so that the recognition accuracy of whether the state change of the object triggered by the digital resource is abnormal can be improved. And when the triggering time length consumed by the plurality of first objects for triggering the first digital resources from the first state and changing to the second state respectively is acquired, corresponding evaluation probability distribution can be automatically generated, whether state change triggered by the digital resources is abnormal or not can be automatically identified based on the difference between the evaluation probability distribution and the reference probability distribution, cost can be saved, and data identification efficiency is improved.
The method of the embodiment of the application is described above, and the device of the embodiment of the application is described below.
Referring to fig. 9, fig. 9 is a schematic diagram of a composition structure of a data identification device according to an embodiment of the present application, where the data identification device may be deployed on a computer device; the data identification device can be used for executing corresponding steps in the data identification method provided by the embodiment of the application. The data identification device 90 includes:
a data acquisition unit 901, configured to acquire a trigger duration consumed by a plurality of first objects to trigger and change a first digital resource from a first state to a second state, respectively;
the data clustering unit 902 is configured to perform clustering processing on the plurality of first objects for the same trigger duration in the trigger durations of the first digital resource, so as to obtain a plurality of trigger duration groups;
a probability obtaining unit 903, configured to generate an evaluation probability distribution based on the number of trigger durations included in each trigger duration group; the evaluation probability distribution is used for evaluating state changes triggered by the first digital resources by the plurality of first objects;
a data identifying unit 904, configured to obtain a reference probability distribution, and identify trigger results of the plurality of first objects for the first digital resource based on a difference between the estimated probability distribution and the reference probability distribution;
Wherein the reference probability distribution is a reference probability distribution for evaluating a state change triggered by the first digital resource by the plurality of first objects, and the trigger result is used for indicating whether the state change triggered by the first digital resource by the plurality of first objects is normal or abnormal.
Optionally, the first digital resource is pushed by the second object for a first period of time; the data identification unit 904 is specifically configured to:
acquiring the triggering time consumed by a plurality of first objects to trigger and change the second digital resources from the first state to the second state respectively; the second digital resource is a digital resource pushed by the second object for a second period of time, the second period of time being a historical period of time prior to the first period of time;
clustering the plurality of first objects aiming at the same trigger duration in the trigger durations of the second digital resources to obtain a plurality of reference trigger duration groups;
the reference probability distribution is generated based on the number of trigger durations contained by each reference trigger duration group.
Optionally, the first state and the second state are N, where N is a positive integer, and a first object triggers the first digital resource from a first state and generates a trigger duration after changing to a second state;
Wherein the N first states include at least one of: the first digital resource is in an exposure state, the first digital resource is in a selected state, or the first digital resource is in a state that the landing page of the first digital resource is opened;
the N second states include at least one of: the first digital resource is in a selected state, and the first digital resource is in a state that the belonging landing page is opened, or in a state that the belonging landing page is closed after being opened;
if one first state is that the first digital resource is in an exposure state, a second state corresponding to the first state is that the first digital resource is in a selected state;
if one first state is that the first digital resource is in a selected state, a second state corresponding to the first state is that the first digital resource is in a state that the landing page to which the first digital resource belongs is opened;
if a first state is a state that the first digital resource is in the opened state of the landing page, a second state corresponding to the first state is a state that the first digital resource is closed after the first digital resource is in the opened state of the landing page.
Optionally, a first state and a corresponding second state have an evaluation probability distribution and a reference probability distribution corresponding to the evaluation probability distribution, and N first states and N second states have N evaluation probability distributions and reference probability distributions corresponding to the N evaluation probability distributions respectively; the data identification unit 904 is specifically configured to:
Obtaining the similarity between each evaluation probability distribution and the corresponding reference probability distribution;
adding the N similarities between the N estimated probability distributions and the N reference probability distributions to obtain the comprehensive similarity between the N estimated probability distributions and the N reference probability distributions;
if the integrated similarity is less than or equal to a first similarity threshold, determining that the trigger result is used for indicating that the state change triggered by the plurality of first objects on the first digital resource is abnormal;
if the integrated similarity is greater than or equal to a second similarity threshold, determining that the trigger result is used for indicating that the state change triggered by the plurality of first objects on the first digital resource is normal;
wherein the first similarity threshold is less than the second similarity threshold.
Optionally, the data identifying unit 904 is specifically configured to:
acquiring evaluation weights corresponding to the N similarities respectively; the evaluation weight of any one of the N similarities refers to the total number of times that the first object triggers the first digital resource from the N first states and changes to the corresponding N second states;
weighting the N similarity based on the evaluation weights of the N similarity to obtain N weighted similarity;
And adding the N weighted similarities to obtain the comprehensive similarity.
Optionally, the first digital resource is a digital resource pushed by a second object; the data identification unit 904 is specifically configured to:
if the similarity between the estimated probability distribution and the reference probability distribution is greater than a first similarity threshold and less than a second similarity threshold, taking the second object as a pending object; the number of the undetermined objects is multiple, one undetermined object is provided with one corresponding evaluation probability distribution, and one undetermined object is provided with one pushed first digital resource;
based on the difference between the evaluation probability distributions corresponding to each undetermined object, clustering the undetermined objects to obtain K initial undetermined object groups; k is a positive integer, and the similarity between the evaluation probability distributions corresponding to every two objects to be determined in an initial object group is larger than or equal to a third similarity threshold;
combining the K initial pending object groups to obtain one or more target pending object groups; any target pending object group comprises one or more combined initial pending object groups, and the number of the same pending objects contained among the one or more initial pending object groups contained in any target pending object group is greater than or equal to a first number threshold;
Based on the one or more target pending object groups, determining a trigger result of the first digital resource for the plurality of first objects for each of the pending objects, respectively.
Optionally, the data identifying unit 904 is specifically configured to:
acquiring a first number of pending objects contained in any target pending object group;
constructing a network map based on the any target pending object group; one undetermined object in any target undetermined object group is provided with a network node in the network graph, undetermined objects with similarity between corresponding evaluation probability distribution being greater than or equal to the third similarity threshold value are similar, the network nodes of the similar undetermined objects in the network graph are provided with connecting edges, and the network nodes with connecting edges in the network graph are mutually adjacent nodes;
acquiring a second number of connected edges between neighbor nodes of any network node in the network graph, acquiring a third number of pairwise combination between the neighbor nodes of any network node, and determining a ratio between the second number and the third number as a clustering evaluation coefficient corresponding to any network node;
acquiring the average value between the clustering evaluation coefficients corresponding to each network node in the network graph as the comprehensive clustering evaluation coefficient for any target undetermined object group;
If the first number is greater than or equal to a second number threshold and the comprehensive clustering evaluation coefficient is greater than or equal to a coefficient threshold, determining that state changes triggered by the plurality of first objects on the first digital resources pushed by the undetermined objects in any target undetermined object group are abnormal;
if the first number is smaller than the second number threshold or the comprehensive clustering evaluation coefficient is smaller than the coefficient threshold, determining that the state changes triggered by the first digital resources pushed by the plurality of first objects to the undetermined objects in any target undetermined object group are normal.
Optionally, the first digital resource is an advertising resource; the data identification device 90 further comprises a data output unit 905 for:
if the triggering result is used for indicating that the state change triggered by the plurality of first objects on the advertisement resource is abnormal, outputting abnormal triggering prompt information aiming at the advertisement resource;
the abnormal triggering prompt information is used for prompting that the advertisement heat generated after the advertisement resource is triggered to change state is abnormal.
It should be noted that, in the embodiment corresponding to fig. 9, the content not mentioned may be referred to the description of the method embodiment, and will not be repeated here.
In the embodiment of the application, the triggering time consumed by the plurality of first objects to trigger the first digital resource from the first state and change the first digital resource to the second state is acquired; clustering the same trigger duration in the trigger durations of the plurality of first objects for the first digital resources to obtain a plurality of trigger duration groups; generating an evaluation probability distribution based on the number of trigger durations contained in each trigger duration group; a reference probability distribution is acquired, and trigger results of the plurality of first objects for the first digital resource are identified based on differences between the estimated probability distribution and the reference probability distribution. Since the evaluation probability distribution can represent the triggering time length of the actual consumption of the first digital resources triggered by the first objects, the difference between the evaluation probability distribution and the reference probability distribution can be determined by comparing the evaluation probability distribution with the reference probability distribution, so that whether the state change triggered by the first objects on the first digital resources is abnormal or not can be determined. By comparing the differences between the probability distributions generated by the two trigger durations, the anomaly recognition accuracy is higher than that of judging the click quantity and the click quantity of the digital resource, so that the recognition accuracy of whether the state change of the object triggered by the digital resource is abnormal can be improved. And when the triggering time length consumed by the plurality of first objects for triggering the first digital resources from the first state and changing to the second state respectively is acquired, corresponding evaluation probability distribution can be automatically generated, whether state change triggered by the digital resources is abnormal or not can be automatically identified based on the difference between the evaluation probability distribution and the reference probability distribution, cost can be saved, and data identification efficiency is improved.
Referring to fig. 10, fig. 10 is a schematic diagram of a composition structure of a computer device according to an embodiment of the present application. As shown in fig. 10, the above-mentioned computer device may include: a processor 1001 and a memory 1002. Optionally, the computer device may further include a network interface or a power module. Data may be exchanged between the processor 1001 and the memory 1002.
The processor 1001 may be a central processing unit (Central Processing Unit, CPU) which may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The network interface may include input devices, such as a control panel, microphone, receiver, etc., and/or output devices, such as a display screen, transmitter, etc., which are not shown.
The memory 1002 may include read only memory and random access memory, and provides program instructions and data to the processor 1001. A portion of memory 1002 may also include non-volatile random access memory. Wherein the processor 1001 is configured to execute, when calling the program instructions:
acquiring the triggering time consumed by a plurality of first objects to trigger the first digital resource from the first state and change the first digital resource to the second state respectively;
clustering the plurality of first objects aiming at the same trigger duration in the trigger durations of the first digital resources to obtain a plurality of trigger duration groups;
generating an evaluation probability distribution based on the number of trigger durations contained in each trigger duration group; the evaluation probability distribution is used for evaluating state changes triggered by the first digital resources by the plurality of first objects;
acquiring a reference probability distribution, and identifying trigger results of the plurality of first objects for the first digital resource based on differences between the estimated probability distribution and the reference probability distribution;
wherein the reference probability distribution is a reference probability distribution for evaluating a state change triggered by the first digital resource by the plurality of first objects, and the trigger result is used for indicating whether the state change triggered by the first digital resource by the plurality of first objects is normal or abnormal.
Optionally, the first digital resource is pushed by the second object for a first period of time; the processor 1001 is specifically configured to:
acquiring the triggering time consumed by a plurality of first objects to trigger and change the second digital resources from the first state to the second state respectively; the second digital resource is a digital resource pushed by the second object for a second period of time, the second period of time being a historical period of time prior to the first period of time;
clustering the plurality of first objects aiming at the same trigger duration in the trigger durations of the second digital resources to obtain a plurality of reference trigger duration groups;
the reference probability distribution is generated based on the number of trigger durations contained by each reference trigger duration group.
Optionally, the first state and the second state are N, where N is a positive integer, and a first object triggers the first digital resource from a first state and generates a trigger duration after changing to a second state;
wherein the N first states include at least one of: the first digital resource is in an exposure state, the first digital resource is in a selected state, or the first digital resource is in a state that the landing page of the first digital resource is opened;
The N second states include at least one of: the first digital resource is in a selected state, and the first digital resource is in a state that the belonging landing page is opened, or in a state that the belonging landing page is closed after being opened;
if one first state is that the first digital resource is in an exposure state, a second state corresponding to the first state is that the first digital resource is in a selected state;
if one first state is that the first digital resource is in a selected state, a second state corresponding to the first state is that the first digital resource is in a state that the landing page to which the first digital resource belongs is opened;
if a first state is a state that the first digital resource is in the opened state of the landing page, a second state corresponding to the first state is a state that the first digital resource is closed after the first digital resource is in the opened state of the landing page.
Optionally, a first state and a corresponding second state have an evaluation probability distribution and a reference probability distribution corresponding to the evaluation probability distribution, and N first states and N second states have N evaluation probability distributions and reference probability distributions corresponding to the N evaluation probability distributions respectively; the processor 1001 is specifically configured to:
Obtaining the similarity between each evaluation probability distribution and the corresponding reference probability distribution;
adding the N similarities between the N estimated probability distributions and the N reference probability distributions to obtain the comprehensive similarity between the N estimated probability distributions and the N reference probability distributions;
if the integrated similarity is less than or equal to a first similarity threshold, determining that the trigger result is used for indicating that the state change triggered by the plurality of first objects on the first digital resource is abnormal;
if the integrated similarity is greater than or equal to a second similarity threshold, determining that the trigger result is used for indicating that the state change triggered by the plurality of first objects on the first digital resource is normal;
wherein the first similarity threshold is less than the second similarity threshold.
Optionally, the processor 1001 is specifically configured to:
acquiring evaluation weights corresponding to the N similarities respectively; the evaluation weight of any one of the N similarities refers to the total number of times that the first object triggers the first digital resource from the N first states and changes to the corresponding N second states;
weighting the N similarity based on the evaluation weights of the N similarity to obtain N weighted similarity;
And adding the N weighted similarities to obtain the comprehensive similarity.
Optionally, the first digital resource is a digital resource pushed by a second object; the processor 1001 is specifically configured to:
if the similarity between the estimated probability distribution and the reference probability distribution is greater than a first similarity threshold and less than a second similarity threshold, taking the second object as a pending object; the number of the undetermined objects is multiple, one undetermined object is provided with one corresponding evaluation probability distribution, and one undetermined object is provided with one pushed first digital resource;
based on the difference between the evaluation probability distributions corresponding to each undetermined object, clustering the undetermined objects to obtain K initial undetermined object groups; k is a positive integer, and the similarity between the evaluation probability distributions corresponding to every two objects to be determined in an initial object group is larger than or equal to a third similarity threshold;
combining the K initial pending object groups to obtain one or more target pending object groups; any target pending object group comprises one or more combined initial pending object groups, and the number of the same pending objects contained among the one or more initial pending object groups contained in any target pending object group is greater than or equal to a first number threshold;
Based on the one or more target pending object groups, determining a trigger result of the first digital resource for the plurality of first objects for each of the pending objects, respectively.
Optionally, the processor 1001 is specifically configured to:
acquiring a first number of pending objects contained in any target pending object group;
constructing a network map based on the any target pending object group; one undetermined object in any target undetermined object group is provided with a network node in the network graph, undetermined objects with similarity between corresponding evaluation probability distribution being greater than or equal to the third similarity threshold value are similar, the network nodes of the similar undetermined objects in the network graph are provided with connecting edges, and the network nodes with connecting edges in the network graph are mutually adjacent nodes;
acquiring a second number of connected edges between neighbor nodes of any network node in the network graph, acquiring a third number of pairwise combination between the neighbor nodes of any network node, and determining a ratio between the second number and the third number as a clustering evaluation coefficient corresponding to any network node;
acquiring the average value between the clustering evaluation coefficients corresponding to each network node in the network graph as the comprehensive clustering evaluation coefficient for any target undetermined object group;
If the first number is greater than or equal to a second number threshold and the comprehensive clustering evaluation coefficient is greater than or equal to a coefficient threshold, determining that state changes triggered by the plurality of first objects on the first digital resources pushed by the undetermined objects in any target undetermined object group are abnormal;
if the first number is smaller than the second number threshold or the comprehensive clustering evaluation coefficient is smaller than the coefficient threshold, determining that the state changes triggered by the first digital resources pushed by the plurality of first objects to the undetermined objects in any target undetermined object group are normal.
Optionally, the processor 1001 is further configured to:
if the triggering result is used for indicating that the state change triggered by the plurality of first objects on the advertisement resource is abnormal, outputting abnormal triggering prompt information aiming at the advertisement resource;
the abnormal triggering prompt information is used for prompting that the advertisement heat generated after the advertisement resource is triggered to change state is abnormal.
In the embodiment of the application, the triggering time consumed by the plurality of first objects to trigger the first digital resource from the first state and change the first digital resource to the second state is acquired; clustering the same trigger duration in the trigger durations of the plurality of first objects for the first digital resources to obtain a plurality of trigger duration groups; generating an evaluation probability distribution based on the number of trigger durations contained in each trigger duration group; a reference probability distribution is acquired, and trigger results of the plurality of first objects for the first digital resource are identified based on differences between the estimated probability distribution and the reference probability distribution. Since the evaluation probability distribution can represent the triggering time length of the actual consumption of the first digital resources triggered by the first objects, the difference between the evaluation probability distribution and the reference probability distribution can be determined by comparing the evaluation probability distribution with the reference probability distribution, so that whether the state change triggered by the first objects on the first digital resources is abnormal or not can be determined. By comparing the differences between the probability distributions generated by the two trigger durations, the anomaly recognition accuracy is higher than that of judging the click quantity and the click quantity of the digital resource, so that the recognition accuracy of whether the state change of the object triggered by the digital resource is abnormal can be improved. And when the triggering time length consumed by the plurality of first objects for triggering the first digital resources from the first state and changing to the second state respectively is acquired, corresponding evaluation probability distribution can be automatically generated, whether state change triggered by the digital resources is abnormal or not can be automatically identified based on the difference between the evaluation probability distribution and the reference probability distribution, cost can be saved, and data identification efficiency is improved.
Optionally, the program instructions may further implement other steps of the method in the above embodiment when executed by the processor, which is not described herein.
The embodiments of the present application also provide a computer readable storage medium storing a computer program comprising program instructions which, when executed by a computer, cause the computer to perform a method as in the previous embodiments, the computer being part of a computer device as mentioned above. As an example, the program instructions may be executed on one computer device or on multiple computer devices located at one site, or alternatively, on multiple computer devices distributed across multiple sites and interconnected by a communication network, which may constitute a blockchain network.
Embodiments of the present application also provide a computer program product or computer program comprising computer instructions which, when executed by a processor, implement some or all of the steps of the above-described method. For example, the computer instructions are stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the steps performed in the embodiments of the methods described above.
Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, may include processes of the embodiments of the methods as described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random-access Memory (Random Access Memory, RAM), or the like.
The foregoing disclosure is illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims (12)

1. A method of data identification, the method comprising:
acquiring the triggering time consumed by a plurality of first objects to trigger the first digital resource from the first state and change the first digital resource to the second state respectively;
clustering the plurality of first objects aiming at the same trigger duration in the trigger durations of the first digital resources to obtain a plurality of trigger duration groups;
generating an evaluation probability distribution based on the number of trigger durations contained in each trigger duration group; the evaluation probability distribution is used for evaluating state changes triggered by the first digital resources by the plurality of first objects;
Acquiring a reference probability distribution, and identifying trigger results of the plurality of first objects for the first digital resource based on differences between the evaluation probability distribution and the reference probability distribution;
wherein the reference probability distribution is a reference probability distribution for evaluating state changes triggered by the plurality of first objects on the first digital resource, and the trigger result is used for indicating whether the state changes triggered by the plurality of first objects on the first digital resource are normal or abnormal.
2. The method of claim 1, wherein the first digital resource is pushed by a second object over a first period of time; the acquiring the reference probability distribution includes:
acquiring the triggering time consumed by a plurality of first objects to trigger and change the second digital resources from the first state to the second state respectively; the second digital resource is a digital resource pushed by the second object for a second period of time, the second period of time being a historical period of time prior to the first period of time;
clustering the same trigger duration in the trigger durations of the plurality of first objects for the second digital resources to obtain a plurality of reference trigger duration groups;
The reference probability distribution is generated based on the number of trigger durations contained by each reference trigger duration group.
3. The method of claim 1, wherein the first state and the second state are each N, N being a positive integer, a first object triggering the first digital resource from a first state and changing to a second state to generate a trigger duration;
wherein the N first states include at least one of: the first digital resource is in an exposure state, the first digital resource is in a selected state, or the first digital resource is in a state that the landing page of the first digital resource is opened;
the N second states include at least one of: the first digital resource is in a selected state, and the first digital resource is in a state that the landing page of the first digital resource is opened, or in a state that the landing page of the first digital resource is closed after being opened;
if one first state is that the first digital resource is in an exposure state, a second state corresponding to the first state is that the first digital resource is in a selected state;
if one first state is that the first digital resource is in a selected state, a second state corresponding to the first state is that the first digital resource is in a state that the landing page to which the first digital resource belongs is opened;
If a first state is a state that the first digital resource is in the opened state of the landing page, a second state corresponding to the first state is a state that the first digital resource is closed after the first digital resource is in the opened state of the landing page.
4. A method according to claim 3, wherein a first state and a corresponding second state have an estimated probability distribution and a reference probability distribution corresponding to the estimated probability distribution, N first states and N second states having N estimated probability distributions and reference probability distributions corresponding to the N estimated probability distributions, respectively;
the identifying trigger results of the plurality of first objects for the first digital resource based on differences between the estimated probability distribution and the reference probability distribution, comprising:
obtaining the similarity between each evaluation probability distribution and the corresponding reference probability distribution;
adding the N similarities between the N evaluation probability distributions and the corresponding N reference probability distributions to obtain the comprehensive similarity between the N evaluation probability distributions and the N reference probability distributions;
if the comprehensive similarity is smaller than or equal to a first similarity threshold, determining that the triggering result is used for indicating that the state change triggered by the plurality of first objects on the first digital resource is abnormal;
If the integrated similarity is greater than or equal to a second similarity threshold, determining that the trigger result is used for indicating that the state change triggered by the plurality of first objects on the first digital resource is normal;
wherein the first similarity threshold is less than the second similarity threshold.
5. The method of claim 4, wherein summing the N similarities between the N estimated probability distributions and the corresponding N reference probability distributions results in a composite similarity between the N estimated probability distributions and the N reference probability distributions, comprising:
acquiring evaluation weights corresponding to the N similarities respectively; the evaluation weight of any one of the N similarities refers to the total number of times that the first digital resource is triggered by the N first states and changed to the corresponding N second states by the plurality of first objects;
weighting the N similarity based on the evaluation weights of the N similarity to obtain N weighted similarity;
and adding the N weighted similarities to obtain the comprehensive similarity.
6. The method of claim 1, wherein the first digital resource is a digital resource pushed by a second object; the identifying trigger results of the plurality of first objects for the first digital resource based on differences between the estimated probability distribution and the reference probability distribution, comprising:
If the similarity between the evaluation probability distribution and the reference probability distribution is greater than a first similarity threshold and less than a second similarity threshold, taking the second object as a pending object; the number of the undetermined objects is multiple, one undetermined object has a corresponding evaluation probability distribution, and the other undetermined object has a pushed first digital resource;
clustering a plurality of undetermined objects based on differences among the evaluation probability distributions corresponding to each undetermined object respectively to obtain K initial undetermined object groups; k is a positive integer, and the similarity between the evaluation probability distributions corresponding to the two objects to be determined in one initial object group is larger than or equal to a third similarity threshold;
combining the K initial pending object groups to obtain one or more target pending object groups; any target pending object group comprises one or more combined initial pending object groups, and the number of the same pending objects contained among the one or more initial pending object groups contained in any target pending object group is greater than or equal to a first number threshold;
based on the one or more target pending object groups, determining trigger results of the first digital resources of the plurality of first objects for each of the pending objects, respectively.
7. The method of claim 6, wherein the determining, based on the one or more target pending object groups, a trigger result of the first digital resource for the plurality of first objects for each of the pending objects, respectively, comprises:
acquiring a first number of undetermined objects contained in any target undetermined object group;
constructing a network map based on the any target pending object group; one undetermined object in any target undetermined object group is provided with a network node in the network graph, undetermined objects with similarity between corresponding evaluation probability distribution being greater than or equal to the third similarity threshold value are similar, network nodes of similar undetermined objects in the network graph are provided with connecting edges, and network nodes with connecting edges in the network graph are neighboring nodes;
acquiring a second number of connected edges between neighbor nodes of any network node in the network graph, acquiring a third number of pairwise combination between the neighbor nodes of any network node, and determining a ratio between the second number and the third number as a clustering evaluation coefficient corresponding to any network node;
Acquiring the average value between the clustering evaluation coefficients corresponding to each network node in the network graph as a comprehensive clustering evaluation coefficient for any target undetermined object group;
if the first number is greater than or equal to a second number threshold and the comprehensive clustering evaluation coefficient is greater than or equal to a coefficient threshold, determining that state changes triggered by the plurality of first objects on first digital resources pushed by the undetermined objects in any target undetermined object group are abnormal;
and if the first number is smaller than the second number threshold or the comprehensive clustering evaluation coefficient is smaller than the coefficient threshold, determining that the state changes triggered by the first objects on the first digital resources pushed by the undetermined objects in any target undetermined object group are normal.
8. The method of claim 1, wherein the first digital resource is an advertising resource; the method further comprises the steps of:
if the triggering result is used for indicating that the state change triggered by the plurality of first objects on the advertisement resource is abnormal, outputting abnormal triggering prompt information aiming at the advertisement resource;
the abnormal triggering prompt information is used for prompting that the advertisement heat generated after the advertisement resource is triggered to change state is abnormal.
9. A data recognition device, the device comprising:
the data acquisition unit is used for acquiring the triggering time consumed by the plurality of first objects to trigger the first digital resources from the first state and change the first digital resources to the second state respectively;
the data clustering unit is used for clustering the same trigger duration in the trigger durations of the plurality of first objects for the first digital resources to obtain a plurality of trigger duration groups;
the probability acquisition unit is used for generating an evaluation probability distribution based on the number of the trigger time durations contained in each trigger time duration group; the evaluation probability distribution is used for evaluating state changes triggered by the first digital resources by the plurality of first objects;
a data identifying unit, configured to obtain a reference probability distribution, and identify trigger results of the plurality of first objects for the first digital resource based on a difference between the evaluation probability distribution and the reference probability distribution;
wherein the reference probability distribution is a reference probability distribution for evaluating state changes triggered by the plurality of first objects on the first digital resource, and the trigger result is used for indicating whether the state changes triggered by the plurality of first objects on the first digital resource are normal or abnormal.
10. A computer device comprising a processor, a memory, wherein the memory is for storing a computer program, the computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-8.
11. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program adapted to be loaded and executed by a processor to cause a computer device having a processor to perform the method of any of claims 1-8.
12. A computer program product, characterized in that the computer program product comprises computer instructions which, when executed by a processor, implement the method according to any of claims 1-8.
CN202310315191.XA 2023-03-22 2023-03-22 Data identification method, device, equipment, medium and product Pending CN116976975A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310315191.XA CN116976975A (en) 2023-03-22 2023-03-22 Data identification method, device, equipment, medium and product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310315191.XA CN116976975A (en) 2023-03-22 2023-03-22 Data identification method, device, equipment, medium and product

Publications (1)

Publication Number Publication Date
CN116976975A true CN116976975A (en) 2023-10-31

Family

ID=88475528

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310315191.XA Pending CN116976975A (en) 2023-03-22 2023-03-22 Data identification method, device, equipment, medium and product

Country Status (1)

Country Link
CN (1) CN116976975A (en)

Similar Documents

Publication Publication Date Title
Amato et al. Recognizing human behaviours in online social networks
CN109241461B (en) User portrait construction method and device
CN113590497A (en) Business service test method and device, electronic equipment and storage medium
CN111476469B (en) Guest-rubbing method, terminal equipment and storage medium
CN115511501A (en) Data processing method, computer equipment and readable storage medium
CN112231570B (en) Recommendation system support attack detection method, device, equipment and storage medium
CN110457175B (en) Service data processing method and device, electronic equipment and medium
CN112733045B (en) User behavior analysis method and device and electronic equipment
US11928605B2 (en) Techniques for cyber-attack event log fabrication
CN112738040A (en) Network security threat detection method, system and device based on DNS log
CN106202126B (en) A kind of data analysing method and device for logistics monitoring
US20200099713A1 (en) System and method for detecting bots based on iterative clustering and feedback-driven adaptive learning techniques
Khan et al. Collaborative filtering based online recommendation systems: A survey
CN109313541A (en) For showing and the user interface of comparison attacks telemetering resource
CN112508630B (en) Abnormal conversation group detection method and device, computer equipment and storage medium
Liu et al. Request dependency graph: A model for web usage mining in large-scale web of things
CN115994079A (en) Test method, test device, electronic apparatus, storage medium, and program product
CN116976975A (en) Data identification method, device, equipment, medium and product
CN114900356A (en) Malicious user behavior detection method and device and electronic equipment
CN116091133A (en) Target object attribute identification method, device and storage medium
CN109902831B (en) Service decision processing method and device
CN115604000B (en) Override detection method, device, equipment and storage medium
CN114820085B (en) User screening method, related device and storage medium
CN113011887B (en) Data processing method, device, computer equipment and storage medium
US20230169345A1 (en) Multiscale dimensional reduction of data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication