CN117557314A

CN117557314A - List generation method, device, equipment and storage medium

Info

Publication number: CN117557314A
Application number: CN202311658209.2A
Authority: CN
Inventors: 魏亚东; 朱宇戈; 刘博�; 张建荣
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-12-06
Filing date: 2023-12-06
Publication date: 2024-02-13

Abstract

The present disclosure provides a list generation method, apparatus, device, and storage medium, which may be applied to the field of computer technology, artificial intelligence technology, financial technology, or other related fields. The method comprises the following steps: preprocessing the collected operation information of a plurality of objects to obtain index information corresponding to each object; extracting characteristics of index information corresponding to each object to obtain a historical behavior sequence corresponding to each object; extracting interest and evolving the historical behavior sequence corresponding to each object by using a time attenuation mechanism to obtain an interest value corresponding to each object; determining index information of an object with an interest value meeting a preset threshold value condition as first abnormal information with cheating probability; determining second abnormal information with cheating probability from the index information corresponding to each object by performing cluster analysis on the index information corresponding to each object; and obtaining a target list according to the first abnormal information and the second abnormal information.

Description

List generation method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technology, artificial intelligence technology, financial technology, or other related fields, and more particularly, to a list generation method, apparatus, device, medium, and program product.

Background

With the traffic entrance of each traffic platform and the like occupying dominant position, the DSP (Demand Side Platform, advertisement demand side platform) mode opens the connection between the client and the advertiser, and the advertisement purchasing object is changed from purchasing media to purchasing audience, thereby bringing about advertisement delivery. The generation of the maximum advertising revenue has become an important advertising mode at present, but with this, the DSP delivery has a cheating action, and false ineffective delivery exposure can be generated through industrial chain counterfeiting or by cheating means, so that advertisers cause funds loss.

Therefore, how to realize quick and accurate discovery of cheating behavior to improve the accuracy of advertisement delivery is a technical problem to be solved in the related art.

Disclosure of Invention

In view of the foregoing, the present disclosure provides a list generation method, apparatus, device, medium, and program product.

According to a first aspect of the present disclosure, there is provided a list generation method including:

preprocessing the collected operation information of a plurality of objects to obtain index information corresponding to each object;

extracting characteristics of index information corresponding to each object to obtain a historical behavior sequence corresponding to each object;

Extracting interest and evolving interest of a historical behavior sequence corresponding to each object by using a time attenuation mechanism to obtain an interest value corresponding to each object;

determining index information of the object with the interest value meeting a preset threshold value condition as first abnormal information with cheating probability;

determining second anomaly information having a cheating probability from the index information corresponding to each of the objects by performing cluster analysis on the index information corresponding to each of the objects;

and obtaining a target list according to the first abnormal information and the second abnormal information.

According to an embodiment of the present disclosure, preprocessing the collected operation information of a plurality of objects to obtain index information corresponding to each of the objects includes:

collecting operation information of the plurality of objects through the page buried points;

desensitizing the operation information of the plurality of objects to obtain desensitized information;

and carrying out missing value processing on the desensitization information to obtain index information corresponding to each object.

According to an embodiment of the present disclosure, the extracting and evolving an interest of a historical behavior sequence corresponding to each of the objects by using a time decay mechanism to obtain an interest value corresponding to each of the objects includes:

Extracting a plurality of interest vectors corresponding to the object from the historical behavior sequence according to the dependency relationship in the historical behavior sequence;

for each interest vector, adjusting the weight of the interest vector according to the importance of the current behavior of the object to obtain an adjusted interest vector, wherein the current behavior corresponds to the interest vector; and

and activating each adjusted interest vector to obtain an interest value corresponding to the object.

According to an embodiment of the present disclosure, the extracting, from the historical behavior sequence, a plurality of interest vectors corresponding to the object according to the dependency relationship in the historical behavior sequence includes:

determining a time attenuation factor corresponding to the feature vector according to a difference value between a current time and an occurrence time of a historical behavior for each feature vector in the historical behavior sequence, wherein the historical behavior corresponds to the feature vector, and the current time represents the occurrence time of the current behavior;

obtaining a new historical behavior sequence according to the characteristic vector and a time attenuation factor corresponding to the characteristic vector; and

And extracting the interest vectors corresponding to the object from the new historical behavior sequence according to the dependency relationship in the new historical behavior sequence.

According to an embodiment of the present disclosure, for each of the interest vectors, adjusting a weight of the interest vector according to an importance of a current behavior of the object, to obtain an adjusted interest vector, including:

and adjusting the weight of the interest vector according to the importance of the current behavior and the time attenuation factor to obtain the adjusted interest vector.

According to an embodiment of the present disclosure, the determining, by performing cluster analysis on index information corresponding to each of the objects, second abnormality information having a cheating probability from the index information corresponding to each of the objects includes:

determining feature information from index information corresponding to each of the objects;

the characteristic information is standardized, and standardized characteristic information is obtained;

carrying out cluster analysis on the standardized characteristic information by using a cluster algorithm to obtain a cluster result;

determining an aggregation point according to the clustering result; and

and determining index information of the object corresponding to the aggregation point as the second abnormality information.

According to an embodiment of the present disclosure, the obtaining a target list according to the first anomaly information and the second anomaly information includes:

determining repeated objects in the first abnormal information and the second abnormal information;

and obtaining the target list according to the index information corresponding to the repeated object.

According to an embodiment of the present disclosure, the weight of each piece of information in the target list decays with time, and each piece of information characterizes index information of the object.

A second aspect of the present disclosure provides a list generating apparatus, including: the device comprises a first obtaining module, an extracting module, a second obtaining module, a first determining module, a second determining module and a third year obtaining module. The first obtaining module is used for preprocessing the collected operation information of the plurality of objects to obtain index information corresponding to each object. And the extraction module is used for carrying out characteristic extraction on the index information corresponding to each object to obtain a historical behavior sequence corresponding to each object. And the second obtaining module is used for extracting the interests and evolving the interests of the historical behavior sequences corresponding to each object by utilizing a time attenuation mechanism to obtain the interest value corresponding to each object. And the first determining module is used for determining the index information of the object with the interest value meeting the preset threshold value condition as first abnormal information with cheating probability. And a second determining module for determining second anomaly information having a cheating probability from the index information corresponding to each of the objects by performing cluster analysis on the index information corresponding to each of the objects. And a third obtaining module, configured to obtain a target list according to the first anomaly information and the second anomaly information.

A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method described above.

A fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described method.

A fifth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above method.

According to the list generation method, device, medium and program product provided by the disclosure, the operation information of a plurality of objects is preprocessed to obtain index information, and the index information is subjected to feature extraction to obtain a historical behavior sequence corresponding to each object, so that the historical behavior sequence corresponding to each object can be subjected to interest extraction and interest evolution by utilizing a time attenuation mechanism, the evolution of interest can be realized while the possible change of interest with time is considered, the current interest state of the object can be reflected to obtain an interest value corresponding to each object, the interest value of the object is determined by judging whether the interest value corresponding to each object meets a preset threshold condition or not, the abnormal value of the interest evolution of the object is determined to determine the first abnormal information with the cheating probability, and the second abnormal information with the cheating probability is determined by carrying out cluster analysis on the index information corresponding to each object, so that the object with the cheating probability can be more accurately judged according to the first abnormal information and the second abnormal information, the target list is obtained, the anti-cheating behavior can be identified, the anti-click effect is improved, and the cost of the advertisement is reduced.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates an application scenario diagram of a list generation method according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a list generation method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a model structure diagram for deriving interest values according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a system diagram for implementing a list generation method according to an embodiment of the disclosure;

fig. 5 schematically shows a block diagram of a structure of a list generating apparatus according to an embodiment of the present disclosure; and

fig. 6 schematically illustrates a block diagram of an electronic device adapted to implement a list generation method according to an embodiment of the disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

In the technical scheme of the invention, the related user information (including but not limited to user personal information, user image information, user equipment information, such as position information and the like) and data (including but not limited to data for analysis, stored data, displayed data and the like) are information and data authorized by a user or fully authorized by all parties, and the processing of the related data such as collection, storage, use, processing, transmission, provision, disclosure, application and the like are all conducted according to the related laws and regulations and standards of related countries and regions, necessary security measures are adopted, no prejudice to the public welfare is provided, and corresponding operation inlets are provided for the user to select authorization or rejection.

In the process of implementing the present disclosure, it is generally found that the search of the cheating behavior is performed in a rule searching manner, for example, the judgment is performed by access time, abnormal behavior, regional abnormality and IP abnormality, but the misjudgment probability exists in the manner. The DSP can provide a cross-medium, cross-platform and cross-terminal advertisement delivery platform for advertisers, and accurate delivery based on audiences and continuous real-time monitoring are realized through data integration and analysis. Therefore, how to realize quick and accurate discovery of cheating behavior to improve the accuracy of advertisement delivery is a technical problem to be solved in the related art.

The embodiment of the disclosure provides a list generation method, which comprises the following steps: preprocessing the collected operation information of a plurality of objects to obtain index information corresponding to each object; extracting characteristics of index information corresponding to each object to obtain a historical behavior sequence corresponding to each object; extracting interest and evolving the historical behavior sequence corresponding to each object by using a time attenuation mechanism to obtain an interest value corresponding to each object; determining index information of an object with an interest value meeting a preset threshold value condition as first abnormal information with cheating probability; determining second abnormal information with cheating probability from the index information corresponding to each object by performing cluster analysis on the index information corresponding to each object; and obtaining a target list according to the first abnormal information and the second abnormal information.

Fig. 1 schematically illustrates an application scenario diagram of a list generation method according to an embodiment of the present disclosure.

As shown in fig. 1, an application scenario 100 according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is a medium used to provide a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 through the network 104 using at least one of the first terminal device 101, the second terminal device 102, the third terminal device 103, to receive or send messages, etc. Various communication client applications, such as a shopping class application, a web browser application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only) may be installed on the first terminal device 101, the second terminal device 102, and the third terminal device 103.

The first terminal device 101, the second terminal device 102, the third terminal device 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by the user using the first terminal device 101, the second terminal device 102, and the third terminal device 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.

For example, the collected operation information of the plurality of objects may be preprocessed by the server 105 to obtain index information corresponding to each object, and feature extraction is performed on the index information corresponding to each object to obtain a historical behavior sequence corresponding to each object, so that by using a time attenuation mechanism, interest extraction and interest evolution are performed on the historical behavior sequence corresponding to each object to obtain an interest value corresponding to each object, and the index information of the object whose interest value meets a preset threshold condition is determined as first abnormal information with cheating probability; and then, carrying out cluster analysis on the index information corresponding to each object, determining second abnormal information with cheating probability from the index information corresponding to each object, and finally, obtaining a target list according to the first abnormal information and the second abnormal information.

It should be noted that the list generating method provided by the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the list generating apparatus provided by the embodiments of the present disclosure may be generally provided in the server 105. The list generation method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105. Accordingly, the list generating apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

The list generation method of the disclosed embodiment will be described in detail below by fig. 2 to 4 based on the scenario described in fig. 1.

Fig. 2 schematically illustrates a flowchart of a list generation method according to an embodiment of the present disclosure.

As shown in fig. 2, the method 200 includes operations S210 to S260.

In operation S210, the collected operation information of the plurality of objects is preprocessed to obtain index information corresponding to each object.

According to the embodiment of the disclosure, the operation information of a plurality of objects can be acquired by embedding the advertisement page, and it is noted that the user permission is obtained in the mode of acquiring the operation information, which accords with the relevant regulations of the law and regulation. The operation information may include operation habit, device information, media information and context information of the object, and the index information may also include information under different index types, for example, operation habit, device information, media information and context information, i.e., the operation information and the index information include the same information type, except that the index information is obtained by preprocessing the operation information. According to an embodiment of the present disclosure, the contents included in the operation information and the index information may be as shown in table 1 below.

TABLE 1

It should be noted that, the operation habits, the device information, the media information and the context information in the embodiments of the present disclosure are information and data authorized by the user or fully authorized by each party, and the processes of collecting, storing, using, processing, transmitting, providing, disclosing and applying the related data all comply with the relevant laws and regulations and standards of the relevant country and region, and necessary security measures are taken without violating the public order colloquial.

In operation S220, feature extraction is performed on the index information corresponding to each object, and a historical behavior sequence corresponding to each object is obtained.

According to the embodiment of the disclosure, by extracting the characteristics of the index information of each object to obtain the historical behavior sequence corresponding to each object, the characteristic vector of the object behavior can be formed according to the operation habit, the equipment information, the media information, the context information and the like.

Wherein, the historical behavior sequence can comprise characteristic vectors of different behaviors of the object, each historical behavior of the object can be represented by one characteristic vector, and the characteristic vector can contain information related to the historical behavior of the object, such as an ID (identity), a score, a time stamp and the like of the object.

In operation S230, the historical behavior sequence corresponding to each object is subjected to interest extraction and interest evolution by using a time decay mechanism, so as to obtain an interest value corresponding to each object.

According to the embodiment of the disclosure, a DIEN algorithm can be applied to process the historical behavior sequence of each object, the interest evolution process of the object can be effectively captured, and a time attenuation mechanism is introduced, so that the behavior change of the object in different time periods can be considered, and the obtained interest value of the object is more accurate.

In operation S240, index information of an object whose interest value satisfies a preset threshold condition is determined as first abnormality information having a cheating probability.

According to the embodiment of the disclosure, detection of abnormal values of object interest evolution can be achieved by judging whether the interest values meet the preset threshold conditions, so that first abnormal information with cheating probability is determined. The preset threshold condition may represent that the interest value is lower than a set threshold, the first abnormal information may include index information of the object whose interest value is lower than the set threshold, and the cheating probability may represent that the object may have a cheating behavior.

For example, in the case where the preset threshold condition is that the interest value is lower than 0.6, when the interest value of a certain object is lower than 0.6, the object is considered to be abnormal in the evolution of interest, and the object may be considered to have the possibility of click cheating, so that the index information of the object may be identified as the first abnormal information.

In operation S250, second abnormality information having a probability of cheating is determined from the index information corresponding to each object by performing cluster analysis on the index information corresponding to each object.

According to the embodiment of the disclosure, the index information of each object may also be subjected to cluster analysis by a K-means clustering algorithm to determine second anomaly information having a cheating probability from the index information corresponding to each object, that is, to determine the index information of the object having a potential click cheating behavior.

In operation S260, a target list is obtained according to the first abnormality information and the second abnormality information.

According to the embodiment of the disclosure, according to the first abnormal information and the second abnormal information, a target list recording possible cheating behaviors can be obtained.

According to the embodiment of the disclosure, the operation information of the plurality of objects is preprocessed to obtain index information, the index information is subjected to feature extraction to obtain a historical behavior sequence corresponding to each object, so that the historical behavior sequence corresponding to each object can be subjected to interest extraction and interest evolution by utilizing a time attenuation mechanism, the evolution of the interest can be realized while the possible change of the interest with time is considered, the current interest state of the object can be reflected more, the interest value corresponding to each object is obtained, the abnormal value of the interest evolution of the object is determined by judging whether the interest value corresponding to each object meets a preset threshold condition or not, the first abnormal information of the cheating probability is determined, the second abnormal information of the cheating probability is determined by carrying out cluster analysis on the index information corresponding to each object, the object with the cheating behavior can be judged more accurately according to the first abnormal information and the second abnormal information, the target list can be obtained, the anti-cheating behavior can be identified accurately, the advertisement clicking accuracy can be improved, and the advertisement loss of an advertiser can be reduced.

According to an embodiment of the present disclosure, preprocessing operation information of a plurality of collected objects to obtain index information corresponding to each object includes: collecting operation information of a plurality of objects through the page buried points; desensitizing the operation information of a plurality of objects to obtain desensitized information; and carrying out missing value processing on the desensitization information to obtain index information corresponding to each object.

According to the embodiment of the disclosure, the operation information of a plurality of objects can be acquired in a page embedding manner, namely, codes or tools can be embedded in a page or an application program to record the operation information of clicking, browsing, purchasing, registering and other operation behaviors of the objects in the page or the application program.

According to the embodiment of the disclosure, the sensitive information in the operation information of the plurality of objects can be subjected to desensitization treatment, so that the sensitive information is prevented from being directly used in an unreliable environment.

For example, the desensitization treatment may be performed by federal algorithms; for some information with strong recognition, anonymization processing, such as hash processing on the object identifier by using a hash function, so as to protect the object identity, and meanwhile, differential privacy noise can be introduced into the calculation result to prevent the recovery from the updated data to the original data.

According to the embodiment of the disclosure, the occurrence of network faults, physical machine anomalies, temporary system stop and the like can lead to the existence of missing values of the collected operation information. The missing values can cause incomplete data, so that certain influence can be generated on subsequent feature extraction and modeling, and therefore, the missing values need to be filled, and errors are reduced as much as possible.

According to embodiments of the present disclosure, the missing values may be populated with a large number of historical contemporaneous recorded values. Wherein, the majority of the historical contemporaneous record values refer to the most frequent value of the historical record values in the case that a certain missing index has a contemporaneous record in history.

According to the embodiment of the disclosure, the sensitive information and the identity of the object can be protected by performing desensitization processing on the acquired operation information, and errors can be reduced by filling the desensitization information with missing values.

According to an embodiment of the present disclosure, an interest extraction and an interest evolution are performed on a historical behavior sequence corresponding to each object by using a time decay mechanism, to obtain an interest value corresponding to each object, including: extracting a plurality of interest vectors corresponding to the object from the historical behavior sequence according to the dependency relationship in the historical behavior sequence; for each interest vector, adjusting the weight of the interest vector according to the importance of the current behavior of the object to obtain an adjusted interest vector, wherein the current behavior corresponds to the interest vector; and activating each adjusted interest vector to obtain an interest value corresponding to the object.

According to embodiments of the present disclosure, a historic behavior sequence of an object may be encoded using a GRU (Gated Recurrent Unit, gating loop unit) to extract multiple interest vectors of the object. The GRU can capture long-term dependency in the historical behavior sequence, and meanwhile, the problem of gradient disappearance can be avoided.

According to embodiments of the present disclosure, AUGUR (GRU with attention mechanisms) may be used to evolve an object's interest vector. For each interest vector, the AUGRU can dynamically adjust the weight of the historical interest vector according to the importance of the current behavior corresponding to the interest vector, so as to realize the evolution of the interest and reflect the current interest state of the object more accurately. The importance of the current behavior can represent the relative importance of the current behavior of the object, for example, the recent browsing, clicking, purchasing and other behaviors of the object can reflect the current interest of the object more than the previous behaviors; the determination of importance may be based on a variety of factors, such as time stamp, behavior type, frequency of behavior, etc.

Wherein the interest vector can characterize the preference degree of the object for different interests or objects; the weight of the interest vector can represent the contribution degree of different historical interests to the current behavior; the historical interests may characterize the object's interest history over a period of time.

According to an embodiment of the present disclosure, the adjusted interest vector may be subjected to an activation process using a dic (dynamic input control unit) to consider sparsity and nonlinearity of the interest of the object. The DICE can adaptively adjust parameters of the activation function according to the input distribution, and the activation of the interest is realized to obtain an interest value of the object.

According to embodiments of the present disclosure, the distribution of inputs may characterize the distribution of the adjusted interest vectors for each object. Through modeling of the distribution of interest vectors, the DICE can automatically learn and adjust parameters of the activation function so as to better adapt to interest characteristics of different objects.

Wherein the interest value may be used to represent the current interest of the object.

According to the embodiment of the disclosure, the adjusted interest vector is activated, so that dynamic change of the interest of the object can be better captured, and deeper processing is performed in the interest activation stage.

According to the embodiment of the disclosure, a long-term dependency relationship of a historical behavior sequence is captured to extract a plurality of interest vectors corresponding to an object, the weight of the interest vectors is adjusted based on the importance of the current behavior of the object, the evolution of interest is realized to more accurately reflect the current interest state of the object, and then the adjusted interest vectors are activated to obtain an interest value of the object based on the interest characteristics of the captured object.

According to an embodiment of the present disclosure, extracting a plurality of interest vectors corresponding to an object from a historical behavior sequence according to a dependency relationship in the historical behavior sequence includes: for each feature vector in the historical behavior sequence, determining a time attenuation factor corresponding to the feature vector according to the difference between the current time and the occurrence time of the historical behavior; obtaining a new historical behavior sequence according to the feature vector and the time attenuation factor corresponding to the feature vector; and extracting a plurality of interest vectors corresponding to the object from the new historical behavior sequence according to the dependency relationship in the new historical behavior sequence.

Wherein the historical behavior corresponds to the feature vector, and the current time may characterize an occurrence time of the current behavior.

According to embodiments of the present disclosure, since object interests may change over time, not just depending on historical behavior, a time decay mechanism may be introduced. Each feature vector in the historical behavior sequence can be multiplied by a time attenuation factor to obtain a new historical behavior sequence, and a plurality of interest vectors corresponding to the object are extracted from the new historical behavior sequence based on the dependency relationship in the new historical behavior sequence.

According to embodiments of the present disclosure, the time decay factor may be determined by the difference between the current time and the time of occurrence of the historical behavior, wherein the time decay factor may characterize an exponential decay or a linear decay, etc.

According to an embodiment of the present disclosure, applying a time decay factor to feature vectors in a sequence of historical behaviors enables extraction of multiple interest vectors for an object in consideration of the possible time-varying interest of the object in different historical behaviors.

According to an embodiment of the present disclosure, for each interest vector, a weight of the interest vector is adjusted according to an importance of a current behavior of an object, to obtain an adjusted interest vector, including: and adjusting the weight of the interest vector according to the importance of the current behavior and the time attenuation factor to obtain the adjusted interest vector.

According to embodiments of the present disclosure, the weight of the interest vector may be adjusted according to the importance of the current behavior. And multiplying the weight of each interest vector by a time attenuation factor under the condition that the interest of the object possibly changes along with time, namely, adjusting the weight adjusted according to the importance of the current behavior. Wherein the time decay factor is also determined by the difference between the current time and the time of occurrence of the historical behavior.

According to the embodiment of the disclosure, the weight of the interest vector is introduced into the time attenuation factor, so that the change of the interest of the object along with time can be reflected, and the current interest state of the object can be better reflected.

Fig. 3 schematically illustrates a model structure diagram for deriving interest values according to an embodiment of the present disclosure.

As shown in FIG. 3, the module 300 for deriving an interest value may include an interest extraction layer 310, an interest evolution layer 320, and an interest activation layer 330. According to an embodiment of the present disclosure, the above-described operation S230 may be performed by the interest extraction layer 310, the interest evolution layer 320, and the interest activation layer 330.

According to embodiments of the present disclosure, the interest extraction layer (interest extractor layer) 310 may be configured to extract a plurality of interest vectors corresponding to an object from a historical behavior sequence. The interest evolution layer (interest evolving layer) 320 may be used to simulate and capture object interest changes, the main purpose of which is to adjust the weights of the object's interest vectors according to the importance of the object's current behavior, so as to more accurately reflect the object's current interest state, so as to more accurately generate personalized recommendations.

Some model parameters or mechanisms may be involved in the process of adjusting the weights of the interest vectors to more flexibly update the weights of the interest vectors when the object interests change.

According to embodiments of the present disclosure, the output of the interest evolution layer 320 may include the weighted interest vector, which may be passed to the next layer, the interest activation layer (interest activation layer) 330.

According to an embodiment of the present disclosure, the interest activation layer 330 activates an interest vector of an object using a dynamic input control unit (dic), which mainly focuses on sparsity and nonlinearity of the object interest, and adjusts parameters of an activation function so that a model better adapts to the distribution characteristics of the object interest to improve the expressive power of the model.

The DICE is a mechanism for dynamically adjusting parameters of the activation function, and the DICE can adaptively adjust the parameters of the activation function by modeling the distribution of input data, so that the model is more flexible due to the self-adaption, and the model can be better adapted to the distribution conditions of different interests.

According to embodiments of the present disclosure, the output of interest activation layer 330 may include the value of the object after the interest vector activation, i.e., the interest value of the object. Activating the interest vector of the object generally refers to a mechanism of adjusting parameters through some activating functions and dynamically, which may involve adjusting some activating functions and parameters, so that the model can better adapt to nonlinearity and sparsity of the object interest, and the model expression capability is improved, so as to better capture complex features of the object interest.

According to embodiments of the present disclosure, the interest value ultimately output by the interest activation layer 330 may be used to perform recommendations or other tasks that aim to capture dynamic changes and evolution of the object interest. A preset threshold condition may also be set in the output of interest activation layer 330 for finding abnormal situations or cheating actions.

According to embodiments of the present disclosure, an auxiliary loss function may be added to the output of the interest evolution layer 320, which may consist of the object's interest value and the object's click behavior. The interest value of the object may be obtained from the output of the interest activation layer 330, or may be obtained from the output of the interest evolution layer 320 through a fully-connected layer and a sigmoid activation function; the clicking behavior of the object can be obtained from the historical clicking record of the object; the auxiliary loss function may characterize the form of cross entropy loss or square difference loss, etc.

According to embodiments of the present disclosure, the role of the auxiliary loss function may include providing additional supervisory signals during the training process, helping the model to learn better the interest representation of the object, by using the interest value of the object as the supervisory signal, the model may be forced to better fit the actual interest of the object, improving the training effect of the model.

According to an embodiment of the present disclosure, determining second abnormality information having a cheating probability from index information corresponding to each object by performing cluster analysis on the index information corresponding to each object, includes: determining feature information from the index information corresponding to each object; the characteristic information is standardized, and standardized characteristic information is obtained; carrying out clustering analysis on the standardized characteristic information by using a clustering algorithm to obtain a clustering result; determining an aggregation point according to the clustering result; index information of an object corresponding to the aggregation point is determined as second abnormality information.

According to the embodiment of the disclosure, statistics can be performed by a K-means algorithm based on IP and interest dimensions, and in the case that an aggregation point occurs in an IP segment, the existence of cheating through an IP agent can be explained.

According to an embodiment of the present disclosure, feature information for participating in the K-means algorithm may be determined from index information of the object, wherein the feature information may include IP, access period, object interest, and the like. And normalizing the characteristic information to ensure that the characteristic information is normalized on the same scale.

According to the embodiment of the disclosure, the standardized characteristic information can be input into a K-means algorithm, and clustering analysis is performed on the standardized characteristic information to obtain a clustering result. In the process of cluster analysis, a proper K value, namely the number of clusters, can be selected according to service requirements.

According to the embodiment of the present disclosure, the aggregation point is determined by analyzing the clustering result, so that the index information of the object corresponding to the aggregation point can be determined as the second abnormality information.

For example, for a city interior, where there are aggregation points of unified behavior in the clustering result, it may be implied that there is a behavior of cheating through the pool of IP agents.

In accordance with embodiments of the present disclosure, in the analysis of the clustering results, a clustered point may refer to a set of samples in a cluster generated by a clustering algorithm that are close to each other in a certain feature space. For example, an aggregation point of IP and interest may mean that there is a set of IP addresses and corresponding interest features that form a tight cluster in a certain space.

According to embodiments of the present disclosure, the primary purpose is for analysis of aggregation points to understand whether these have business significance. For example, focusing on clustered points within a city where there is uniform behavior may mean that a group of objects within the city have similar interests and behavior. If this aggregation point shows unusual or unusual behavior, some problems may be implied, such as the possibility of click cheating through an IP proxy.

According to the embodiment of the disclosure, the feature information may be subjected to cluster analysis by a cluster algorithm to determine the aggregation point, so that the second abnormal information may be determined according to the index information of the object corresponding to the aggregation point to accurately identify the cheating behavior.

According to an embodiment of the present disclosure, obtaining a target list according to first anomaly information and second anomaly information includes: determining repeated objects in the first abnormal information and the second abnormal information; and obtaining a target list according to the index information corresponding to the repeated object.

According to the embodiment of the disclosure, the index information of the repeated object in the first abnormal information and the second abnormal information is added to the target list, namely, the information in the target list mainly comes from the intersection of outputs of the DIEN and the K-means, wherein the target list can comprise the index information of the identified object for implementing the cheating behavior, and the target list can characterize the list for maintaining and recording the cheating behavior.

The repeated object can characterize the discovery of an IP segment and an interest aggregation point in the cluster analysis of K-means, and meanwhile, the object with the interest evolution abnormality is detected through a DIEN algorithm.

According to the embodiment of the disclosure, the target list is obtained based on the first abnormal information obtained through the DIEN algorithm and the second abnormal information obtained through the K-means algorithm, so that the object with the cheating behavior can be judged more certainly, and the accuracy and the robustness of the cheating detection can be improved.

According to an embodiment of the present disclosure, the weight of each piece of information in the target list decays over time, each piece of information characterizing the index information of the object.

According to the embodiment of the disclosure, the time attenuation mechanism can be combined, so that the information in the target list is gradually attenuated along with time to maintain the accuracy of the target list, namely the target list gradually reduces the weight of each piece of information along with time, so that the information in the target list always reflects the current cheating condition. In general, it is the weights of the information in the target list that decay,

the concept of decay is generally referred to herein as the process of diminishing the identified cheating information over a period of time. In particular, a time decay mechanism may be used such that the impact of cheating information in the target list is gradually reduced.

According to the embodiment of the disclosure, a time attenuation mechanism is used for information in the target list so as to maintain the accuracy of the target list and prevent outdated cheating information from affecting decision making.

Fig. 4 schematically illustrates a system diagram for implementing a list generation method according to an embodiment of the present disclosure.

As shown in fig. 4, the system 400 for implementing a list generation method implements a list generation method by means of a big data platform 420.

According to embodiments of the present disclosure, the big data platform 420 may collect operational information 410 for a plurality of objects. The data preprocessing 430 may be performed by the big data platform 420, and the cheating detection 440 may be performed on the index information after the data preprocessing to obtain the target list 450, where the data preprocessing 430 may include a desensitization processing 431 and a missing value processing 432; the cheat detection 440 may include a DIEN algorithm 441 and a K-means algorithm 442.

Based on the list generation method, the disclosure also provides a list generation device. The device will be described in detail below in connection with fig. 5.

Fig. 5 schematically shows a block diagram of a configuration of a list generating apparatus according to an embodiment of the present disclosure.

As shown in fig. 5, the list generating apparatus 500 of this embodiment includes a first obtaining module 510, an extracting module 520, a second obtaining module 530, a first determining module 540, a second determining module 550, and a third obtaining module 560.

The first obtaining module 510 is configured to pre-process the collected operation information of the plurality of objects to obtain index information corresponding to each object. In an embodiment, the first obtaining module 510 may be configured to perform the operation S210 described above, which is not described herein.

The extraction module 520 is configured to perform feature extraction on the index information corresponding to each object, so as to obtain a historical behavior sequence corresponding to each object. In an embodiment, the extracting module 520 may be configured to perform the operation S220 described above, which is not described herein.

The second obtaining module 530 is configured to extract and evolve interests from the historical behavior sequence corresponding to each object by using a time decay mechanism, so as to obtain an interest value corresponding to each object. In an embodiment, the second obtaining module 530 may be used to perform the operation S230 described above, which is not described herein.

The first determining module 540 is configured to determine, as first anomaly information of the cheating probability, index information of an object whose interest value meets a preset threshold condition. In an embodiment, the first determining module 540 may be used to perform the operation S240 described above, which is not described herein.

The second determining module 550 is configured to determine second anomaly information having a cheating probability from the index information corresponding to each object by performing cluster analysis on the index information corresponding to each object. In an embodiment, the second determining module 550 may be configured to perform the operation S250 described above, which is not described herein.

The third obtaining module 560 is configured to obtain a target list according to the first anomaly information and the second anomaly information. In an embodiment, the third obtaining module 560 may be configured to perform the operation S260 described above, which is not described herein.

According to an embodiment of the present disclosure, the first obtaining module 510 includes an acquisition unit, a first obtaining unit, and a second obtaining unit.

The acquisition unit is used for acquiring the operation information of a plurality of objects through the page buried points.

The first obtaining unit is used for performing desensitization processing on the operation information of the plurality of objects to obtain desensitization information.

And the second obtaining unit is used for carrying out missing value processing on the desensitization information to obtain index information corresponding to each object.

According to an embodiment of the present disclosure, the second obtaining module 530 includes an extracting unit, a third obtaining unit, and a fourth obtaining unit.

And the extraction unit is used for extracting a plurality of interest vectors corresponding to the object from the historical behavior sequence according to the dependency relationship in the historical behavior sequence.

The third obtaining unit is configured to adjust, for each interest vector, a weight of the interest vector according to an importance degree of a current behavior of the object, and obtain an adjusted interest vector, where the current behavior corresponds to the interest vector.

And the fourth obtaining unit is used for carrying out activation processing on each adjusted interest vector to obtain an interest value corresponding to the object.

According to an embodiment of the present disclosure, the extraction unit comprises a determination subunit, a first obtaining subunit, and an extraction subunit.

And the determining subunit is used for determining a time attenuation factor corresponding to the feature vector according to the difference value between the current time and the occurrence time of the historical behavior aiming at each feature vector in the historical behavior sequence, wherein the historical behavior corresponds to the feature vector, and the current time represents the occurrence time of the current behavior.

The first obtaining subunit is configured to obtain a new historical behavior sequence according to the feature vector and a time attenuation factor corresponding to the feature vector.

And the extraction subunit is used for extracting a plurality of interest vectors corresponding to the object from the new historical behavior sequence according to the dependency relationship in the new historical behavior sequence.

According to an embodiment of the present disclosure, the third obtaining unit comprises a second obtaining subunit.

And the second obtaining subunit is used for adjusting the weight of the interest vector according to the importance degree and the time attenuation factor of the current behavior to obtain the adjusted interest vector.

According to an embodiment of the present disclosure, the second determining module 550 includes a first determining unit, a fifth obtaining unit, a sixth obtaining unit, a second determining unit, and a third determining unit.

And a first determining unit configured to determine feature information from the index information corresponding to each object.

And a fifth obtaining unit, configured to normalize the feature information to obtain normalized feature information.

And a sixth obtaining unit, configured to perform cluster analysis on the normalized feature information by using a clustering algorithm, to obtain a clustering result.

And the second determining unit is used for determining the aggregation point according to the clustering result.

And a third determination unit configured to determine index information of the object corresponding to the aggregation point as second abnormality information.

According to an embodiment of the present disclosure, the third obtaining module 560 includes a fourth determining unit and a seventh obtaining unit.

And a fourth determination unit configured to determine the repetitive object in the first abnormality information and the second abnormality information.

And a seventh obtaining unit, configured to obtain a target list according to the index information corresponding to the repeated object.

According to an embodiment of the present disclosure, any of the first obtaining module 510, the extracting module 520, the second obtaining module 530, the first determining module 540, the second determining module 550, and the third obtaining module 560 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. According to embodiments of the present disclosure, at least one of the first obtaining module 510, the extracting module 520, the second obtaining module 530, the first determining module 540, the second determining module 550, and the third obtaining module 560 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-substrate, a system-on-package, an Application Specific Integrated Circuit (ASIC), or in hardware or firmware in any other reasonable manner of integrating or packaging the circuits, or in any one of or a suitable combination of three of software, hardware, and firmware. Alternatively, at least one of the first obtaining module 510, the extracting module 520, the second obtaining module 530, the first determining module 540, the second determining module 550, and the third obtaining module 560 may be at least partially implemented as a computer program module, which may perform the corresponding functions when being executed.

As shown in fig. 6, an electronic device 600 according to an embodiment of the present disclosure includes a processor 601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. The processor 601 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. Processor 601 may also include on-board memory for caching purposes. The processor 601 may comprise a single processing unit or a plurality of processing units for performing different actions of the method flows according to embodiments of the disclosure.

In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are stored. The processor 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. The processor 601 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 602 and/or the RAM 603. Note that the program may be stored in one or more memories other than the ROM 602 and the RAM 603. The processor 601 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in the one or more memories.

According to an embodiment of the present disclosure, the electronic device 600 may also include an input/output (I/O) interface 605, the input/output (I/O) interface 605 also being connected to the bus 604. The electronic device 600 may also include one or more of the following components connected to an input/output (I/O) interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to an input/output (I/O) interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.

The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 602 and/or RAM 603 and/or one or more memories other than ROM 602 and RAM 603 described above.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code, when executed in a computer system, causes the computer system to perform the methods provided by embodiments of the present disclosure.

The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 601. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of signals over a network medium, and downloaded and installed via the communication section 609, and/or installed from the removable medium 611. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 601. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.

The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims

1. A list generation method, comprising:

extracting interest and evolving the interest of the historical behavior sequence corresponding to each object by using a time attenuation mechanism to obtain an interest value corresponding to each object;

determining second abnormal information with cheating probability from the index information corresponding to each object by performing cluster analysis on the index information corresponding to each object;

2. The method of claim 1, wherein preprocessing the collected operation information of the plurality of objects to obtain index information corresponding to each object, comprises:

3. The method of claim 1, wherein the extracting interest and evolving interest for the historical behavior sequence corresponding to each object using a time decay mechanism to obtain the interest value corresponding to each object comprises:

4. The method of claim 3, wherein the extracting a plurality of interest vectors corresponding to the object from the historical behavior sequence according to the dependency relationship in the historical behavior sequence comprises:

for each feature vector in the historical behavior sequence, determining a time attenuation factor corresponding to the feature vector according to a difference value between a current time and an occurrence time of a historical behavior, wherein the historical behavior corresponds to the feature vector, and the current time represents the occurrence time of the current behavior;

Obtaining a new historical behavior sequence according to the feature vector and a time attenuation factor corresponding to the feature vector; and

5. The method of claim 4, wherein the adjusting the weight of the interest vector for each of the interest vectors according to the importance of the current behavior of the object, comprises:

6. The method of claim 1, wherein the determining the second abnormality information having the cheating probability from the index information corresponding to each of the objects by performing a cluster analysis on the index information corresponding to each of the objects comprises:

normalizing the characteristic information to obtain normalized characteristic information;

performing cluster analysis on the standardized characteristic information by using a clustering algorithm to obtain a clustering result;

Determining an aggregation point according to the clustering result; and

and determining index information of the object corresponding to the aggregation point as the second abnormal information.

7. The method of claim 1, wherein the obtaining a target list according to the first anomaly information and the second anomaly information comprises:

8. The method of claim 1, wherein the weight of each piece of information in the target list decays over time, the each piece of information characterizing index information of the object.

9. A list generating apparatus comprising:

the first acquisition module is used for preprocessing the acquired operation information of the plurality of objects to obtain index information corresponding to each object;

the extraction module is used for carrying out feature extraction on the index information corresponding to each object to obtain a historical behavior sequence corresponding to each object;

the second obtaining module is used for extracting interests and evolving interests of the historical behavior sequences corresponding to each object by utilizing a time attenuation mechanism to obtain interest values corresponding to each object;

The first determining module is used for determining index information of the object, the interest value of which meets the preset threshold condition, as first abnormal information with cheating probability;

a second determining module, configured to determine second anomaly information having a cheating probability from the index information corresponding to each object by performing cluster analysis on the index information corresponding to each object;

and the third obtaining module is used for obtaining a target list according to the first abnormal information and the second abnormal information.

10. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-8.

11. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1-8.

12. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 8.