CN115700787A

CN115700787A - Abnormal object identification method and device, electronic equipment and storage medium

Info

Publication number: CN115700787A
Application number: CN202110795543.7A
Authority: CN
Inventors: 孔令凯; 李晟; 李关乐; 高艳铭; 白义; 冯烨
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Chengdu ICT Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Chengdu ICT Co Ltd
Priority date: 2021-07-14
Filing date: 2021-07-14
Publication date: 2023-02-07

Abstract

The present application discloses a method, device, electronic equipment, and storage medium for identifying an abnormal object. The method includes: obtaining first information and second information of at least one target object; the first information is operation behavior information, and the The second information is activity participation information; obtain sample data based on the first information and second information of the at least one target object; use the sample data to train the recognition model to be trained to obtain the recognition model; based on the at least one The first information of the target object, the second information and the identification model determine an abnormal object in the at least one target object. This application can use user behavior characteristic information to fully mine user behavior characteristics, identify abnormal users among website users based on user operation behavior information, activity participation information and other data in the website, and solve various business scenarios of various websites Issues such as cash-out evasion, CP self-consumption, and marketing activity audits.

Description

A method, device, electronic device, and storage medium for identifying abnormal objects

技术领域technical field

本申请实施例涉及计算机技术领域，尤其涉及一种异常对象的识别方法、装置、电子设备及存储介质。The embodiments of the present application relate to the field of computer technology, and in particular, to a method, device, electronic device, and storage medium for identifying abnormal objects.

背景技术Background technique

随着个人计算机(PC，Personal Computer)端、手机端业务的不断发展，用户规模的不断扩大，从合作伙伴推广监控到活动稽核再到运营分析，对网站异常数据进行分拣、清洗、过滤的需求日趋急迫，在一些情况下，网站中可能会存在大量的渠道套利用户、消费类产品(CP，Consumer Products)自消费用户、参与营销用户等异常用户，这类用户会对网站业务的数据统计、业务收入等产生负面的影响，因此，需要提供一种有效的方法对上述异常用户进行识别。With the continuous development of personal computer (PC, Personal Computer) and mobile terminal services, and the continuous expansion of the user scale, from partner promotion monitoring to activity auditing to operation analysis, it is necessary to sort, clean, and filter abnormal data on the website. The demand is becoming more and more urgent. In some cases, there may be a large number of abnormal users such as channel arbitrage users, consumer product (CP, Consumer Products) self-consumption users, and marketing users on the website. , business income, etc. have a negative impact, therefore, it is necessary to provide an effective method to identify the above-mentioned abnormal users.

发明内容Contents of the invention

为解决上述技术问题，本申请实施例提供了一种异常对象的识别方法、装置、电子设备及存储介质。In order to solve the above technical problems, embodiments of the present application provide a method, device, electronic device, and storage medium for identifying abnormal objects.

本申请实施例提供了一种异常对象的识别方法，所述方法包括：An embodiment of the present application provides a method for identifying an abnormal object, the method including:

获得至少一个目标对象的第一信息以及第二信息；所述第一信息为操作行为信息，所述第二信息为活动参与信息；Obtain first information and second information of at least one target object; the first information is operational behavior information, and the second information is activity participation information;

基于所述至少一个目标对象的第一信息以及第二信息获得样本数据；obtaining sample data based on the first information and second information of the at least one target object;

利用所述样本数据对待训练的识别模型进行训练，获得识别模型；Using the sample data to train the recognition model to be trained to obtain the recognition model;

基于所述至少一个目标对象的第一信息、第二信息以及所述识别模型确定所述至少一个目标对象中的异常对象。An abnormal object in the at least one target object is determined based on the first information, the second information of the at least one target object, and the recognition model.

本申请一可选实施方式中，所述样本数据包括正样本数据和负样本数据；In an optional implementation manner of the present application, the sample data includes positive sample data and negative sample data;

所述基于所述至少一个目标对象的第一信息以及第二信息获得样本数据，包括：The obtaining sample data based on the first information and second information of the at least one target object includes:

基于所述至少一个目标对象的第一信息以及第二信息构建输入信息表；constructing an input information table based on the first information and the second information of the at least one target object;

基于所述输入信息表匹配多个异常特征信息作为所述样本数据中的正样本数据；matching a plurality of abnormal feature information based on the input information table as positive sample data in the sample data;

基于所述输入信息表匹配多个正常特征信息作为所述样本数据中的负样本数据。Matching a plurality of normal feature information as negative sample data in the sample data based on the input information table.

本申请一可选实施方式中，所述基于所述至少一个目标对象的第一信息、第二信息以及所述识别模型确定所述至少一个目标对象中的异常对象，包括：In an optional implementation manner of the present application, the determining the abnormal object in the at least one target object based on the first information, the second information and the identification model of the at least one target object includes:

将所述输入信息表输入至所述识别模型，利用所述识别模型确定所述至少一个目标对象中的异常对象。The input information table is input into the recognition model, and the abnormal objects in the at least one target object are determined by using the recognition model.

本申请一可选实施方式中，所述样本数据中包括多个特征指标，所述利用所述样本数据对待训练的识别模型进行训练，包括：In an optional implementation manner of the present application, the sample data includes a plurality of feature indicators, and using the sample data to train the recognition model to be trained includes:

确定所述多个特征指标中各特征指标间的相关性，基于所述相关性确定所述多个特征指标中的至少一个重要指标；Determining the correlation between each of the feature indexes in the plurality of feature indexes, and determining at least one important index in the plurality of feature indexes based on the correlation;

利用所述样本数据中与所述至少一个重要指标对应的样本数据对待训练的识别模型进行训练。The identification model to be trained is trained by using the sample data corresponding to the at least one important indicator in the sample data.

本申请一可选实施方式中，所述待训练的识别模型包括梯度提升决策树GBDT模型，所述利用所述样本数据对待训练的识别模型进行训练，获得识别模型，包括：In an optional implementation manner of the present application, the recognition model to be trained includes a gradient boosting decision tree GBDT model, and the recognition model to be trained is trained using the sample data to obtain the recognition model, including:

将所述样本数据按照操作行为特征和活动参与特征进行划分，得到操作行为特征集和活动参与特征集；Dividing the sample data according to operational behavior characteristics and activity participation characteristics to obtain an operational behavior characteristic set and an activity participation characteristic set;

分别建立与所述操作行为特征对应的第一GBDT模型以及与所述活动参与特征对应的第二GBDT模型；Respectively establishing a first GBDT model corresponding to the operating behavior characteristics and a second GBDT model corresponding to the activity participation characteristics;

利用所述操作行为特征集遍历所述第一GBDT模型，得到所述第一GBDT模型的叶子节点输出的第一特征；Traverse the first GBDT model by using the operation behavior feature set to obtain the first feature output by the leaf node of the first GBDT model;

利用所述活动参与特征集遍历所述第二GBDT模型，得到所述第二GBDT模型的叶子节点输出的第二特征。The second GBDT model is traversed by using the activity participation feature set to obtain second features output by the leaf nodes of the second GBDT model.

本申请一可选实施方式中，所述待训练的识别模型还包括逻辑回归LR模型，所述利用所述样本数据对待训练的识别模型进行训练，获得识别模型，还包括：In an optional implementation manner of the present application, the recognition model to be trained also includes a logistic regression LR model, and the training of the recognition model to be trained by using the sample data to obtain the recognition model further includes:

利用所述第一特征和所述第二特征对所述LR模型进行训练，得到训练后的LR模型；所述LR模型的输出包括所述样本数据中包括的训练对象的标识信息以及所述训练对象的类型标识。The LR model is trained by using the first feature and the second feature to obtain a trained LR model; the output of the LR model includes the identification information of the training object included in the sample data and the training The type ID of the object.

本申请一可选实施方式中，所述利用所述样本数据对待训练的识别模型进行训练，获得识别模型，包括：In an optional implementation manner of the present application, using the sample data to train the recognition model to be trained to obtain the recognition model includes:

基于所述样本数据，利用K折交叉法对待训练的识别模型进行训练，得到K个具有不同参数的目标识别模型；Based on the sample data, the recognition model to be trained is trained using the K-fold crossover method to obtain K target recognition models with different parameters;

选取所述K个具有不同参数的目标识别模型中调和平均值最大的目标识别模型作为识别模型。The target recognition model with the largest harmonic mean value among the K target recognition models with different parameters is selected as the recognition model.

本申请一可选实施方式中，所述利用所述样本数据对待训练的识别模型进行训练，获得识别模型之后，所述方法还包括：In an optional implementation manner of the present application, using the sample data to train the recognition model to be trained, after obtaining the recognition model, the method further includes:

利用测试样本数据对所述识别模型进行测试，确定所述识别模型的指标是否满足预设条件；和/或，基于所述至少一个目标对象中各对象的第三信息确定所述识别模型的识别结果是否正确；Using test sample data to test the recognition model, determine whether the indicators of the recognition model meet preset conditions; and/or, determine the recognition of the recognition model based on the third information of each object in the at least one target object whether the result is correct;

若所述识别模型的指标不满足预设条件，和/或，所述识别模型的识别结果不正确，则继续对所述识别模型进行优化，得到优化的识别模型；If the index of the recognition model does not meet the preset condition, and/or, the recognition result of the recognition model is incorrect, continue to optimize the recognition model to obtain an optimized recognition model;

所述基于所述至少一个目标对象的第一信息、第二信息以及所述识别模型确定所述至少一个目标对象中的异常对象，包括：The determining the abnormal object in the at least one target object based on the first information, the second information and the identification model of the at least one target object includes:

基于所述至少一个目标对象的第一信息、第二信息以及所述优化的识别模型确定所述至少一个目标对象中的异常对象。Abnormal objects in the at least one target object are determined based on the first information, the second information of the at least one target object, and the optimized identification model.

本申请实施例还提供了一种异常对象的识别装置，所述装置包括：The embodiment of the present application also provides an abnormal object identification device, the device includes:

第一获得单元，用于获得至少一个目标对象的第一信息以及第二信息；所述第一信息为操作行为信息，所述第二信息为活动参与信息；A first obtaining unit, configured to obtain first information and second information of at least one target object; the first information is operation behavior information, and the second information is activity participation information;

第二获得单元，用于基于所述至少一个目标对象的第一信息以及第二信息获得样本数据；a second obtaining unit, configured to obtain sample data based on the first information and second information of the at least one target object;

训练单元，用于利用所述样本数据对待训练的识别模型进行训练，获得识别模型；A training unit, configured to use the sample data to train a recognition model to be trained to obtain a recognition model;

确定单元，用于基于所述至少一个目标对象的第一信息、第二信息以及所述识别模型确定所述至少一个目标对象中的异常对象。A determining unit, configured to determine an abnormal object in the at least one target object based on the first information, the second information, and the recognition model of the at least one target object.

本申请一可选实施方式中，所述样本数据包括正样本数据和负样本数据；所述第二获得单元，具体用于：基于所述至少一个目标对象的第一信息以及第二信息构建输入信息表；基于所述输入信息表匹配多个异常特征信息作为所述样本数据中的正样本数据；基于所述输入信息表匹配多个正常特征信息作为所述样本数据中的负样本数据。In an optional implementation manner of the present application, the sample data includes positive sample data and negative sample data; the second obtaining unit is specifically configured to: construct an input based on the first information and second information of the at least one target object An information table; matching a plurality of abnormal feature information as positive sample data in the sample data based on the input information table; matching a plurality of normal feature information as negative sample data in the sample data based on the input information table.

本申请一可选实施方式中，所述确定单元，具体用于：将所述输入信息表输入至所述识别模型，利用所述识别模型确定所述至少一个目标对象中的异常对象。In an optional implementation manner of the present application, the determining unit is specifically configured to: input the input information table into the recognition model, and use the recognition model to determine abnormal objects in the at least one target object.

本申请一可选实施方式中，所述样本数据中包括多个特征指标，所述训练单元，具体用于：确定所述多个特征指标中各特征指标间的相关性，基于所述相关性确定所述多个特征指标中的至少一个重要指标；利用所述样本数据中与所述至少一个重要指标对应的样本数据对待训练的识别模型进行训练。In an optional implementation manner of the present application, the sample data includes a plurality of characteristic indexes, and the training unit is specifically configured to: determine the correlation among the characteristic indexes in the plurality of characteristic indexes, and based on the correlation Determining at least one important indicator among the plurality of characteristic indicators; using the sample data corresponding to the at least one important indicator in the sample data to train the recognition model to be trained.

本申请一可选实施方式中，所述待训练的识别模型包括梯度提升决策树GBDT模型，所述训练单元，具体用于：将所述样本数据按照操作行为特征和活动参与特征进行划分，得到操作行为特征集和活动参与特征集；分别建立与所述操作行为特征对应的第一GBDT模型以及与所述活动参与特征对应的第二GBDT模型；利用所述操作行为特征集遍历所述第一GBDT模型，得到所述第一GBDT模型的叶子节点输出的第一特征；利用所述活动参与特征集遍历所述第二GBDT模型，得到所述第二GBDT模型的叶子节点输出的第二特征。In an optional implementation manner of the present application, the recognition model to be trained includes a gradient boosting decision tree GBDT model, and the training unit is specifically used to: divide the sample data according to the characteristics of operation behavior and activity participation, and obtain An operation behavior feature set and an activity participation feature set; respectively establish a first GBDT model corresponding to the operation behavior feature and a second GBDT model corresponding to the activity participation feature; use the operation behavior feature set to traverse the first A GBDT model, obtaining a first feature output by a leaf node of the first GBDT model; traversing the second GBDT model by using the activity participation feature set to obtain a second feature output by a leaf node of the second GBDT model.

本申请一可选实施方式中，所述待训练的识别模型还包括逻辑回归LR模型，所述训练单元，还具体用于：利用所述第一特征和所述第二特征对所述LR模型进行训练，得到训练后的LR模型；所述LR模型的输出包括所述样本数据中包括的训练对象的标识信息以及所述训练对象的类型标识。In an optional implementation manner of the present application, the recognition model to be trained further includes a logistic regression LR model, and the training unit is further specifically configured to: use the first feature and the second feature to train the LR model Performing training to obtain a trained LR model; the output of the LR model includes the identification information of the training object included in the sample data and the type identification of the training object.

本申请一可选实施方式中，所述训练单元，具体用于：基于所述样本数据，利用K折交叉法对待训练的识别模型进行训练，得到K个具有不同参数的目标识别模型；选取所述K个具有不同参数的目标识别模型中调和平均值最大的目标识别模型作为识别模型。In an optional implementation manner of the present application, the training unit is specifically configured to: based on the sample data, use the K-fold crossover method to train the recognition model to be trained to obtain K target recognition models with different parameters; select the Among the above K target recognition models with different parameters, the target recognition model with the largest harmonic mean is used as the recognition model.

本申请一可选实施方式中，所述训练单元利用所述样本数据对待训练的识别模型进行训练，获得识别模型之后，所述装置还包括：In an optional implementation manner of the present application, the training unit uses the sample data to train the recognition model to be trained, and after obtaining the recognition model, the device further includes:

本申请实施例还提供了一种电子设备，所述电子设备包括：存储器和处理器，所述存储器上存储有计算机可执行指令，所述处理器运行所述存储器上的计算机可执行指令时可实现上述实施例所述的异常对象的识别方法。The embodiment of the present application also provides an electronic device. The electronic device includes: a memory and a processor, where computer-executable instructions are stored in the memory, and when the processor runs the computer-executable instructions in the memory, it can The method for identifying abnormal objects described in the above-mentioned embodiments is realized.

本申请实施例还提供了一种计算机存储介质，所述存储介质上存储有可执行指令，该可执行指令被处理器执行时实现上述实施例所述的异常对象的识别方法。The embodiment of the present application also provides a computer storage medium, where executable instructions are stored on the storage medium, and when the executable instructions are executed by a processor, the methods for identifying abnormal objects described in the foregoing embodiments are implemented.

本申请实施例的技术方案，通过获得至少一个目标对象的第一信息以及第二信息；所述第一信息为操作行为信息，所述第二信息为活动参与信息；基于所述至少一个目标对象的第一信息以及第二信息获得样本数据；利用所述样本数据对待训练的识别模型进行训练，获得识别模型；基于所述至少一个目标对象的第一信息、第二信息以及所述识别模型确定所述至少一个目标对象中的异常对象。本申请实施例的技术方案，能够利用用户行为特征信息，充分挖掘用户的行为特征，能够基于网站中的用户的操作行为信息、活动参与信息等数据识别出网站用户中的异常用户，解决各类网站在多种业务场景中存在的套现规避、CP自消费、营销活动稽核等问题。The technical solution of the embodiment of the present application obtains first information and second information of at least one target object; the first information is operation behavior information, and the second information is activity participation information; based on the at least one target object Obtain sample data from the first information and second information of the target object; use the sample data to train the recognition model to be trained to obtain the recognition model; determine based on the first information, second information and the recognition model of the at least one target object An exception object in the at least one target object. The technical solution of the embodiment of the present application can utilize the user behavior feature information to fully mine the user's behavior features, and can identify abnormal users among the website users based on the user's operation behavior information, activity participation information and other data in the website, and solve various problems. Problems such as cash-out evasion, CP self-consumption, and marketing activity audits exist in various business scenarios on the website.

附图说明Description of drawings

图1为本申请实施例提供的异常对象的识别方法的流程示意图；FIG. 1 is a schematic flowchart of a method for identifying an abnormal object provided in an embodiment of the present application;

图2为本申请实施例提供的两种GBDT树示意图；Figure 2 is a schematic diagram of two GBDT trees provided by the embodiment of the present application;

图3为本申请实施例提供的异常对象的识别过程示意图；FIG. 3 is a schematic diagram of the identification process of an abnormal object provided by the embodiment of the present application;

图4为本申请实施例提供的异常对象的识别装置的结构组成示意图；FIG. 4 is a schematic diagram of the structural composition of the abnormal object identification device provided by the embodiment of the present application;

图5为本申请实施例提供的电子设备的结构组成示意图。FIG. 5 is a schematic diagram of the structure and composition of an electronic device provided by an embodiment of the present application.

具体实施方式Detailed ways

为了能够更加详尽地了解本申请实施例的特点与技术内容，下面结合附图对本申请实施例的实现进行详细阐述，所附附图仅供参考说明之用，并非用来限定本申请实施例。In order to understand the characteristics and technical contents of the embodiments of the present application in more detail, the implementation of the embodiments of the present application will be described in detail below in conjunction with the accompanying drawings. The attached drawings are only for reference and description, and are not intended to limit the embodiments of the present application.

一般来说，异常用户主要分为三大类：Generally speaking, abnormal users are mainly divided into three categories:

渠道套利用户，在渠道合作推广中，一些渠道可能为了达成结算目的，发展大量僵尸用户、无效用户，增加门户、APP访问负载的同时，耗费大量结算支出，需要有效的方法支持渠道套利用户识别。Channel arbitrage users. In channel cooperation promotion, some channels may develop a large number of zombie users and invalid users in order to achieve settlement purposes. While increasing the access load of portals and APPs, they will consume a lot of settlement expenses. Effective methods are needed to support the identification of channel arbitrage users.

CP自消费用户，在内容推广过程中，可能存在一些CP为了提高分成收入或达成结算考核目标，针对自有内容批量自消费的情况，借助异常模型甄别CP自消费，可以确保内容良性推广。For CP self-consumption users, in the process of content promotion, there may be some CPs who consume their own content in batches in order to increase revenue sharing or achieve settlement assessment goals. Using the abnormal model to identify CP self-consumption can ensure healthy promotion of content.

参与营销活动用户，需借助异常用户识别方法，对上线的营销活动获奖用户进行有效性稽核，可以有效保护营销资源，保障活跃用户权益，以更好地提升活动感知和效果。Users who participate in marketing activities need to use the method of abnormal user identification to check the effectiveness of online marketing activity award-winning users, which can effectively protect marketing resources and protect the rights and interests of active users, so as to better improve the perception and effect of activities.

本申请实施例的技术方案能够融合客户端、CP端以及参与活动的用户的基本属性、行为信息，通过对上述信息进行处理并衍生统计后，作为识别模型输入，利用识别模型对上述列举的三种异常用户进行识别，并对识别模型进行调优，输出异常对象集合。The technical solution of the embodiment of the present application can integrate the basic attributes and behavior information of the client, CP, and users participating in the activity. After processing the above information and deriving statistics, it can be input as a recognition model, and the recognition model can be used to analyze the above-mentioned three Identify different kinds of abnormal users, tune the recognition model, and output a set of abnormal objects.

图1为本申请实施例提供的异常对象的识别方法的流程示意图；如图1所示，本申请实施例提供的异常对象的识别方法包括如下步骤：Figure 1 is a schematic flow chart of the method for identifying an abnormal object provided by the embodiment of the present application; as shown in Figure 1, the method for identifying an abnormal object provided by the embodiment of the present application includes the following steps:

步骤101：获得至少一个目标对象的第一信息以及第二信息；所述第一信息为操作行为信息，所述第二信息为活动参与信息。Step 101: Obtain first information and second information of at least one target object; the first information is operation behavior information, and the second information is activity participation information.

本申请实施例中，所述目标对象为互联网用户，本申请实施例预先从多个渠道提取用户的行为信息，该行为信息主要包括用户的操作行为信息和活动参与信息。操作行为信息可以为用户的注册信息、评论信息、点赞信息等，活动参与信息可以为用户参与营销活动的信息，如购买信息、转发信息等，本申请中的操作行为信息和活动参与信息还可以是其它信息，本申请实施例对上述两种信息的具体内容不作具体限定。In the embodiment of the present application, the target object is an Internet user, and the embodiment of the present application extracts the user's behavior information from multiple channels in advance, and the behavior information mainly includes the user's operation behavior information and activity participation information. Operational behavior information can be the user’s registration information, comment information, like information, etc., and activity participation information can be information about the user’s participation in marketing activities, such as purchase information, forwarding information, etc. The operational behavior information and activity participation information in this application are also It may be other information, and the embodiment of the present application does not specifically limit the specific content of the above two kinds of information.

具体在提取用户的行为信息时，可以以用户信息表中注册号为唯一标识提取出用户连续T天的操作行为信息以及活动参与信息，将提取的上述数据作为用户行为信息的原始数据集。Specifically, when extracting the user's behavior information, the user's operation behavior information and activity participation information for consecutive T days can be extracted using the registration number in the user information table as the unique identifier, and the extracted above-mentioned data can be used as the original data set of user behavior information.

步骤102：基于所述至少一个目标对象的第一信息以及第二信息获得样本数据。Step 102: Obtain sample data based on the first information and second information of the at least one target object.

上述步骤102具体可以通过以下步骤实现：The above-mentioned step 102 can specifically be realized through the following steps:

本申请实施例中，在通过多渠道提取用户行为信息得到用户行为信息的原始数据集后，可以通过统计手段对用户行为信息的原始数据集进行汇总衍生，形成待训练的识别模型的输入所需的信息表，同时匹配并标识若干已确认为异常的用户特征信息作为用于对待训练识别模型进行训练的正样本，并将剩余的用户特征信息作为用于对待识别模型进行训练的负样本。In the embodiment of this application, after extracting user behavior information through multiple channels to obtain the original data set of user behavior information, the original data set of user behavior information can be summarized and derived by statistical means to form the input required for the recognition model to be trained. The information table, while matching and identifying several user feature information that has been confirmed as abnormal as positive samples for training the recognition model to be trained, and using the remaining user feature information as negative samples for training the recognition model to be trained.

步骤103：利用所述样本数据对待训练的识别模型进行训练，获得识别模型。Step 103: using the sample data to train the recognition model to be trained to obtain the recognition model.

本申请一可选实施方式中，所述样本数据中包括多个特征指标，上述步骤103具体可通过如下方式实现：In an optional implementation manner of the present application, the sample data includes multiple characteristic indicators, and the above step 103 can be specifically implemented in the following manner:

具体的，在提取完样本数据中的正样本和负样本后，需要对样本数据的有效性进行检查和处理。本申请一可选实施方式中，可以采用Pearson系数计算指标的相关性，并根据指标的相关性提取重要指标，降低数据冗余度。Specifically, after the positive samples and negative samples in the sample data are extracted, the validity of the sample data needs to be checked and processed. In an optional implementation manner of the present application, the Pearson coefficient can be used to calculate the correlation of indicators, and important indicators can be extracted according to the correlation of indicators, so as to reduce data redundancy.

Pearson相关系数反映的是两个变量间的线性相关性，取值区间为[-1,1]，其中1表示完全两个变量正相关，0表示两个变量完全没有线性关系，-1表示两个变量完全负相关，即一个变量上升的同时，另一个变量在下降。两个变量的相关系数越接近于0，二者的相关性越弱，相关性计算公式如下：The Pearson correlation coefficient reflects the linear correlation between two variables, and the value range is [-1,1], where 1 means that the two variables are completely positively correlated, 0 means that the two variables have no linear relationship at all, and -1 means that the two variables have no linear relationship at all. The two variables are completely negatively correlated, that is, when one variable increases, the other variable decreases. The closer the correlation coefficient of two variables is to 0, the weaker the correlation between the two variables is. The correlation calculation formula is as follows:

其中，X与Y分别表示两个成对的连续变量。Among them, X and Y respectively represent two pairs of continuous variables.

相关性判定依据如下：Relevance is judged based on the following:

一般地，|r|>0.95代表两个变量显著性相关,|r|>0.8代表两个变量高度相关,0.5<＝|r|<0.8代表两个变量中度相关；0.3<＝|r|<0.5代表两个变量低度相关；|r|<0.3代表两个变量关系极弱，可认定为两个变量不相关。Generally, |r|>0.95 represents a significant correlation between two variables, |r|>0.8 represents a high correlation between two variables, and 0.5<=|r|<0.8 represents a moderate correlation between two variables; 0.3<=|r| <0.5 means that the two variables are lowly correlated; |r|<0.3 means that the relationship between the two variables is extremely weak, which can be considered as the two variables are not correlated.

通过计算指标变量之间的Pearson相关系数，能够去除变量间相关度比较大且相对不重要的指标，从而降低模型的指标冗余度。By calculating the Pearson correlation coefficient between indicator variables, indicators with relatively large correlations between variables and relatively unimportant indicators can be removed, thereby reducing the indicator redundancy of the model.

本申请一可选实施方式中，所述待训练的识别模型包括梯度提升决策树GBDT模型，上述步骤103具体可通过如下方式实现：In an optional implementation manner of the present application, the recognition model to be trained includes a gradient boosting decision tree GBDT model, and the above step 103 can be specifically implemented in the following manner:

本申请实施例中，由于识别模型的输入涉及到操作行为信息以及活动参与信息，识别模型训练的特征维度很高，可以采用逻辑回归(LR，Logistic Regression)算法，但是，由于LR模型的学习能力有限，需要进行大量的特征工程，提取出有效的特征以及特征组合，提高模型的非线性学习能力。而梯度提升决策树(GBDT，Gradient Boosting DecisionTree)是一种迭代的决策树算法，属于集成学习Boosting家族的成员，具有分类准确率高且泛化能力好等优点，是一种非线性模型，它基于集成学习中的boosting思想，每次迭代都在减少残差的梯度方向新建立一颗决策树，迭代多少次就会生成多少颗决策树。因此，GBDT可以发现多种具有区分性的特征以及特征组合，很大程度地节省了特征工程的时间与人工成本。因此，本申请实施例选取GBDT和LR的融合算法作为本申请实施例的识别模型。In the embodiment of the present application, since the input of the recognition model involves operation behavior information and activity participation information, the feature dimension of the recognition model training is very high, and the logistic regression (LR, Logistic Regression) algorithm can be used. However, due to the learning ability of the LR model Limited, a large amount of feature engineering is required to extract effective features and feature combinations to improve the nonlinear learning ability of the model. The Gradient Boosting Decision Tree (GBDT, Gradient Boosting DecisionTree) is an iterative decision tree algorithm, which is a member of the integrated learning Boosting family. It has the advantages of high classification accuracy and good generalization ability. It is a nonlinear model. Based on the boosting idea in ensemble learning, a new decision tree is built in the gradient direction of the residual error reduction in each iteration, and how many decision trees will be generated as many times as the iterations. Therefore, GBDT can discover a variety of distinguishing features and feature combinations, which greatly saves the time and labor costs of feature engineering. Therefore, the embodiment of the present application selects the fusion algorithm of GBDT and LR as the recognition model of the embodiment of the present application.

GBDT的基本思想是：基于前向分布算法，每一次迭代计算都是为了减少上一次的残差。而为了消除残差，可以在残差减少的梯度方向上建立一个新的模型。所以说，在梯度增强过程中，每个新的模型的目标是为了使得之前模型的残差往梯度方向减少，这与传统的Boost算法对正确、错误的样本进行加权有着很大的区别。因此，GBDT在相对少的参数调整时间内，可以取得较高的准确率。另外，GBDT采用健壮的损失函数，对异常数据的鲁棒性非常高。The basic idea of GBDT is: based on the forward distribution algorithm, each iterative calculation is to reduce the residual error of the previous one. In order to eliminate the residual, a new model can be built in the gradient direction of the residual reduction. Therefore, in the gradient enhancement process, the goal of each new model is to reduce the residual error of the previous model in the gradient direction, which is very different from the traditional Boost algorithm that weights correct and wrong samples. Therefore, GBDT can achieve high accuracy in a relatively small parameter adjustment time. In addition, GBDT adopts a robust loss function, which is very robust to abnormal data.

由于本申请实施例中模型的输入信息主要包含用户操作行为信息和用户活动参与信息，但由于用户活动参与信息对应的特征过于稀疏。因此需要对两部分信息对应的特征指标分别建树，避免出现特征权重倾斜的情况。Since the input information of the model in the embodiment of the present application mainly includes user operation behavior information and user activity participation information, the features corresponding to the user activity participation information are too sparse. Therefore, it is necessary to build trees for the feature indicators corresponding to the two parts of information to avoid the situation of feature weight tilt.

本申请实施例对于待训练的识别模型的训练步骤如下：In the embodiment of the present application, the training steps for the recognition model to be trained are as follows:

预处理样本数据x_i＝(msidsn，flag，A1，A2，…，An，B1，B2，…，Bm)，其中，msisdn代表注册号，flag代表异常号码标识，即预测值，A1至An代表用户的操作行为特征，B1至Bm代表用户的活动参与行为特征。将样本数据x_i按照操作行为特征和活动参与特征进行划分，输出结果如下：Preprocessing sample data _xi = (msidsn, flag, A1, A2, ..., An, B1, B2, ..., Bm), where msisdn represents the registration number, flag represents the abnormal number identification, that is, the predicted value, and A1 to An represent The user's operation behavior characteristics, B1 to Bm represent the user's activity participation behavior characteristics. Divide the sample data x _i according to the operating behavior characteristics and activity participation characteristics, and the output results are as follows:

操作行为特征集：x_iA，即(msidsn，flag，A1，A2，…，An)Operational behavior feature set: x _iA , namely (msidsn, flag, A1, A2, ..., An)

活动参与特征集：x_iB，即(msidsn，flag，B1，B2，…，Bm)Activity participation feature set: x _iB , namely (msidsn, flag, B1, B2, ..., Bm)

通过上述样本数据的划分，得到T1和T2两部分训练集，如图2所示，本申请实施例能够按照图2的方式分别对T1中的操作行为特征x_iA与x_iB活动参与特征进行训练，分别建立对应的GBDT树。Through the division of the above sample data, two training sets of T1 and T2 are obtained, as shown in Figure 2, the embodiment of the present application can respectively train the operation behavior features x _iA and x _iB activity participation features in T1 according to the method shown in Figure 2 , build the corresponding GBDT tree respectively.

上述两种特征的GBDT树构建完成后，通过将样本数据中的操作行为特征集x_iA和活动参与特征集x_iB分别遍历对应的GBDT树，输出的每个叶子节点即为一个LR的特征。分别用0，1代表样本是否落入该叶子节点，构建出LR模型的输入x_input：(msisdn1，flag，C1，C2，…，Ck)，其中k为GBDT叶子节点的数目。After the construction of the GBDT tree of the above two features is completed, the operation behavior feature set x _iA and the activity participation feature set x _iB in the sample data are respectively traversed through the corresponding GBDT tree, and each leaf node output is a feature of an LR. Use 0 and 1 to represent whether the sample falls into the leaf node, and construct the input x _input of the LR model: (msisdn1, flag, C1, C2, ..., Ck), where k is the number of GBDT leaf nodes.

以建立用户操作行为特征GBDT树为例，具体步骤如下：Taking the establishment of the user operation behavior characteristic GBDT tree as an example, the specific steps are as follows:

a)输入:训练样本集

损失函数：

迭代次数：M。a) Input: training sample set

Loss function:

Number of iterations: M.

其中，x_i＝(A_i1，A_i2，…，A_in)，即为用户的操作行为特征集；y_i∈{0，1}，0代表负样本，1代表正样本。F(x)即为模型F的预测值；N为样本数。Among them, x _i =(A _i1 , A _i2 ,..., A _in ), which is the feature set of the user's operation behavior; y _i ∈ {0, 1}, 0 represents a negative sample, and 1 represents a positive sample. F(x) is the predicted value of model F; N is the number of samples.

用户操作行为特征GBDT树的损失函数采用如下形式：The loss function of the user operation behavior feature GBDT tree takes the following form:

b)初始化弱学习器。b) Initialize the weak learner.

c)对于第m轮迭代：c) For the mth round of iteration:

1)计算损失函数的负梯度，作为r_im的估计值:1) Calculate the negative gradient of the loss function as an estimate of r _im :

2)训练集更新为

用于训练模型h_m(x)去拟合r_im。2) The training set is updated as

Used to train the model h _m (x) to fit r _im .

3)利用线性搜索估计叶节点区域值，即优化如下函数：3) Use linear search to estimate the area value of the leaf node, that is, optimize the following function:

其中，R_jm为叶节点区域j，J_m为叶子数目，b_jm为叶节点的输出值。Among them, R _jm is the area j of the leaf node, J _m is the number of leaves, and b _jm is the output value of the leaf node.

4)更新模型：4) Update the model:

F_m(x)＝F_m-1(x)+γ_mh_m(x) (7)F _m (x) = F _m-1 (x) + γ _m h _m (x) (7)

5)输出M轮迭代后的强学习器F_M(x)。5) Output the strong learner F _M (x) after M rounds of iterations.

本申请一可选实施方式中，所述待训练的识别模型还包括逻辑回归LR模型，上述步骤103具体可通过如下步骤实现：In an optional implementation manner of the present application, the recognition model to be trained also includes a logistic regression LR model, and the above step 103 can be specifically implemented through the following steps:

具体的，将根据两个GBDT树得到的特征作为LR模型的输入，建立LR模型来预测每条样本数据集中注册号是否为异常用户，输出结果示例：msisdn1:flag。其中，flag＝0代表非异常号码，flag＝1代表属于异常号码。Specifically, the features obtained according to the two GBDT trees are used as the input of the LR model, and the LR model is established to predict whether the registration number in each sample data set is an abnormal user, and the output result example is: msisdn1:flag. Wherein, flag=0 represents a non-abnormal number, and flag=1 represents an abnormal number.

本申请实施例中，建立LR模型的具体步骤如下：In the embodiment of this application, the specific steps for establishing the LR model are as follows:

a)输入:训练样本集

损失函数：

步长，即学习率：α，最大迭代次数：max_iter，误差限度tol。a) Input: training sample set

Loss function:

Step size, that is, learning rate: α, maximum number of iterations: max_iter, error limit tol.

其中，x_i＝(C_i1，C_i2，…，C_in)，即为GBDT输出的融合特征集；y_i∈{0，1}，0代表负样本，1代表正样本，N为样本数。Among them, x _i = (C _i1 , C _i2 ,..., C _in ), which is the fusion feature set output by GBDT; y _i ∈ {0, 1}, 0 represents negative samples, 1 represents positive samples, and N is the number of samples .

损失函数采用对数似然loss的形式：The loss function takes the form of log-likelihood loss:

其中，

in,

b)初始化参数θ:(θ₀,θ₁,θ₂,…θ_k)，可设为全1向量。b) Initialization parameter θ: (θ ₀ , θ ₁ , θ ₂ ,…θ _k ), which can be set as a vector of all 1s.

c)对于第j轮迭代，判断误差是否满足小于tol。若满足，则终止训练，否则进行操作：c) For the jth iteration, determine whether the error is less than tol. If it is satisfied, the training is terminated, otherwise, the operation is performed:

更新

renew

d)输出LR模型最终参数θ。d) Output the final parameter θ of the LR model.

本申请一可选实施方式中，上述步骤103具体可通过如下步骤实现：In an optional implementation manner of the present application, the above step 103 can be specifically implemented through the following steps:

具体的，在GBDT树中，GBDT树的个数和属性维数以及树的深度需要人为调整。利用可以利用K折交叉法(如十折交叉法)进行模型训练，选取准确率最大的分类结果对应的GBDT参数作为优化结果，也即F1越大，模型识别效果越佳。其中

Precision为准确率，Recall为召回率。Specifically, in the GBDT tree, the number of GBDT trees, the attribute dimension and the depth of the tree need to be adjusted manually. The K-fold crossover method (such as the ten-fold crossover method) can be used for model training, and the GBDT parameters corresponding to the classification result with the highest accuracy rate are selected as the optimization result, that is, the larger the F1, the better the model recognition effect. in

Precision is the accuracy rate, and Recall is the recall rate.

本申请一可选实施方式中，执行上述步骤103后，还可以利用如下方式对得到的识别模型进行优化：In an optional implementation manner of the present application, after performing the above step 103, the obtained recognition model can also be optimized in the following manner:

本申请实施例中，利用测试样本对优化后的算法规则进行测试，一方面可以根据识别出的异常用户与已知正负样本进行匹配对比，计算识别模型识别错误的比率、准确率、召回率等指标；另一方面，结合用户自身属性，行为特征，然后验证识别模型的预测是否合理。综合两部分指标，最终确认出识别模型最优的算法规则。In the embodiment of this application, test samples are used to test the optimized algorithm rules. On the one hand, the identified abnormal users can be matched and compared with known positive and negative samples to calculate the recognition error rate, accuracy rate, and recall rate of the recognition model. and other indicators; on the other hand, combined with the user's own attributes and behavior characteristics, and then verify whether the prediction of the recognition model is reasonable. Combining the two parts of the indicators, the optimal algorithm rules for the identification model are finally confirmed.

步骤104：基于所述至少一个目标对象的第一信息、第二信息以及所述识别模型确定所述至少一个目标对象中的异常对象。Step 104: Determine an abnormal object in the at least one target object based on the first information, the second information of the at least one target object, and the identification model.

本申请一可选实施方式中，上述步骤104具体可通过以下方式实现：In an optional implementation manner of the present application, the above step 104 can be specifically implemented in the following manner:

具体的，本申请实施例中，在得到训练好的识别模型后，将基于多个用户的操作行为信息和活动参与信息得到的输入信息表输入至训练好的识别模型，即可利用训练好的识别模型识别出多个用户中的每个用户是否为异常用户。Specifically, in the embodiment of the present application, after obtaining the trained recognition model, the input information table obtained based on the operation behavior information and activity participation information of multiple users is input into the trained recognition model, and the trained recognition model can be used. The identification model identifies whether each of the plurality of users is an abnormal user.

本申请一可选实施方式中，在识别模型为优化的识别模型的情况下，本申请能够基于所述至少一个目标对象的第一信息、第二信息以及所述优化的识别模型确定所述至少一个目标对象中的异常对象。In an optional implementation manner of the present application, when the recognition model is an optimized recognition model, the present application can determine the at least one target object based on the first information, second information and the optimized recognition model. An exception object within a target object.

具体的，本申请实施例中，在得到识别模型后，还可以继续对识别模型进行优化，并利用优化的识别模型对多个用户是否为异常用户进行预测，提高预测的准确率。Specifically, in the embodiment of the present application, after the recognition model is obtained, the recognition model can be further optimized, and the optimized recognition model can be used to predict whether multiple users are abnormal users, so as to improve the prediction accuracy.

本申请实施例的技术方案，能够融合客户端、CP端以及参与活动的用户的基本属性、行为信息，通过对上述信息进行处理并衍生统计后，作为模型输入，利用识别模型对异常用户进行识别，并对算法进行调优，输出异常对象集合。The technical solution of the embodiment of the present application can integrate the basic attributes and behavior information of the client, CP, and users participating in the activity. After processing the above information and deriving statistics, it can be input as a model, and the identification model can be used to identify abnormal users. , and tune the algorithm to output a collection of abnormal objects.

图3为本申请实施例提供的待训练的识别模型的训练过程示意图，如图2所示，待训练的识别模型的训练过程包括如下步骤：Fig. 3 is a schematic diagram of the training process of the recognition model to be trained provided by the embodiment of the present application. As shown in Fig. 2, the training process of the recognition model to be trained includes the following steps:

步骤301：数据采集。Step 301: Data collection.

确定数据采集渠道，从多个渠道提取多个用户的行为信息。Determine the data collection channel, and extract the behavior information of multiple users from multiple channels.

步骤302：获取基本信息、操作行为信息以及参与活动信息。Step 302: Obtain basic information, operation behavior information and participation activity information.

从提取的多个用户的行为信息中提取用户的基本信息、操作行为信息以及参与活动信息。From the extracted behavior information of multiple users, the user's basic information, operation behavior information and participation activity information are extracted.

步骤303：建立模型所需特征指标信息表(即输入信息表)。Step 303: Establish a feature index information table (ie, an input information table) required by the model.

在提取到用户的基本信息、操作行为信息以及活动参与信息后，通过统计手段对用户行为信息的原始数据集进行汇总衍生，形成待训练的识别模型的输入所需的信息表。After extracting the user's basic information, operational behavior information, and activity participation information, the original data set of user behavior information is summarized and derived by statistical means to form the information table required for the input of the recognition model to be trained.

步骤304：GBDT特征提取。Step 304: GBDT feature extraction.

在建立模型所需的特征指标信息表后，根据特征指标信息表建立模型输入的样本数据，并对样本数据进行划分，得到用于对操作行为信息GBDT树进行训练的样本数据集，和用于对活动参与信息GBDT树进行训练的样本数据集。After the characteristic index information table required by the model is established, the sample data input by the model is established according to the characteristic index information table, and the sample data is divided to obtain the sample data set used for training the operation behavior information GBDT tree, and used for A sample dataset for training the activity participation information GBDT tree.

利用上述两个样本数据集分别对两种GBDT树进行训练，得到两种GBDT树输出的叶子节点。The above two sample data sets are used to train the two GBDT trees respectively, and the output leaf nodes of the two GBDT trees are obtained.

步骤305：LR模型训练。Step 305: LR model training.

利用两种GBDT树输出的叶子节点输出的特征对作为LR模型的输入，对LR模型进行训练。Using the feature pairs output by the leaf nodes output by the two GBDT trees as the input of the LR model, the LR model is trained.

步骤306：模型输出。Step 306: Model output.

LR模型能够输出预测每条样本数据集中注册号是否为异常用户，输出结果为msisdn1:flag。其中，flag＝0代表非异常号码，flag＝1代表属于异常号码。The LR model can output and predict whether the registration number in each sample data set is an abnormal user, and the output result is msisdn1:flag. Wherein, flag=0 represents a non-abnormal number, and flag=1 represents an abnormal number.

步骤307：判断模型是否合理。Step 307: Determine whether the model is reasonable.

根据LR模型的输出确定模型对应异常用户的预测是否合理。判断预测结果是否正确的方式主要包括步骤308和步骤309两种方式。According to the output of the LR model, it is determined whether the prediction of the model corresponding to the abnormal user is reasonable. The way of judging whether the prediction result is correct mainly includes two ways of step 308 and step 309 .

步骤308：根据用户基本属性、行为信息反推验证异常用户识别准确性。Step 308: Verify the accuracy of abnormal user identification based on the user's basic attributes and behavior information.

由模型训练人员根据用户的基本属性行为信息等确定用户是否为异常用户，并将确定结果与模型的预测结果进行判断，确定模型的预测是否正确。The model trainer determines whether the user is an abnormal user based on the user's basic attribute behavior information, etc., and judges the result of the determination with the prediction result of the model to determine whether the prediction of the model is correct.

步骤309：算法的优化。Step 309: Algorithm optimization.

在判断模型预测结果准确率较低的情况下，循环执行步骤304至307对模型进行优化。When it is judged that the prediction result of the model has a low accuracy rate, steps 304 to 307 are executed cyclically to optimize the model.

步骤310：确定算法规则。Step 310: Determine algorithm rules.

在对算法模型进行优化，使得得到的模型的准确率满足条件的情况下，将最终优化得到的算法模型作为最终的识别模型，后续可利用该最终的识别模型进行异常用户的识别。When the algorithm model is optimized so that the accuracy of the obtained model satisfies the conditions, the final optimized algorithm model is used as the final recognition model, and the final recognition model can be used to identify abnormal users subsequently.

本申请实施例还提供了一种异常对象的识别装置，图4为本申请实施例提供的异常对象的识别装置400的结构组成示意图，如图4所示，所述异常对象的识别装置400包括：The embodiment of the present application also provides a device for identifying abnormal objects. FIG. 4 is a schematic diagram of the structure and composition of the device for identifying abnormal objects 400 provided in the embodiment of the present application. As shown in FIG. 4 , the device for identifying abnormal objects 400 includes :

第一获得单元401，用于获得至少一个目标对象的第一信息以及第二信息；所述第一信息为操作行为信息，所述第二信息为活动参与信息；The first obtaining unit 401 is configured to obtain first information and second information of at least one target object; the first information is operation behavior information, and the second information is activity participation information;

第二获得单元402，用于基于所述至少一个目标对象的第一信息以及第二信息获得样本数据；A second obtaining unit 402, configured to obtain sample data based on the first information and second information of the at least one target object;

训练单元403，用于利用所述样本数据对待训练的识别模型进行训练，获得识别模型；A training unit 403, configured to use the sample data to train a recognition model to be trained to obtain a recognition model;

确定单元404，用于基于所述至少一个目标对象的第一信息、第二信息以及所述识别模型确定所述至少一个目标对象中的异常对象。A determining unit 404, configured to determine an abnormal object in the at least one target object based on the first information, the second information, and the recognition model of the at least one target object.

本申请一可选实施方式中，所述样本数据包括正样本数据和负样本数据；所述第二获得单元402，具体用于：基于所述至少一个目标对象的第一信息以及第二信息构建输入信息表；基于所述输入信息表匹配多个异常特征信息作为所述样本数据中的正样本数据；基于所述输入信息表匹配多个正常特征信息作为所述样本数据中的负样本数据。In an optional implementation manner of the present application, the sample data includes positive sample data and negative sample data; the second obtaining unit 402 is specifically configured to: construct an object based on the first information and second information of the at least one target object. An input information table; matching a plurality of abnormal feature information as positive sample data in the sample data based on the input information table; matching a plurality of normal feature information as negative sample data in the sample data based on the input information table.

本申请一可选实施方式中，所述确定单元404，具体用于：将所述输入信息表输入至所述识别模型，利用所述识别模型确定所述至少一个目标对象中的异常对象。In an optional implementation manner of the present application, the determining unit 404 is specifically configured to: input the input information table into the recognition model, and use the recognition model to determine abnormal objects in the at least one target object.

本申请一可选实施方式中，所述样本数据中包括多个特征指标，所述训练单元403，具体用于：确定所述多个特征指标中各特征指标间的相关性，基于所述相关性确定所述多个特征指标中的至少一个重要指标；利用所述样本数据中与所述至少一个重要指标对应的样本数据对待训练的识别模型进行训练。In an optional implementation manner of the present application, the sample data includes a plurality of characteristic indexes, and the training unit 403 is specifically configured to: determine the correlation among the characteristic indexes in the plurality of characteristic indexes, and based on the correlation Consistently determine at least one important indicator among the plurality of characteristic indicators; use the sample data corresponding to the at least one important indicator in the sample data to train the recognition model to be trained.

本申请一可选实施方式中，所述待训练的识别模型包括梯度提升决策树GBDT模型，所述训练单元403，具体用于：将所述样本数据按照操作行为特征和活动参与特征进行划分，得到操作行为特征集和活动参与特征集；分别建立与所述操作行为特征对应的第一GBDT模型以及与所述活动参与特征对应的第二GBDT模型；利用所述操作行为特征集遍历所述第一GBDT模型，得到所述第一GBDT模型的叶子节点输出的第一特征；利用所述活动参与特征集遍历所述第二GBDT模型，得到所述第二GBDT模型的叶子节点输出的第二特征。In an optional implementation manner of the present application, the recognition model to be trained includes a gradient boosting decision tree GBDT model, and the training unit 403 is specifically configured to: divide the sample data according to operating behavior characteristics and activity participation characteristics, Obtain an operation behavior feature set and an activity participation feature set; respectively establish a first GBDT model corresponding to the operation behavior feature and a second GBDT model corresponding to the activity participation feature; use the operation behavior feature set to traverse the first GBDT model A GBDT model, obtaining the first feature output by the leaf node of the first GBDT model; traversing the second GBDT model by using the activity participation feature set to obtain the second feature output by the leaf node of the second GBDT model .

本申请一可选实施方式中，所述待训练的识别模型还包括逻辑回归LR模型，所述训练单元403，还具体用于：利用所述第一特征和所述第二特征对所述LR模型进行训练，得到训练后的LR模型；所述LR模型的输出包括所述样本数据中包括的训练对象的标识信息以及所述训练对象的类型标识。In an optional implementation manner of the present application, the recognition model to be trained further includes a logistic regression LR model, and the training unit 403 is further specifically configured to: use the first feature and the second feature to classify the LR The model is trained to obtain a trained LR model; the output of the LR model includes the identification information of the training object included in the sample data and the type identification of the training object.

本申请一可选实施方式中，所述训练单元403，具体用于：基于所述样本数据，利用K折交叉法对待训练的识别模型进行训练，得到K个具有不同参数的目标识别模型；选取所述K个具有不同参数的目标识别模型中调和平均值最大的目标识别模型作为识别模型。In an optional implementation manner of the present application, the training unit 403 is specifically configured to: based on the sample data, use the K-fold crossover method to train the recognition model to be trained to obtain K target recognition models with different parameters; select Among the K target recognition models with different parameters, the target recognition model with the largest harmonic mean value is used as the recognition model.

本申请一可选实施方式中，所述训练单元403利用所述样本数据对待训练的识别模型进行训练，获得识别模型之后，所述装置还包括：In an optional implementation manner of the present application, the training unit 403 uses the sample data to train the recognition model to be trained, and after obtaining the recognition model, the device further includes:

优化单元405，用于利用测试样本数据对所述识别模型进行测试，确定所述识别模型的指标是否满足预设条件；和/或，基于所述至少一个目标对象中各对象的第三信息确定所述识别模型的识别结果是否正确；若所述识别模型的指标不满足预设条件，和/或，所述识别模型的识别结果不正确，则继续对所述识别模型进行优化，得到优化的识别模型；An optimization unit 405, configured to use test sample data to test the recognition model, determine whether the indicators of the recognition model meet preset conditions; and/or determine based on the third information of each object in the at least one target object Whether the recognition result of the recognition model is correct; if the index of the recognition model does not meet the preset conditions, and/or, the recognition result of the recognition model is incorrect, continue to optimize the recognition model to obtain the optimized recognition model;

所述确定单元404，还具体用于：基于所述至少一个目标对象的第一信息、第二信息以及所述优化的识别模型确定所述至少一个目标对象中的异常对象。The determining unit 404 is further specifically configured to: determine an abnormal object in the at least one target object based on the first information, the second information of the at least one target object, and the optimized identification model.

本领域技术人员应当理解，图4所示的异常对象的识别装置400中的各单元的实现功能可参照前述异常对象的识别方法的相关描述而理解。图4所示的异常对象的识别装置400中的各单元的功能可通过运行于处理器上的程序而实现，也可通过具体的逻辑电路而实现。Those skilled in the art should understand that the functions implemented by each unit in the abnormal object identification apparatus 400 shown in FIG. 4 can be understood with reference to the relevant description of the aforementioned abnormal object identification method. The functions of each unit in the abnormal object identification device 400 shown in FIG. 4 can be realized by a program running on a processor, or can be realized by a specific logic circuit.

本申请实施例还提供了一种电子设备。图5为本申请实施例的电子设备的硬件结构示意图，如图5所示，电子设备包括：用于进行数据传输的通信组件503、至少一个处理器501和用于存储能够在处理器501上运行的计算机程序的存储器502。终端中的各个组件通过总线系统504耦合在一起。可理解，总线系统504用于实现这些组件之间的连接通信。总线系统504除包括数据总线之外，还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见，在图5中将各种总线都标为总线系统504。The embodiment of the present application also provides an electronic device. FIG. 5 is a schematic diagram of the hardware structure of the electronic device according to the embodiment of the present application. As shown in FIG. 5 , the electronic device includes: a communication component 503 for data transmission, at least one processor 501 and a Memory 502 for running computer programs. Various components in the terminal are coupled together through the bus system 504 . It can be understood that the bus system 504 is used to realize connection and communication between these components. In addition to the data bus, the bus system 504 also includes a power bus, a control bus and a status signal bus. However, for clarity of illustration, the various buses are labeled as bus system 504 in FIG. 5 .

其中，所述处理器501执行所述计算机程序时至少执行图1所示的方法的步骤。Wherein, when the processor 501 executes the computer program, at least the steps of the method shown in FIG. 1 are executed.

可以理解，存储器502可以是易失性存储器或非易失性存储器，也可包括易失性和非易失性存储器两者。其中，非易失性存储器可以是只读存储器(ROM，Read Only Memory)、可编程只读存储器(PROM，Programmable Read-Only Memory)、可擦除可编程只读存储器(EPROM，Erasable Programmable Read-Only Memory)、电可擦除可编程只读存储器(EEPROM，Electrically Erasable Programmable Read-Only Memory)、磁性随机存取存储器(FRAM，ferromagnetic random access memory)、快闪存储器(Flash Memory)、磁表面存储器、光盘、或只读光盘(CD-ROM，Compact Disc Read-Only Memory)；磁表面存储器可以是磁盘存储器或磁带存储器。易失性存储器可以是随机存取存储器(RAM，Random AccessMemory)，其用作外部高速缓存。通过示例性但不是限制性说明，许多形式的RAM可用，例如静态随机存取存储器(SRAM，Static Random Access Memory)、同步静态随机存取存储器(SSRAM，Synchronous Static Random Access Memory)、动态随机存取存储器(DRAM，Dynamic Random Access Memory)、同步动态随机存取存储器(SDRAM，SynchronousDynamic Random Access Memory)、双倍数据速率同步动态随机存取存储器(DDRSDRAM，Double Data Rate Synchronous Dynamic Random Access Memory)、增强型同步动态随机存取存储器(ESDRAM，Enhanced Synchronous Dynamic Random Access Memory)、同步连接动态随机存取存储器(SLDRAM，SyncLink Dynamic Random Access Memory)、直接内存总线随机存取存储器(DRRAM，Direct Rambus Random Access Memory)。本申请实施例描述的存储器502旨在包括但不限于这些和任意其它适合类型的存储器。It can be understood that the memory 502 may be a volatile memory or a non-volatile memory, and may also include both volatile and non-volatile memories. Wherein, the non-volatile memory can be a read-only memory (ROM, Read Only Memory), a programmable read-only memory (PROM, Programmable Read-Only Memory), an erasable programmable read-only memory (EPROM, Erasable Programmable Read-Only Memory), Only Memory), Electrically Erasable Programmable Read-Only Memory (EEPROM, Electrically Erasable Programmable Read-Only Memory), Magnetic Random Access Memory (FRAM, ferromagnetic random access memory), Flash Memory (Flash Memory), Magnetic Surface Memory , CD, or CD-ROM (CD-ROM, Compact Disc Read-Only Memory); the magnetic surface storage can be disk storage or tape storage. The volatile memory may be random access memory (RAM, Random Access Memory), which is used as an external cache. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM, Static Random Access Memory), Synchronous Static Random Access Memory (SSRAM, Synchronous Static Random Access Memory), Dynamic Random Access Memory Memory (DRAM, Dynamic Random Access Memory), Synchronous Dynamic Random Access Memory (SDRAM, Synchronous Dynamic Random Access Memory), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM, Double Data Rate Synchronous Dynamic Random Access Memory), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM, Enhanced Synchronous Dynamic Random Access Memory), Synchronous Link Dynamic Random Access Memory (SLDRAM, SyncLink Dynamic Random Access Memory), Direct Memory Bus Random Access Memory (DRRAM, Direct Rambus Random Access Memory) . The memory 502 described in the embodiment of the present application is intended to include but not limited to these and any other suitable types of memory.

上述本申请实施例揭示的方法可以应用于处理器501中，或者由处理器501实现。处理器501可能是一种集成电路芯片，具有信号的处理能力。在实现过程中，上述方法的各步骤可以通过处理器501中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器501可以是通用处理器、DSP，或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。处理器501可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤，可以直接体现为硬件译码处理器执行完成，或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于存储介质中，该存储介质位于存储器502，处理器501读取存储器502中的信息，结合其硬件完成前述方法的步骤。The methods disclosed in the foregoing embodiments of the present application may be applied to the processor 501 or implemented by the processor 501 . The processor 501 may be an integrated circuit chip and has signal processing capabilities. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 501 or instructions in the form of software. The aforementioned processor 501 may be a general processor, DSP, or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. The processor 501 may implement or execute various methods, steps, and logic block diagrams disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in a storage medium, the storage medium is located in the memory 502, and the processor 501 reads the information in the memory 502, and completes the steps of the foregoing method in combination with its hardware.

在示例性实施例中，电子设备可以被一个或多个应用专用集成电路(ASIC，Application Specific Integrated Circuit)、DSP、可编程逻辑器件(PLD，ProgrammableLogic Device)、复杂可编程逻辑器件(CPLD，Complex Programmable Logic Device)、FPGA、通用处理器、控制器、MCU、微处理器(Microprocessor)、或其他电子元件实现，用于执行前述的通话录音方法。In an exemplary embodiment, the electronic device may be implemented by one or more Application Specific Integrated Circuit (ASIC, Application Specific Integrated Circuit), DSP, Programmable Logic Device (PLD, ProgrammableLogic Device), Complex Programmable Logic Device (CPLD, Complex Programmable Logic Device), FPGA, general-purpose processor, controller, MCU, microprocessor (Microprocessor), or other electronic components to implement the aforementioned call recording method.

本申请实施例还提供一种计算机可读存储介质，其上存储有计算机程序，其特征在于，该程序被处理器执行时至少用于执行图1所示方法的步骤。所述计算机可读存储介质具体可以为存储器。所述存储器可以为如图5所示的存储器502。An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, wherein the program is at least used to execute the steps of the method shown in FIG. 1 when executed by a processor. The computer-readable storage medium may specifically be a memory. The memory may be the memory 502 shown in FIG. 5 .

本申请实施例所记载的技术方案之间，在不冲突的情况下，可以任意组合。The technical solutions described in the embodiments of the present application may be combined arbitrarily if there is no conflict.

在本申请所提供的几个实施例中，应该理解到，所揭露的方法和智能设备，可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，如：多个单元或组件可以结合，或可以集成到另一个系统，或一些特征可以忽略，或不执行。另外，所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口，设备或单元的间接耦合或通信连接，可以是电性的、机械的或其它形式的。In the several embodiments provided in this application, it should be understood that the disclosed methods and smart devices can be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods, such as: multiple units or components can be combined, or May be integrated into another system, or some features may be ignored, or not implemented. In addition, the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms of.

上述作为分离部件说明的单元可以是、或也可以不是物理上分开的，作为单元显示的部件可以是、或也可以不是物理单元，即可以位于一个地方，也可以分布到多个网络单元上；可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place or distributed to multiple network units; Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各实施例中的各功能单元可以全部集成在一个第二处理单元中，也可以是各单元分别单独作为一个单元，也可以两个或两个以上单元集成在一个单元中；上述集成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be fully integrated into a second processing unit, or each unit may be used as a single unit, or two or more units may be integrated into one unit; The above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software functional units.

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。The above is only a specific implementation of the application, but the scope of protection of the application is not limited thereto. Anyone familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the application. Should be covered within the protection scope of this application.

Claims

1. A method for identifying an abnormal object, characterized in that the method comprises:

Obtain first information and second information of at least one target object; the first information is operational behavior information, and the second information is activity participation information;

obtaining sample data based on the first information and second information of the at least one target object;

Using the sample data to train the recognition model to be trained to obtain the recognition model;

An abnormal object in the at least one target object is determined based on the first information, the second information of the at least one target object, and the recognition model.

2. The method according to claim 1, wherein the sample data includes positive sample data and negative sample data;

The obtaining sample data based on the first information and second information of the at least one target object includes:

constructing an input information table based on the first information and the second information of the at least one target object;

matching a plurality of abnormal feature information based on the input information table as positive sample data in the sample data;

Matching a plurality of normal feature information as negative sample data in the sample data based on the input information table.

3. The method according to claim 2, wherein the determining the abnormal object in the at least one target object based on the first information, the second information and the recognition model of the at least one target object comprises :

The input information table is input into the recognition model, and the abnormal objects in the at least one target object are determined by using the recognition model.

4. The method according to any one of claims 1 to 3, wherein the sample data includes a plurality of feature indicators, and using the sample data to train the recognition model to be trained includes:

Determining the correlation between each of the feature indexes in the plurality of feature indexes, and determining at least one important index in the plurality of feature indexes based on the correlation;

The identification model to be trained is trained by using the sample data corresponding to the at least one important indicator in the sample data.

5. The method according to any one of claims 1 to 3, wherein the recognition model to be trained comprises a gradient boosting decision tree GBDT model, and the recognition model to be trained is trained using the sample data , to obtain the recognition model, including:

Dividing the sample data according to operational behavior characteristics and activity participation characteristics to obtain an operational behavior characteristic set and an activity participation characteristic set;

Respectively establishing a first GBDT model corresponding to the operating behavior characteristics and a second GBDT model corresponding to the activity participation characteristics;

Traverse the first GBDT model by using the operation behavior feature set to obtain the first feature output by the leaf node of the first GBDT model;

The second GBDT model is traversed by using the activity participation feature set to obtain second features output by the leaf nodes of the second GBDT model.

6. The method according to claim 5, wherein the recognition model to be trained also includes a logistic regression LR model, and the recognition model to be trained is trained using the sample data to obtain a recognition model, which also includes :

The LR model is trained by using the first feature and the second feature to obtain a trained LR model; the output of the LR model includes the identification information of the training object included in the sample data and the training The type ID of the object.

7. The method according to any one of claims 1 to 3, wherein said using said sample data to train a recognition model to be trained to obtain a recognition model comprises:

Based on the sample data, the recognition model to be trained is trained using the K-fold crossover method to obtain K target recognition models with different parameters;

The target recognition model with the largest harmonic mean value among the K target recognition models with different parameters is selected as the recognition model.

8. The method according to any one of claims 1 to 3, wherein the identification model to be trained is trained using the sample data, and after the identification model is obtained, the method further comprises:

Using test sample data to test the recognition model, determine whether the indicators of the recognition model meet preset conditions; and/or, determine the recognition of the recognition model based on the third information of each object in the at least one target object whether the result is correct;

If the index of the recognition model does not meet the preset condition, and/or, the recognition result of the recognition model is incorrect, continue to optimize the recognition model to obtain an optimized recognition model;

The determining the abnormal object in the at least one target object based on the first information, the second information and the identification model of the at least one target object includes:

Abnormal objects in the at least one target object are determined based on the first information, the second information of the at least one target object, and the optimized identification model.

9. A device for identifying abnormal objects, characterized in that the device comprises:

A first obtaining unit, configured to obtain first information and second information of at least one target object; the first information is operation behavior information, and the second information is activity participation information;

a second obtaining unit, configured to obtain sample data based on the first information and second information of the at least one target object;

A training unit, configured to use the sample data to train a recognition model to be trained to obtain a recognition model;

A determining unit, configured to determine an abnormal object in the at least one target object based on the first information, the second information, and the recognition model of the at least one target object.

10. An electronic device, characterized in that the electronic device comprises: a memory and a processor, the memory is stored with computer-executable instructions, and when the processor runs the computer-executable instructions on the memory, it can realize The method according to any one of claims 1 to 8.

11. A computer storage medium, wherein executable instructions are stored on the storage medium, and when the executable instructions are executed by a processor, the method according to any one of claims 1 to 8 is implemented.