CN116796264A

CN116796264A - Object classification method, device, computer equipment and storage medium

Info

Publication number: CN116796264A
Application number: CN202210232245.1A
Authority: CN
Inventors: 毕超波; 黄嘉成; 彭艺; 刘明亮; 郑磊
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-03-09
Filing date: 2022-03-09
Publication date: 2023-09-22

Abstract

The present application relates to an object classification method, apparatus, computer device, storage medium and computer program product. The embodiment of the application can be applied to vehicle-mounted scenes. The method comprises the following steps: acquiring a plurality of training sample subsets obtained through random sampling; constructing a corresponding initial decision tree based on the training sample subset, wherein decision tree nodes of the initial decision tree are determined based on randomly selected feature categories from all feature categories corresponding to the training sample; inputting training samples in the training sample subset into corresponding initial decision trees to obtain initial prediction labels; based on training labels and initial prediction labels corresponding to all training samples in the same training sample subset, adjusting corresponding initial decision trees until a first convergence condition is met, and obtaining target decision trees; the object classification model generated based on each target decision tree can improve the object classification accuracy by obtaining the prediction label corresponding to the target object according to the prediction result of each target decision tree for the target object.

Description

Object classification method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technology, and in particular, to an object classification method, apparatus, computer device, storage medium, and computer program product.

Background

With the development of computer technology, more and more application programs are emerging in the network. In order to maintain the network environment and protect the physical and mental health of the objects, the operation authorities of different types of objects in the same application program can be set differently.

In the conventional technology, different types of objects are generally identified by empirical rules, for example, for the same application program, different operation times of the different types of objects exist, and an object operating the application program at a specific time is identified as an object of a specific type corresponding to the specific time. However, the object classification method based on the experience rule has large subjectivity and low classification accuracy.

Disclosure of Invention

In view of the foregoing, it is desirable to provide an object classification method, apparatus, computer device, computer-readable storage medium, and computer program product that can improve classification accuracy.

The application provides an object classification method. The method comprises the following steps:

acquiring a plurality of training sample subsets; each training sample subset is obtained by randomly sampling the same training sample set, the training sample subset comprises training samples and training labels corresponding to the training samples, the training samples are obtained based on operation data of a training object in a target application, and the training labels are used for determining the operation authority of the training object in the target application;

Constructing decision trees based on the training sample subsets to obtain initial decision trees respectively corresponding to the training sample subsets; the decision tree nodes of the initial decision tree are determined from feature categories corresponding to all training features contained in the training sample based on randomly selected feature categories;

inputting training samples in the training sample subset into corresponding initial decision trees to obtain initial prediction labels corresponding to the training samples;

based on training labels and initial prediction labels corresponding to all training samples in the same training sample subset, adjusting corresponding initial decision trees until a first convergence condition is met, and obtaining target decision trees corresponding to all training sample subsets respectively;

generating an object classification model based on each target decision tree; the object classification model is used for inputting target features corresponding to target objects into each target decision tree, and obtaining target prediction labels corresponding to the target objects based on prediction results of each target decision tree.

The application also provides an object classification device. The device comprises:

the training set acquisition module is used for acquiring a plurality of training sample subsets; each training sample subset is obtained by randomly sampling the same training sample set, the training sample subset comprises training samples and training labels corresponding to the training samples, the training samples are obtained based on operation data of a training object in a target application, and the training labels are used for determining the operation authority of the training object in the target application;

The initial decision tree construction module is used for constructing decision trees based on the training sample subsets to obtain initial decision trees respectively corresponding to the training sample subsets; the decision tree nodes of the initial decision tree are determined from feature categories corresponding to all training features contained in the training sample based on randomly selected feature categories;

the decision tree prediction module is used for inputting training samples in the training sample subset into corresponding initial decision trees to obtain initial prediction labels corresponding to the training samples;

the target decision tree generation module is used for adjusting the corresponding initial decision tree based on the training label and the initial prediction label corresponding to each training sample in the same training sample subset until the first convergence condition is met, so as to obtain target decision trees corresponding to each training sample subset respectively;

the model generation module is used for generating an object classification model based on each target decision tree; the object classification model is used for inputting target features corresponding to target objects into each target decision tree, and obtaining target prediction labels corresponding to the target objects based on prediction results of each target decision tree.

Acquiring target characteristics corresponding to a target object, wherein the target characteristics are obtained based on operation data of the target object in a target application;

inputting the target characteristics into an object classification model to obtain target prediction labels corresponding to the target objects; the target prediction label is used for determining the operation authority of the target object in the target application, and the target prediction label is obtained based on the prediction result of each target decision tree in the object classification model;

the training process of the object classification model comprises the following steps:

acquiring a plurality of training sample subsets; each training sample subset is obtained by randomly sampling the same training sample set; constructing decision trees based on the training sample subsets to obtain initial decision trees respectively corresponding to the training sample subsets; the decision tree nodes of the initial decision tree are determined from feature categories corresponding to all training features contained in the training sample based on randomly selected feature categories; inputting training samples in the training sample subset into corresponding initial decision trees to obtain initial prediction labels corresponding to the training samples; based on training labels and initial prediction labels corresponding to all training samples in the same training sample subset, adjusting corresponding initial decision trees until a first convergence condition is met, and obtaining target decision trees corresponding to all training sample subsets respectively; an object classification model is generated based on each of the target decision trees.

the data acquisition module is used for acquiring target characteristics corresponding to a target object, wherein the target characteristics are obtained based on operation data of the target object in a target application;

the label prediction module is used for inputting the target characteristics into an object classification model to obtain target prediction labels corresponding to the target objects; the target prediction label is used for determining the operation authority of the target object in the target application, and the target prediction label is obtained based on the prediction result of each target decision tree in the object classification model;

A computer device comprising a memory storing a computer program and a processor implementing the steps of the various data querying methods described above when the processor executes the computer program.

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the various data querying methods described above.

A computer program product comprising a computer program which when executed by a processor performs the steps of the various data querying methods described above.

According to the object classification method, the device, the computer equipment, the storage medium and the computer program product, a plurality of training sample subsets are obtained through acquiring random samples of the same training sample set, each training sample subset is obtained through randomly sampling the same training sample set, the training sample subset comprises training samples and training labels corresponding to the training samples, the training samples are obtained based on operation data of training objects in target applications, the training labels are used for determining operation authority of the training objects in the target applications, decision trees are built based on the training sample subsets, initial decision trees corresponding to the training sample subsets are obtained, decision tree nodes of the initial decision trees are determined based on the randomly selected feature categories from feature categories corresponding to training features contained in the training samples, the training samples in the training sample subsets are input into the corresponding initial decision trees to obtain initial prediction labels corresponding to the training samples, the corresponding initial decision trees are adjusted based on the training labels and the initial prediction labels corresponding to the training samples in the same training sample subsets until first convergence conditions are met, the target decision trees corresponding to the training sample subsets are obtained, and the object classification models are generated based on the target decision trees. Subsequently, the target features corresponding to the target objects can be input into an object classification model, and the object classification model outputs the prediction labels corresponding to the target objects based on the prediction results of the target decision trees. Therefore, the object classification model is obtained based on training of the training sample set, and the object is classified based on the object classification model, so that the accuracy and the efficiency of classifying the object can be improved. And the object classification model comprises a plurality of target decision trees, different training sample subsets can be obtained by randomly sampling the same training sample set, different target decision trees can be obtained by training based on the different training sample subsets through random feature selection, and the object classification model can output more accurate prediction labels based on the prediction results of the different target decision trees, so that the classification accuracy of the objects is further improved.

Drawings

FIG. 1 is a diagram of an application environment for an object classification method in one embodiment;

FIG. 2 is a flow diagram of a method of classifying objects in one embodiment;

FIG. 3 is a flow diagram of training a decision tree in one embodiment;

FIG. 4 is a flow chart of an object classification method according to another embodiment;

FIG. 5 is a flow diagram of identifying a game player in one embodiment;

FIG. 6 is a schematic diagram of training and applying an integrated model in one embodiment;

FIG. 7 is a schematic diagram of model training in one embodiment;

FIG. 8 is a schematic representation of player characteristics in one embodiment;

FIG. 9 is a block diagram of an object classification apparatus in one embodiment;

FIG. 10 is a block diagram of an object classification apparatus according to another embodiment;

FIG. 11 is a block diagram of an object classification apparatus according to yet another embodiment;

FIG. 12 is an internal block diagram of a computer device in one embodiment;

fig. 13 is an internal structural view of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The object classification method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on the cloud or other servers.

The terminal 102 and the server 104 may cooperate to perform the object classification method provided in embodiments of the present application. For example, the server acquires a plurality of training sample subsets from the terminal, each training sample subset is obtained by randomly sampling the same training sample set, the training sample subsets comprise training samples and training labels corresponding to the training samples, the training samples are obtained based on operation data of a training object in a target application, and the training labels are used for determining operation authority of the training object in the target application. The server builds a decision tree based on the training sample subsets to obtain initial decision trees respectively corresponding to the training sample subsets, and decision tree nodes of the initial decision tree are determined based on randomly selected feature categories from feature categories corresponding to training features contained in the training samples. The server inputs training samples in the training sample subsets into corresponding initial decision trees to obtain initial prediction labels corresponding to the training samples, adjusts the corresponding initial decision trees based on the training labels and the initial prediction labels corresponding to the training samples in the same training sample subset until a first convergence condition is met, obtains target decision trees corresponding to the training sample subsets respectively, and generates an object classification model based on the target decision trees. The server transmits the object classification model to the terminal. The terminal acquires target features corresponding to the target objects, the target features are input into each target decision tree in the object classification model, the object classification model obtains a prediction label corresponding to the target objects based on the prediction results of each target decision tree, and the object classification model outputs the target prediction labels. The terminal can determine the operation authority of the target object in the target application based on the target prediction tag, so as to control the operation of the target object in the target application. The target feature is derived based on operational data of the target object in the target application.

The terminal 102 and the server 104 may also be used separately to perform the object classification method provided in the embodiment of the present application. For example, the terminal trains to obtain an object classification model based on training data, and the terminal obtains a target prediction label corresponding to the target object based on the object classification model.

The terminal 102 may be, but not limited to, various desktop computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The terminal is provided with a target application. The server 104 may be implemented as a stand-alone server or as a server cluster or cloud server composed of a plurality of servers.

The embodiment of the application can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving, games and the like.

In one embodiment, as shown in fig. 2, an object classification method is provided, and the method is applied to a computer device for illustration, where the computer device may be the terminal 102 or the server 104 in fig. 1. Referring to fig. 2, the object classification method includes the steps of:

Step S202, acquiring a plurality of training sample subsets; each training sample subset is obtained by randomly sampling the same training sample set, the training sample subset comprises training samples and training labels corresponding to the training samples, the training samples are obtained based on operation data of a training object in a target application, and the training labels are used for determining operation authority of the training object in the target application.

The target application is an application program of which the pointers are required to distinguish operation authorities for different types of objects. The target application may be a gaming application, a video application, an e-commerce application, etc. For example, the target application may be a game application in which the operational authority of an adult player and a minor player is different, and the duration of operation of the minor player in the game application is limited. The target application may also be a video application, and the operation rights of adults and minors in the video application may be different, so that the video viewing duration, the video viewing type and the like of the minors in the video application may be limited.

The training samples are derived based on operational data of the training object in the target application. A training sample may be composed of a plurality of operational characteristics corresponding to a training object. The operation characteristics are obtained by extracting the characteristics of the operation data. The operational characteristics are used to reflect the use of the training object for the target application. Each training sample has a corresponding training label. The training label is used for determining the operation authority of the training object in the target application. For example, if the training label is a negative label, it indicates that the operation authority of the training object in the target application is not limited, and if the training label is a positive label, it indicates that the operation authority of the training object in the target application is limited.

The training sample set includes a plurality of training samples. The same training sample set is randomly sampled to obtain a plurality of training sample subsets, and each training sample subset can contain the same training sample and can contain different training samples. For example, a set of training samples may be randomly sampled with a put back so that the same training sample may appear in different subsets of training samples.

Specifically, the computer device may obtain a training sample set locally, or from other terminals or servers, randomly sample the training sample set to obtain a plurality of training sample subsets, and further perform model training based on each training sample subset to obtain an object classification model composed of a plurality of target decision trees. Subsequently, the computer equipment can classify any object based on the object classification model, so that the operation of the object in the target application is controlled, and the purposes of maintaining the network environment and protecting the object are achieved.

Step S204, constructing decision trees based on the training sample subsets to obtain initial decision trees respectively corresponding to the training sample subsets; the decision tree nodes of the initial decision tree are determined from feature categories corresponding to each training feature contained in the training sample based on randomly selected feature categories.

Wherein the decision tree is a tree structure in which each internal node represents a test on an attribute, feature, each branch represents a test output, and each leaf node represents a class.

Specifically, the computer device may construct a decision tree based on the training sample subsets, resulting in initial decision trees for each training sample subset, respectively. In constructing the decision tree, the computer device may generate decision tree nodes based on the subset of features. When generating the decision tree node, the computer device may randomly select a plurality of feature classes from feature classes corresponding to each training feature included in the training sample as candidate feature classes, and determine a target feature class from the candidate feature classes to generate the decision tree node. It can be understood that, since the decision tree nodes are determined based on randomly selected feature categories, the initial decision tree corresponding to each training sample subset is different, and in addition, each training sample subset is also different, so that the target decision tree is obtained by training the corresponding initial decision tree based on the training sample subset, and the target decision tree corresponding to each training sample subset is also different, but each target decision tree has a certain prediction capability, so that the object classification model generated based on each target decision tree can have higher accuracy and generalization performance.

In one embodiment, one feature class may be randomly selected from the candidate feature classes as the target feature class. Feature importance levels corresponding to the candidate feature classes can be calculated based on the training sample subset, and the target feature class is determined from the candidate feature classes based on the feature importance levels. The feature importance is used to characterize the degree of influence of a feature on the prediction result, and can also be considered as the classification capability of the feature. For example, the training sample subset may be divided into a first sample subset and a second sample subset, target splitting points corresponding to each candidate feature class respectively are determined based on the first sample subset, splitting accuracy of each target splitting point is calculated based on the second sample subset, classification accuracy is used as feature importance, and the candidate feature class with the largest feature importance is selected as the target feature class. It can be understood that the feature space can be divided into two areas based on any one target splitting point, for example, the candidate feature class is the operation duration, the target splitting point is whether the operation duration is greater than 10 hours, the training sample with the operation duration greater than 10 hours is divided into an area A, the training sample with the operation duration less than or equal to 10 hours is divided into an area B, the distribution conditions of the training samples corresponding to the two areas obtained by dividing the target splitting point are obviously different, positive training samples in one area are more, negative training samples in one area are more, the area with more positive training samples can be considered to correspond to the positive label, and the area with more negative training samples can be considered to correspond to the negative label. Dividing training samples in the second sample subset based on target splitting points, determining dividing labels corresponding to the training samples according to the areas where the training samples fall, and calculating splitting accuracy based on real labels and dividing labels corresponding to the training samples in the second sample subset. When determining the target splitting point, the splitting point with the largest difference between the distribution conditions of the training samples corresponding to the two divided areas can be used as the target splitting point. Of course, the computer device may also calculate feature importance based on other custom formulas or algorithms.

In one embodiment, a plurality of feature classes are randomly selected from the feature classes to serve as candidate feature classes, optimal splitting features are determined from the candidate feature classes based on feature importance, optimal splitting points corresponding to the optimal splitting features are further determined, a feature space is divided into two areas based on the optimal splitting points corresponding to the optimal splitting features, each training sample in the training sample subset is distributed into the two areas according to the value of the optimal splitting features, the division process is repeated for each area until division stopping conditions are met, and a plurality of areas are obtained. Generating decision tree nodes based on optimal splitting points corresponding to optimal splitting features adopted in each space division, and connecting the decision tree nodes according to the division sequence to obtain an initial decision tree. In each division, a plurality of feature classes may be randomly selected from the remaining feature classes (feature classes that are not the optimal split features) as candidate feature classes, or a plurality of feature classes may be randomly selected from all feature classes as candidate feature classes.

Step S206, inputting training samples in the training sample subset into corresponding initial decision trees to obtain initial prediction labels corresponding to the training samples.

Step S208, based on the training labels and the initial prediction labels corresponding to the training samples in the same training sample subset, the corresponding initial decision tree is adjusted until the first convergence condition is met, and the target decision tree corresponding to each training sample subset is obtained.

Specifically, the computer device may use the training sample as input data of the initial decision tree, use the training label as an expected output of the initial decision tree, adjust decision tree parameters of the initial decision tree through multiple iterative training, and use the decision tree obtained by final training as a target decision tree.

After generating the initial decision tree, the computer device may input training samples in each training sample subset into a corresponding initial decision tree to obtain initial prediction labels corresponding to each training sample in each training sample subset. After inputting the data into the initial decision tree, the initial decision tree may output an initial predictive label based on the decision tree nodes and node parameters traversed by the input data. The computer equipment performs back propagation update based on the difference between the training labels corresponding to all training samples in the same training sample subset and the initial prediction labels, adjusts decision tree parameters of the initial decision tree corresponding to the training sample subset to obtain a new initial decision tree, inputs the training samples in the training sample subset into the new initial decision tree to obtain a new initial prediction label, performs back propagation update based on the difference between the new initial prediction labels and the training labels, adjusts the decision tree parameters of the initial decision tree corresponding to the training sample subset again, and performs iterative training in such a way that the adjustment target of each time is that the difference between the initial prediction labels and the training labels is smaller and smaller until the first convergence condition is met, and obtains a target decision tree corresponding to the training sample subset. And each training sample subset independently trains the corresponding initial decision tree, so that the target decision tree corresponding to each training sample subset is finally obtained.

The first convergence condition may be at least one of a difference between the initial prediction result and the training label being smaller than a preset difference, an iteration number being smaller than a preset number, and the like. Adjusting the decision tree parameters may be adjusting the morphology of the decision tree, for example, adjusting the splitting point of the decision tree, or adjusting the decision tree parameters may be adjusting the node parameters of the decision tree nodes.

It can be understood that, because the training sample subset and the initial decision tree form are different, any two target decision trees obtained by training are not identical, but any two target decision trees can output the prediction result corresponding to the same object.

Step S210, generating an object classification model based on each object decision tree; the object classification model is used for inputting the target characteristics corresponding to the target objects into each target decision tree, and obtaining target prediction labels corresponding to the target objects based on the prediction results of each target decision tree.

The target object refers to an object to be classified and an operation right to be determined. The target feature is derived based on operational data of the target object in the target application.

Specifically, after training to obtain each target decision tree, the computer device may compose each target decision tree into an object classification model. When the model is applied later, the computer equipment can acquire target features corresponding to the target objects, the target features are input into an object classification model, the target features are input into each target decision tree, the target decision trees respectively output prediction results after data processing of the decision trees, and the object classification model finally outputs target prediction labels corresponding to the target objects based on the prediction results of the target decision trees. For example, the object classification model may calculate an average value of the respective prediction results, and obtain the target prediction label based on the average value.

In one embodiment, the computer device may control the operation of the target object in the target application directly based on the target predictive tag, e.g., if the target predictive tag is a target tag, the operation of the target object in the target application is restricted. The computer device can also comprehensively determine the operation authority of the target object in the target application based on the target prediction tag and other data, so as to control the operation of the target object in the target application.

In the object classification method, a plurality of training sample subsets are obtained by randomly sampling the same training sample set, each training sample subset is obtained by training samples and training labels corresponding to the training samples, the training samples are obtained based on operation data of training objects in target applications, the training labels are used for determining operation authority of the training objects in the target applications, decision trees are built based on the training sample subsets, initial decision trees respectively corresponding to the training sample subsets are obtained, decision tree nodes of the initial decision trees are determined based on randomly selected feature categories from feature categories corresponding to training features contained in the training samples, the training samples in the training sample subsets are input into the corresponding initial decision trees to obtain initial prediction labels corresponding to the training samples, the corresponding initial decision trees are adjusted based on the training labels and the initial prediction labels corresponding to the training samples in the same training sample subset until first convergence conditions are met, the target decision trees respectively corresponding to the training sample subsets are obtained, and an object classification model is generated based on the target decision trees. Subsequently, the target features corresponding to the target objects can be input into an object classification model, and the object classification model outputs the prediction labels corresponding to the target objects based on the prediction results of the target decision trees. Therefore, the object classification model is obtained based on training of the training sample set, and the object is classified based on the object classification model, so that the accuracy and the efficiency of classifying the object can be improved. And the object classification model comprises a plurality of target decision trees, different training sample subsets can be obtained by randomly sampling the same training sample set, different target decision trees can be obtained by training based on the different training sample subsets through random feature selection, and the object classification model can output more accurate prediction labels based on the prediction results of the different target decision trees, so that the classification accuracy of the objects is further improved.

In one embodiment, the training samples include a positive training sample and a negative training sample, the operation authority corresponding to the positive training sample is smaller than the operation authority corresponding to the negative training sample, the training object corresponding to the negative training sample includes at least one of a current time active object, a target time state object, a target platform active object and a target time registration object, the current time active object refers to an object with the activity of the current time period in the target application being greater than a first preset threshold, the target time state object refers to an object with the time state being the target state, the target platform active object refers to an object with the activity of the target operation platform in the target application being greater than a second preset threshold, and the target time registration object refers to an object with the account registration time in the target application being earlier than the target time.

The training samples comprise positive training samples and negative training samples, training labels corresponding to the positive training samples are positive labels, and training labels corresponding to the negative training samples are negative labels. The operation authority corresponding to the positive training sample is smaller than the operation authority corresponding to the negative training sample, namely, the operation authority of the object with the positive label in the target application is smaller than the operation authority of the object with the negative label in the target application. For example, the allowable operation duration of the object with the positive label in the target application is smaller than the allowable operation duration of the object with the negative label in the target application.

The current time active object refers to an object of which the activity degree of the current time period in the target application is greater than a first preset threshold value. The current time period refers to a data statistics time period currently used, and can be set according to actual needs. For example, the current time period may be about one month, each object whose activity level in the target application is greater than the first preset threshold value for about one month is obtained as a candidate object, and random sampling is performed on each candidate object to obtain an active object in the current time. The liveness may be calculated according to a custom formula or algorithm, for example, the liveness is calculated based on the duration of the operation, the more the duration of the operation, the higher the liveness. The first preset threshold value can be set according to actual needs. It can be understood that the current time active object represents an object in which the current time period is more active in the target application, and may be considered as an object in which the current time period is less limited in the target application, that is, an object having a higher operation authority.

The target time state object refers to an object whose time state is the target state. The time status may be determined according to the age of the subject, for example, if the age of the subject is greater than a preset age, the corresponding time status is the target status. The time status may be determined based on the daemon flag, for example, if the object has the daemon flag, the corresponding time status is the target status. The object with the daemon mark can manage and control the operation authority of the specific object, for example, a parent with the growing daemon mark can check the game time and consumption record of the child, control the game time and the game type of the child, and the like for the game application. It can be understood that the object with the time state being the target state can more freely and reasonably allocate the time of the object, and can have higher operation authority in the target application. Of course, the time status may be determined comprehensively according to the age of the subject and the daemon, for example, if the subject age of the subject is greater than the preset age and the daemon is provided, the corresponding time status is the target status.

The active object of the target platform refers to an object with the activity of the target running platform in the target application being greater than a second preset threshold, that is, an object with the activity of the target running platform in the target application being greater than the second preset threshold. The target operating platform may refer to a client, e.g., a computer. The second preset threshold may be set according to actual needs.

The target time registration object refers to an object whose account registration time in the target application is earlier than the target time. It can be understood that the earlier the account registration time is, the longer the target object uses the target application, and the target object should have higher operation authority.

In one embodiment, the performance of the model trained based on the positive training sample and the various negative training samples was verified by experiments, respectively, with the following results:

1. positive samples: real name authentication underage, negative sample: real-name authentication of adults

TABLE 1

The AUC value refers to the area surrounded by the ROC curve and the X axis, and the closer the AUC value is to 1, the better the model optimizing effect is. Accuracy refers to the number of samples/total number of samples of the correct class predicted. Recall refers to the predicted correct proportion in all positive samples. The F1 value refers to the harmonic mean of the precision and recall. The false positive rate refers to the proportion of the sample predicted to be positive to the actual negative sample number, for example, in a gaming application the false positive rate may be the probability of being an actual adult player, but judged to be a minor player. The simulated delivery underage ratio refers to the ratio of the number of underage subjects in the real-name authentication subjects.

2. Positive samples: real name authentication underage, negative sample: adult parents marked by real-name authentication adult and growth daemon

TABLE 2

AUC values	0.921
		Accuracy rate of	0.640
Recall rate of recall	0.472
		F1 value	0.487
Rate of false positive	6.15％
		Simulated delivery of minor proportions	5.32％

Among them, the training effect of table 2 was better than table 1, the minor authentication ratio was increased by 129.3%, and the false positive rate was reduced by 33.8%. Adult parents who are authenticated as adults and growth daemons business markers may be considered target time state objects.

3. Positive samples: real name authentication underage, negative sample: real-name authentication adult+active object of near one month

TABLE 3 Table 3

AUC values	0.914
		Accuracy rate of	0.663
Recall rate of recall	0.394
		F1 value	0.513
Rate of false positive	5.92％
		Simulated delivery of minor proportions	4.70％

Among them, the training effect of table 3 was better than table 1, the minor authentication ratio was increased by 102.6%, and the false positive rate was reduced by 36.3%. The active object of the last month may be regarded as the active object of the current time.

4. Positive samples: real name authentication underage, negative sample: real name authentication adult+end-play active object+registration year earlier object

TABLE 4 Table 4

AUC values	0.932
		Accuracy rate of	0.741
Recall rate of recall	0.628
		F1 value	0.620
Rate of false positive	4.71％
		Simulated delivery of minor proportions	10.67％

Among them, the training effect of table 4 was better than table 1, the minor authentication ratio was increased by 359.9%, and the false positive rate was reduced by 49.3%. The end-play active object may be considered a target platform active object and the earlier-registered year object may be considered a target time registration object.

In the above embodiment, the training objects corresponding to the negative training samples in the training sample set may include at least one of a current time active object, a target time state object, a target platform active object and a target time registration object, and the operation features of these training objects are used as the negative training samples to help ensure the training effect of the model, and improve the accuracy of the model.

In one embodiment, the training features include at least one of object attribute features, operation interaction features, operation duration features, device login features, registration time features, running platform features, and target association features between the training object and target association objects, where the target association objects are association objects having target operation rights in respective association objects of the training object.

Wherein the object attribute features are used to characterize attribute information of the object. The object attribute characteristics may include at least one attribute information of an object age, an object sex, a terminal identification of a target application execution terminal, a terminal model number, a terminal execution network, a terminal execution area, a terminal execution operating system, and the like. The object attribute features may specifically include various attribute information prior to real-name authentication of the object. The terminal operation area refers to a geographical area where the terminal is located, for example, a first line, a second line, a third line city, etc.

The operation interaction characteristic is obtained based on the interaction operation of a certain object in the target application and other objects and is used for representing the operation frequency and the operation proficiency of the certain object in the target application. For example, if the target application is a game application, the interactive operation may specifically be an oppositional operation, and the operation interactive characteristics may specifically include a match ratio, a match average ratio, a match failure ratio, a match ratio of each mode, a total number of matches (total number of matches/number of matches), a daily number of matches (total number of working matches/number of days of working matches having occurred), a number of matches (total number of weekdays of working matches/number of days having occurred, weekends and holidays, and total number of weekends/total number of weekends and holidays having occurred), a total number of matches having a ratio of the number of matches for each of weekends and holidays, a total number of matches for each hour of each weekend and holiday, a total number of matches for each hour of weekends and holidays, and the like.

The operation duration feature is used for characterizing the operation duration of the object in the target application. The operation duration feature may include at least one attribute information of an operation duration, an average duration, a duration duty cycle, and the like. For example, the number of the cells to be processed, the operational time length characteristics may include data such as weekday average time length, weekend and holiday average time length, weekday time period time length, weekend and holiday time period time length, daily morning/noon/afternoon/evening/late night time length duty cycle, and the like.

The equipment login features are obtained based on the terminal and the account number of the object used for logging in the target application and are used for representing the login frequency degree and the login specificity degree of the terminal and the account number. The device login feature may include at least one of data of a number of devices that have been logged in from history, a number of natural persons that have been bound by account history, a number of accounts that have been logged in from device history that have been logged in during a target time period, a number of accounts that have been bound by natural persons that have been bound by account history that have been logged in during a target time period, whether they have been logged in to suspicious devices during a target time period, and the like. The target time period may be set according to actual needs, for example, the target time period is summer holidays. A suspicious device refers to a device that historically has a target type object logged into a target application, e.g., a device that historically has a underage logged into a game.

The registration time feature is used to characterize the registration time and registration duration of the object in the target application. The registration time feature may include data of registration duration, registration year, etc.

The operation platform features are used for characterizing the liveness of the object in operating the target application on different operation platforms. The operation platform may specifically include a client (e.g., a computer end) and a mobile end (e.g., a mobile phone end). The running platform features may include data of a number of days the client is active within a preset time period, a duration of time the client is active within the preset time period, a number of days the client is active within the preset time period, a duration of time the client is active within the preset time period, and so on. The preset time period may be set according to actual needs, for example, the preset time period is approximately one month.

An associated object of an object refers to a friend of the object in the target application. The target associated object refers to an associated object with target operation authority in each associated object of the training object, and can also be considered as a target type friend of the training object in the target application, for example, the target associated object can be an underage friend of the training object in the target application. The target association features between the training object and the target association object are used to characterize the importance of the target association object to the training object. The target association feature may include data such as the number of target associated objects, the target associated object scale, and the target associated object affinity sum. The target associated object affinity is generated based on communication information between the training object and the target associated object, the communication information including at least one of session information, item gifting information, invitation information, and the like.

In one embodiment, the operational interaction feature, the operational duration feature, and the device login feature may be operational features corresponding to a target time period. For example, the operational interaction characteristics include a tie rate, and a failure rate for 7-8 months.

Specifically, the training features may include feature information of at least one dimension, and may specifically include at least one of an object attribute feature, an operation interaction feature, an operation duration feature, a device login feature, a registration time feature, an operation platform feature, and a target association feature. It will be appreciated that the training features of each dimension may also include feature information of at least one category, for example, the target-associated features may include feature information of a total of two categories, namely a target-associated object number and a target-associated object ratio.

In one embodiment, the computer device may obtain an application log reported by the target application, and extract each feature information of the same object from the application log to obtain the training feature or the target feature.

In the above embodiments, the training features include various data, which helps to ensure the training effect of the model.

In one embodiment, the training sample acquisition process includes the steps of:

extracting characteristics of operation data of a current training object in a target application to obtain a plurality of initial operation characteristics; determining a plurality of target operating characteristics from the respective initial operating characteristics; performing feature intersection on each target operation feature to obtain an intersection operation feature; and obtaining a training sample corresponding to the current training object based on the initial operation characteristic and the cross operation characteristic.

Wherein feature intersection refers to a composite feature formed by combining individual features. Feature crossing helps to represent the nonlinear relationship.

Specifically, when training data of a model is acquired, the computer device may perform feature extraction on operation data of a training object in a target application to obtain a plurality of initial operation features, and use the initial operation features as training features, thereby obtaining a training sample. In order to further improve the training effect of the model, the computer equipment can also select part of operation features from all initial operation features as target operation features, perform feature intersection on the target operation features to obtain intersection operation features, and respectively use the initial operation data and the intersection operation data as training features to obtain training samples. When the feature crossing is performed, the feature crossing can be performed by combining each target operation feature two by two, so that at least one crossing operation feature can be obtained.

The choice of the target operating characteristics can be determined according to actual needs. For example, the computer device may randomly select a plurality of operation features from the initial operation features as the target operation features. The computer equipment can also acquire operation features belonging to different dimensions from the initial operation features as target operation features, and the information quantity expressed by the crossed operation features can be improved by performing feature crossing on the operation features of different dimensions. The computer device may also obtain, from the initial operating characteristics, a plurality of operating characteristics with the largest difference between different types of objects as target operating characteristics, and the obtained cross operating characteristics are helpful to help the model distinguish between different types of objects.

In one embodiment, the computer device may multiply the characteristic values of the different target operating characteristics to obtain the cross operating characteristic. The computer device may also divide the range of feature values of the target operation feature to obtain a plurality of feature value intervals, combine the feature value intervals of the target operation feature of different categories to obtain a plurality of combined intervals, obtain an initial feature based on each combined interval, and update the initial feature based on the feature value interval to which the feature value of the target operation feature of different categories belongs to obtain the cross operation feature.

For example, the operation duration may be divided into 3 eigenvalue intervals, A, B, C respectively, and the target associated object ratio may be divided into 2 eigenvalue intervals, 1 and 2 respectively. The characteristic value intervals are combined to obtain 6 combined intervals, namely A and 1, B and 1, C and 1, A and 2, B and 2 and C and 2 respectively. Initial features (0, 0) are generated from the 6 merge segments. If the operation duration corresponding to a certain object falls into the characteristic value interval a and the target associated object proportion falls into the characteristic value interval 2, the cross operation feature obtained by updating the initial feature based on the operation duration and the target associated object proportion is (0,0,0,1,0,0).

In the above embodiment, the training samples are composed of the original operation features and the intersecting features obtained through feature intersecting, and model training based on such data helps to improve the accuracy of the model.

The model effect of model training with and without feature crossing was tested by experiment, and the experimental results are shown in table 5. Referring to table 5, it can be seen that the feature cross helps to ensure the training effect of the model and improve the model prediction accuracy.

TABLE 5

In one embodiment, constructing a decision tree based on the training sample subsets to obtain initial decision trees corresponding to the training sample subsets, respectively, includes:

Randomly determining a plurality of candidate feature classes from each current feature class; in the current training sample subset, calculating first splitting coefficients corresponding to feature boxes based on feature box sets corresponding to candidate feature categories, and obtaining a plurality of first splitting coefficients respectively corresponding to each candidate feature category; determining a second splitting coefficient based on each first splitting coefficient corresponding to the same candidate feature class, obtaining second splitting coefficients corresponding to each candidate feature class, and determining a target feature class from each candidate feature class based on the second splitting coefficients; generating decision tree nodes based on the target feature class; updating the current feature category based on the target feature category, and returning to execute the step of randomly determining a plurality of candidate feature categories from each current feature category until a preset condition is met, so as to obtain a plurality of decision tree nodes; an initial decision tree is generated based on the individual decision tree nodes.

Specifically, for the current training sample subset, the computer device may randomly determine, from each current feature class, a plurality of feature classes as candidate feature classes by using each feature class as a current feature class, and calculate, based on a feature box set corresponding to the candidate feature classes, a first split coefficient corresponding to the feature box, to obtain a plurality of first split coefficients respectively corresponding to each candidate feature class. It will be appreciated that the feature binning set corresponding to one candidate feature class includes a plurality of feature bins, each feature bin having a corresponding first splitting coefficient. The computer device may calculate the first split coefficient based on a custom formula or algorithm. The first splitting coefficient may be used to determine a second splitting coefficient corresponding to a certain feature class, and may also be used to determine an optimal splitting point corresponding to a certain feature class. The computer equipment determines a second splitting coefficient based on each first splitting coefficient corresponding to the same candidate feature class, and obtains the second splitting coefficient corresponding to each candidate feature class. The second splitting coefficient is used to determine an optimal splitting characteristic. For example, if the smaller the first splitting coefficient is, the higher the degree of distinction of the corresponding splitting manner on different objects is, the first splitting coefficients corresponding to the same candidate feature class may be ranked from small to large, and the computer device may obtain the minimum value as the second splitting coefficient, or may obtain an average value of a plurality of first splitting coefficients ranked earlier as the second splitting coefficient. The computer device may then determine a target feature class from the candidate feature classes based on the second splitting coefficient, e.g., if the smaller the second splitting coefficient, the higher the degree of distinction of the corresponding splitting pattern for the different objects, the higher the feature importance of the corresponding feature class, and the candidate feature class corresponding to the smallest value of the second splitting coefficient may be taken as the target feature class. The target feature class may be considered the best split feature currently found. Further, the computer device generates a decision tree node based on the target feature class, resulting in a first decision tree node. After determining the optimal splitting feature, the computer device can generate a decision tree node based on any splitting point corresponding to the optimal splitting feature, and in order to improve the convergence speed of the model, the computer device can further determine the optimal splitting point corresponding to the optimal splitting feature and generate the decision tree node based on the optimal splitting point. For example, a target feature bin may be determined from a set of feature bins corresponding to optimal split features based on the first split coefficient, and an optimal split point may be determined based on the target feature bin.

When the first decision tree node is generated, the computer equipment can divide the training features corresponding to the same candidate feature category into feature division sets corresponding to the candidate feature category by adopting modes of equal-frequency division, equal-length division, chi-square division, custom division and the like. The binning process refers to grouping the scrambled features and distributing the scrambled features into a number of bins in an ordered arrangement. Each feature bin obtained by the equal frequency bin has the same number of features. The length of the characteristic range of each characteristic sub-box obtained by equal-length sub-boxes is the same. The chi-square box separation method is based on chi-square test. The user-defined box division method refers to a box division method of user-defined box division points. The training features of each feature class need to be separately binned. For example, the training features comprise registration time length and target associated object number, the registration time length corresponding to each training object in the same training sample subset is subjected to box division processing to obtain a feature box division set corresponding to the registration time length, and the target associated object number corresponding to each training object in the same training sample subset is subjected to box division processing to obtain a feature box division set corresponding to the target associated object number. A feature binning set includes a plurality of feature bins.

Further, the computer device may divide the feature space into two regions based on the first decision tree node, different regions corresponding to different branches of the decision tree node. For example, a decision tree node is whether the operation duration is greater than 9 hours, one decision tree branch is longer than 9 hours, and another decision tree branch is shorter than or equal to 9 hours. The computer equipment can select any branch and any area to split the decision tree node, and generate a second decision tree node. When determining the second decision tree node, the computer device may update the current feature class based on the target feature class, take the feature class (i.e., the remaining feature class) other than the target feature class as the current feature class, randomly determine a plurality of new candidate feature classes from each of the current feature classes, recalculate a plurality of first splitting coefficients corresponding to each candidate feature class, determine a second splitting coefficient corresponding to each candidate feature class based on the first splitting coefficients, determine the target feature class from each candidate feature class based on the second splitting coefficients, and generate the decision tree node based on the target feature class to obtain the second decision tree node. When the first division coefficient is calculated, dividing each training object into each region of the feature space according to the value of the training feature, dividing the training object into different decision tree branches, and updating the feature box classification set corresponding to the residual feature category according to the division result. For example, the first decision tree node is whether the operation duration is longer than 9 hours, before updating, the feature box a corresponding to the registration duration includes data of 10 training objects, after updating, for a first area or a first branch, the feature box a corresponding to the registration duration includes data of 6 training objects, the operation duration of the 6 training objects is longer than 9 hours, for a second area or a second branch, the feature box a corresponding to the registration duration includes data of 4 training objects, and the operation duration of the 4 training objects is less than or equal to 9 hours. Of course, the binning process may be performed again according to the division result.

The computer equipment repeats the steps, continuously randomly selects the feature category from the rest feature categories, determines new optimal split features from the randomly selected feature categories to generate new decision tree nodes, and the like until the preset condition is met, so as to obtain a plurality of decision tree nodes. And sequentially connecting all the decision tree nodes according to the generation sequence and the parent-child relationship, thereby obtaining an initial decision tree. For example, a decision tree node for dividing into a region a and a decision tree node for refining the region a are in a parent-child relationship.

It will be appreciated that if no new decision tree node is generated based on the splitting coefficient under a certain decision tree branch, then a leaf node may be directly connected under that decision tree branch. The node parameters corresponding to each decision tree node may be randomly initialized or may be generated based on the corresponding second splitting coefficient. The initial decision tree can be regarded as an initialized decision tree, and the morphology and node parameters of the decision tree can be adjusted and changed through multiple iterations.

The preset condition may be at least one of that the depth of the decision tree is greater than a preset depth, and the box dividing range of the feature box corresponding to the decision tree node is greater than a preset box dividing range.

In the above embodiment, based on the first splitting coefficient and the second splitting coefficient, the optimal splitting feature may be quickly selected from the training operation features, so as to quickly generate the decision tree node, and further obtain the initial decision tree. Moreover, as the candidate feature categories are randomly selected, the initial decision trees corresponding to the training sample subsets are different, and the target decision trees obtained by subsequent training are different, but each target decision tree can predict the data of the same object.

In one embodiment, the binning process may be achieved by:

aiming at the current training sample subset, carrying out initial box division on each training feature corresponding to the current candidate feature class to obtain a plurality of candidate boxes, and taking the candidate boxes as current boxes; based on the training labels corresponding to each current sub-box, calculating the sub-box merging coefficients corresponding to the adjacent current sub-boxes; carrying out box division and combination on adjacent current boxes based on the box division and combination coefficients to obtain a plurality of combined boxes; taking the combined sub-boxes as current sub-boxes, returning to the step of calculating sub-box combining coefficients corresponding to adjacent current sub-boxes based on training labels corresponding to each current sub-box, and executing until the ending condition is met, so as to obtain a plurality of characteristic sub-boxes corresponding to the current candidate characteristic category; and obtaining a feature box set corresponding to the current candidate feature class based on each feature box corresponding to the current candidate feature class.

Specifically, when the box division processing is performed, the initial box division processing can be performed first, then the box division merging is performed, the box division result is gradually optimized, and finally the characteristic box division set is obtained. For any one feature class, the computer device may perform initial binning on each training feature corresponding to the current candidate feature class to obtain a plurality of candidate bins. For example, the training features corresponding to the same feature class are sequenced from small to large according to feature values, a plurality of box division points are randomly determined, the value range of the training features is divided based on the box division points, a plurality of feature value intervals are obtained, one feature value interval corresponds to one candidate box, and the training features of the training objects are placed into the corresponding candidate boxes according to the feature value intervals in which the feature values fall.

For any one feature category, the computer equipment takes the candidate sub-box as the current sub-box, and calculates the sub-box merging coefficients corresponding to the adjacent current sub-boxes based on the training labels corresponding to the current sub-boxes. The bin merging coefficients are used for representing the similarity of data distribution in adjacent feature bins, for example, the label duty ratios corresponding to various training labels can be calculated based on the label number of various training labels in one feature bin, and the bin merging coefficients can be obtained based on the difference of the label duty ratios corresponding to the same training label in the adjacent feature bins. The computer equipment performs box division and combination on adjacent current boxes based on the box division and combination coefficients, combines at least one group of adjacent characteristic boxes with similar data distribution, and keeps other adjacent characteristic boxes unchanged, so that a plurality of combined boxes are obtained. For example, if the bin combination coefficient is smaller, the data distribution of the adjacent feature bins is more similar, in a round of data processing, the adjacent feature bins with the smallest bin combination coefficient may be combined to obtain a combined bin, and the other feature bins are respectively used as combined bins, so as to obtain a plurality of combined bins. The computer device may repeat the above steps with the combined bin as the current bin, calculate a new bin combination coefficient, perform a new round of bin combination, and so on until a second preset condition is satisfied, to obtain a plurality of feature bins corresponding to the current candidate feature category. And finally, forming a feature sub-box set corresponding to the current candidate feature class by each feature sub-box corresponding to the current candidate feature class.

The second preset condition may be at least one of the number of characteristic sub-boxes being less than or equal to a preset sub-box number, each sub-box combination coefficient being greater than a preset coefficient, and the like. For example, if the preset number of bins is 10, and the number of merging bins decreases from 11 to 10 after a certain round of data processing, the bin merging may be stopped. After a certain round of data processing, the bin merging coefficients calculated based on the number of the labels corresponding to the latest merging bins are all larger than the preset coefficients, so that the bin merging can be stopped.

In one embodiment, the binning merge coefficient may be calculated by the following formula:

wherein A is _ij And (3) representing the number of labels corresponding to the jth training label in the ith sub-box aiming at a certain characteristic class. E (E) _ij Representation A _ij Is used to determine the desired probability of (1). N represents the total number of tags of the training tag for a certain feature class, N _i Representing the number of labels corresponding to the ith bin for a certain feature class, C _j Representing the duty cycle of the jth training label in all training labels for a certain feature class.

In the above embodiment, the binning and merging may integrate data sets with similar distribution into one feature binning, so that different feature bins in the finally obtained feature binning set have certain data differences, and such feature binning set is helpful to improve the convergence speed of the model during model training.

In one embodiment, the feature binning set obtained by the binning processing based on the above manner is applied to model training, and data verification is performed on the trained model, so that the obtained underage authentication proportion is 7.3% and the false positive rate is 5.9%. The underage authentication ratio is the ratio of the number of underage subjects to the number of real-name subjects to be authenticated.

In one embodiment, the feature binning results of the durations may be as shown in table 6, where the binning results of table 6 are obtained by performing a cluster analysis on the operational durations of a large number of objects. The binning result is applied to model training, and data verification is carried out on the trained model, so that the obtained underage authentication proportion is 6.4%, and the false positive rate is 5.1%.

TABLE 6

In one embodiment, the characteristic binning results of the time durations may be as shown in table 7, table 7 being equal-length bins, in order for the model to learn by itself the optimal splitting points for generating decision tree nodes. The binning result is applied to model training, and data verification is carried out on the trained model, so that the obtained underage authentication proportion is 7.1%, and the false positive rate is 4.7%.

TABLE 7

In one embodiment, in the current training sample subset, calculating first split coefficients corresponding to feature bins based on a feature bin set corresponding to candidate feature categories, to obtain a plurality of first split coefficients respectively corresponding to each candidate feature category, including:

Determining a current feature sub-box from all feature sub-boxes corresponding to the current feature sub-box set, and dividing all feature sub-boxes corresponding to the current feature sub-box set into a first class sub-box and a second class sub-box based on the current feature sub-box; obtaining the label proportion corresponding to the first type sub-box and the second type sub-box based on the total number of labels corresponding to the first type sub-box and the second type sub-box respectively; based on the number of the labels and the total number of the labels corresponding to the training labels in the same class of sub-boxes, obtaining the label distribution coefficients respectively corresponding to the first class sub-boxes and the second class sub-boxes; and obtaining a first splitting coefficient corresponding to the candidate feature class corresponding to the current feature sub-box based on the label proportion and the label distribution coefficient corresponding to the first class sub-box and the second class sub-box.

Specifically, when the first splitting coefficient is calculated, a certain feature bin corresponding to a certain feature class can be used as a candidate splitting point, each training feature corresponding to the feature class is divided into two types of data based on the candidate splitting point, the label proportion and the label distribution coefficient corresponding to any type of data are calculated, and the label proportion and the label distribution coefficient corresponding to each type of data are fused to obtain the first splitting coefficient corresponding to the candidate splitting point.

The computer device may divide each feature bin corresponding to the current feature bin set into a first type bin and a second type bin based on the current feature bin. For example, feature bins corresponding to the same candidate feature class are sorted according to feature values from small to large, feature bins arranged before the reference feature bin are used as first class bins, and feature bins arranged after the reference feature bin and the reference feature bin are used as second class bins. Then, aiming at the first class of the sub-boxes or the second class of the sub-boxes, the computer equipment counts the number of the training labels corresponding to each characteristic of the current class of the sub-boxes to obtain the total number of the labels corresponding to the first class of the sub-boxes and the second class of the sub-boxes, and calculates the label proportion based on the total number of the labels corresponding to the first class of the sub-boxes and the second class of the sub-boxes to obtain the label proportion corresponding to the first class of the sub-boxes and the second class of the sub-boxes. For example, the total number of labels corresponding to the first type of boxes is 5, the total number of labels corresponding to the second type of boxes is 10, then the label proportion corresponding to the first type of boxes is 5/15, and the label proportion corresponding to the second type of boxes is 10/15. Furthermore, aiming at the first class of the sub-boxes or the second class of the sub-boxes, the computer equipment counts the number of the labels corresponding to the training labels in the current class of the sub-boxes, calculates the label distribution coefficient based on the number of the labels corresponding to the training labels in the current class of the sub-boxes and the total number of the labels corresponding to the current class of the sub-boxes, and obtains the label distribution coefficient corresponding to the first class of the sub-boxes and the second class of the sub-boxes respectively. And finally, calculating a first splitting coefficient corresponding to the reference characteristic bin based on the label proportion and the label distribution coefficient corresponding to the first class bin and the second class bin. For example, the label proportion and the label distribution coefficient corresponding to the same type of sub-boxes are fused to obtain an initial fusion result corresponding to the first type of sub-boxes and the second type of sub-boxes, and then the initial fusion result corresponding to the first type of sub-boxes and the second type of sub-boxes is fused to obtain the first splitting coefficient.

And taking the next feature bin in the current feature bin set as the current feature bin, repeating the steps to calculate the corresponding first splitting coefficient, and the like, and finally, calculating to obtain the first splitting coefficient corresponding to each feature bin in the current feature bin set.

In one embodiment, the first split coefficient may be calculated by the following formula:

wherein Gini (D, a) represents a first splitting coefficient corresponding to the feature bin a. D (D) ₁ Represents a first class of sub-boxes D ₂ Representing a second class of bins, D representing a set of feature bins corresponding to a feature class. D can be divided into D according to the characteristic bin A ₁ And D ₂ 。Representing the label proportion corresponding to the first class bin, < ->And the label proportion corresponding to the second class of sub-boxes is represented. Gini (D) ₁ ) Representing the label distribution coefficient corresponding to the first class bin, gini (D ₂ ) And representing the label distribution coefficient corresponding to the second class bin. P is p _k Represented at D _i In the above, for example, the first class packet includes 100 features, where the training tags corresponding to 40 features are first tags, that is, the total number of tags in the first class packet is 100, the number of tags corresponding to the first tags is 40, and the tag ratio corresponding to the first tags is 40/100=0.4.

In the above embodiment, the first splitting coefficient may be quickly calculated based on the label proportion and the label distribution coefficient corresponding to the first class bin and the second class bin, where the first splitting coefficient may represent the distinguishing capability of the corresponding splitting point for different objects and different labels.

In one embodiment, determining the second splitting coefficient based on each first splitting coefficient corresponding to the same candidate feature class, to obtain the second splitting coefficient corresponding to each candidate feature class, and determining the target feature class from each candidate feature class based on the second splitting coefficient includes:

acquiring a first splitting coefficient with the smallest value from each first splitting coefficient corresponding to the same candidate feature class as a second splitting coefficient; and acquiring a candidate feature class corresponding to the second splitting coefficient with the smallest value as a target feature class.

Specifically, the smaller the first splitting coefficient is, the higher the discrimination degree of the corresponding splitting mode to different objects is, so that the computer equipment can obtain the first splitting coefficient with the smallest value from the first splitting coefficients corresponding to the same candidate feature class as the second splitting coefficient, and obtain the second splitting coefficient corresponding to each class respectively. Similarly, the smaller the second splitting coefficient is, the higher the feature importance is, so that the computer equipment can obtain the candidate feature class corresponding to the second splitting coefficient with the smallest value as the target feature class, and the optimal splitting feature is obtained.

In one embodiment, generating a decision tree node based on a target feature class includes:

acquiring a feature box corresponding to a first splitting coefficient with the smallest value from each first splitting coefficient corresponding to the target feature class as a target feature box; decision tree nodes are generated based on target feature binning.

In particular, the computer device may determine an optimal split point from a plurality of split points corresponding to the target feature class, generate a decision tree node based on the optimal split point. Firstly, from each first splitting coefficient corresponding to the target feature class, the computer equipment can acquire a feature box corresponding to the first splitting coefficient with the smallest value as a target feature box, and determine an optimal splitting point based on the target feature box to generate a decision tree node. The optimal split point may be any one of a range of feature values corresponding to the target feature bin, for example, a median value.

In the above embodiment, the feature bins corresponding to the first splitting coefficients with the smallest values are obtained from the first splitting coefficients corresponding to the target feature classes and used as the target feature bins, the decision tree nodes are generated based on the target feature bins, and the decision tree nodes are generated based on the optimal splitting points of the optimal splitting features, so that the convergence speed of the model can be increased.

In one embodiment, the training samples include a positive training sample and a negative training sample, the positive training sample corresponding to an operation authority that is less than an operation authority corresponding to the negative training sample. As shown in fig. 3, based on training labels and initial prediction labels corresponding to respective training samples in the same training sample subset, adjusting corresponding initial decision trees until a first convergence condition is satisfied, to obtain target decision trees corresponding to respective training sample subsets, including:

step S302, based on training labels and initial prediction labels corresponding to all training samples of the current training sample subset, adjusting an initial decision tree corresponding to the current training sample subset until a first convergence condition is met, and obtaining an intermediate decision tree corresponding to the current training sample subset.

In particular, the computer device may also optimize the training samples in order to further enhance the training effect of the model. Training the initial decision tree based on an original positive training sample and a negative training sample to obtain an intermediate decision tree, and then dividing the training sample predicted as a positive label by the intermediate decision tree in the negative training sample into the positive training sample, and continuing training the intermediate decision tree to finally obtain the target decision tree. Thus, the positive sample is added, and the problem of sample imbalance can be solved.

Therefore, the computer equipment can train the initial decision tree based on each training sample of the current training sample subset, and when training, the initial decision tree corresponding to the current training sample subset is adjusted based on the training label and the initial prediction label corresponding to each training sample until the first convergence condition is met, so as to obtain the intermediate decision tree corresponding to the current training sample subset.

Step S304, inputting each training sample in the current training sample subset into an intermediate decision tree to obtain an intermediate prediction label corresponding to each training sample.

Step S306, the training label corresponding to the negative training sample with the middle predictive label being the positive label is updated to be the positive label.

Step S308, based on the intermediate prediction labels corresponding to the training samples and the updated training labels, adjusting the intermediate decision tree until the second convergence condition is met, and obtaining a target decision tree corresponding to the current training sample subset.

Specifically, the computer equipment inputs each training sample in the current training sample subset into an intermediate decision tree to obtain intermediate prediction labels corresponding to each training sample, updates the training labels corresponding to negative training samples with the intermediate prediction labels being positive labels to positive labels, and trains the intermediate decision tree based on the new positive training samples and the new negative training samples to obtain a target decision tree. And when training the intermediate decision tree, carrying out counter propagation update based on the difference between the intermediate prediction label corresponding to each training sample and the updated training label, adjusting the decision tree parameters of the intermediate decision tree, and carrying out iterative training until a second convergence condition is met, so as to obtain a target decision tree corresponding to the current training sample subset.

The second convergence condition may be at least one of that a difference between the intermediate prediction result and the updated training label is smaller than a preset difference, that the iteration number is smaller than a preset number, and the like.

In one embodiment, the model performance obtained by training based on the positive training sample optimization scheme is verified through experiments, and the experimental results are as follows:

TABLE 8

AUC values	0.869
		Accuracy rate of	0.690
Recall rate of recall	0.348
		F1 value	0.461
Rate of false positive	8.71％
		Simulated delivery of minor proportions	3.16％

Wherein the positive samples include real-name authenticated underadults and real-name authenticated adults predicted by the intermediate decision tree to be underadults, and the negative samples include real-name authenticated adults. Compared with table 1, the training effect of table 8 is better, the minor authentication proportion is increased by 34.8%, and the false positive rate is reduced by 6.5%. A real-name authenticated adult predicted by the intermediate decision tree as a minor may be considered a minor operating in the target application using an adult registration account number.

In the above embodiment, the initial decision tree is trained based on the original positive training sample and the negative training sample to obtain the intermediate decision tree, and then the training sample predicted as the positive label by the intermediate decision tree in the negative training sample is drawn to the positive training sample, and the intermediate decision tree is continuously trained to obtain the target decision tree. The target decision tree trained in this way can identify suspicious objects that operate in the target application using the account registered by the normal object.

In one embodiment, as shown in fig. 4, an object classification method is provided, and the method is applied to a computer device for illustration, where the computer device may be the terminal 102 or the server 104 in fig. 1. Referring to fig. 4, the object classification method includes the steps of:

step S402, obtaining target characteristics corresponding to the target object, wherein the target characteristics are obtained based on operation data of the target object in the target application.

Step S404, inputting the target characteristics into the object classification model to obtain target prediction labels corresponding to the target objects; the target prediction label is used for determining the operation authority of the target object in the target application, and the target prediction label is obtained based on the prediction results of all target decision trees in the object classification model.

The training process of the object classification model comprises the following steps: acquiring a plurality of training sample subsets; each training sample subset is obtained by randomly sampling the same training sample set; constructing decision trees based on the training sample subsets to obtain initial decision trees respectively corresponding to the training sample subsets; the decision tree nodes of the initial decision tree are determined based on randomly selected feature categories from feature categories corresponding to each training feature contained in the training sample; inputting training samples in the training sample subset into corresponding initial decision trees to obtain initial prediction labels corresponding to the training samples; based on training labels and initial prediction labels corresponding to all training samples in the same training sample subset, adjusting corresponding initial decision trees until a first convergence condition is met, and obtaining target decision trees corresponding to all training sample subsets respectively; an object classification model is generated based on each of the target decision trees.

It will be appreciated that the specific training process of the object classification model may refer to the content of the foregoing related embodiments, and will not be described herein. Similar to the training features, the target features are derived based on operational data of the target object in the target application.

In particular, the computer device may classify the target object based on the trained object classification model to determine the operational rights of the target object in the target application. The computer equipment can acquire target features corresponding to the target objects, the target features are input into an object classification model, the target features are input into all target decision trees, through data processing of the decision trees, all the target decision trees respectively output prediction results, and the object classification model finally outputs target prediction labels corresponding to the target objects based on the prediction results of all the target decision trees.

In the object classification method, the object classification model outputs the target prediction label corresponding to the target object based on the prediction result of each target decision tree by inputting the target feature corresponding to the target object into the object classification model generated by the plurality of target decision trees. The object is classified based on the object classification model, so that the classification accuracy and efficiency of the object can be improved. And the object classification model comprises a plurality of target decision trees, different training sample subsets can be obtained by randomly sampling the same training sample set, different target decision trees can be obtained by training based on the different training sample subsets through random feature selection, and the object classification model can output more accurate prediction labels based on the prediction results of the different target decision trees, so that the classification accuracy of the objects is further improved.

In one embodiment, inputting the target feature into the object classification model to obtain a target prediction tag corresponding to the target object, including:

inputting the target features into each target decision tree respectively to obtain prediction results corresponding to each target decision tree respectively; respectively carrying out normalization processing on each prediction result to obtain each normalization result; and carrying out statistical analysis on each normalization result to obtain a target prediction label.

The normalization process normalizes the prediction results to a preset range so as to distinguish the prediction results corresponding to different types of objects.

Specifically, when the object classification model is applied, the computer device may input the object feature corresponding to the object into the object classification model, the object feature is input into each object decision tree, the each object decision tree outputs the prediction result respectively through the data processing of the decision tree, and then normalizes each prediction result, for example, normalizes the prediction result through a softmax function, so as to obtain each normalization result, and finally performs statistical analysis on each normalization result, so as to obtain the object prediction label. For example, calculating an average value of each normalization result, and obtaining a target prediction label based on the average value; and determining the prediction labels corresponding to the normalization results, and selecting the label type with the largest number from the prediction labels as the target prediction label.

In one embodiment, the prediction results are normalized by the following formula:

wherein p is _i The normalized result corresponding to the i-th target decision tree is represented and may also be referred to as a predictive probability. exp represents a sigmoid function. f (f) _i (x) Representing the predicted result of the ith target decision tree. f (f) _k (x) Representing the prediction result of the kth target decision tree, K representing the number of target decision trees.

In the above embodiment, the target features are respectively input into each target decision tree to obtain the prediction results respectively corresponding to each target decision tree, and normalization processing is respectively performed on each prediction result to obtain each normalization result; and carrying out statistical analysis on each normalization result to obtain an accurate target prediction label.

In one embodiment, the method further comprises:

when the target prediction tag is a target tag, generating an object authentication request, and sending the object authentication request to a target terminal corresponding to the target object; acquiring authentication information returned by the target terminal according to the object authentication request, and determining an authentication result of the target object based on the authentication information; and when the authentication result is that the authentication fails, limiting the operation of the target object in the target application.

The target tag is used to indicate that the model predicts that an object belongs to a target type object, for example, the target tag may be a positive tag. The operation authority of the object type in the object application is limited. The object authentication request is used to determine the identity of the target object to further determine whether the target object is a target type object.

Specifically, in order to improve the classification accuracy of the target object, the computer device may comprehensively determine the operation authority of the target object in the target application based on the target prediction tag and the authentication information. If it is determined based on the object classification model that the target prediction tag corresponding to the target object is not the target tag, it may be considered that the target object has a low possibility of belonging to the target type object, and at this time, the operation of the target object in the target application may be directly not limited. If the target prediction tag corresponding to the target object is determined to be the target tag based on the object classification model, it may be considered that the target object has a high possibility of belonging to the target type object, and in order to further determine whether the target object is the target type object, the computer device may generate an object authentication request and send the object authentication request to the target terminal corresponding to the target object. The target terminal may collect personal information of the target object as authentication information according to the object authentication request, for example, collect at least one data of face information, certificate information, communication information, and the like of the target object as authentication information, and feed back the authentication information to the computer device. The computer device may determine identity information of the target object based on the authentication information to obtain an authentication result of the target object, e.g., match the authentication information with the registration information to obtain the authentication result. If the authentication result indicates that the target object is not the target type object, the authentication result is determined to be authentication passing, and if the authentication result indicates that the target object is the target type object, the authentication result is determined to be authentication failing. Therefore, if the authentication result is that the authentication is not passed, it is determined that the target object is the target type object, and it is necessary to limit the operation of the target object in the target application, for example, to limit the operation duration of the target object in the target application, to limit the virtual resource transfer share (for example, use of a game coin, recharging, etc.) of the target object in the target application, and so on.

In one embodiment, the authentication information may be face information, and real-name authentication is performed on the target object based on the face information, so as to determine whether the target object is a minor. If the face information is not matched with the real-name identity information in the account registration information, the authentication result is determined to be failed, and if the face information is not matched with the real-name identity information in the account registration information, the authentication result is determined to be failed, the target object can be considered as a minor, the minor uses the account of the adult in the target application, and the operation of the minor in the target application needs to be limited.

In one embodiment, the predictive label may be represented by a probability. If the predicted tag is greater than the preset probability, determining the predicted tag as the target tag. The preset probability can be set according to actual needs, for example, set to 0.5.

In the above embodiment, whether the target object is the target type object is primarily determined based on the object classification model, if the primary determination result indicates that the target object is not the target type object, the subsequent data processing is not needed, so that resources are effectively saved, and if the primary determination result indicates that the target object is the target type object, whether the target object is the target type object is further determined according to the authentication information of the target object. If the authentication result shows that the target object is the target type object again, the target object is finally determined to be the target type object, and further the operation of the target object in the target application is limited. The classification accuracy of the object can be further improved through the dual operations of model prediction and information authentication.

In a specific embodiment, the object classification method of the present application may be applied in gaming applications. The operation of underage in gaming applications is limited according to national issued underage protection policies, e.g., underage may play only for 1 hour on friday-sunday or legal holiday 20-21 days per day, and no other period of time may play. Referring to fig. 5, in the object classification method of the present application, whether a game player playing a game in a specific period of time (e.g., holiday) is a suspicious minors is identified by a high-risk model (i.e., object classification model), if the identification result indicates that the game player is not a suspicious minor, the game player is not restricted from playing, and if the identification result indicates that the game player is a suspicious minor, a face popup is performed on the suspicious minors in a non-game period, and real-name authentication is performed on information acquired through the face popup. If the real name authentication is passed, the game player is considered to be an adult, the game play of the game player is not limited, and if the real name authentication is not passed, the game player is considered to be a minor, and the game play of the game player is limited.

Referring to fig. 6, an integrated model for finding suspected underage players is trained based on positive and negative samples and player characteristics corresponding to each sample, and the resulting trained integrated model may be referred to as a high-risk model. And inputting the player characteristics of the player to be identified into a high-risk model, wherein the high-risk model predicts the probability that the player to be identified belongs to the suspicious underage player, and if the predicted probability is greater than or equal to 0.5, determining that the player to be identified is the suspicious underage player.

Training data of model: the player characteristics are derived from game report logs, and specifically include basic attribute data (i.e., player portrait attributes), game play data, duration data, suspicious device data, underage friend data, registration and game class data. The basic attribute data includes age before real name, sex before real name, terminal before real name, model before real name, network before real name, regional classification before real name (first line, second line, third line city, etc.), operating system before real name, etc. The office data comprises a working day average time length, a weekend and holiday average time length, a working day time period time length, a weekend and holiday time period time length, a daily morning/noon/afternoon/evening/night/late night time length duty ratio and the like. The time length data comprises a working day time average time length, a weekend and holiday day time average time length, a working day time period time length, a weekend and holiday time period time length, a daily morning/noon/afternoon/evening/night/late night time length duty ratio and the like. The suspicious device data includes the number of devices logged in from the beginning of the history, the number of natural people logged in from the beginning of the history, the number of account numbers logged in from the history of the logged-in devices, the number of account numbers tied in from the history of the natural people, whether to log in to the suspicious device (refer to the device with the underage login), and the like. The underage buddy data includes the number of underage buddies, the proportion of underage buddies, the sum of the underage buddies' affinities, etc. The registration and game class data comprise the year of the book, the earliest registration year, the number of active days of end game, the active time of end game, the number of active days of hand game, the active time of hand game and the like. Referring to FIG. 7, features may be mined from various dimensions from which player features for model training are determined.

Training process of model: referring to fig. 8, during training, a training sample set is acquired, randomly sampled, and a plurality of training sample subsets, D1, D2, and Di, respectively, are generated. Based on the training sample subsets, respectively performing classification training to obtain target decision trees corresponding to the training sample subsets, wherein the target decision trees can also be called classifiers, and are C1, C2, ci. Each of the target decision trees constitutes an object classification model, which may also be referred to as a strong classifier. When the method is applied, the target features corresponding to the target objects are input into an object classification model, the object classification model votes the prediction results of all the target decision trees, and the prediction results with more votes are used as target prediction labels.

Expression of high-risk model:

wherein x represents a feature variable of the input modelI.e. training samples, f _m (x) Representing the target decision tree, i.e. the model function calculated for x, M representing the number of decision trees.

Various important parameters adopted in model training are as follows: random number seed (starting point of iteration needs to be specified when the algorithm performs multiple iterations): 34; number of decision trees: 100; purity (indicating the likelihood that a randomly selected sample will be correctly split in the model): a coefficient of kunning; maximum tree depth (representing the distance between the leaf node and the root node, maximum tree depth being the critical point at which decision tree iteration is stopped, decision tree will stop splitting when the decision tree depth reaches maximum tree depth): 8, 8; feature maximum bin count: 32; verification set scale (when constructing a model, the data set is divided into a training set and a verification set, the training set data is used for constructing the model, and the verification set data is used for checking the accuracy of the model): 20%.

Model training samples: the positive sample may employ at least one of a real-name authenticated minor, a real-name authenticated adult predicted by the intermediate decision tree to be a minor; the negative sample may be at least one of a real-name authenticated adult, a parent adult with a long daemon mark, an active player in the last month, an active player in the end, and an earlier-registered year player.

The validation effect on the high-risk model based on the validation set is shown in table 9.

TABLE 9

Further, the effects of the high-risk models after being put into line are shown in table 10.

Table 10

Referring to table 10, in the high-risk model, in the face popup effect of the object put in the 12 months 1 to 7 days, the underage authentication ratio of each day is 7.6%, the ratio is 6.61 times of the whole large disc, and the popup interception rate is 1.14 times of the whole large disc. Here, the bullet window interception rate= (the number of objects passing through no face in the bullet window/the number of bullet window objects) ×100%.

In the above embodiment, the features of the basic attribute data, the office data, the duration data, the suspicious device data, the minor friend data, the registration and game class data and the like are combined, modeling is performed by using an integrated model algorithm, and model training parameters and training samples are optimized, so that the integrated model is obtained through training. The integrated model can accurately and efficiently identify which underage game players are among players playing games, thereby limiting the underage players from playing games and preventing underage addiction.

It will be appreciated that the object classification method of the present application may also be applied in other applications, such as video applications, e-commerce applications, etc.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides an object classification device for realizing the above related object classification method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the object classification device or object classification devices provided below may be referred to the limitation of the object classification method hereinabove, and will not be repeated here.

In one embodiment, as shown in fig. 9, there is provided an object classification apparatus 900 comprising: a training set acquisition module 902, an initial decision tree construction module 904, a decision tree prediction module 906, a target decision tree generation module 908, and a model generation module 910, wherein:

a training set acquisition module 902, configured to acquire a plurality of training sample subsets; each training sample subset is obtained by randomly sampling the same training sample set, the training sample subset comprises training samples and training labels corresponding to the training samples, the training samples are obtained based on operation data of a training object in a target application, and the training labels are used for determining operation authority of the training object in the target application.

An initial decision tree construction module 904, configured to construct a decision tree based on the training sample subsets, to obtain initial decision trees corresponding to the training sample subsets respectively; the decision tree nodes of the initial decision tree are determined from feature categories corresponding to each training feature contained in the training sample based on randomly selected feature categories.

The decision tree prediction module 906 is configured to input training samples in the training sample subset into corresponding initial decision trees, and obtain initial prediction labels corresponding to the training samples.

The target decision tree generating module 908 is configured to adjust the corresponding initial decision tree based on the training label and the initial prediction label corresponding to each training sample in the same training sample subset until the first convergence condition is satisfied, thereby obtaining target decision trees corresponding to each training sample subset.

A model generation module 910 for generating an object classification model based on each of the target decision trees; the object classification model is used for inputting the target characteristics corresponding to the target objects into each target decision tree, and obtaining target prediction labels corresponding to the target objects based on the prediction results of each target decision tree.

According to the object classification device, the object classification model is obtained through training based on the training sample set, and the object is classified based on the object classification model, so that the accuracy and the efficiency of object classification can be improved. And the object classification model comprises a plurality of target decision trees, different training sample subsets can be obtained by randomly sampling the same training sample set, different target decision trees can be obtained by training based on the different training sample subsets through random feature selection, and the object classification model can output more accurate prediction labels based on the prediction results of the different target decision trees, so that the classification accuracy of the objects is further improved.

In one embodiment, the training set obtaining module is further configured to perform feature extraction on operation data of the current training object in the target application, to obtain a plurality of initial operation features; determining a plurality of target operating characteristics from the respective initial operating characteristics; performing feature intersection on each target operation feature to obtain an intersection operation feature; and obtaining a training sample corresponding to the current training object based on the initial operation characteristic and the cross operation characteristic.

In one embodiment, the initial decision tree construction module is further configured to randomly determine a plurality of candidate feature categories from each of the current feature categories; in the current training sample subset, calculating first splitting coefficients corresponding to feature boxes based on feature box sets corresponding to candidate feature categories, and obtaining a plurality of first splitting coefficients respectively corresponding to each candidate feature category; determining a second splitting coefficient based on each first splitting coefficient corresponding to the same candidate feature class, obtaining second splitting coefficients corresponding to each candidate feature class, and determining a target feature class from each candidate feature class based on the second splitting coefficients; generating decision tree nodes based on the target feature class; updating the current feature category based on the target feature category, and returning to execute the step of randomly determining a plurality of candidate feature categories from each current feature category until a preset condition is met, so as to obtain a plurality of decision tree nodes; an initial decision tree is generated based on the individual decision tree nodes.

In one embodiment, the initial decision tree construction module is further configured to determine a current feature sub-box from the feature sub-boxes corresponding to the current feature sub-box set, and divide the feature sub-boxes corresponding to the current feature sub-box set into a first class sub-box and a second class sub-box based on the current feature sub-box; obtaining the label proportion corresponding to the first type sub-box and the second type sub-box based on the total number of labels corresponding to the first type sub-box and the second type sub-box respectively; based on the number of the labels and the total number of the labels corresponding to the training labels in the same class of sub-boxes, obtaining the label distribution coefficients respectively corresponding to the first class sub-boxes and the second class sub-boxes; and obtaining a first splitting coefficient corresponding to the candidate feature class corresponding to the current feature sub-box based on the label proportion and the label distribution coefficient corresponding to the first class sub-box and the second class sub-box.

In one embodiment, the initial decision tree construction module is further configured to obtain, from each first split coefficient corresponding to the same candidate feature class, a first split coefficient with a minimum value as a second split coefficient; and acquiring a candidate feature class corresponding to the second splitting coefficient with the smallest value as a target feature class.

In one embodiment, the initial decision tree construction module is further configured to obtain, from each first split coefficient corresponding to the target feature class, a feature bin corresponding to the first split coefficient with the smallest value as the target feature bin; decision tree nodes are generated based on target feature binning.

In one embodiment, the training samples include a positive training sample and a negative training sample, the positive training sample corresponding to an operation authority that is less than an operation authority corresponding to the negative training sample. The model generation module is further used for adjusting an initial decision tree corresponding to the current training sample subset based on the training labels and the initial prediction labels corresponding to the training samples of the current training sample subset until a first convergence condition is met, and obtaining an intermediate decision tree corresponding to the current training sample subset; inputting each training sample in the current training sample subset into an intermediate decision tree to obtain an intermediate prediction label corresponding to each training sample; updating the training label corresponding to the negative training sample with the middle predictive label being the positive label to be the positive label; and adjusting the intermediate decision tree based on the intermediate prediction labels corresponding to the training samples and the updated training labels until the second convergence condition is met, so as to obtain a target decision tree corresponding to the current training sample subset.

In one embodiment, as shown in fig. 10, there is provided an object classification apparatus 1000 comprising: a data acquisition module 1002 and a tag prediction module 1004, wherein:

the data obtaining module 1002 is configured to obtain a target feature corresponding to the target object, where the target feature is obtained based on operation data of the target object in the target application.

The tag prediction module 1004 is configured to input the target feature into the object classification model to obtain a target prediction tag corresponding to the target object; the target prediction label is used for determining the operation authority of the target object in the target application, and the target prediction label is obtained based on the prediction results of all target decision trees in the object classification model.

acquiring a plurality of training sample subsets; each training sample subset is obtained by randomly sampling the same training sample set; constructing decision trees based on the training sample subsets to obtain initial decision trees respectively corresponding to the training sample subsets; the decision tree nodes of the initial decision tree are determined based on randomly selected feature categories from feature categories corresponding to each training feature contained in the training sample; inputting training samples in the training sample subset into corresponding initial decision trees to obtain initial prediction labels corresponding to the training samples; based on training labels and initial prediction labels corresponding to all training samples in the same training sample subset, adjusting corresponding initial decision trees until a first convergence condition is met, and obtaining target decision trees corresponding to all training sample subsets respectively; an object classification model is generated based on each of the target decision trees.

According to the object classification device, the object is classified based on the object classification model, so that the accuracy and the efficiency of object classification can be improved. And the object classification model comprises a plurality of target decision trees, different training sample subsets can be obtained by randomly sampling the same training sample set, different target decision trees can be obtained by training based on the different training sample subsets through random feature selection, and the object classification model can output more accurate prediction labels based on the prediction results of the different target decision trees, so that the classification accuracy of the objects is further improved.

In one embodiment, the tag prediction module is further configured to input the target features into each target decision tree, so as to obtain prediction results corresponding to each target decision tree; respectively carrying out normalization processing on each prediction result to obtain each normalization result; and carrying out statistical analysis on each normalization result to obtain a target prediction label.

In one embodiment, as shown in fig. 11, the object classification apparatus 1000 further includes:

the object authentication model 1006 is configured to generate an object authentication request when the target prediction tag is a target tag, and send the object authentication request to a target terminal corresponding to the target object; acquiring authentication information returned by the target terminal according to the object authentication request, and determining an authentication result of the target object based on the authentication information; and when the authentication result is that the authentication fails, limiting the operation of the target object in the target application.

The respective modules in the above-described object classification apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 12. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing training samples, object classification models and other data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an object classification method.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 13. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an object classification method. The display unit of the computer equipment is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device, wherein the display screen can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on a shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by persons skilled in the art that the structures shown in fig. 12 and 13 are merely block diagrams of portions of structures associated with aspects of the present application and are not intended to limit the computer apparatus to which aspects of the present application may be applied, and that a particular computer apparatus may include more or less components than those shown, or may combine some of the components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, storing a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the steps in the above-described method embodiments.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. An object classification method, the method comprising:

2. The method of claim 1, wherein the training samples comprise a positive training sample and a negative training sample, the operation authority corresponding to the positive training sample is smaller than the operation authority corresponding to the negative training sample, the training object corresponding to the negative training sample comprises at least one of a current time active object, a target time state object, a target platform active object and a target time registration object, the current time active object refers to an object with the activity degree in the target application being greater than a first preset threshold value in a current time period, the target time state object refers to an object with the time state being a target state, the target platform active object refers to an object with the activity degree in the target application being greater than a second preset threshold value in a target operation platform, and the target time registration object refers to an object with the account registration time in the target application being earlier than the target time.

3. The method of claim 1, wherein the training features comprise at least one of object attribute features, operational interaction features, operational duration features, device login features, registration time features, run platform features, target association features between the training object and target associated objects, the target associated objects being associated objects having target operational rights among respective associated objects of the training object.

4. The method according to claim 1, wherein the training sample acquisition process comprises the steps of:

extracting characteristics of operation data of the current training object in the target application to obtain a plurality of initial operation characteristics;

determining a plurality of target operating characteristics from the respective initial operating characteristics;

performing feature intersection on each target operation feature to obtain an intersection operation feature;

and obtaining a training sample corresponding to the current training object based on the initial operation characteristic and the cross operation characteristic.

5. The method according to claim 1, wherein constructing a decision tree based on the training sample subsets, to obtain initial decision trees respectively corresponding to the training sample subsets, comprises:

Randomly determining a plurality of candidate feature classes from each current feature class;

in the current training sample subset, calculating first splitting coefficients corresponding to feature boxes based on feature box sets corresponding to candidate feature categories, and obtaining a plurality of first splitting coefficients respectively corresponding to each candidate feature category;

determining a second splitting coefficient based on each first splitting coefficient corresponding to the same candidate feature class, obtaining second splitting coefficients corresponding to each candidate feature class, and determining a target feature class from each candidate feature class based on the second splitting coefficient;

generating decision tree nodes based on the target feature categories;

updating the current feature category based on the target feature category, returning to the step of randomly determining a plurality of candidate feature categories from each current feature category, and executing until a preset condition is met, so as to obtain a plurality of decision tree nodes;

the initial decision tree is generated based on the individual decision tree nodes.

6. The method of claim 5, wherein in the current training sample subset, calculating the first split coefficients corresponding to the feature bins based on the feature bin set corresponding to the candidate feature classes, to obtain a plurality of first split coefficients corresponding to the candidate feature classes, respectively, includes:

Determining a current feature sub-box from all feature sub-boxes corresponding to a current feature sub-box set, and dividing all feature sub-boxes corresponding to the current feature sub-box set into a first class sub-box and a second class sub-box based on the current feature sub-box;

based on the total number of the labels respectively corresponding to the first type of sub-boxes and the second type of sub-boxes, obtaining the label proportion respectively corresponding to the first type of sub-boxes and the second type of sub-boxes;

obtaining label distribution coefficients respectively corresponding to the first class sub-boxes and the second class sub-boxes based on the number of labels and the total number of labels corresponding to various training labels in the same class sub-boxes;

and obtaining a first splitting coefficient corresponding to the candidate feature class corresponding to the current feature sub-box based on the label proportion and the label distribution coefficient corresponding to the first class sub-box and the second class sub-box.

7. The method of claim 5, wherein determining the second split coefficients based on the first split coefficients corresponding to the same candidate feature class, to obtain the second split coefficients corresponding to the candidate feature classes, and determining the target feature class from the candidate feature classes based on the second split coefficients, comprises:

Acquiring a first splitting coefficient with the smallest value from each first splitting coefficient corresponding to the same candidate feature class as a second splitting coefficient;

and acquiring a candidate feature class corresponding to the second splitting coefficient with the minimum value as the target feature class.

8. The method of claim 5, wherein generating decision tree nodes based on the target feature class comprises:

acquiring a feature box corresponding to the first splitting coefficient with the smallest value from each first splitting coefficient corresponding to the target feature class as a target feature box;

and generating decision tree nodes based on the target characteristic bin.

9. The method according to any one of claims 1 to 8, wherein the training samples comprise a positive training sample and a negative training sample, and the operation authority corresponding to the positive training sample is smaller than the operation authority corresponding to the negative training sample;

the step of adjusting the corresponding initial decision tree based on the training label and the initial prediction label corresponding to each training sample in the same training sample subset until the first convergence condition is satisfied, to obtain the target decision tree corresponding to each training sample subset, includes:

Based on training labels and initial prediction labels corresponding to all training samples of a current training sample subset, adjusting an initial decision tree corresponding to the current training sample subset until a first convergence condition is met, and obtaining an intermediate decision tree corresponding to the current training sample subset;

inputting each training sample in the current training sample subset into the intermediate decision tree to obtain an intermediate prediction label corresponding to each training sample;

updating the training label corresponding to the negative training sample with the middle predictive label being the positive label to be the positive label;

and adjusting the intermediate decision tree based on the intermediate prediction labels corresponding to the training samples and the updated training labels until a second convergence condition is met, so as to obtain a target decision tree corresponding to the current training sample subset.

10. An object classification method, the method comprising:

acquiring a plurality of training sample subsets; each training sample subset is obtained by randomly sampling the same training sample set;

an object classification model is generated based on each of the target decision trees.

11. The method of claim 10, wherein inputting the target feature into an object classification model to obtain a target prediction tag corresponding to the target object comprises:

Inputting the target features into each target decision tree respectively to obtain prediction results corresponding to each target decision tree respectively;

respectively carrying out normalization processing on each prediction result to obtain each normalization result;

and carrying out statistical analysis on each normalization result to obtain the target prediction label.

12. The method according to claim 10, wherein the method further comprises:

when the target prediction tag is a target tag, generating an object authentication request, and sending the object authentication request to a target terminal corresponding to the target object;

acquiring authentication information returned by the target terminal according to the object authentication request, and determining an authentication result of the target object based on the authentication information;

and when the authentication result is that the authentication is not passed, limiting the operation of the target object in the target application.

13. An object classification apparatus, the apparatus comprising:

14. An object classification apparatus, the apparatus comprising:

15. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 9 or 10 to 12 when the computer program is executed.

16. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 9 or 10 to 12.

17. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 9 or 10 to 12.