CN113379301A - Method, device and equipment for classifying users through decision tree model - Google Patents

Method, device and equipment for classifying users through decision tree model Download PDF

Info

Publication number
CN113379301A
CN113379301A CN202110728460.6A CN202110728460A CN113379301A CN 113379301 A CN113379301 A CN 113379301A CN 202110728460 A CN202110728460 A CN 202110728460A CN 113379301 A CN113379301 A CN 113379301A
Authority
CN
China
Prior art keywords
attribute
decision tree
tree model
user
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110728460.6A
Other languages
Chinese (zh)
Inventor
梁炀潇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weikun Shanghai Technology Service Co Ltd
Original Assignee
Weikun Shanghai Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weikun Shanghai Technology Service Co Ltd filed Critical Weikun Shanghai Technology Service Co Ltd
Priority to CN202110728460.6A priority Critical patent/CN113379301A/en
Priority to PCT/CN2021/108779 priority patent/WO2023272852A1/en
Publication of CN113379301A publication Critical patent/CN113379301A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Abstract

The invention discloses a method, a device and equipment for classifying users through a decision tree model, which relate to artificial intelligence and comprise the following steps: obtaining at least one user attribute of a user to be classified and a pre-established decision tree model; acquiring a kini coefficient corresponding to each attribute node of the decision tree model under a current scene, wherein the kini coefficient carries weight information of the attribute node corresponding to the current scene; sequentially determining expense nodes of attribute nodes of the next level from a root node of the decision tree model according to the kini coefficient; selecting the user attributes stored in the attribute nodes of the decision tree model from the user attributes to obtain matching attributes; and matching the obtained matching attribute with each node attribute in the decision tree model, and classifying the users according to the determined expense nodes of each level in the decision tree model.

Description

Method, device and equipment for classifying users through decision tree model
Technical Field
The invention relates to the technical field of intelligent decision making of artificial intelligence, in particular to a method and a device for classifying users through a decision tree model, computer equipment and a storage medium.
Background
The LOOKALIKE model tool is an operation tool widely used in various companies and businesses at present, and the core principle is that users with higher similarity to the group are discovered through a target user group under a specific scene, and the LOOKALIKE model tool is often applied to the popularization of hidden passenger operation and new products/functions. The method system of LOOKALIKE is roughly divided into two types, one type is a clustering method, and users closest to a target group under Cartesian distance are searched directly through indexes in an index library; and the second type is that a target population is used as a positive sample through a supervised classification model, and the positive sample is reversely used for predicting users with high probability as the positive sample by a training set de-circle model after the model is output.
The second method is superior in practical application because the second method tries to optimize the model in the direction with the most obvious characteristics of the target user group, but still has the problem of weak pertinence. For example, in the following scenario, other users who potentially may purchase the product a need to be searched according to the user who has purchased the product a, at this time, the transaction, the active attribute, and the like of the user need to be paid more attention, but the influence of other basic attributes such as the user cannot be completely deleted.
Disclosure of Invention
The embodiment of the invention provides a method, a device, computer equipment and a storage medium for classifying users through a decision tree model, and aims to solve the technical problems of small scene application range and low flexibility of the conventional decision tree model.
A method of classifying a user through a decision tree model, the method comprising:
obtaining at least one user attribute of a user to be classified and a pre-established decision tree model;
acquiring a kini coefficient corresponding to each attribute node of the decision tree model in a current scene, wherein the kini coefficient of at least one attribute node carries weight information of the attribute node corresponding to the current scene;
sequentially determining expense nodes of attribute nodes of the next level from the root node of the decision tree model according to the kini coefficient;
selecting the user attributes stored in the attribute nodes of the decision tree model from the user attributes to obtain matching attributes;
and matching the obtained matching attribute with each node attribute in the decision tree model, and classifying the users according to the determined expense nodes of each level in the decision tree model.
An apparatus for classifying a user through a decision tree model, the apparatus comprising:
the model acquisition module is used for acquiring at least one user attribute of a user to be classified and a pre-established decision tree model;
a coefficient obtaining module, configured to obtain a kini coefficient corresponding to each attribute node of the decision tree model in a current scene, where the kini coefficient of at least one attribute node carries weight information of the attribute node corresponding to the current scene;
a node determining module, configured to sequentially determine, from a root node of the decision tree model, an expense node of a next-level attribute node according to the kini coefficient;
the attribute selection module is used for selecting the user attributes stored in the attribute nodes of the decision tree model from the user attributes to obtain matching attributes;
and the matching module is used for matching the obtained matching attribute with each node attribute in the decision tree model and classifying the user according to the determined expense node of each level in the decision tree model.
A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above method for classifying a user by means of a decision tree model when executing the computer program.
A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the above-mentioned steps of the method of classifying a user by means of a decision tree model.
The invention provides a method, a device, computer equipment and a storage medium for classifying users through a decision tree model, which firstly obtain at least one user attribute of a user to be classified and a pre-created decision tree model, then obtain a kini coefficient corresponding to each attribute node of the decision tree model in a current scene, sequentially determine expenditure nodes of attribute nodes of a next level according to the kini coefficient from a root node of the decision tree model, then select user attributes in the attribute nodes of the decision tree model from the user attributes to obtain matching attributes, finally match the obtained matching attributes with the attributes of each node in the decision tree model, classify the user according to the expenditure nodes of each level determined in the decision tree model, and because the kini coefficient of at least one attribute node carries weight information of the attribute node corresponding to the current scene, the influence of the attribute node on the classification result in the current scene can be reflected by presetting the weight of the corresponding attribute node in the current scene, so that the decision tree model has better scene effect when classifying the user to be classified, the scene application range of the decision tree model is improved, and the method for classifying the user through the decision tree model has better flexibility.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a diagram illustrating an application environment of a method for classifying users through a decision tree model according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for classifying users through a decision tree model in one embodiment of the present invention;
FIG. 3 is a flowchart of a further implementation of step S105 in FIG. 2 according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an apparatus for classifying users through a decision tree model according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a computer device according to an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method for classifying users through a decision tree model provided by the present application can be applied in an application environment as shown in fig. 1, wherein the computer device can communicate with a server through a network. Wherein the computer device may be, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers.
In an embodiment, as shown in fig. 2, a method for classifying users through a decision tree model is provided, which is described by taking the computer device in fig. 1 as an example, and includes the following steps S101 to S105.
S101, obtaining at least one user attribute of a user to be classified and a pre-established decision tree model.
In one embodiment, the user attributes include, but are not limited to, user gender, age, assets, activity metrics, historical transaction metrics, and the like. It can be understood that, in order to make the classification result of the user to be classified more accurate, all the user attributes existing in the user to be classified are preferentially acquired.
In one embodiment, the step of creating the decision tree model includes the following steps S201 to S205.
S201, obtaining a current scene and a user sample in the current scene, wherein the user sample carries an identifier of a positive label or a negative label.
In one embodiment, the current scene may be divided according to types of products purchased by the user, where the types of products may be cars with an absolute value of a difference within a preset range, and may also be different types of insurance. Further, a positive label user pattern represents users who purchased the corresponding product, and a negative label user pattern represents users who did not purchase the corresponding product.
S202, obtaining sample characteristics of the positive label user sample and sample attributes to which the sample characteristics belong.
It is understood that the sample feature belongs to attributes such as a gender attribute for the sample feature, an asset attribute for the sample feature annual income, and the like.
And S203, calculating the kini coefficient of each sample attribute.
In one embodiment, the kini coefficient for each of the sample attributes is calculated by the following formula:
Figure BDA0003139393200000051
Figure BDA0003139393200000052
wherein N isiRepresenting the sample attribute to which the sample characteristic A belongs, D representing the positive label user sample, tiRepresenting the weight of corresponding sample attributes preset according to the current scene, | D | representing the total number of the user samples, | D |kL represents the number of samples of the positive label user sample in the kth sample attribute, and l CkAnd | represents the number of samples of the kth sample attribute.
In one embodiment, the weight of the sample attribute with the greatest influence in the current scene may be set to be smaller, the weight of the sample attribute with the least influence in the current scene may be set to be larger, when the kini coefficient is calculated, the obtained kini coefficient of the sample attribute with the largest influence is very small, the decision tree model is favorable to take the sample attribute with the largest influence as a branch node of the decision tree model, and similarly, the weight of the sample attribute with the smallest influence is set to be larger, so that the obtained sample property with small influence has very large damping coefficient when calculating the damping coefficient, therefore, when the decision tree model determines the branch nodes, sample attributes with small influence are ignored, when the created decision tree model is used for classifying users to be classified, more scene factors are considered, so that the scene application range of the decision tree model is widened.
In one embodiment, when the influence of the sample attribute on the classification result is not influenced by the current scene, the weight ti of the corresponding sample attribute preset according to the current scene is set to 1.
According to a usage scenario of the embodiment, for example, when a current scenario is a screening of potential customers trading a specific product, a sample attribute of "historical trading related index" has a relatively large influence on the selection of the potential customers trading the specific product, and a sample attribute of "user activity" has a relatively large influence on the selection of the potential customers trading the specific product, so that a smaller value of the weight of the kini coefficient corresponding to the sample attribute of "historical trading related index" in the screening of potential customers trading the specific product can be set, a larger value of the weight of the kini coefficient corresponding to the sample attribute of "user activity" in the screening of potential customers trading the specific product can be set, and a conveniently-constructed decision tree model takes the sample attribute of "historical trading related index" as an expense node of a current level.
And S204, taking the sample attribute with the minimum Gini coefficient as an expense node of the current level.
It will be appreciated that the decision tree model created comprises at least one hierarchy, each hierarchy comprising at least one attribute node, and that the node from which the next hierarchy starts is determined to be an expense node of the current hierarchy. Smaller the kini coefficient indicates that the sample attribute has a larger influence on the classification result, whereas larger the kini coefficient indicates that the sample attribute has a smaller influence on the classification result.
S205, after the expense nodes are determined, circularly calculating the step from the Gini coefficient of each residual sample attribute to the step that the sample attribute with the minimum Gini coefficient is used as the expense node of the current level until a preset stop condition is met, and obtaining the created decision tree model.
In one embodiment, the preset stop condition includes, but is not limited to: and the sample attributes corresponding to the kini coefficients smaller than the preset value are all distributed, or all the sample attributes of the user sample are distributed. Further, the preset value may be, for example, 0.3, which indicates that when sample attributes with a kini coefficient less than 0.3 are all allocated, the obtained created decision tree model includes a root node, a common attribute node, an expense node, and a leaf node on the last level.
S102, acquiring a kini coefficient corresponding to each attribute node of the decision tree model in the current scene, wherein the kini coefficient of at least one attribute node carries weight information of the attribute node corresponding to the current scene.
It can be understood that the weight of the attribute node corresponding to the current scene is preset according to the influence of the attribute on the classification result in the experimental process, and the weight can be preset in a manual setting mode.
On the other hand, the preset weight of the same attribute node in different scenes is determined according to the influence degree of the attribute node on the classification result by the current scene.
And S103, sequentially determining the expense nodes of the attribute nodes of the next level from the root node of the decision tree model according to the kini coefficient.
In one embodiment, the step of sequentially determining the expense nodes of the attribute node of the next level according to the kini coefficient further comprises:
acquiring a kini coefficient corresponding to each attribute node in the same level under the current scene;
and taking the obtained attribute node with the minimum Gini coefficient as an expenditure node from the current level to the next level.
S104, selecting the user attributes stored in the attribute nodes of the decision tree model from the user attributes to obtain matching attributes.
It can be understood that the matching attributes are a subset or a complete set of the user attributes, and the corresponding matching attributes are screened from the user attributes by judging which attribute nodes exist in the decision tree model, so that the matching is directly performed with the attribute nodes in the decision tree model through the matching attributes in the follow-up process, the condition of invalid user attribute matching is reduced, and the user classification efficiency is improved.
And S105, matching the obtained matching attribute with each node attribute in the decision tree model, and classifying the user according to the determined expense node of each level in the decision tree model.
Fig. 3 is a flowchart of a further implementation of step S105 in fig. 2 according to an embodiment of the present invention, and in one embodiment, as shown in fig. 5, the step of matching the obtained matching attribute with each node attribute in the decision tree model specifically includes the following steps S301 to S304.
S301, screening matching attributes corresponding to the root node of the decision tree model from the matching attributes to obtain first residual matching attributes;
s302, sequentially screening matching attributes corresponding to the expense nodes of the decision tree model from the first remaining matching attributes to obtain final remaining matching attributes;
s303, judging whether the leaf node of the last level of the decision tree model contains the final residual matching attribute, if so, executing a step S304;
s304, the user to be classified and the positive label user sample are classified into the same type.
It can be understood that when the user attribute of the user to be classified is completely the same as the sample attribute of the positive label user in the decision tree model, the user to be classified and the positive label user sample can be classified into the same class.
In one embodiment, as shown in fig. 3, when the leaf node of the last level of the decision tree model does not include the final residual matching attribute, the method further includes the following steps S305 to S307.
S305, obtaining the total level number of the decision tree model;
s306, acquiring the level number of the last node attribute matched with the matching attribute;
s307, taking the percentage of the ratio of the layer number where the last node attribute is located to the total layer number as the probability that the user to be classified and the positive label user sample are in the same class.
By providing the method for classifying the user to be classified when the final residual matching attribute is not included in the leaf node of the last level of the decision tree model, the embodiment can classify the user into more categories through the decision tree model, so that when the product pushing is required, the user with the probability of 100% of the push message to the potential client can be preferentially sent, and then the message pushing is sequentially performed according to the probability of the potential client.
The implementation comprises the steps of obtaining at least one user attribute of a user to be classified and a pre-created decision tree model, then obtaining a kini coefficient corresponding to each attribute node of the decision tree model in a current scene, sequentially determining expense nodes of attribute nodes of a next level according to the kini coefficient from a root node of the decision tree model, then selecting the user attribute stored in the attribute node of the decision tree model from the user attributes to obtain a matching attribute, finally matching the obtained matching attribute with each node attribute in the decision tree model, classifying the user according to the expense nodes of each level determined in the decision tree model, wherein the kini coefficient of at least one attribute node carries weight information of the attribute node corresponding to the current scene, so that the weight of the corresponding attribute node in the current scene can be preset, the influence of the attribute node on the classification result in the current scene is reflected, so that a better scene effect is achieved when the user to be classified is classified through the decision tree model, the scene application range of the decision tree model is improved, and the method for classifying the user through the decision tree model has better flexibility.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
In an embodiment, an apparatus for classifying users through a decision tree model is provided, and the apparatus for classifying users through a decision tree model corresponds to the method for classifying users through a decision tree model in the above embodiment one to one. As shown in fig. 4, the apparatus 100 for classifying users through a decision tree model includes a model obtaining module 11, a coefficient obtaining module 12, a node determining module 13, an attribute selecting module 14, and a matching module 15. The functional modules are explained in detail as follows:
the model obtaining module 11 is configured to obtain at least one user attribute of a user to be classified and a pre-created decision tree model.
In one embodiment, the user attributes include, but are not limited to, user gender, age, assets, activity metrics, historical transaction metrics, and the like. It can be understood that, in order to make the classification result of the user to be classified more accurate, all the user attributes existing in the user to be classified are preferentially acquired.
A coefficient obtaining module 12, configured to obtain a kini coefficient corresponding to each attribute node of the decision tree model in a current scene, where the kini coefficient of at least one attribute node carries weight information of the attribute node corresponding to the current scene.
It can be understood that the weight of the attribute node corresponding to the current scene is preset according to the influence of the attribute on the classification result in the experimental process, and the weight can be preset in a manual setting mode.
On the other hand, the preset weight of the same attribute node in different scenes is determined according to the influence degree of the attribute node on the classification result by the current scene.
And a node determining module 13, configured to sequentially determine, from a root node of the decision tree model, expense nodes of the next-level attribute node according to the kini coefficient.
And an attribute selecting module 14, configured to select, from the user attributes, the user attributes that exist in the attribute node of the decision tree model, so as to obtain a matching attribute.
It can be understood that the matching attributes are a subset or a complete set of the user attributes, and the corresponding matching attributes are screened from the user attributes by judging which attribute nodes exist in the decision tree model, so that the matching is directly performed with the attribute nodes in the decision tree model through the matching attributes in the follow-up process, the condition of invalid user attribute matching is reduced, and the user classification efficiency is improved.
And the matching module 15 is configured to match the obtained matching attribute with each node attribute in the decision tree model, and classify the user according to the determined expense node of each level in the decision tree model.
In one embodiment, the apparatus 100 for classifying users through a decision tree model further comprises:
and the user sample acquisition module is used for acquiring the current scene and a user sample in the current scene, wherein the user sample carries the identifier of the positive label or the negative label. The current scene can be divided according to the types of products purchased by the user, wherein the types of the products can be automobiles with the absolute value of the difference value within a preset range, and can also be different types of insurance. Further, a positive label user pattern represents users who purchased the corresponding product, and a negative label user pattern represents users who did not purchase the corresponding product.
And the sample attribute acquisition module is used for acquiring the sample characteristics of the positive label user sample and the sample attributes to which the sample characteristics belong. It is understood that the sample feature belongs to attributes such as a gender attribute for the sample feature, an asset attribute for the sample feature annual income, and the like.
And the calculating module is used for calculating the kini coefficient of each sample attribute.
And the expense node determining module is used for taking the sample attribute with the minimum Gini coefficient as the expense node of the current level.
It will be appreciated that the decision tree model created comprises at least one hierarchy, each hierarchy comprising at least one attribute node, and that the node from which the next hierarchy starts is determined to be an expense node of the current hierarchy. Smaller the kini coefficient indicates that the sample attribute has a larger influence on the classification result, whereas larger the kini coefficient indicates that the sample attribute has a smaller influence on the classification result.
And the circulating module is used for circularly calculating the step from the Gini coefficient of each residual sample attribute to the sample attribute with the minimum Gini coefficient as the expense node of the current level after the expense node is determined until a preset stop condition is met, so that the created decision tree model is obtained.
In one embodiment, the preset stop condition includes, but is not limited to: and the sample attributes corresponding to the kini coefficients smaller than the preset value are all distributed, or all the sample attributes of the user sample are distributed. Further, the preset value may be, for example, 0.3, which indicates that when sample attributes with a kini coefficient less than 0.3 are all allocated, the obtained created decision tree model includes a root node, a common attribute node, an expense node, and a leaf node on the last level.
In one embodiment, the calculation module is specifically configured to calculate the kini coefficient of each sample attribute by the following formula:
Figure BDA0003139393200000111
Figure BDA0003139393200000112
wherein N isiRepresents the sample to which the sample characteristic A belongsThis attribute, D represents the positive label user sample, tiRepresenting the weight of corresponding sample attributes preset according to the current scene, | D | representing the total number of the user samples, | D |kL represents the number of samples of the positive label user sample in the kth sample attribute, and l CkAnd | represents the number of samples of the kth sample attribute.
In one embodiment, the weight of the sample attribute with the greatest influence in the current scene may be set to be smaller, the weight of the sample attribute with the least influence in the current scene may be set to be larger, when the kini coefficient is calculated, the obtained kini coefficient of the sample attribute with the largest influence is very small, the decision tree model is favorable to take the sample attribute with the largest influence as a branch node of the decision tree model, and similarly, the weight of the sample attribute with the smallest influence is set to be larger, so that the obtained sample property with small influence has very large damping coefficient when calculating the damping coefficient, therefore, when the decision tree model determines the branch nodes, sample attributes with small influence are ignored, when the created decision tree model is used for classifying users to be classified, more scene factors are considered, so that the scene application range of the decision tree model is widened.
In one embodiment, when the influence of the sample attribute on the classification result is not influenced by the current scene, the weight ti of the corresponding sample attribute preset according to the current scene is set to 1.
According to a usage scenario of the embodiment, for example, when a current scenario is a screening of potential customers trading a specific product, a sample attribute of "historical trading related index" has a relatively large influence on the selection of the potential customers trading the specific product, and a sample attribute of "user activity" has a relatively large influence on the selection of the potential customers trading the specific product, so that a smaller value of the weight of the kini coefficient corresponding to the sample attribute of "historical trading related index" in the screening of potential customers trading the specific product can be set, a larger value of the weight of the kini coefficient corresponding to the sample attribute of "user activity" in the screening of potential customers trading the specific product can be set, and a conveniently-constructed decision tree model takes the sample attribute of "historical trading related index" as an expense node of a current level.
In one embodiment, the node determining module 13 specifically includes:
the system comprises a kini coefficient acquisition unit, a data processing unit and a data processing unit, wherein the kini coefficient acquisition unit is used for acquiring a kini coefficient corresponding to each attribute node in the same level under a current scene;
and the expense node determining unit is used for taking the acquired attribute node with the minimum Gini coefficient as an expense node from the current level to the next level.
In one embodiment, the matching module 15 further comprises:
the first screening unit is used for screening the matching attributes corresponding to the root node of the decision tree model from the matching attributes to obtain first residual matching attributes;
a second screening unit, configured to sequentially screen matching attributes corresponding to the expense nodes of the decision tree model from the first remaining matching attributes to obtain final remaining matching attributes;
and the judging unit is used for judging whether the leaf node of the last level of the decision tree model contains the final residual matching attribute, and if so, the user to be classified and the positive label user sample are classified into the same class.
It can be understood that when the user attribute of the user to be classified is completely the same as the sample attribute of the positive label user in the decision tree model, the user to be classified and the positive label user sample can be classified into the same class.
In one embodiment, when the leaf node of the last level of the decision tree model does not include the final residual matching attribute, the apparatus 100 for classifying a user through the decision tree model further includes:
the total hierarchy acquisition module is used for acquiring the total hierarchy of the decision tree model;
the layer number obtaining module is used for obtaining the layer number of the last node attribute matched with the matching attribute;
and the probability determining unit is used for taking the percentage of the ratio of the layer number where the last node attribute is located to the total layer number as the probability that the user to be classified and the positive label user sample are in the same class.
By providing the method for classifying the user to be classified when the final residual matching attribute is not included in the leaf node of the last level of the decision tree model, the embodiment can classify the user into more categories through the decision tree model, so that when the product pushing is required, the user can preferentially send the push message to users with the probability of 100% of potential customers, and then sequentially push the message according to the probability of the potential customers.
The device for classifying users through a decision tree model provided by the embodiment obtains at least one user attribute of a user to be classified and a pre-created decision tree model, then obtains a kini coefficient corresponding to each attribute node of the decision tree model in a current scene, sequentially determines expense nodes of attribute nodes of a next level according to the kini coefficient from a root node of the decision tree model, selects user attributes existing in the attribute nodes of the decision tree model from the user attributes to obtain a matching attribute, finally matches the obtained matching attribute with each node attribute in the decision tree model, classifies the user according to the expense nodes of each level determined in the decision tree model, and classifies the user due to the fact that the kini coefficient of at least one attribute node carries weight information of the attribute node corresponding to the current scene, the influence of the attribute node on the classification result in the current scene can be reflected by presetting the weight of the corresponding attribute node in the current scene, so that the decision tree model has better scene effect when classifying the user to be classified, the scene application range of the decision tree model is improved, and the method for classifying the user through the decision tree model has better flexibility.
Wherein the meaning of "first" and "second" in the above modules/units is only to distinguish different modules/units, and is not used to define which module/unit has higher priority or other defining meaning. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not explicitly listed or inherent to such process, method, article, or apparatus, and such that a division of modules presented in this application is merely a logical division and may be implemented in a practical application in a further manner.
For specific limitations of the apparatus for classifying users through the decision tree model, reference may be made to the above limitations of the method for classifying users through the decision tree model, and details are not repeated here. The modules in the apparatus for classifying users through the decision tree model may be implemented wholly or partially through software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external server through a network connection. The computer program is executed by a processor to implement a method of classifying a user through a decision tree model.
In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor executes the computer program to implement the steps of the method for classifying a user through a decision tree model in the above embodiments, such as the steps 101 to 105 shown in fig. 2 and other extensions of the method and related steps. Alternatively, the processor, when executing the computer program, implements the functions of the modules/units of the apparatus for classifying users through the decision tree model in the above embodiments, such as the functions of the modules 11 to 15 shown in fig. 4. To avoid repetition, further description is omitted here.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like which is the control center for the computer device and which connects the various parts of the overall computer device using various interfaces and lines.
The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the computer device by running or executing the computer programs and/or modules stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, video data, etc.) created according to the use of the cellular phone, etc.
The memory may be integrated in the processor or may be provided separately from the processor.
In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps of the method for classifying a user by a decision tree model in the above embodiments, such as the steps 101 to 105 and other extensions of the method and extensions of related steps shown in fig. 2. Alternatively, the computer program, when executed by the processor, implements the functions of the modules/units of the apparatus for classifying users through the decision tree model in the above-described embodiments, such as the functions of the modules 11 to 15 shown in fig. 4. To avoid repetition, further description is omitted here.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. A method for classifying a user through a decision tree model, the method comprising:
obtaining at least one user attribute of a user to be classified and a pre-established decision tree model;
acquiring a kini coefficient corresponding to each attribute node of the decision tree model in a current scene, wherein the kini coefficient of at least one attribute node carries weight information of the attribute node corresponding to the current scene;
sequentially determining expense nodes of attribute nodes of the next level from a root node of the decision tree model according to the kini coefficient;
selecting the user attributes stored in the attribute nodes of the decision tree model from the user attributes to obtain matching attributes;
and matching the obtained matching attributes with the attributes of each node in the decision tree model, and classifying the users according to the determined expense nodes of each level in the decision tree model.
2. The method of classifying users through a decision tree model according to claim 1, wherein before the obtaining at least one user attribute of the user to be classified and the pre-created decision tree model, further comprising:
acquiring a current scene and a user sample in the current scene, wherein the user sample carries an identifier of a positive label or a negative label;
obtaining sample characteristics of a positive label user sample and sample attributes to which the sample characteristics belong;
calculating a kini coefficient of each sample attribute;
taking the sample attribute with the minimum Gini coefficient as an expense node of the current level;
and after the expense nodes are determined, circularly calculating the kini coefficient of each residual sample attribute to the step of taking the sample attribute with the minimum kini coefficient as the expense node of the current level until a preset stop condition is met, and obtaining the created decision tree model.
3. The method of classifying users through a decision tree model according to claim 2, wherein the kini coefficient of each of the sample attributes is calculated by the following formula:
Figure FDA0003139393190000021
Figure FDA0003139393190000022
wherein N isiRepresenting the sample attribute to which the sample characteristic A belongs, D representing the positive label user sample, tiRepresenting the weight of corresponding sample attributes preset according to the current scene, | D | representing the total number of the user samples, | D |kL represents the number of samples of the positive label user sample in the kth sample attribute, and l CkAnd | represents the number of samples of the kth sample attribute.
4. The method of claim 3, wherein when the influence of the sample attributes on the classification result is not affected by the current scene, the weight t corresponding to the sample attributes preset according to the current scene is usediIs provided withIs 1.
5. The method for classifying users through a decision tree model according to claim 1, wherein the step of sequentially determining the expense nodes of the attribute nodes of the next hierarchy according to the kini coefficient comprises:
acquiring a kini coefficient corresponding to each attribute node in the same level under the current scene;
and taking the obtained attribute node with the minimum Gini coefficient as an expenditure node from the current level to the next level.
6. The method of classifying users through a decision tree model according to claim 2, wherein the matching the obtained matching attributes with the attributes of the nodes in the decision tree model, and the classifying the users according to the determined expense nodes of each level in the decision tree model comprises:
screening matching attributes corresponding to the root node of the decision tree model from the matching attributes to obtain first remaining matching attributes;
sequentially screening matching attributes corresponding to the expense nodes of the decision tree model from the first remaining matching attributes to obtain final remaining matching attributes;
and judging whether the leaf node of the last level of the decision tree model contains the final residual matching attribute, if so, classifying the user to be classified and the positive label user sample into the same class.
7. The method of classifying a user through a decision tree model according to claim 6, wherein when the final residual matching attribute is not contained in a leaf node of a last level of the decision tree model, the method further comprises:
acquiring the total level number of the decision tree model;
acquiring the level number of the layer where the last node attribute matched with the matching attribute is located;
and taking the percentage of the ratio of the level number of the last node attribute to the total level number as the probability that the user to be classified and the positive label user sample are in the same class.
8. An apparatus for classifying a user through a decision tree model, the apparatus comprising:
the model acquisition module is used for acquiring at least one user attribute of a user to be classified and a pre-established decision tree model;
a coefficient obtaining module, configured to obtain a kini coefficient corresponding to each attribute node of the decision tree model in a current scene, where the kini coefficient of at least one attribute node carries weight information of the attribute node corresponding to the current scene;
the node determination module is used for sequentially determining the expense nodes of the attribute nodes of the next level from the root node of the decision tree model according to the kini coefficient;
the attribute selection module is used for selecting the user attributes stored in the attribute nodes of the decision tree model from the user attributes to obtain matching attributes;
and the matching module is used for matching the obtained matching attributes with the node attributes in the decision tree model and classifying the users according to the determined expense nodes of each level in the decision tree model.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor when executing the computer program implements the steps of the method of classifying a user by means of a decision tree model according to any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method of classifying a user by means of a decision tree model according to any one of claims 1 to 7.
CN202110728460.6A 2021-06-29 2021-06-29 Method, device and equipment for classifying users through decision tree model Pending CN113379301A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110728460.6A CN113379301A (en) 2021-06-29 2021-06-29 Method, device and equipment for classifying users through decision tree model
PCT/CN2021/108779 WO2023272852A1 (en) 2021-06-29 2021-07-28 Method and apparatus for classifying user by using decision tree model, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110728460.6A CN113379301A (en) 2021-06-29 2021-06-29 Method, device and equipment for classifying users through decision tree model

Publications (1)

Publication Number Publication Date
CN113379301A true CN113379301A (en) 2021-09-10

Family

ID=77579828

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110728460.6A Pending CN113379301A (en) 2021-06-29 2021-06-29 Method, device and equipment for classifying users through decision tree model

Country Status (2)

Country Link
CN (1) CN113379301A (en)
WO (1) WO2023272852A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115393659A (en) * 2022-10-27 2022-11-25 珠海横琴圣澳云智科技有限公司 Personalized classification process optimization method and device based on multi-level decision tree

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116562769A (en) * 2023-06-15 2023-08-08 深圳爱巧网络有限公司 Cargo data analysis method and system based on cargo attribute classification
CN116561650B (en) * 2023-07-10 2023-09-19 中汽智联技术有限公司 Scene file classification and updating method, device and equipment based on tree structure

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6182058B1 (en) * 1997-02-28 2001-01-30 Silicon Graphics, Inc. Bayes rule based and decision tree hybrid classifier
CN107292186A (en) * 2016-03-31 2017-10-24 阿里巴巴集团控股有限公司 A kind of model training method and device based on random forest
US20190147350A1 (en) * 2016-04-27 2019-05-16 The Fourth Paradigm (Beijing) Tech Co Ltd Method and device for presenting prediction model, and method and device for adjusting prediction model
CN110632191A (en) * 2019-09-10 2019-12-31 福建工程学院 Transformer chromatographic peak qualitative method and system based on decision tree algorithm
CN111783840A (en) * 2020-06-09 2020-10-16 苏宁金融科技(南京)有限公司 Visualization method and device for random forest model and storage medium
CN112801231A (en) * 2021-04-07 2021-05-14 支付宝(杭州)信息技术有限公司 Decision model training method and device for business object classification

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6871201B2 (en) * 2001-07-31 2005-03-22 International Business Machines Corporation Method for building space-splitting decision tree
CN112437469B (en) * 2019-08-26 2024-04-05 中国电信股份有限公司 Quality of service guarantee method, apparatus and computer readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6182058B1 (en) * 1997-02-28 2001-01-30 Silicon Graphics, Inc. Bayes rule based and decision tree hybrid classifier
CN107292186A (en) * 2016-03-31 2017-10-24 阿里巴巴集团控股有限公司 A kind of model training method and device based on random forest
US20190147350A1 (en) * 2016-04-27 2019-05-16 The Fourth Paradigm (Beijing) Tech Co Ltd Method and device for presenting prediction model, and method and device for adjusting prediction model
CN110632191A (en) * 2019-09-10 2019-12-31 福建工程学院 Transformer chromatographic peak qualitative method and system based on decision tree algorithm
CN111783840A (en) * 2020-06-09 2020-10-16 苏宁金融科技(南京)有限公司 Visualization method and device for random forest model and storage medium
CN112801231A (en) * 2021-04-07 2021-05-14 支付宝(杭州)信息技术有限公司 Decision model training method and device for business object classification

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115393659A (en) * 2022-10-27 2022-11-25 珠海横琴圣澳云智科技有限公司 Personalized classification process optimization method and device based on multi-level decision tree
CN115393659B (en) * 2022-10-27 2023-01-24 珠海横琴圣澳云智科技有限公司 Personalized classification process optimization method and device based on multi-level decision tree

Also Published As

Publication number Publication date
WO2023272852A1 (en) 2023-01-05

Similar Documents

Publication Publication Date Title
WO2020135535A1 (en) Recommendation model training method and related apparatus
CN113379301A (en) Method, device and equipment for classifying users through decision tree model
CN109543925B (en) Risk prediction method and device based on machine learning, computer equipment and storage medium
CN110458324B (en) Method and device for calculating risk probability and computer equipment
CN115082209A (en) Business data risk early warning method and device, computer equipment and storage medium
CN114416998A (en) Text label identification method and device, electronic equipment and storage medium
CN115115004A (en) Decision tree model construction and application method, device and related equipment
CN111209929A (en) Access data processing method and device, computer equipment and storage medium
CN116663505B (en) Comment area management method and system based on Internet
CN113656699A (en) User feature vector determination method, related device and medium
CN110705889A (en) Enterprise screening method, device, equipment and storage medium
CN111275071A (en) Prediction model training method, prediction device and electronic equipment
CN115330487A (en) Product recommendation method and device, computer equipment and storage medium
CN110765387A (en) User interface generation method and device, computing equipment and storage medium
CN114547257A (en) Class matching method and device, computer equipment and storage medium
CN114020853A (en) Data model management method and device, computer equipment and storage medium
CN113901328A (en) Information recommendation method and device, electronic equipment and storage medium
CN112989183B (en) Product information recommendation method and device based on life cycle and related equipment
CN117459576A (en) Data pushing method and device based on edge calculation and computer equipment
CN113792163B (en) Multimedia recommendation method and device, electronic equipment and storage medium
CN109740671B (en) Image identification method and device
TWI706334B (en) Storage device, electronic device and method for classifying images
CN114881761A (en) Determination method of similar sample and determination method of credit limit
CN112269860A (en) Automatic response processing method and device, electronic equipment and readable storage medium
CN117853171A (en) Marketable user data identification allocation processing method, marketable user data identification allocation processing device, marketable user data identification allocation processing equipment and marketable user data identification allocation processing medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210910